Colocation Analysis (Spatial Statistics)

Summary

Measures local patterns of spatial association, or colocation, between two categories of point features using the colocation quotient statistic.

Learn more about how Colocation Analysis works

Illustration

Colocation Analysis diagram

Usage

  • This tool only accepts point features. The categories you want to analyze can be contained in the same or two separate datasets. You can also use two separate datasets to be considered as categories. For example, you may have a point dataset with many types of restaurants that will only be considered as category RESTAURANTS and another point dataset containing many types of crimes that will only be considered as category CRIMES.

  • The tool will determine, for each feature of the Category of Interest, whether the features of the Neighboring Category are more or less present in its neighborhood compared to the overall spatial distribution of the categories. For example, for each feature of category A, a resulting local colocation quotient (LCLQ) value of 1 means that you are as likely to have category B as a neighbor as you might expect. A LCLQ value greater than 1 means you are more likely (than random) to have B as a neighbor, and a LCLQ value less than 1 means that the feature of category A is less likely to have a category B point as your neighbor (than a random distribution).

    Note:

    The colocation relationship of this analysis is not symmetric. The colocation quotient values calculated when comparing category A to category B will be different than the colocation quotient values calculated when comparing category B to category A.

    Also, if you have category C in your neighborhood, the resulting colocation quotients will be different than if you only had categories A and B. Depending on the question you are asking, it may be important to create a subset of your data to include only categories A and B. However, when creating a subset, you are losing information about the other categories present. Selecting and creating a subset of your data is important in cases in which you are sure that the occurrence of one category is not at all affected by the occurrence of another.

  • Spatial relationships can be defined using a Distance band, K nearest neighbors, or a spatial weights matrix file through the Neighborhood Type parameter.

  • You can analyze your data using space time windows by specifying the Time Field of Interest, Time Field of Neighboring Categories, and Temporal Relationship Type parameters. Using space time windows, you can control which features are included in the neighborhood analyzed. Features that are near each other in space and time will be analyzed together because all feature relationships are assessed relative to the location and time stamp of the target feature. You can also specify whether the tool searches for features before or after the target feature, or you can create a time span during which the tool will search for features before and after the target feature being analyzed.

  • The Number of Permutations parameter is used to calculate p-values. Choosing the number of permutations is a balance between precision and increased processing time. While the default is 99 permutations, it is recommended that you increase the number of permutations for your final analysis results.

  • A global colocation quotient can also be calculated by specifying a path for the Output Table for Global Relationships parameter. This table contains colocation quotients so you can analyze the measures of spatial association between all categories in your dataset. This allows you to explore other relationships in your data, as you may find other strongly colocated categories globally. If you do find other strongly colocated categories, you can extend your analysis by either exploring the local nature of that relationship by running the tool again with those categories of interest or running the tool again by removing those categories from the analysis if you think the strongly colocated categories are introducing unnecessary bias in your results.

  • The output of this tool is a map displaying each of the Input Features of Interest symbolized by whether they were significantly colocated with or isolated from the Input Neighboring Features. The tool adds fields to the Output Features including the Local Colocation Quotient calculated, p-value, LCLQ Bin used for symbolization, and the LCLQ Type. An optional Output Table for Global Relationships can be specified that will report the global colocation quotients between all the categories in the Field of Interest parameter and all the categories present in the Field Containing Neighboring Category parameter.

  • This tool supports parallel processing and uses 50 percent of available processors by default. The number of processors can be increased or decreased using the Parallel Processing Factor environment.

Syntax

arcpy.stats.ColocationAnalysis(input_type, in_features_of_interest, output_features, {field_of_interest}, {time_field_of_interest}, {category_of_interest}, {input_feature_for_comparison}, {field_for_comparison}, {time_field_for_comparison}, {category_for_comparison}, neighborhood_type, {number_of_neighbors}, {distance_band}, {weights_matrix_file}, {temporal_relationship_type}, {time_step_interval}, {number_of_permutations}, {local_weighting_scheme}, {output_table})
ParameterExplanationData Type
input_type

Specifies whether the in_features_of_interest parameter values will come from the same dataset with specified categories, different datasets with specified categories, or different datasets that will be treated as their own category (for example, one dataset with all points representing cheetahs and a second dataset in which all points represent gazelles).

  • SINGLE_DATASETThe categories to be analyzed exist in a field in a single dataset.
  • TWO_DATASETSThe categories to be analyzed exist in fields of separate datasets.
  • DATASETS_WITHOUT_CATEGORIESTwo separate datasets representing two categories will be analyzed.
String
in_features_of_interest

The feature class containing points with representative categories.

Feature Layer
output_features

The output feature class containing all the in_features parameter values with fields representing the local colocation quotient scores and p-values.

Feature Class
field_of_interest
(Optional)

The field containing the category or categories to be analyzed.

Field
time_field_of_interest
(Optional)

A date field with an optional time stamp for each feature to analyze points using a space-time window. Features near each other in space and time will be considered neighbors and will be analyzed together.

Field
category_of_interest
(Optional)

The base category for the analysis. The tool will identify, for each category_of_interest value, the degree to which the base category is attracted to or colocated with the neighboring_category parameter value.

String
input_feature_for_comparison
(Optional)

The input feature class containing the points with the categories that will be compared.

Feature Layer
field_for_comparison
(Optional)

The field from the input_feature_for_comparison parameter containing the category to be compared.

Field
time_field_for_comparison
(Optional)

A date field with a time stamp for each feature to analyze your points using a space-time window. Features near each other in space and time will be considered neighbors and will be analyzed together.

Field
category_for_comparison
(Optional)

The neighboring category for the analysis. The tool will identify the degree to which the category_of_interest parameter value is attracted to or isolated from the category_for_comparison value.

String
neighborhood_type

Specifies how the spatial relationships among features will be defined.

  • DISTANCE_BANDEach feature will be analyzed within the context of neighboring features. Neighboring features inside the specified critical distance specified by the distance_band parameter receive a weight of one and exert influence on computations for the target feature. Neighboring features outside the critical distance receive a weight of zero and have no influence on a target feature's computations.
  • K_NEAREST_NEIGHBORSThe closest k features will be included in the analysis as neighbors. The number of neighbors is specified by the number_of_neighbors parameter. The neighbor's influence in the analysis is weighted based on the distance to the farthest neighbor. This is the default.
  • GET_SPATIAL_WEIGHTS_FROM_FILEWhen SINGLE_DATASET is used as the input_type, spatial relationships will be defined by a specified spatial weights matrix file. The neighbor's influence in the analysis is weighted based on the distance to the farthest neighbor. The path to the spatial weights file is specified by the weights_matrix_file parameter.
String
number_of_neighbors
(Optional)

The number of neighbors around each feature that will be used to test for local relationships between categories. If no value is provided, the default of 8 is used. The provided value must be large enough to detect the relationships between features but small enough to still identify local patterns.

Long
distance_band
(Optional)

The neighborhood size is a constant or fixed distance for each feature. All features within this distance will be used to test for local relationships between categories. If no value is provided, the distance used will be the average distance at which each feature has at least eight neighbors.

Linear Unit
weights_matrix_file
(Optional)

The path to a file containing weights that define spatial, and potentially temporal, relationships among features.

File
temporal_relationship_type
(Optional)

Specifies how temporal relationships among features will be defined.

  • BEFOREThe time window will extend back in time for each of the in_features_of_interest values. Neighboring features must have a date/time stamp that occurs before the date/time stamp of the feature of interest to be included in the analysis. This is the default.
  • AFTERThe time window will extend forward in time for each of the in_features_of_interest values. Neighboring features must have a date/time stamp that occurs after the date/time stamp of the feature of interest to be included in the analysis.
  • SPANThe time window will extend both back and forward in time for each of the in_features_of_interest values. Neighboring features that have a date/time stamp that occurs within the time_step_interval value before or after the date/time stamp of the feature of interest will be included in the analysis. For example, if the time_step_interval parameter is set to 1 week, the window will look 1 week before and 1 week after the target feature.
String
time_step_interval
(Optional)

An integer and unit of measurement representing the number of time units composing the time window.

Time Unit
number_of_permutations
(Optional)

The number of permutations that will be used to create a reference distribution. Choosing the number of permutations is a balance between precision and increased processing time. Choose your preference of speed versus precision. More robust and precise results take longer to calculate.

  • 99The analysis will use 99 permutations. With 99 permutations, the smallest possible pseudo p-value is 0.02 and all other pseudo p-values will be multiples of this value. This is the default.
  • 199The analysis will use 199 permutations. With 199 permutations, the smallest possible pseudo p-value is 0.01 and all other pseudo p-values will be even multiples of this value.
  • 499The analysis will use 499 permutations. With 499 permutations, the smallest possible pseudo p-value is 0.004 and all other pseudo p-values will be even multiples of this value.
  • 999The analysis will use 999 permutations. With 999 permutations, the smallest possible pseudo p-value is 0.002 and all other pseudo p-values will be even multiples of this value.
  • 9999The analysis will use 9,999 permutations. With 9,999 permutations, the smallest possible pseudo p-value is 0.0002 and all other pseudo p-values will be even multiples of this value.
Long
local_weighting_scheme
(Optional)

Specifies the kernel type that will be used to provide the spatial weighting. The kernel defines how each feature is related to other features within its neighborhood.

  • BISQUAREFeatures will be weighted based on the distance to the farthest neighbor or the edge of the distance band, and a weight of 0 will be assigned to any feature outside the neighborhood specified.
  • GAUSSIANFeatures will be weighted based on the distance to the farthest neighbor or the edge of the distance band but drop off more quickly than the Bisquare option. A weight of 0 will be assigned to any feature outside the neighborhood specified. This is the default.
  • NONENo weighting scheme will be applied, and all features within the neighborhood will be given a weight of 1 and contribute equally. All features outside the neighborhood will be given a weight of 0.
String
output_table
(Optional)

A table that includes the global colocation quotients between all the categories in the Field of Interest parameter and all the categories in the Field Containing Neighboring Category parameter. This table can help you determine the local categories to analyze.

If Datasets without categories is used as the Input Type parameter value, global colocation quotients will be calculated for each dataset and between each dataset.

Table

Code sample

ColocationAnalysis example 1 (Python window)

The following Python window scripts demonstrate how to use the ColocationAnalysis function.

import arcpy
arcpy.env.workspace = r"C:\Analysis"

# Two categories from the same categorical field.
# Find the colocation of elementary schools and middle schools
arcpy.stats.ColocationAnalysis("SINGLE_DATASET", r"Colocation.gdb\Schools",
                               r"Outputs.gdb\School_Colocation", "Facility_Type", None,
                               "Elementary", None, None, None, "Middle", "K_NEAREST_NEIGHBORS",
                               8, None, None, "BEFORE", None, 99, "BISQUARE",
                               r"Outputs.gdb\Global_School_Colocation")

# Categories from different datasets without categories
# Find the colocation of elementary schools and hospitals
arcpy.stats.ColocationAnalysis("DATASETS_WITHOUT_CATEGORIES", r"Colocation.gdb\Schools",
                               r"Outputs.gdb\Schools_Hospitals", None, None, '',
                               r"Colocation.gdb\Hospitals", None, None, '', "DISTANCE_BAND",
                               None, "30 Kilometers", None, "BEFORE", None, 199, "GAUSSIAN",
                               None)

# Categories from two datasets
# Find the colocation of elementary schools and hospitals
arcpy.stats.ColocationAnalysis("TWO_DATASETS", r"Colocation.gdb\Schools",
                               r"Outputs.gdb\Elementary_Hospitals", "Facility_Type", None,
                               "Elementary", r"Colocation.gdb\Hospitals", None, None, '',
                               "K_NEAREST_NEIGHBORS", 15, None, None, "BEFORE", None, 499,
                               "NONE", None)
ColocationAnalysis example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the ColocationAnalysis function.

# Analyze the spatial relationship (colocation) between elementary school locations and hospital locations

# Two categories from the same categorical field.
# Find the colocation of elementary schools and  middle schools

intype = "SINGLE_DATASET"
infc_interest = r"Colocation.gdb\Schools"
outfc = r"Outputs.gdb\School_Colocation"
field_interest = "Facility_Type"
time_field = ""
cat_interest = "Elementary"
infc_neigh = ""
field_neigh = ""
time_field_neigh = ""
cat_neigh = "Middle"
neighborhood_type = "K_NEAREST_NEIGHBORS"
num_neighbors = 8
dist_band = ""
swm_file = ""
temporal_type = ""
time_step_interval = ""
num_permutation = 99
weighting_scheme ="BISQUARE"
out_global_tbl = r"Outputs.gdb\Global_School_Colocation"

arcpy.stats.ColocationAnalysis(intype, infc_interest, outfc, field_interest,
                               time_field, cat_interest, infc_neigh, field_neigh,
                               time_field_neigh, cat_neigh, neighborhood_type,
                               num_neighbors, dist_band, swm_file, temporal_type,
                               time_step_interval num_permutation, weighting_scheme,
                               out_global_tbl)

# Categories from different datasets without categories
# Find the colocation of schools and hospitals

intype = "DATASETS_WITHOUT_CATEGORIES"
infc_interest = r"Colocation.gdb\Schools"
outfc = r"Outputs.gdb\Schools_Hospitals"
field_interest = ""
time_field = ""
cat_interest = ""
infc_neigh = r"Colocation.gdb\Hospitals"
field_neigh = ""
time_field_neigh = ""
cat_neigh = ""
neighborhood_type = "DISTANCE_BAND"
num_neighbors = ""
dist_band = "30 Kilometers"
swm_file = ""
temporal_type = ""
time_step_interval = ""
num_permutation = 199
weighting_scheme ="GAUSSIAN"
out_global_tbl = ""

arcpy.stats.ColocationAnalysis(intype, infc_interest, outfc, field_interest,
                               time_field, cat_interest, infc_neigh, field_neigh,
                               time_field_neigh, cat_neigh, neighborhood_type,
                               num_neighbors, dist_band, swm_file, temporal_type,
                               time_step_interval num_permutation, weighting_scheme,
                               out_global_tbl)

# Categories from two datasets
# Find the colocation of elementary schools and hospitals

intype = "TWO_DATASETS"
infc_interest = r"Colocation.gdb\Schools"
outfc = r"Outputs.gdb\Elementary_Hospitals"
field_interest = "Facility_Type"
time_field = ""
cat_interest = "Elementary"
infc_neigh = r"Colocation.gdb\Hospitals"
field_neigh = ""
time_field_neigh = ""
cat_neigh = ""
neighborhood_type = "K_NEAREST_NEIGHBORS"
num_neighbors = 15
dist_band = ""
swm_file = ""
temporal_type = ""
time_step_interval = ""
num_permutation = 499
weighting_scheme ="NONE"
out_global_tbl = ""

arcpy.stats.ColocationAnalysis(intype, infc_interest, outfc, field_interest,
                               time_field, cat_interest, infc_neigh, field_neigh,
                               time_field_neigh, cat_neigh, neighborhood_type,
                               num_neighbors, dist_band, swm_file, temporal_type,
                               time_step_interval num_permutation, weighting_scheme,
                               out_global_tbl)

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics