How Colocation Analysis works

The Colocation Analysis tool measures local patterns of spatial association between two categories of point features using the colocation quotient statistic. The output of this tool is a map representation of the likelihood of the spatial association between the two categories analyzed with added fields including the colocation quotient value and p-value. An optional table parameter can be specified that reports the associations from every category in the Input Features of Interest parameter to every category represented in the Input Neighboring Features parameter.

Potential applications

The following are potential applications for the Colocation Analysis tool:

  • Are certain business types likely to be colocated (such as coffee shops and retail stores)?
  • Are locations of residential theft more likely to occur or be colocated with certain housing types?
  • Are there specific areas in your study area where failed restaurant inspections are colocated with insect infestations?

How the colocation quotient is calculated

Each feature in the Category of Interest (category A) is evaluated individually for colocation with the presence of the Neighboring Category (category B) found within its neighborhood. In general, if the proportion of B points within the neighborhood of A is more than the global proportion of B, the colocation quotient will be high. If the neighborhood of A contains many other A points or many other categories other than B, the colocation between the Category of Interest (category A) and the Neighboring Category (category B) will be small.

If two Datasets without categories are used as the Input Type, the Input Features of Interest are treated as category A and the Input Neighboring Features are treated as category B.

Caution:

The colocation relationship of this analysis is not symmetric. The colocation quotient values calculated when comparing category A to category B will be different than the colocation quotients calculated when comparing category B to category A.

Note:

If you happen to have category C in your neighborhood, the resulting colocation quotients will be different than if you only had categories A and B. Depending on the question you are asking, it may be important to extract a subset of your data to include only categories A and B. However, when extracting a subset, you are losing information about the other categories present. Selecting and extracting a subset of your data is important in cases where you are sure that the occurrence of one category is not at all affected by the occurrence of another.

The local colocation quotient calculated from point Ai in the Category of Interest A to the Neighboring Category B is given as:

Local colocation quotient equation
Where NB is the total number of category B present in the study area, and N is the total number of points in the study area (including all categories present). NAi–>B is the weighted average of the number of category B points in the neighborhood of each category A point (Ai). This is based on a distance decay function that allows closer features to the target feature to weigh heavier in the calculations than features that are farther away. It can be based on a Gaussian or Bisquare kernel and is specified in the Local Weighting Scheme parameter. You can apply no weighting scheme by choosing None in the Local Weighting Scheme parameter.

NAi–>B represents the weighted average of the number of type B points in the neighborhood of each Ai based on either a Gaussian or Bisquare kernel function given as:

Weighted average equation

Where fij is a binary variable indicating whether point j is a category B point. If this is true, it is equal to 1. If not, it is equal to 0. The kernel function equations are given as:

Kernel function equations
Note:

If the value of wij is negative for the Bisquare kernel, the weight assigned is 0.

Illustration of the different local weighting schemes

A global colocation quotient can also be calculated to provide a measure of spatial association between all categories in your dataset. This can allow you to explore other relationships in your data, as you may find other strongly colocated categories globally. The global colocation quotient equation is given as:

Global colocation quotient equation
where N is the total number of features, NA is the number of features of category A, and N'B is the number of features of category B. This equation will be calculated for every combination of categories in your dataset.

Permutations are used to calculate a p-value for each of the Input Features of Interest to determine whether the observed colocation quotient values are statistically significant. For each of the features, the local colocation quotient is calculated using its neighborhood and for each permutation, the categories of all other points are randomly rearranged across the entire study area (keeping the target point location category constant). A new local colocation quotient is calculated for each feature of interest using the categories in the neighborhood for each permutation. The result is a reference distribution of colocation quotient values that is then compared to the actual colocation quotient value of the feature to determine the probability that the observed value could be found in the random distribution from the permutations. By looking at this distribution, you can see the range of colocation quotient values that could reasonably be due to randomness. If the p-value is small (less than 0.05), the actual colocation quotient for the feature is statistically significant. The default for the tool is 99 permutations; however, the precision of the p-value calculated is improved with increased permutations.

Neighborhood type

You can choose a Neighborhood Type in one of three ways. A Distance band will ensure that the scale of analysis is the same across all neighborhoods in the study area. This means that areas that are more dense will have more points considered in the analysis that those areas that are more sparse. The K nearest neighbors option is adaptive in its distance and will ensure that each neighborhood contains the same number of neighbors for each feature. You can also specify a .swm file created by the Generate Spatial Weights Matrix tool to define spatial weights in other ways.

Using a space-time window

If your data has date and time fields, you can divide your analysis into a series of space-time windows. By specifying Time Field of Interest, Time Field of Neighboring Categories, and Temporal Relationship Type, you can control which features are included in the neighborhood analyzed. Features that are near each other in space and time will be analyzed together, because all feature relationships are assessed relative to the location and time stamp of the target feature. In the example below, a 1-kilometer Distance Band finds 6 neighbors for the feature labeled Jan 31. However, in the bottom example, a 1-kilometer Distance Band and a 1-day space-time window after the target feature finds only 2 other neighbors.

Applying a space-time window versus no space-time window

Suppose you were analyzing wildfire origins and camper locations in a region. If you ran the Colocation Analysis tool using only the Distance Band option for Neighborhood Type to define feature relationships, the result would be a map showing locations of wildfire origin points and whether they were colocated with all campers recorded in your dataset. If you then ran the analysis again, defining a space-time window with the parameters above, you would ensure that camper locations that occurred a year ago have no effect on your analysis of wildfire origins that occurred this year. Understanding this temporal characteristic of wildfires and campers can have important implications of how you allocate fire resources.

Interpreting results

When the Colocation Analysis tool is run, it adds six fields to the resulting Output Features. The Local Colocation Quotient field contains the resulting quotient score for each of the Input Features of Interest, and the p-value is also reported. The local colocation quotients are binned (LCLQ Bin), labeled (LCLQ Type), and displayed on the map according to each feature's LCLQ Type. Features of the Category of Interest (category A) that have a local colocation quotient greater than one are more likely to have features of the Neighboring Category (category B) within their neighborhood. Features that have colocation quotients less than one are less likely to have category B within their neighborhood. If a feature has a colocation quotient equal to one, it means the proportion of categories within their neighborhood is a good representation of the proportion of categories throughout the entire study area.

SymbolLCLQ BinLCLQ TypeDescription
Colocated - Significant symbol

0

Colocated - Significant

Local colocation quotient is greater than 1 with a p-value less than 0.05.

Colocated - Not Significant Symbol

1

Colocated - Not Significant

Local colocation quotient is greater than 1 with a p-value greater than 0.05.

Isolated - Significant Symbol

2

Isolated - Significant

Local colocation quotient is equal to or less than 1 with a p-value less than 0.05.

Isolated - Not Significant Symbol

3

Isolated - Not Significant

Local colocation quotient is equal to or less than 1 with a p-value greater than 0.05.

Undefined Symbol

4

Undefined

The feature did not have any other features within its neighborhood or bandwidth equal to 0.

For each feature's neighborhood, the Neighboring Categories field lists all categories found within the neighborhood specified. The Neighbor Prevalence field captures the number of times any combination of neighboring categories appear in neighborhoods for the other features of interest. For instance, if category B appears as a neighboring category, the Neighbor Prevalence of B is equal to the number of features for which B appeared as a neighboring category divided by the total number of Input Features of Interest. This can be helpful for exploring how common this combination (or subset of the combination) of categories appears in your study area. The following table shows that category A appears in 100 percent of neighborhoods, while the combination of A and C appears in 20 percent of the neighborhoods:

Neighborhood category combinationsNeighbor Prevalence

A

1

A

1

A, B

0.4

A, B

0.4

A, C

0.2

A scatter plot is also created and can be accessed below the Output Features in the Contents pane that displays the relationship between the local colocation quotients and the p-values calculated.

LCLQ scatter plot

Additional resources

  • Timothy F. Leslie, & Barry J. Kronenfeld (2011). "The Colocation Quotient: A New Measure of Spatial Association Between Categorical Subsets of Points." Geographical Analysis43 (3), 306-326. doi: 10.1111/j.1538-4632.2011.00821.x
  • Fahui Wang, Yujie Hu, Shuai Wang & Xiaojuan Li (2017). "Local Indicator of Colocation Quotient with a Statistical Significance Test: Examining Spatial Association of Crime and Facilities." The Professional Geographer69 (1), 22-31. doi: 10.1080/00330124.2016.1157498