How Cluster and Outlier Analysis (Anselin Local Moran's I) works

Given a set of features (Input Feature Class parameter value) and an analysis field (Input Field parameter value), the Cluster and Outlier Analysis(Anselin Local Moran's I) tool identifies spatial clusters of features with high or low values. The tool also identifies spatial outliers. To do this, the tool calculates a local Moran's I value, a z-score, a pseudo p-value, and a code representing the cluster type for each statistically significant feature. The z-scores and pseudo p-values represent the statistical significance of the computed index values.

Calculations

Local Moran's I mathematics

View additional mathematics for the local Moran's I statistic.

Interpretation

A positive value for the index (I) indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster. A negative value for the index indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. In either instance, the p-value for the feature must be small enough for the cluster or outlier to be considered statistically significant. For more information about determining statistical significance, see What is a z-score? What is a p-value? The local Moran's I index (I) is a relative measure and can only be interpreted within the context of its computed z-score or p-value. The z-scores and p-values reported in the output feature class are uncorrected for multiple testing or spatial dependency.

The cluster/outlier type (COType) field distinguishes between a statistically significant cluster of high values (HH), cluster of low values (LL), outlier in which a high value is surrounded primarily by low values (HL), and outlier in which a low value is surrounded primarily by high values (LH). Statistical significance is set at the 95 percent confidence level. When no FDR correction is applied, features with p-values smaller than 0.05 are considered statistically significant. The FDR correction reduces this p-value threshold from 0.05 to a value that better reflects the 95 percent confidence level given multiple testing. Features with no neighbors will have field value NN, and features that are not significant will have empty text in the field.

Output

This tool creates a new output feature class with the following attributes for each feature in the input feature class: local Moran's I index, z-score, p-value, and the cluster/outlier type.

When this tool runs, the output feature class is automatically added to the table of contents with default rendering applied to the COType field. The rendering applied is defined by a layer file in <ArcGIS Pro>\Resources\ArcToolBox\Templates\Layers. You can reapply the default rendering, if needed, using the Apply Symbology From Layer tool.

Permutations

Permutations are used to determine how likely it would be to find the actual spatial distribution of the values that you are analyzing by comparing the values to a set of randomly generated values. Even with complete spatial randomness (CSR), some degree of clustering will always be observed simply due to randomness. Permutations will generate many random datasets and you should compare these values to the Local Moran's I of the original data. To do this, each permutation randomly rearranges the neighborhood values around each feature and calculates the Local Moran's I value of this random data. By reviewing the distribution of the Local Moran's I values generated from permutations, you can determine the range of Local Moran's I values that could reasonably be due to randomness. If there is a statistically significant spatial pattern in the data, you expect the Local Moran's I values generated from permutations to display less clustering than the Local Moran's I value from the original data. A pseudo p-value is then calculated by determining the proportion of Local Moran's I values generated from permutations that display more clustering than the original data. If this proportion (the pseudo p-value) is small (less than 0.05), you can conclude that the data does display statistically significant clustering.

Choosing the number of permutations is a balance between precision and increased processing time. Increasing the number of permutations increases precision by increasing the range of possible values for the pseudo-p. For example, with 99 permutations, the precision of the pseudo-p value is .01, and for 999 permutations, the precision is .001. These values are computed by dividing 1 by the number of permutations plus one: 1/(1+99) and 1/(1+999). A lower number of permutations can be used when first exploring a problem, but it is a best practice to increase the permutations to the highest number feasible for final results.

Best practice guidelines

Keep the following in mind when using the Cluster and Outlier Analysis (Anselin Local Moran's I) tool:

  • Results are only reliable if the input feature class contains at least 30 features.
  • This tool requires an input field such as a count, rate, or other numeric measurement. If you are analyzing point data, where each point represents a single event or incident, you may not have a specific numeric attribute to evaluate (a severity ranking, count, or other measurement). If you want to find locations with many incidents (hot spots) or locations with very few incidents (cold spots), you need to aggregate the incident data before analysis. The Hot Spot Analysis (Getis-Ord Gi*) tool is also effective for finding hot and cold spots. Only the Cluster and Outlier Analysis (Anselin Local Moran's I) tool, however, will identify statistically significant spatial outliers (a high value surrounded by low values or a low value surrounded by high values).
  • Select an appropriate conceptualization of spatial relationships.
  • When you select the Space time window conceptualization, you can identify space-time clusters and outliers. See Space-time cluster analysis for more information.
  • Select an appropriate distance band or threshold distance.
    • All features should have at least one neighbor.
    • No feature should have all other features as a neighbor.
    • Especially if the values for the input field are skewed, each feature should have about eight neighbors.

Potential applications

The Cluster and Outlier Analysis (Anselin Local Moran's I) tool identifies concentrations of high values, concentrations of low values, and spatial outliers. It can help you answer questions such as the following:

  • Where are the sharpest boundaries between affluence and poverty in a study area?
  • Are there locations in a study area with anomalous spending patterns?
  • Where are the unexpectedly high rates of diabetes across the study area?

Applications can be found in many fields including economics, resource management, biogeography, political geography, and demographics.

Additional resources

Anselin, Luc. "Local Indicators of Spatial Association—LISA," Geographical Analysis 27(2): 93–115, 1995.

Mitchell, Andy. The ESRI Guide to GIS Analysis,Volume 2. ESRI Press, 2005.