How Spatial Autocorrelation (Global Moran's I) works

The Spatial Autocorrelation (Global Moran's I) tool measures spatial autocorrelation based on both feature locations and feature values simultaneously. Given a set of features and an associated attribute, it evaluates whether the pattern expressed is clustered, dispersed, or random. The tool calculates the Moran's I Index value and both a z-score and p-value to evaluate the significance of that index. P-values are numerical approximations of the area under the curve for a known distribution, limited by the test statistic.

Calculations

Mathematics used to compute Global Moran's I

View additional mathematics for Global Moran's I

The calculations behind the Global Moran's I statistic are shown above. The tool computes the mean and variance for the attribute being evaluated. Then, for each feature value, it subtracts the mean, creating a deviation from the mean. Deviation values for all neighboring features (features within the specified distance band, for example) are multiplied together to create a cross-product. The numerator for the Global Moran's I statistic includes these summed cross-products. Suppose features A and B are neighbors, and the mean for all feature values is 10. The range of possible cross-product results are as follows:

Feature valuesDeviationsCross-products

A=50

B=40

40

30

1200

A=8

B=6

-2

-4

8

A=20

B=2

10

-8

-80

When values for neighboring features are either both larger than the mean or both smaller than the mean, the cross-product will be positive. When one value is smaller than the mean and the other is larger than the mean, the cross-product will be negative. In all cases, the larger the deviation from the mean, the larger the cross-product result. If the values in the dataset tend to cluster spatially (high values cluster near other high values; low values cluster near other low values), the Moran's Index will be positive. When high values repel other high values, and tend to be near low values, the index will be negative. If positive cross-product values balance negative cross-product values, the index will be near zero. The numerator is normalized by the variance so that index values fall between -1.0 and +1.0 (see the Additional information section below for exceptions).

After the tool computes the index value, it computes the Expected Index value. The Expected and Observed Index values are then compared. Given the number of features in the dataset and the variance for the data values overall, the tool computes a z-score and p-value indicating whether this difference is statistically significant or not. Index values cannot be interpreted directly; they can only be interpreted within the context of the null hypothesis.

Interpretation

The tool is an inferential statistic, which means that the results of the analysis are always interpreted within the context of its null hypothesis. For the Global Moran's I statistic, the null hypothesis states that the attribute being analyzed is randomly distributed among the features in your study area; the spatial processes promoting the observed pattern of values is random chance. Imagine that you could pick up the values for the attribute you are analyzing and throw them down onto your features, letting each value fall where it may. This process (picking up and throwing down the values) is an example of a random chance spatial process.

When the p-value returned by this tool is statistically significant, you can reject the null hypothesis. The following table summarizes the interpretation of the results:

The p-value is not statistically significant.

You cannot reject the null hypothesis. It is quite possible that the spatial distribution of feature values is the result of random spatial processes. The observed spatial pattern of feature values could very well be one of many, many possible versions of complete spatial randomness (CSR).

The p-value is statistically significant, and the z-score is positive.

You may reject the null hypothesis. The spatial distribution of high values and/or low values in the dataset is more spatially clustered than would be expected if underlying spatial processes were random.

The p-value is statistically significant, and the z-score is negative.

You may reject the null hypothesis. The spatial distribution of high values and low values in the dataset is more spatially dispersed than would be expected if underlying spatial processes were random. A dispersed spatial pattern often reflects some type of competitive process—a feature with a high value repels other features with high values; similarly, a feature with a low value repels other features with low values.

Note:

The null hypothesis for both the High/Low Clustering (General G) tool and the Spatial Autocorrelation (Global Moran's I) tool is complete spatial randomness. The interpretation of z-scores for the High/Low Clustering (General G) tool is different, however.

Output

The Spatial Autocorrelation tool returns five values: the Moran's I Index, Expected Index, Variance, z-score, and p-value. The tool provides these values as geoprocessing messages and as derived output values for use in models or scripts. Optionally, the tool will create an report as an .html file with a graphical summary of results. The path to the report is included with the messages summarizing the tool parameters. Click the path to open the report file.

Best practices

The following considerations should be made when using this tool:

  • The Input Feature Class parameter value should contain at least 30 features. Results will not be reliable with less than 30 features.

  • Ensure that the specified Conceptualization of Spatial Relationships parameter value is appropriate.

    Learn more about best practices for selecting a conceptualization of spatial relationships

  • Ensure that the specified Distance Band or Threshold Distance parameter value is appropriate. The following should be true:
    • All features should have at least one neighbor.
    • No feature should have all other features as a neighbor.
    • If the values for the Input Field parameter value are skewed, features should have about eight neighbors each.
  • For input polygon features, you should almost always standardize.

Additional information

Results from the Hot Spot Analysis (Getis-Ord Gi*) tool indicate statistically significant hot spots. The results from this tool may not be statistically significant. The global statistics from the Spatial Autocorrelation (Global Moran's I) tool assess the overall pattern and trend of your data. They are most effective when the spatial pattern is consistent across the study area. Local statistics (like the Hot Spot Analysis (Getis-Ord Gi*) tool) assess each feature within the context of neighboring features and compare the local situation to the global situation. Consider an example. When you compute a mean or average for a set of values, you are also computing a global statistic. If all the values are near 20, the mean will also be near 20, and that result will be a very good representation and summary of the dataset as a whole. But if half of the values are near 1 and the other half of the values are near 100, the mean will be near 50. There might not be any data values anywhere near 50, so the mean value is not a good representation or summary of the dataset as a whole. If you create a histogram of the data values, you will see the bimodal distribution. Similarly, global spatial statistics, including the Spatial Autocorrelation (Global Moran's I) tool, are most effective when the spatial processes being measured are consistent across the study area. Results will then be a good representation and summary of the overall spatial pattern. For more information, see The Analysis of Spatial Association by Use of Distance Statistics, and the analysis of SIDS they present.

The results from this tool are different from the results of the Spatial Autocorrelation (Global Moran's I) tool. These two tools measure different spatial patterns. See Interpretation of High/Low Clustering (Getis-Ord General G) results for more information.

Results of z-scores or p-values are not comparable across different study areas. When the study area is fixed, however (for example, all analyses are for California counties), the Input Field parameter value is comparable (for example, all analyses involve some type of population count), and the tool parameters are the same, you may compare statistically significant z-scores to get a sense of the intensity of spatial clustering or spatial dispersion or to better understand trends over time. You can also run the analysis for a series of increasing Distance Band or Threshold Distance parameter values to see the distance or scale where the processes promoting spatial clustering are most pronounced.

In general, the Global Moran's Index is bounded by -1.0 and 1.0. This is always the case when your weights are row standardized. When you don't row standardize the weights, there may be instances where the index value falls outside the -1.0 to 1.0 range, indicating a problem with your parameter settings. The most common problems are the following:

  • The Input Field parameter value is strongly skewed (create a histogram of the data values to see this), and the Conceptualization of Spatial Relationships or Distance Band parameter value is such that some features have very few neighbors. The Global Moran's I statistic is asymptotically normal, which means for skewed data, you will want each feature to have at least eight neighbors. The default value computed for the Distance Band or Threshold Distance parameter ensures that every feature has at least one neighbor, but this may not be sufficient, especially when values in the Input Field parameter value are strongly skewed.
  • If the Conceptualization of Spatial Relationships parameter's Inverse Distance option is used, and the inverted distances are very small.
  • The Standardization parameter is not set to the Row option but should be. Whenever your data has been aggregated, unless the aggregation scheme relates directly to the field you are analyzing, specify the Row option.

Example applications

The following are example applications of the tool:

  • Identify an appropriate neighborhood distance for a variety of spatial analysis methods by finding the distance where spatial autocorrelation is strongest.
  • Measure broad trends in ethnic or racial segregation over time—is segregation increasing or decreasing?
  • Summarize the diffusion of an idea, disease, or trend over space and time—is the idea, disease, or trend remaining isolated and concentrated, or spreading and becoming more diffuse?

Additional resources

The following books and journal articles have further information about this tool:

Getis, Arthur, and J. K. Ord. "The Analysis of Spatial Association by Use of Distance Statistics." Geographical Analysis 24, no. 3. 1992.

Goodchild, Michael F. Spatial Autocorrelation. Catmog 47, Geo Books. 1986.

Griffith, Daniel. Spatial Autocorrelation: A Primer. Resource Publications in Geography, Association of American Geographers. 1987.

The ESRI Guide to GIS Analysis, Volume 2. ESRI Press, 2005.