How Incremental Spatial Autocorrelation works

With a lot of spatial data analysis, the scale of the analysis is important. The default value for the Conceptualization of Spatial Relationships parameter of the Hot Spot Analysis tool, for example, is a fixed distance band. For many density tools, you need to provide a radius value. The distance you provide should relate to the scale of the question you are trying to answer or to the scale of remediation you are considering. For example, you want to understand childhood obesity. What is the scale of analysis? Is it at the individual household or neighborhood level? If so, the distance you use to define the scale of the analysis will be small, encompassing the homes within a block or two of each other. Alternatively, what will be the scale of remediation? Maybe the question involves where to increase after-school fitness programs as a way to potentially reduce childhood obesity. In that case, the distance will likely be reflective of school zones. Sometimes it’s fairly easy to determine an appropriate scale of analysis; if you are analyzing commuting patterns and know that the average journey to work is 12 miles, for example, 12 miles is an appropriate distance to use for the analysis. Other times it is more difficult to justify any particular analysis distance. This is when the Incremental Spatial Autocorrelation tool is helpful.

Whenever you see spatial clustering in the landscape, you are seeing evidence of underlying spatial processes at work. Knowing something about the spatial scale at which those underlying processes operate can help you select an appropriate analysis distance. The Incremental Spatial Autocorrelation tool runs the Spatial Autocorrelation (Global Moran’s I) tool for a series of increasing distances, measuring the intensity of spatial clustering for each distance. The intensity of clustering is determined by the z-score returned. Typically, as the distance increases, so does the z-score, indicating intensification of clustering. At a particular distance, however, the z-score generally peaks. Sometimes there are multiple peaks.

Incremental Spatial Autocorrelation graph

Peaks reflect distances where the spatial processes promoting clustering are most pronounced. The color of each point on the graph corresponds to the statistical significance of the z-score values.

Color legend for statistical significance

One strategy for identifying an appropriate scale of analysis is to select the distance associated with the statistically significant peak that best reflects the scale of the question. Often this is the first statistically significant peak.

Determine the Beginning Distance and Distance Increment values

All distance measurements are based on feature centroids and the default Beginning Distance parameter value is the smallest distance that will ensure every feature has at least one neighboring feature. This is generally a good choice, unless the dataset includes locational outliers. Determine whether there are locational outliers; then select all but the outlier features and run Incremental Spatial Autocorrelation on only the selected features. If you find a peak distance for the selection set, use that distance to create a spatial weights matrix file based on all of the features (even the outliers). When you run the Generate Spatial Weights Matrix tool to create the spatial weights matrix file, set the Number of Neighbors parameter to a value so that all features will have at least that many neighboring features.

The default Increment Distance parameter value is the average distance to each feature's nearest neighboring feature. If you've determined an appropriate starting distance using the strategies above and still don't see a peak distance, you may want to experiment with smaller or larger increment distances.

No peak distance

In some cases, you may use the Incremental Spatial Autocorrelation tool and get a graph with a z-score that continues to rise with increasing distances; there is no peak. This most often happens in cases when data has been aggregated and the scale of the processes impacting the Input Field variable are smaller than the aggregation scheme. You can try making the Increment Distance value smaller to see if that captures more subtle peaks. Sometimes, however, there is no peak because there are multiple spatial processes, each operating at a different distance, in the study area. This is often the case with large point datasets that are noisy (no clear spatial pattern to the point data values you're analyzing). In this case, you need to justify the scale of analysis using other criteria.

Interpret the results

When you run the Incremental Spatial Autocorrelation tool, the z-score results for each distance are written as messages. To access the messages, hover over the progress bar, click the pop-out button, or expand the messages section in the Geoprocessing pane. You can also access the messages for a previously run tool through the geoprocessing history. When you specify a path for the optional Output Table parameter, a table is created that includes fields for Distance, Morans I, Expected I, variance, z_score, and p_value.

By examining the Spatial Autocorrelation by Distance line chart and z-score values written as messages, you can determine if there are any peak distances. In the image below, the chart has two peak z-scores associated with distances of 8100 and 11500 feet.

Spatial Autocorrelation by Distance chart

If you use the Output Table parameter to create a table of the autocorrelation values, the table includes the Spatial Autocorrelation by Distance line chart. This is the same chart that appears in the messages.

Additional resources

For additional information, see the following: