With much of the spatial data analysis you do, the scale of your analysis will be important. The default Conceptualization of Spatial Relationships for the Hot Spot Analysis tool, for example, is Fixed distance band and requires you to specify a distance value. For many density tools you will be asked to provide a radius value. The distance you select should relate to the scale of the question you are trying to answer or to the scale of remediation you are considering. Suppose, for example, you want to understand childhood obesity. What is your scale of analysis? Is it at the individual household or neighborhood level? If so, the distance you use to define your scale of analysis will be small, encompassing the homes within a block or two of each other. Alternatively, what will be the scale of remediation? Perhaps your question involves where to increase after-school fitness programs as a way to potentially reduce childhood obesity. In that case, your distance will likely be reflective of school zones. Sometimes it’s fairly easy to determine an appropriate scale of analysis; if you are analyzing commuting patterns and know that the average journey to work is 12 miles, for example, then 12 miles would be an appropriate distance to use for your analysis. Other times it is more difficult to justify any particular analysis distance. This is when the Incremental Spatial Autocorrelation tool is most helpful.
Whenever you see spatial clustering in the landscape, you are seeing evidence of underlying spatial processes at work. Knowing something about the spatial scale at which those underlying processes operate can help you select an appropriate analysis distance. The Incremental Spatial Autocorrelation tool runs the Spatial Autocorrelation (Global Moran’s I) tool for a series of increasing distances, measuring the intensity of spatial clustering for each distance. The intensity of clustering is determined by the z-score returned. Typically, as the distance increases, so does the z-score, indicating intensification of clustering. At some particular distance, however, the z-score generally peaks. Sometimes you will see multiple peaks.
Peaks reflect distances where the spatial processes promoting clustering are most pronounced. The color of each point on the graph corresponds to the statistical significance of the z-score values.
One strategy for identifying an appropriate scale of analysis is to select the distance associated with the statistically significant peak that best reflects the scale of your question. Often this is the first statistically significant peak.
How do I select the Beginning Distance and Distance Increment values?
All distance measurements are based on feature centroids and the default Beginning Distance is the smallest distance that will ensure every feature has at least one neighboring feature. This is generally a good choice, unless your dataset includes locational outliers. Determine whether or not you have locational outliers, then select all but the outlier features and run Incremental Spatial Autocorrelation on just the selected features. If you find a peak distance for the selection set, use that distance to create a spatial weights matrix file based on all of your features (even the outliers). When you run the Generate Spatial Weights Matrix tool to create the spatial weights matrix file, set the Number of Neighbors parameter to some value so that all features will have at least that many neighboring features.
The default Increment Distance is the average distance to each feature's nearest neighboring feature. If you've determined an appropriate starting distance using the strategies above and still don't see a peak distance, you may want to experiment with smaller or larger increment distances.
What if the graph never peaks?
In some cases, you will use the Incremental Spatial Autocorrelation tool and get a graph with a z-score that just continues to rise with increasing distances; there is no peak. This most often happens in cases where data has been aggregated and the scale of the processes impacting your Input Field variable are smaller than the aggregation scheme. You can try making your Distance Increment smaller to see if this captures more subtle peaks. Sometimes, however, you won't get a peak because there are multiple spatial processes, each operating at a different distance, in your study area. This is often the case with large point datasets that are noisy (no clear spatial pattern to the point data values you're analyzing). In this case, you will need to justify your scale of analysis using some other criteria.
When you run the Incremental Spatial Autocorrelation tool, the z-score results for each distance are written as messages at the bottom of the Geoprocessing pane during tool execution. You may access the messages by hovering over the progress bar, clicking on the pop-out button, or expanding the messages section in the Geoprocessing pane. You may also access the messages for a previously run tool via the Geoprocessing History. When you specify a path for the optional Output Table parameter, a table is created that includes fields for Distance, Morans I, Expected I, variance, z_score, and p_value.
By examining the z-score values written as messages or by opening and examining those values in the Output Table, you can determine if there are any peak distances. More typically, however, you would identify peak distances by looking at the graphic in the optional Output Report file. The report has three pages. An example of the first page of the report is shown below. Notice that this graph has three peak z-scores associated with distances of 5000, 9000, and 13000 feet. A halo will be drawn to highlight both the first peak distance and the maximum peak distance, but all peaks represent distances where the spatial processes promoting clustering are most pronounced. You can select the peak that best reflects the scale of your analytical question. In some cases, there will only be one halo because the first and the maximum peaks are found at the same distance. If none of the z-score peaks are statistically significant, then none of the peaks will have the light blue halo. Notice that the color of the plotted z-score corresponds to the legend showing the critical values for statistical significance.
On page two of the report, the distances and z-score values are presented in table format. The last page of the report documents the parameter settings used when the tool was run. To get a report file, provide a path for the Output Report parameter.
- Videos outlining some best practices for performing a hot spot analysis:
- The Spatial Pattern Analysis Tutorial walks through an analysis of Dengue Fever data that uses the Incremental Spatial Autocorrelation tool.
- See Selecting a Fixed Distance Band in Modeling Spatial Relationships.
- How Hot Spot Analysis Works includes a discussion of finding an appropriate scale of analysis.
- For a an up-to-date list of all of the spatial statistics resources available, go to http://esriurl.com/spatialstats