A fixed distance band can be thought of as a moving analysis window that momentarily settles on top of each feature and views that feature in the context of its neighbors. The following guidelines and best practices will help you identify an appropriate distance band for your analysis:
- Select a distance based on what you know about the geographic extent of the spatial processes promoting clustering for the phenomena you are studying. Often you won't know this, but if you do, use this knowledge to select a distance value. Suppose, for example, you know that the average journey-to-work commute distance is 15 miles. Using 15 miles for the distance band is a good strategy for analyzing commute data.
- Use a distance band that is large enough to ensure that all features will have at least one neighbor, or results will not be valid.
- If the input data is skewed (and does not create a bell curve when you plot the values as a histogram), make sure that your distance band is neither too small (most features have only one or two neighbors) nor too large (several features include all other features as neighbors), as that will make resultant z-scores less reliable.
- Z-scores are reliable (even with skewed data) as long as the distance band is large enough to ensure several neighbors (approximately eight) for each feature. Even if none of the features have all other features as neighbors, performance issues and potential memory limitations can result if you create a distance band in which features have thousands of neighbors.
- Sometimes ensuring that all features have at least one neighbor results in some features having many thousands of neighbors, and this is not ideal. This can happen when some of your features are spatial outliers. To resolve this problem, find an appropriate distance band for all but the spatial outliers, and use the Generate Spatial Weights Matrix tool to create a spatial weights matrix file using that distance. When you run the Generate Spatial Weights Matrix tool, however, specify a minimum number of neighbors value for the Number of Neighbors parameter. For example, suppose you are evaluating access to healthy food in Los Angeles County using census tract data. You know that more than 90 percent of the population live within 3 miles of shopping opportunities. If you are analyzing census tracts, you may find that distances between tracts (based on tract centroids) in the downtown region are about 1,000 meters on average, but distances between tracts in outlying areas are more than 18,000 meters. To ensure that every feature has at least one neighbor, your distance band needs to be more than 18,000 meters. This scale of analysis (distance) is not appropriate for the questions you are asking. The solution is to create a spatial weights matrix file for the census tract feature class using the Generate Spatial Weights Matrix tool. Specify a value for Distance Band or Threshold Distance that makes sense for all but the spatial outliers—for example, 4800 meters (approximately 3 miles)—and a minimum number of neighbors value for the Number of Neighbors parameter (for example, 2). This will apply the 4,800 meter fixed-distance neighborhood to all features except those that do not have a least 2 neighbors using that distance. For those outlier features (and only those), the distance will be expanded just far enough to ensure that every feature has at least 2 neighbors.
- Use a distance band that reflects maximum spatial autocorrelation. Whenever you see spatial clustering on the landscape, you are seeing evidence of underlying spatial processes at work. The distance band that exhibits maximum clustering, as measured by the Incremental Spatial Autocorrelation tool, is the distance in which those spatial process are most active or most pronounced. Run the Incremental Spatial Autocorrelation tool and note where the resulting z-score seems to peak. Use the distance associated with the peak value for your analysis.
Note:
Enter distance values using the same units as specified by the spatial reference of the layer or the Output Coordinate System geoprocessing environment.
- Every peak represents a distance in which the processes promoting spatial clustering are pronounced. Multiple peaks are common. Generally, the peaks associated with larger distances reflect broad trends (a broad east-to-west trend, for example, where the west is a giant hot spot and the east is a giant cold spot). Generally, you will be most interested in peaks associated with smaller distances, often the first peak.
- An inconspicuous peak often means there are many different spatial processes operating at a variety of spatial scales. You may want to look for other criteria to determine the fixed distance to use for your analysis (perhaps the most effective distance for remediation).
- If the z-score never peaks (in other words, it keeps increasing) and if you are using aggregated data (for example, counties), it usually means the aggregation scheme is too coarse; the spatial processes of interest are operating at a scale that is smaller than the scale of your aggregation units. If you can move to a smaller scale of analysis (moving from counties to tracts for example), this may help find a peak distance. If you are working with point data and the z-score never peaks, it means there are many different spatial processes operating at a variety of spatial scales and you will likely need to come up with different criteria for determining the fixed distance to use in your analysis. Also, when you run the Incremental Spatial Autocorrelation tool, confirm that the Beginning Distance value isn't too large.
- If you do not specify a beginning distance, the Incremental Spatial Autocorrelation tool will use the distance that ensures all features have at least one neighbor. If your data includes spatial outliers, that distance may be too large for your analysis, however, and may be the reason you do not see a pronounced peak in the Output Report File. The solution is to run the Incremental Spatial Autocorrelation tool on a selection set that temporarily excludes all spatial outliers. If a peak is found with the outliers excluded, use the strategy outlined above with that peak distance applied to all of your features (including the spatial outliers), and require each feature to have at least one or two neighbors. If you're not sure if any of your features are spatial outliers, try the following:
- For polygon data, render polygon areas using a Standard Deviation rendering scheme and consider polygons with areas greater than three standard deviations to be spatial outliers. You can use the Calculate Field tool to create a field with polygon areas if you don't already have one.
- For point data, use the Near tool to compute each feature's nearest neighbor distance. To do this, set the Near tool's Input Features and Near Features parameters to your point dataset. Once you have a field with nearest neighbor distances, render those values using a Standard Deviation rendering scheme and consider distances greater than three standard deviations to be spatial outliers.
- Try not to get stuck on the idea that there is only one correct distance band. Reality is never that simple. Most likely, there are multiple or interacting spatial processes promoting observed clustering. Rather than thinking you need one distance band, think of the pattern analysis tools as effective methods for exploring spatial relationships at multiple spatial scales. Consider that when you change the scale of the analysis (by changing the distance band value), you could be asking a different question. Suppose you are looking at income data. With small distance bands, you can examine neighborhood income patterns, middle scale distances might reflect community or city income patterns, and the largest distance bands would highlight broad regional income patterns.