Summary
Measures spatial autocorrelation based on feature locations and attribute values using the Global Moran's I statistic.
Learn more about how Spatial Autocorrelation (Global Moran's I) works
Illustration
Usage
The Spatial Autocorrelation tool returns five values: the Moran's I Index, Expected Index, Variance, z-score, and p-value. These values are written as messages at the bottom of the Geoprocessing pane during tool execution and passed as derived output values for potential use in models or scripts. To access the messages, hover over the progress bar and click the pop-out button, or expand the details section of the messages in the Geoprocessing pane. You can also access the messages and details of a previously run tool via the geoprocessing history. Optionally, you can create an HTML report file with a graphical summary of results using this tool. The path to the report will be included with the messages summarizing the tool execution parameters. Click this path to open the report file.
Given a set of features and an associated attribute, this tool evaluates whether the pattern expressed is clustered, dispersed, or random. When the z-score or p-value indicates statistical significance, a positive Moran's I index value indicates tendency toward clustering, while a negative Moran's I index value indicates tendency toward dispersion.
- This tool calculates a z-score and p-value to indicate whether you can reject the null hypotheses. In this case, the null hypothesis states that the values of the features are spatially uncorrelated.
- The z-score and p-value are measures of statistical significance which tell you whether or not to reject the null hypothesis. For this tool, the null hypothesis states that the values associated with features are randomly distributed.
The Input Field should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. If you want to use this tool to analyze the spatial pattern of incident data, consider aggregating your incident data. Optimized Hot Spot Analysis may also be used to analyze the spatial pattern of incident data.
Note:
Incident data are points representing events (crime, traffic accidents) or objects (trees, stores) where your focus is on presence or absence rather than some measured attribute associated with each point.
When the Input Feature Class is not projected (that is, when coordinates are given in degrees, minutes, and seconds) or when the output coordinate system is set to a Geographic Coordinate System, distances are computed using chordal measurements. Chordal distance measurements are used because they can be computed quickly and provide very good estimates of true geodesic distances, at least for points within about thirty degrees of each other. Chordal distances are based on an oblate spheroid. Given any two points on the earth's surface, the chordal distance between them is the length of a line, passing through the three-dimensional earth, to connect those two points. Chordal distances are reported in meters.
Caution:
Be sure to project your data if your study area extends beyond 30 degrees. Chordal distances are not a good estimate of geodesic distances beyond 30 degrees.
When chordal distances are used in the analysis, the Distance Band or Threshold Distance parameter, if specified, should be given in meters.
For line and polygon features, feature centroids are used in distance computations. For multipoints, polylines, or polygons with multiple parts, the centroid is computed using the weighted mean center of all feature parts. The weighting for point features is 1, for line features is length, and for polygon features is area.
The Conceptualization of Spatial Relationships parameter value should reflect inherent relationships among the features you are analyzing. The more realistically you can model how features interact with each other in space, the more accurate your results will be. Recommendations are outlined in Selecting a conceptualization of spatial relationships: Best practices. Here are some additional tips:
- Fixed distance band
The default Distance Band or Threshold Distance parameter value will ensure that each feature has at least one neighbor. This is important, but often this default will not be the most appropriate distance to use for your analysis. Additional strategies for selecting an appropriate scale (distance band) for your analysis are outlined in Distance band (sphere of influence).
- Inverse distance or Inverse distance squared
When zero is entered for the Distance Band or Threshold Distance parameter value, all features are considered neighbors of all other features; when this parameter is left blank, the default distance will be applied.
Weights for distances less than 1 become unstable when they are inverted. Consequently, the weighting for features separated by less than 1 unit of distance are given a weight of 1.
For the inverse distance options (Inverse distance, Inverse distance squared, and Zone of indifference), any two points that are coincident will be given a weight of 1 to avoid zero division. This assures features are not excluded from analysis.
- FIXED_DISTANCE_BAND
The default Distance Band or Threshold Distance parameter value will ensure that each feature has at least one neighbor. This is important, but often this default will not be the most appropriate distance to use for your analysis. Additional strategies for selecting an appropriate scale (distance band) for your analysis are outlined in Distance band or threshold distance.
- INVERSE_DISTANCE or INVERSE_DISTANCE_SQUARED
When zero is entered for the Distance Band or Threshold Distance parameter value, all features are considered neighbors of all other features; when this parameter is left blank, the default distance will be applied.
Weights for distances less than 1 become unstable when they are inverted. Consequently, the weighting for features separated by less than 1 unit of distance are given a weight of 1.
For the inverse distance options (INVERSE_DISTANCE, INVERSE_DISTANCE_SQUARED, and ZONE_OF_INDIFFERENCE), any two points that are coincident will be given a weight of 1 to avoid zero division. This assures features are not excluded from analysis.
- Fixed distance band
In Python, the derived output of this tool contains the Moran's I index value, z-score, p-value, and an HTML report file. For example, if you assign the tool's Result object to a variable named MoranResult, then MoranResult[0] stores the Moran's I index value, MoranResult[1] stores the z-score, MoranResult[2] stores the p-value, and MoranResult[3] stores the file path of the HTML report file. If you do not output an HTML report file using the Generate Report parameter, the last derived output will be an empty string.
Additional options for the Conceptualization of Spatial Relationships parameter, including three-dimensional and space-time relationships, are available using the Generate Spatial Weights Matrix tool. To take advantage of these additional options, construct a spatial weights matrix file prior to analysis; select Get spatial weights from file for the Conceptualization of Spatial Relationships parameter; and for the Weights Matrix File parameter, specify the path to the spatial weights file you created.
Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.
If you provide a Weights Matrix File with a .swm extension, this tool is expecting a spatial weights matrix file created using the Generate Spatial Weights Matrix tool; otherwise, this tool is expecting an ASCII-formatted spatial weights matrix file. In some cases, behavior is different depending on which type of spatial weights matrix file you use:
- ASCII-formatted spatial weights matrix files:
- Weights are used as is. Missing feature-to-feature relationships are treated as zeros.
- If the weights are row standardized, results will likely be incorrect for analyses on selection sets. If you need to run your analysis on a selection set, convert the ASCII spatial weights file to an SWM file by reading the ASCII data into a table, then using the Convert table option with the Generate Spatial Weights Matrix tool.
- SWM-formatted spatial weights matrix file:
- If the weights are row standardized, they will be restandardized for selection sets; otherwise, weights are used as is.
- ASCII-formatted spatial weights matrix files:
Running your analysis with an ASCII-formatted spatial weights matrix file is memory intensive. For analyses on more than 5,000 features, consider converting your ASCII-formatted spatial weights matrix file into an SWM-formatted file. First put your ASCII weights into a formatted table (using Excel, for example). Next, run the Generate Spatial Weights Matrix tool using Convert table for the Conceptualization of Spatial Relationships parameter. The output will be an SWM-formatted spatial weights matrix file.
For polygon features, you will almost always want to choose Row for the Standardization parameter. Row Standardization mitigates bias when the number of neighbors each feature has is a function of the aggregation scheme or sampling process, rather than reflecting the actual spatial distribution of the variable you are analyzing.
The Modeling Spatial Relationships help topic provides additional information about this tool's parameters.
Note:
It is possible to run out of memory when you run this tool. This generally occurs when you select Conceptualization of Spatial Relationships and Distance Band or Threshold Distance resulting in features having many, many neighbors. You generally do not want to define spatial relationships so that features have thousands of neighbors. You want all features to have at least one neighbor and almost all features to have at least eight neighbors.
Caution:
When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from nonshapefile inputs may store or interpret null values as zero. In some cases, nulls are stored as very large negative values in shapefiles. This can lead to unexpected results. See Geoprocessing considerations for shapefile output for more information.
Syntax
arcpy.stats.SpatialAutocorrelation(Input_Feature_Class, Input_Field, {Generate_Report}, Conceptualization_of_Spatial_Relationships, Distance_Method, Standardization, {Distance_Band_or_Threshold_Distance}, {Weights_Matrix_File}, {number_of_neighbors})
Parameter | Explanation | Data Type |
Input_Feature_Class | The feature class for which spatial autocorrelation will be calculated. | Feature Layer |
Input_Field | The numeric field used in assessing spatial autocorrelation. | Field |
Generate_Report (Optional) |
| Boolean |
Conceptualization_of_Spatial_Relationships | Specifies how spatial relationships among features are defined.
| String |
Distance_Method | Specifies how distances are calculated from each feature to neighboring features.
| String |
Standardization | Specifies whether standardization of spatial weights will be applied. Row standardization is recommended whenever the distribution of your features is potentially biased due to sampling design or an imposed aggregation scheme.
| String |
Distance_Band_or_Threshold_Distance (Optional) | The cutoff distance for the various inverse distance and fixed distance options. Features outside the specified cutoff for a target feature are ignored in analyses for that feature. However, for ZONE_OF_INDIFFERENCE, the influence of features outside the given distance is reduced with distance, while those inside the distance threshold are equally considered. The distance value entered should match that of the output coordinate system. For the inverse distance conceptualizations of spatial relationships, a value of 0 indicates that no threshold distance is applied; when this parameter is left blank, a default threshold value is computed and applied. This default value is the Euclidean distance, which ensures that every feature has at least one neighbor. This parameter has no effect when polygon contiguity (CONTIGUITY_EDGES_ONLY or CONTIGUITY_EDGES_CORNERS) or GET_SPATIAL_WEIGHTS_FROM_FILE spatial conceptualization is selected. | Double |
Weights_Matrix_File (Optional) | The path to a file containing weights that define spatial, and potentially temporal, relationships among features. | File |
number_of_neighbors (Optional) | An integer specifying the number of neighbors to include in the analysis. | Long |
Code sample
The following Python window script demonstrates how to use the SpatialAutocorrelation tool.
import arcpy
arcpy.env.workspace = r"c:\data"
arcpy.SpatialAutocorrelation_stats("olsResults.shp", "Residual","NO_REPORT",
"GET_SPATIAL_WEIGHTS_FROM_FILE","EUCLIDEAN DISTANCE",
"NONE", "#","euclidean6Neighs.swm")
The following stand-alone Python script demonstrates how to use the SpatialAutocorrelation tool.
# Analyze the growth of regional per capita incomes in US
# Counties from 1969 -- 2002 using Ordinary Least Squares Regression
# Import system modules
import arcpy
# Set property to overwrite existing outputs
arcpy.env.overwriteOutput = True
# Local variables...
workspace = r"C:\Data"
try:
# Set the current workspace (to avoid having to specify the full path to the feature classes each time)
arcpy.env.workspace = workspace
# Growth as a function of {log of starting income, dummy for South
# counties, interaction term for South counties, population density}
# Process: Ordinary Least Squares...
ols = arcpy.OrdinaryLeastSquares_stats("USCounties.shp", "MYID",
"olsResults.shp", "GROWTH",
"LOGPCR69;SOUTH;LPCR_SOUTH;PopDen69",
"olsCoefTab.dbf",
"olsDiagTab.dbf")
# Create Spatial Weights Matrix (Can be based off input or output FC)
# Process: Generate Spatial Weights Matrix...
swm = arcpy.GenerateSpatialWeightsMatrix_stats("USCounties.shp", "MYID",
"euclidean6Neighs.swm",
"K_NEAREST_NEIGHBORS",
"#", "#", "#", 6)
# Calculate Moran's I Index of Spatial Autocorrelation for
# OLS Residuals using a SWM File.
# Process: Spatial Autocorrelation (Morans I)...
moransI = arcpy.SpatialAutocorrelation_stats("olsResults.shp", "Residual",
"NO_REPORT", "GET_SPATIAL_WEIGHTS_FROM_FILE",
"EUCLIDEAN_DISTANCE", "NONE", "#",
"euclidean6Neighs.swm")
except:
# If an error occurred when running the tool, print out the error message.
print(arcpy.GetMessages())
Environments
- Output Coordinate System
Feature geometry is projected to the Output Coordinate System prior to analysis. All mathematical computations are based on the Output Coordinate System spatial reference. When the Output Coordinate System is based on degrees, minutes, and seconds, geodesic distances are estimated using chordal distances.
Licensing information
- Basic: Yes
- Standard: Yes
- Advanced: Yes
Related topics
- An overview of the Analyzing Patterns toolset
- Modeling spatial relationships
- What is a z-score? What is a p-value?
- Find a geoprocessing tool
- Average Nearest Neighbor
- Cluster and Outlier Analysis (Anselin Local Moran's I)
- Hot Spot Analysis (Getis-Ord Gi*)
- Spatial weights
- How Spatial Autocorrelation (Global Moran's I) works