Incremental Spatial Autocorrelation (Spatial Statistics)

Summary

Measures spatial autocorrelation for a series of distances and optionally creates a line graph of those distances and their corresponding z-scores. Z-scores reflect the intensity of spatial clustering, and statistically significant peak z-scores indicate distances where spatial processes promoting clustering are most pronounced. These peak distances are often appropriate values to use for tools with a Distance Band or Distance Radius parameter.

Illustration

Incremental Spatial Autocorrelation
Z-score peaks reflect distances where the spatial processes promoting clustering are most pronounced.

Usage

  • This tool can help you select an appropriate Distance Threshold or Radius for tools that have these parameters, such as Hot Spot Analysis or Point Density.

  • The Incremental Spatial Autocorrelation tool measures spatial autocorrelation for a series of distance increments and reports, for each distance increment, the associated Moran's Index, Expected Index, Variance, z-score and p-value. The values are written as messages at the bottom of the Geoprocessing pane during tool execution. You may access the messages by hovering over the progress bar, clicking on the pop-out button, or expanding the messages section in the Geoprocessing pane. You may also access the messages for a previously run tool via the Geoprocessing History. Optionally, this tool will create a PDF report file with a graphical summary of results. The path to the report will be included with the messages summarizing the tool execution parameters. Clicking on that path will pop open the report file.

  • When more than one statistically significant peak is present, clustering is pronounced at each of those distances. Select the peak distance that best corresponds to the scale of analysis you are interested in; often this is the first statistically significant peak encountered.

  • The Input Field should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. If you want to use this tool to analyze the spatial pattern of incident data, consider aggregating your incident data.

  • When the Input Feature Class is not projected (that is, when coordinates are given in degrees, minutes, and seconds) or when the output coordinate system is set to a Geographic Coordinate System, distances are computed using chordal measurements. Chordal distance measurements are used because they can be computed quickly and provide very good estimates of true geodesic distances, at least for points within about thirty degrees of each other. Chordal distances are based on an oblate spheroid. Given any two points on the earth's surface, the chordal distance between them is the length of a line, passing through the three-dimensional earth, to connect those two points. Chordal distances are reported in meters.

    Caution:

    Be sure to project your data if your study area extends beyond 30 degrees. Chordal distances are not a good estimate of geodesic distances beyond 30 degrees.

  • When chordal distances are used in the analysis, the Beginning Distance and Distance Increment parameters, if specified, should be given in meters.

  • For line and polygon features, feature centroids are used in distance computations. For multipoints, polylines, or polygons with multiple parts, the centroid is computed using the weighted mean center of all feature parts. The weighting for point features is 1, for line features is length, and for polygon features is area.

  • Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.

  • For polygon features, you will almost always want to choose Row for the Row Standardization parameter. Row Standardization mitigates bias when the number of neighbors each feature has is a function of the aggregation scheme or sampling process, rather than reflecting the actual spatial distribution of the variable you are analyzing.

  • If no Beginning Distance is given, the default value is the minimum distance for which each feature in the dataset has at least one neighbor. This may not be the most appropriate beginning distance if your dataset includes locational outliers.

  • If no Increment Distance is given, the smaller of either the average nearest neighbor distance or (Td - B) / I is used, where Td is a maximum threshold distance, B is the Beginning Distance and I is the Number of Distance Bands. This algorithm ensures calculations will always be performed for the Number of Distance Bands specified and that the largest distance bands won't be so large that some features have all or almost all other features as neighbors.

  • If the Beginning Distance and/or Increment Distance specified will result in a distance band that is larger than the maximum threshold distance, the Increment Distance will automatically be scaled down. To avoid this adjustment you can decrease the Increment Distance and/or decrease the Number of Distance Bands specified.

  • It is possible to run out of memory when you run this tool. This generally occurs when you specify a Beginning Distance and/or Increment Distance resulting in features having many, many neighbors. You generally do not want to create spatial relationships where your features have thousands of neighbors. Use a smaller value for the Increment Distance and temporarily remove locational outliers so that you can start with a smaller Beginning Distance value.

  • Even if you let the tool calculate a Beginning Distance and Increment Distance for you, processing time can be long for large datasets. You can improve performance by:

    • Temporarily removing locational outliers
    • Instead of running the analysis on all features, select features in a representative portion of the study area and run the analysis on just those features.
    • Take a random sample of features from the dataset and run your analysis on just those sampled features.

  • Distances are always based on the Output Coordinate System environment setting. The default setting for the Output Coordinate System environment is Same as Input. Input features are projected to the output coordinate system prior to analysis.

  • The optional Output Table will contain the distance value at each iteration, the Moran's I Index value, the expected Moran's I index value, the variance, the z-score, and the p-value. A peak would be an increase in the z-score value followed by a decrease in the z-score value. For example, if this tool finds the following series of z-scores for 50, 100, and 150 meter distances, 2.95, 3.68, 3.12, the peak would be 100 meters.

  • The optional Output Report File is created as a PDF file and may be accessed from the messages found at the bottom of the Geoprocessing pane.

  • On machines configured with the ArcGIS language packages for Arabic and other right-to-left languages, you might notice missing text or formatting problems in the PDF Output Report File. These problems are addressed in this article.

  • When no peak z-scores are identified, both the first peak z-score and maximum peak z-score derived output parameters return a blank.

  • When using this tool in Python scripts, the result object returned from tool execution has the following outputs:

    PositionDescriptionData Type

    0

    First Peak

    Double

    1

    Max Peak

    Double

Parameters

LabelExplanationData Type
Input Features

The feature class for which spatial autocorrelation will be measured over a series of distances.

Feature Layer
Input Field

The numeric field used in assessing spatial autocorrelation.

Field
Number of Distance Bands

The number of times to increment the neighborhood size and analyze the dataset for spatial autocorrelation. The starting point and size of the increment are specified in the Beginning Distance and Distance Increment parameters, respectively.

Long
Beginning Distance
(Optional)

The distance at which to start the analysis of spatial autocorrelation and the distance from which to increment. The value entered for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance Increment
(Optional)

The distance to increase after each iteration. The distance used in the analysis starts at the Beginning Distance and increases by the amount specified in the Distance Increment. The value entered for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance Method
(Optional)

Specifies how distances are calculated from each feature to neighboring features.

  • EuclideanThe straight-line distance between two points (as the crow flies)
  • ManhattanThe distance between two points measured along axes at right angles (city block); calculated by summing the (absolute) difference between the x- and y-coordinates
String
Row Standardization
(Optional)

Row standardization is recommended whenever the distribution of your features is potentially biased due to sampling design or an imposed aggregation scheme.

  • Checked—Spatial weights will be standardized; each weight is divided by its row sum (the sum of the weights of all neighboring features).
  • Unchecked—No standardization of spatial weights is applied.
Boolean
Output Table
(Optional)

The table to be created with each distance band and associated z-score result.

Table
Output Report File
(Optional)

The PDF file to be created containing a line graph summarizing results.

File

Derived Output

LabelExplanationData Type
First Peak

The first peak z-score.

Double
Maximum Peak

The maximum peak z-score.

Double

arcpy.stats.IncrementalSpatialAutocorrelation(Input_Features, Input_Field, Number_of_Distance_Bands, {Beginning_Distance}, {Distance_Increment}, {Distance_Method}, {Row_Standardization}, {Output_Table}, {Output_Report_File})
NameExplanationData Type
Input_Features

The feature class for which spatial autocorrelation will be measured over a series of distances.

Feature Layer
Input_Field

The numeric field used in assessing spatial autocorrelation.

Field
Number_of_Distance_Bands

The number of times to increment the neighborhood size and analyze the dataset for spatial autocorrelation. The starting point and size of the increment are specified in the Beginning_Distance and Distance_Increment parameters, respectively.

Long
Beginning_Distance
(Optional)

The distance at which to start the analysis of spatial autocorrelation and the distance from which to increment. The value entered for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance_Increment
(Optional)

The distance to increase after each iteration. The distance used in the analysis starts at the Beginning_Distance and increases by the amount specified in the Distance_Increment. The value entered for this parameter should be in the units of the Output Coordinate System environment setting.

Double
Distance_Method
(Optional)

Specifies how distances are calculated from each feature to neighboring features.

  • EUCLIDEANThe straight-line distance between two points (as the crow flies)
  • MANHATTANThe distance between two points measured along axes at right angles (city block); calculated by summing the (absolute) difference between the x- and y-coordinates
String
Row_Standardization
(Optional)

Row standardization is recommended whenever feature distribution is potentially biased due to sampling design or to an imposed aggregation scheme.

  • ROW_STANDARDIZATIONSpatial weights are standardized by row. Each weight is divided by its row sum.
  • NO_STANDARDIZATIONNo standardization of spatial weights is applied.
Boolean
Output_Table
(Optional)

The table to be created with each distance band and associated z-score result.

Table
Output_Report_File
(Optional)

The PDF file to be created containing a line graph summarizing results.

File

Derived Output

NameExplanationData Type
First_Peak

The first peak z-score.

Double
Max_Peak

The maximum peak z-score.

Double

Code sample

IncrementalSpatialAutocorrelation example 1 (Python window)

The following Python window script demonstrates how to use the IncrementalSpatialAutocorrelation tool.

import arcpy, os
import arcpy.stats as SS
arcpy.env.workspace = r"C:\ISA"
SS.IncrementalSpatialAutocorrelation("911CallsCount.shp", "ICOUNT", "20", "", "", "EUCLIDEAN",
                                     "ROW_STANDARDIZATION", "outTable.dbf", "outReport.pdf")
IncrementalSpatialAutocorrelation example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the IncrementalSpatialAutocorrelation tool.

# Hot Spot Analysis of 911 calls in a metropolitan area
# using the Incremental Spatial Autocorrelation and Hot Spot Analysis Tool

# Import system modules
import arcpy
import os
import arcpy.stats as SS

# Set property to overwrite existing output, by default
arcpy.env.overwriteOutput = True

# Local variables
workspace = r"C:\ISA"

try:
    # Set the current workspace (to avoid having to specify the full path to the feature classes each time)
    arcpy.env.workspace = workspace

    # Copy the input feature class and integrate the points to snap together at 30 feet
    # Process: Copy Features and Integrate
    cf = arcpy.CopyFeatures_management("911Calls.shp", "911Copied.shp","#", 0, 0, 0)
    integrate = arcpy.Integrate_management("911Copied.shp #", "30 Feet")

    # Use Collect Events to count the number of calls at each location
    # Process: Collect Events
    ce = SS.CollectEvents("911Copied.shp", "911Count.shp")

    # Use Incremental Spatial Autocorrelation to get the peak distance
    # Process: Incremental Spatial Autocorrelation
    isa = SS.IncrementalSpatialAutocorrelation(ce, "ICOUNT", "20", "", "", "EUCLIDEAN",
                                               "ROW_STANDARDIZATION", "outTable.dbf", "outReport.pdf")

    # Hot Spot Analysis of 911 Calls
    # Process: Hot Spot Analysis (Getis-Ord Gi*)
    distance = isa.getOutput(2)
    hs = SS.HotSpots(ce, "ICOUNT", "911HotSpots.shp", "Fixed Distance Band",
                     "Euclidean Distance", "None",  distance, "", "")

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print out the error message.
    print(arcpy.GetMessages())

Environments

Special cases

Output Coordinate System

Feature geometry is projected to the Output Coordinate System prior to analysis. All mathematical computations are based on the Output Coordinate System spatial reference. When the Output Coordinate System is based on degrees, minutes, and seconds, geodesic distances are estimated using chordal distances.

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics