Optimized Hot Spot Analysis (Spatial Statistics)

ArcGIS Pro 3.4 | | Help archive

Summary

Given incident points or weighted features (points or polygons), creates a map of statistically significant hot and cold spots using the Getis-Ord Gi* statistic. It evaluates the characteristics of the input feature class to produce optimal results.

Learn more about how Optimized Hot Spot Analysis works

Illustration

Optimized Hot Spot Analysis tool illustration

Usage

  • This tool identifies statistically significant spatial clusters of high values (hot spots) and low values (cold spots). It automatically aggregates incident data, identifies an appropriate scale of analysis, and corrects for both multiple testing and spatial dependence. This tool interrogates your data in order to determine settings that will produce optimal hot spot analysis results. If you want full control over these settings, use the Hot Spot Analysis tool instead.

    Note:

    Incident data are points representing events (crime, traffic accidents) or objects (trees, stores) where your focus is on presence or absence rather than a measured attribute associated with each point.

  • The computed settings used to produce optimal hot spot analysis results are reported as messages during tool execution. The associated workflows and algorithms are explained in How Optimized Hot Spot Analysis works.

  • This tool creates a new Output Feature Class with a z-score, p-value and confidence level bin (Gi_Bin) for each feature in the Input Feature Class. It also includes a field (NNeighbors) with the number of neighbors each feature included in its calculations.

  • The output of this tool includes a histogram charting the value of the variable analyzed (either the Analysis Field or the incident count within each polygon). The chart can be accessed by selecting the List By Charts tab List By Charts on the Contents pane.

  • The Gi_Bin field identifies statistically significant hot and cold spots, corrected for multiple testing and spatial dependence using the False Discovery Rate (FDR) correction method. Features in the +/-3 bins (features with a Gi_Bin value of either +3 or -3) are statistically significant at the 99 percent confidence level; features in the +/-2 bins reflect a 95 percent confidence level; features in the +/-1 bins reflect a 90 percent confidence level; and the clustering for features with 0 for the Gi_Bin field is not statistically significant.

  • The z-score and p-value fields do not reflect any kind of FDR (False Discovery Rate) correction. For more information on z-scores and p-values, see What is a z-score? What is a p-value?

  • When the Input Feature Class is not projected (that is, when coordinates are given in degrees, minutes, and seconds) or when the output coordinate system is set to a Geographic Coordinate System, distances are computed using chordal measurements. Chordal distance measurements are used because they can be computed quickly and provide very good estimates of true geodesic distances, at least for points within about thirty degrees of each other. Chordal distances are based on an oblate spheroid. Given any two points on the earth's surface, the chordal distance between them is the length of a line, passing through the three-dimensional earth, to connect those two points. Chordal distances are reported in meters.

    Caution:

    Be sure to project your data if your study area extends beyond 30 degrees. Chordal distances are not a good estimate of geodesic distances beyond 30 degrees.

  • The Input Features can be points or polygons. With polygons, an Analysis Field is required.

  • If you provide an Analysis Field, it should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; for example, it cannot solve if all input values are 1.

  • With an Analysis Field, this tool is appropriate for all data (points or polygons) including sampled data. In fact, this tool is effective and reliable even in cases where there is oversampling. With lots of features (oversampling), the tool has more information to compute accurate and reliable results. With few features (undersampling), the tool will still do all it can to produce accurate and reliable results, but there will be less information to work with.

    Because the underlying Getis-Ord Gi* statistic used by this tool is asymptotically normal, even when the Analysis Field contains skewed data, results are reliable.

  • With point data you will sometimes be interested in analyzing data values associated with each point feature and will consequently provide an Analysis Field. In other cases you will only be interested in evaluating the spatial pattern (clustering) of the point locations or point incidents. The decision to provide an Analysis Field or not will depend on the question you are asking.

    • Analyzing point features with an Analysis Field allows you to answer questions such as Where do high and low values cluster?
    • The analysis field you select might represent the following:
      • Counts (such as the number of traffic accidents at street intersections)
      • Rates (such as city unemployment, where each city is represented by a point feature)
      • Averages (such as the mean math test score among schools)
      • Indices (such as a consumer satisfaction score for car dealerships across the country)
    • Analyzing point features when there is no Analysis Field allows you to identify where point clustering is unusually (statistically significant) intense or sparse. This type of analysis answers questions such as Where are there many points? Where are there very few points?
  • When you don't provide an Analysis Field the tool will aggregate your points in order to obtain point counts to use as an analysis field. There are three possible aggregation schemes:

    • For Count incidents within fishnet grid and Count incidents within hexagon grid, an appropriate polygon cell size is computed and used to create a fishnet or hexagon polygon mesh which is then positioned over the incident points and the points within each polygon cell are counted. If no Bounding Polygons Defining Where Incidents Are Possible feature layer is provided, the cells with zero points are removed and only the remaining cells are analyzed. When a bounding polygon feature layer is provided, all cells that fall within the bounding polygons are retained and analyzed. The point counts for each polygon cell are used as the analysis field.
      Note:

      Although fishnet grids are the more common aggregation shape used, hexagons may be a better option for certain analyses.

    • For Count incidents within aggregation polygons, you need to provide the Polygons For Aggregating Incidents Into Counts feature layer. The point incidents falling within each polygon will be counted and these polygons with their associated counts will then be analyzed. The Count incidents within aggregation polygons option is an appropriate aggregation strategy when points are associated with administrative units such as tracts, counties, or school districts. You might also use this option if you want the study area to remain fixed across multiple analyses to enhance making comparisons.
    • For Snap nearby incidents to create weighted points, a snap distance is computed and used to aggregate nearby incident points. Each aggregated point is given a count reflecting the number of incidents that were snapped together. The aggregated points are then analyzed with the incident counts serving as the analysis field. The Snap nearby incidents to create weighted points option is an appropriate aggregation strategy when you have many coincident, or nearly coincident, points and want to maintain aspects of the spatial pattern of the original point data.

    Note:
    In many cases you will want to try Snap nearby incidents to create weighted points, Count incidents within fishnet grid and Count incidents within hexagon grid to see which result best reflects the spatial pattern of the original point data. Fishnet and hexagon solutions can artificially separate clusters of point incidents, but the output may be easier for some people to interpret than weighted point output. Although fishnet grids tend to be the most common aggregation shape used, hexagons may be a better option for certain analyses.

    Caution:

    Analysis of point data without specifying an Analysis Field only makes sense when you have all of the known point incidents and you can be confident there is no bias in the point distribution you are analyzing. With sampled data you will almost always be including an Analysis Field (unless you are specifically interested in the spatial pattern of your sampling scheme).

  • When you select Count incidents within fishnet grid or Count incidents within hexagon grid for the Incident Data Aggregation Method, you may optionally provide a Bounding Polygons Defining Where Incidents Are Possible feature layer. When no bounding polygons are provided, the tool cannot know if a location without an incident should be a zero to indicate that an incident is possible at that location, but didn't occur, or if the location should be removed from the analysis because incidents would never occur at that location. Consequently, when no bounding polygons are provided, only cells with at least one incident are retained for analysis. If this isn't the behavior you want, you can provide a Bounding Polygons Defining Where Incidents Are Possible feature layer to ensure that all locations within the bounding polygons are retained. Fishnet or hexagon cells with no underlying incidents will receive an incident count of zero.

  • Any incidents falling outside the Bounding Polygons Defining Where Incidents Are Possible or the Polygons For Aggregating Incidents Into Counts will be excluded from analysis.

  • Instead of letting the tool choose optimal defaults for grid cell size and scale of analysis, the Override Settings can be used to set the Cell Size or Distance Band for the analysis.

  • The Cell Size option allows you to set the size of the grid used to aggregate your point data. You may decide to make each cell in the fishnet grid 50 meters by 50 meters, for example. If you are aggregating into hexagons, the Cell Size is the height of each hexagon and the width of the resulting hexagons will be 2 times the height divided by the square root of 3.

    Cell Size of hexagons versus fishnet grids

  • You should use the Generate Spatial Weights Matrix and Hot Spot Analysis (Getis-Ord Gi*) or the Space Time Pattern Mining tools if you want to identify space-time hot spots. More information about space-time cluster analysis is provided in the Space-Time Cluster Analysis topic and the Space Time Pattern Mining documentation.

  • Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.

  • The Output Features layer is automatically added to the table of contents with default rendering applied to the Gi_Bin field. The hot-to-cold rendering is defined by a layer file in <ArcGIS Pro>\Resources\ArcToolBox\Templates\Layers. You can reapply the default rendering, if needed, by using the Apply Symbology From Layer tool.

  • Caution:

    When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from nonshapefile inputs may store or interpret null values as zero. In some cases, nulls are stored as very large negative values in shapefiles. This can lead to unexpected results. See Geoprocessing considerations for shapefile output for more information.

Parameters

LabelExplanationData Type
Input Features

The point or polygon feature class for which hot spot analysis will be performed.

Feature Layer
Output Features

The output feature class to receive the z-score, p-value, and Gi_Bin results.

Feature Class
Analysis Field
(Optional)

The numeric field (number of incidents, crime rates, test scores, and so on) to be evaluated.

Field
Incident Data Aggregation Method
(Optional)

The aggregation method to use to create weighted features for analysis from incident point data.

  • Count incidents within fishnet gridA fishnet polygon mesh will overlay the incident point data and the number of incidents within each polygon cell will be counted. If no bounding polygon is provided in the Bounding Polygons Defining Where Incidents Are Possible parameter, only cells with at least one incident will be used in the analysis; otherwise, all cells within the bounding polygons will be analyzed.
  • Count incidents within hexagon gridA hexagon polygon mesh will overlay the incident point data and the number of incidents within each polygon cell will be counted. If no bounding polygon is provided in the Bounding Polygons Defining Where Incidents Are Possible parameter, only cells with at least one incident will be used in the analysis; otherwise, all cells within the bounding polygons will be analyzed.
  • Count incidents within aggregation polygonsYou provide aggregation polygons to overlay the incident point data in the Polygons For Aggregating Incidents Into Counts parameter. The incidents within each polygon are counted.
  • Snap nearby incidents to create weighted pointsNearby incidents will be aggregated together to create a single weighted point. The weight for each point is the number of aggregated incidents at that location.
String
Bounding Polygons Defining Where Incidents Are Possible
(Optional)

A polygon feature class defining where the incident Input Features could possibly occur.

Feature Layer
Polygons For Aggregating Incidents Into Counts
(Optional)

The polygons to use to aggregate the incident Input Features in order to get an incident count for each polygon feature.

Feature Layer
Density Surface
(Optional)

The Density Surface parameter is disabled; it remains as a tool parameter only to support backwards compatibility. The Kernel Density tool can be used if you would like a density surface visualization of your weighted points.

Raster Dataset
Cell Size
(Optional)

The size of the grid cells used to aggregate the Input Features. When aggregating into a hexagon grid, this distance is used as the height to construct the hexagon polygons.

Linear Unit
Distance Band
(Optional)

The spatial extent of the analysis neighborhood. This value determines which features are analyzed together in order to assess local clustering.

Linear Unit

Environments

Special cases

Output Coordinate System

Feature geometry is projected to the Output Coordinate System prior to analysis. All mathematical computations are based on the Output Coordinate System spatial reference. When the Output Coordinate System is based on degrees, minutes, and seconds, geodesic distances are estimated using chordal distances.

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics