Spatial Outlier Detection (Spatial Statistics)

Summary

Identifies global or local spatial outliers in point features.

A global outlier is a point that is far away from all other points in a feature class. Global outliers are detected by examining distances between each point and one of its closest neighbors (by default, the closest neighbor) and detecting points where the distance is large.

A local outlier is a point that is farther away from its neighbors than would be expected by the density of points in the surrounding area. Local outliers are detected by calculating the local outlier factor (LOF) of each feature. The LOF is a measure that describes how isolated a location is compared to its local neighbors, and a higher LOF value indicates greater isolation. The tool can also be used to produce a raster prediction surface that can be used to estimate whether new features will be classified as outliers given the spatial distribution of the data.

Learn more about how Spatial Outlier Detection works

Illustration

Spatial Outlier Detection tool illustration

Usage

  • This tool identifies points supplied in the Input Features parameter as either spatial outliers or spatial inliers. The Keep Only Spatial Outliers parameter can be used to only return points identified as outliers.

  • The tool uses a local neighborhood around each feature, specified in the Number of Neighbors parameter. For local outlier detection, all points within the neighborhood are used, and the default is estimated by the tool at run time. For global outlier detection, only the farthest neighbor in the neighborhood is used, and the default is 1 (the closest neighbor). For example, a value of 3 indicates that global outliers are detected using distances to the third nearest neighbor of each point.

  • For local outlier detection, the Percent of Locations Considered Outliers parameter is used to establish a threshold for the LOF to designate each point feature as an outlier or inlier.

    Note:

    Small differences in values for the Percent of Locations Considered Outliers parameter may result in the same count of output features designated as outliers. This can occur when similarities in spatial distribution for features result in the same LOF value for multiple features.

  • The output layer include two charts. The first is a bar chart that displays counts of outliers and inliers. The second chart is a histogram. For local outlier detection, the histogram displays the distribution of LOF values for all point features and the LOF threshold used to determine whether a feature is an outlier or an inlier. For global outlier detection, the histogram instead shows the distribution of neighbor distances and the associated threshold.

  • If the Input Features parameter value has a z-coordinate, the tool will honor the 3D nature of the data by detecting spatial outliers in 3D space. When added to a scene view, the output features display in 3D to visualize the 3D spatial outliers. If the unit (for example, meters) of z-coordinate is not defined in a vertical coordinate system, the unit is assumed to be the same as the x,y coordinates.

  • The Output Prediction Raster parameter is an optional output that displays the values used to determine whether each cell is an outlier as a continuous surface across the study area. For local outlier detection, the raster contains the LOF value calculated for the cell. For global outlier detection, the raster contains the distance to the nearest neighbor. The output can be used to determine whether future observations are outliers without needing to recalculate the value of the new point. The output can only be created for 2D input features.

    Note:

    The neighbor distances and LOF values of the points will not match the values of the raster cells under each point, even if the points coincide with a cell center of the raster. This is because the feature does not use itself as a neighbor, but the raster cell does use the feature as a neighbor, so each calculation uses different neighbors and produces a different value.

  • For more information about the local outlier factor and optimizing parameters, see the following references:

    • Breunig, M. M., Kriegel, H. P., Ng, R. T., Sander, J. (2000). "LOF: identifying density-based local outliers." Proceedings of the 2000 ACM SIGMOD international conference on Management of data. (pp. 93-104).
    • Xu, Z., Kakde, D., Chaudhuri, A. (2019). "Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection." 2019 IEEE International Conference on Big Data. (pp. 4201-4207).

Parameters

LabelExplanationData Type
Input Features

The point features used to build the spatial outlier detection model. Each point will be classified as an outlier or inlier based on its local outlier factor.

Feature Layer
Output Features

The output feature class containing the local outlier factor for each input feature as well as an indicator of whether the point is a spatial outlier.

Feature Class
Number of Neighbors
(Optional)

The number of neighbors used to detect spatial outliers for each input point.

For local outlier detection, the value must be at least 2, and all features within the neighborhood are used as neighbors. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message.

For global outlier detection, only the farthest neighbor in the neighborhood is used, and the default is 1 (the closest neighbor). For example, a value of 3 indicates that global outliers are detected using distances to the third nearest neighbor of each point.

Long
Percent of Locations Considered Outliers
(Optional)

The percent of locations to be identified as spatial outliers by defining the threshold of the local outlier factor. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message. A maximum of 50 percent of the features can be identified as spatial outliers.

Double
Output Prediction Raster
(Optional)

The output raster containing the local outlier factors at each cell, which is calculated based on the spatial distribution of the input features.

Raster Dataset
Outlier Type
(Optional)

Specifies the type of outlier that will be detected. A global outlier is a point that is far away from all other points in the feature class. A local outlier is a point that is farther away from its neighbors than would be expected by the density of points in the surrounding area.

  • GlobalGlobal outliers of input points will be detected. This is the default.
  • LocalLocal outliers of input points will be detected.
String
Detection Sensitivity
(Optional)

Specifies the sensitivity level that will be used to detect global outlier. The higher the sensitivity, the more points that will be detected as outliers.

The sensitivity value will determine the threshold, and any point with a neighbor distance larger than this threshold will be identified as a global outlier. The thresholds are determined using the box plot rule, in which the threshold for high sensitivity is one interquartile range above the third quartile. For medium sensitivity, the threshold is 1.5 interquartile ranges above the third quartile. For low sensitivity, the threshold is two interquartile ranges above the third quartile.

  • LowOutliers will be detected using low sensitivity. This option will detect the fewest outliers.
  • MediumOutliers will be detected using moderate sensitivity. This is the default.
  • HighOutliers will be detected using high sensitivity. This option will detect the most outliers.
String
Keep Only Spatial Outliers
(Optional)

Specifies whether the output features will contain all input features or only features identified as spatial outliers.

  • Checked—The output features will only contain features identified as spatial outliers.
  • Unchecked—The output features will contain all input features. This is the default.

Boolean

arcpy.stats.SpatialOutlierDetection(in_features, output_features, {n_neighbors}, {percent_outlier}, {output_raster}, {outlier_type}, {sensitivity}, {keep_type})
NameExplanationData Type
in_features

The point features used to build the spatial outlier detection model. Each point will be classified as an outlier or inlier based on its local outlier factor.

Feature Layer
output_features

The output feature class containing the local outlier factor for each input feature as well as an indicator of whether the point is a spatial outlier.

Feature Class
n_neighbors
(Optional)

The number of neighbors used to detect spatial outliers for each input point.

For local outlier detection, the value must be at least 2, and all features within the neighborhood are used as neighbors. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message.

For global outlier detection, only the farthest neighbor in the neighborhood is used, and the default is 1 (the closest neighbor). For example, a value of 3 indicates that global outliers are detected using distances to the third nearest neighbor of each point.

Long
percent_outlier
(Optional)

The percent of locations to be identified as spatial outliers by defining the threshold of the local outlier factor. If no value is specified, a value is estimated at run time and is displayed as a geoprocessing message. A maximum of 50 percent of the features can be identified as spatial outliers.

Double
output_raster
(Optional)

The output raster containing the local outlier factors at each cell, which is calculated based on the spatial distribution of the input features.

Raster Dataset
outlier_type
(Optional)

Specifies the type of outlier that will be detected. A global outlier is a point that is far away from all other points in the feature class. A local outlier is a point that is farther away from its neighbors than would be expected by the density of points in the surrounding area.

  • GLOBALGlobal outliers of input points will be detected. This is the default.
  • LOCALLocal outliers of input points will be detected.
String
sensitivity
(Optional)

Specifies the sensitivity level that will be used to detect global outlier. The higher the sensitivity, the more points that will be detected as outliers.

The sensitivity value will determine the threshold, and any point with a neighbor distance larger than this threshold will be identified as a global outlier. The thresholds are determined using the box plot rule, in which the threshold for high sensitivity is one interquartile range above the third quartile. For medium sensitivity, the threshold is 1.5 interquartile ranges above the third quartile. For low sensitivity, the threshold is two interquartile ranges above the third quartile.

  • LOWOutliers will be detected using low sensitivity. This option will detect the fewest outliers.
  • MEDIUMOutliers will be detected using moderate sensitivity. This is the default.
  • HIGHOutliers will be detected using high sensitivity. This option will detect the most outliers.
String
keep_type
(Optional)

Specifies whether the output features will contain all input features or only features identified as spatial outliers.

  • KEEP_OUTLIERThe output features will only contain features identified as spatial outliers.
  • KEEP_ALLThe output features will contain all input features. This is the default.
Boolean

Code sample

SpatialOutlierDetection example 1 (Python window)

The following Python window script demonstrates how to use the SpatialOutlierDetection function.

arcpy.stats.SpatialOutlierDetection("Transaction_Locations", 
            "Transactions_SpatialOutliers", 20, 5, 
            "Transactions_OutliersPredictionSurface")
SpatialOutlierDetection example 2 (stand-alone script)

The following stand-alone Python script demonstrates how to use the SpatialOutlierDetection function.

# Import system modules.
import arcpy

try:
    # Set the workspace and input features.
    arcpy.env.workspace = 'C:\\SpatialOutlierDetection\\MyData.gdb'
    inputFeatures = "PM25_AirQualityStations"

    # Set the name of the output features
    outputFeatures = "AirQualityStations_SpatialOutliers"

    # Set the number of neighbors
    numberNeighbors = 8

    # Set the percentage of locations considered outliers
    pcntLocationsAsOutliers = 10

    # Set the output prediction raster
    outputPredictionRaster = airQualityStations_OutPredictionRaster


    # Run the Spatial Outlier Detection tool
    arcpy.stats.SpatialOutlierDetection(inputFeatures, outputFeatures, 
            numberNeighbors, pcntLocationsAsOutliers, outputPredictionRaster)

except arcpy.ExecuteError:
    # If an error occurred when running the tool, print the error message.
    print(arcpy.GetMessages())

Environments

Cell Size

This environment only impacts the output raster.

Mask

This environment only impacts the output raster.

Snap Raster

This environment only impacts the output raster.

Extent

This environment only impacts the output raster.

Licensing information

  • Basic: Limited
  • Standard: Limited
  • Advanced: Limited

Related topics