The Neighborhood Summary Statistics tool calculates local summary statistics of one or more numeric fields of point or polygon features using neighborhoods. The local statistics include mean (average), median, standard deviation, interquartile range, skewness, and quantile imbalance. The neighborhoods include distance band, number of neighbors, polygon contiguity, and spatial weights files. You can geographically weight all local statistics using kernels.
The Neighborhood Type parameter has six options that can be used to define the features that are used as the neighbors of each focal feature. For all neighborhood types, the focal feature is used as a neighbor of itself by default. You can choose to exclude the focal feature as a neighbor by unchecking the Include Focal Feature in Calculations parameter.
- Distance band—All features within a specified distance (up to a maximum of 1,000 features) are used as neighbors. The default distance is the shortest distance that ensures each feature includes at least one additional neighbor.
- Number of neighbors—A fixed number of features closest to the focal feature are used as neighbors. This number does not include the focal feature itself, so if the focal feature is included in the calculations, the number of neighbors used in the calculations will be one larger than the specified value.
- Contiguity edges only—Any polygons sharing an edge with the focal feature are used as neighbors. This option is only applicable for polygon features.
- Contiguity edges corners—Any polygons sharing an edge or corner with the focal feature are used as neighbors. This option is only applicable for polygon features.
- Delaunay triangulation—Neighbors are defined by sharing edges or corners in their Delaunay triangulation. Using this option is equivalent to using the Create Thiessen Polygons tool on the points and using the Contiguity edges corners option on the Thiessen polygons. This option is only applicable for point features.
- Get spatial weights from file—Neighbors and weights of each feature are defined by a spatial weights matrix file specified in the Weights Matrix File parameter. You can create the files with the Generate Spatial Weights Matrix and Generate Network Spatial Weights tools.
There are six summary statistics that can be calculated for each analysis field, specified with the Local Summary Statistic parameter. The six statistics include measures of centrality, measures of variability and spread, and measures of symmetry. Each class provides two statistics, one traditional and one robust. Robust statistics are statistical measures that are not affected by a small number of outliers.
The All option of the Local Summary Statistic parameter is used by default to calculate all six statistics for each analysis field. The formulas for each statistic can be seen in the Formulas for the local statistics section.
Measures of centrality are used to estimate the middle or center of a distribution of values. You can use these options to smooth values in noisy data. The measures of centrality are the following:
- Mean (traditional)—The arithmetic mean (average) of the values of the analysis field.
- Median (robust)—The 50th percentile of the values of the analysis field. Half of the values fall below and half fall above the median.
Measures of variability or spread are used to estimate the range of the distribution of likely values. You can use these options to investigate whether the variability in the analysis fields is similar across the map (called variance stationarity) or whether certain areas have high higher local variability than others. The measures of variability are the following:
- Standard deviation (traditional)—The standard deviation of the values of the analysis field.
- Interquartile range (robust)—The range of the middle half of the values of the analysis field (the 75th percentile minus the 25th percentile). Half of the data fall within this range.
Measures of symmetry are used to measure whether the shape of a distribution is symmetric around the middle. These options can be used to investigate the frequency of high and low extreme values. The measures of symmetry are the following:
- Skewness (traditional)—The skewness of the values of the analysis field.
- Quantile imbalance (robust)—A value ranging from -1 to 1 indicating the position of the median relative to the 25th and 75th percentiles. Values close to -1 indicate that the median is close to the 25th percentile, and values close to 1 indicate that the median is close to the 75th percentile. Values close to 0 indicate symmetry where the median is halfway between the 25th and 75th percentiles.
Null values in analysis fields
If any of the analysis fields have null values, the null values will be ignored in the calculations by default. You can choose to include the null values by unchecking the Ignore Null Values in Calculations parameter.
When null values are ignored in a calculation, the number of neighbors is adjusted down for all calculations. For example, if two out of six neighbors have null values, the mean is calculated by summing only the four nonnull values and dividing by four.
If null values are included, all statistics will be calculated as null if any of the values used in the calculation are null. For example, if a feature has a null value in an analysis field, all other features that take the feature as a neighbor will calculate null for all summary statistics of the analysis field.
The output features symbolize in the map using the statistic specified in the Local Summary Statistic parameter calculated for the first provided analysis field (or distance to neighbors if no analysis fields are provided). If you choose All for the local summary statistic, the features display the results of the Mean statistic. The summary statistics for all other analysis fields are saved as fields in the output features along with copies of all analysis fields. There are also fields indicating the number of neighbors used for each analysis field.
Geographically weighted summary statistics
When the Neighborhood Type parameter is specified as Distance Band or Number of Neighbors, all statistics can be geographically weighted using the Local Weighting Scheme parameter. If you specify Get spatial Weights from file for the Neighborhood Type parameter, the weights specified in the file are used as the weighting scheme. If you apply a weighting scheme, all summary statistics are weighted such that neighbors that are closer to the focal feature receive higher weights in the calculations using a function, called a kernel, that decreases with distance from the focal feature. Two kernel functions are provided in the Local Weighting Scheme parameter.
The kernel functions depend on a bandwidth that controls how quickly the weights diminish with distance. The bandwidth for each kernel is provided in the Kernel Bandwidth parameter. If you do not provide a value, a default is estimated at runtime and displayed as a geoprocessing message. See How Kernel Density works for information about how this default bandwidth is calculated.
For the distance band neighborhood, the kernel bandwidth instead defaults to the same value as the Distance Band parameter.
Formulas for the local statistics
This section contains the formulas for the weighted and unweighted versions of all summary statistics of a single focal feature. These formulas are applied to every input feature for all analysis fields.
In all formulas, i = 1, ..., n are the neighbors of the focal feature (possibly including the focal feature itself) sorted by value (xi) in increasing order. All weights (wi) are normalized to sum to one before applying these formulas. The unweighted formula of each statistic is derived by setting wi = 1/n for all neighbors i.
The following table shows the weighted and unweighted version of each traditional summary statistic.
|Statistic||Weighted formula||Unweighted formula|
All robust statistics depend on the definition of a weighted p-quantile, where p is between 0 and 1. This definition is used to calculate the weighted median (p=0.5), first quartile (p=0.25), and third quartile (p=0.75). The p-quantile for a given p is defined as the following:
- Weighted p-quantile:
- Unweighted p-quantile:
Using the above definition of the p-quantile, the following table shows the weighted and unweighted version of each robust summary statistic.
|Statistic||Weighted formula||Unweighted formula|
For additional information about geographically weighted summary statistics, see the following reference:
- Brunsdon, C., A.S. Fotheringham, M. Charlton. 2002. "Geographically weighted summary statistics — a framework for localised exploratory data analysis." Computers, Environment and Urban Systems 26 (6): 501-524. ISSN 0198-9715. https://doi.org/10.1016/S0198-9715(01)00009-6.