All comparisons are performed by comparing the significance level categories (99% hot, 95% hot, 90% hot, not significant, 90% cold, 95% cold, and 99% cold) between corresponding features and their neighbors in both input layers. The similarity measures how closely the hot spots, cold spots, and nonsignificant areas of both hot spot results spatially align. The association (or dependence) measures the strength of the underlying statistical relationship between the hot spot variables (similar to correlation for continuous variables). The distinction between similarity and association is important because it is common for two hot spot results to be highly similar (many corresponding features and their neighbors have the same significance level) but still have little association or dependence. This means that despite the similarity of the significance levels, attempts to influence one variable (such as mitigation efforts) will not produce changes in the other variable. Highly similar but unassociated results often occur when both hot spot results are dominated by a single category, such as not significant, or when both results have large clusters of features with the same significance level.
The similarity between the hot spot results is measured by a similarity value between 0 and 1. If many corresponding features in both results have the same significance level, the value will be close to 1. If many corresponding features do not have matching significance levels, the value will be close to 0. The association is measured by a kappa value: strongly associated results will have kappa values close to 1, and unassociated (independent) results will have kappa values close to 0 (or slightly negative). The kappa value is a rescaled version of the similarity value that accounts for spatial clustering and category frequencies in order to isolate the statistical association between the hot spot results. Both values use fuzzy set membership to allow partial matches between corresponding features based on significance level similarity and spatial neighborhoods. For example, 99% hot spots can be considered perfect matches to other 99% hot spots, partial matches to 95% hot spots, and complete mismatches to 99% cold spots. Two corresponding features can also be considered partial matches if the features themselves do not have the same significance level but their neighboring features do.
The tool calculates a global similarity and global kappa value to measure the overall similarity and association between the hot spot results, and local versions are also calculated for each pair of corresponding features. This allows you to map the comparisons to explore areas that have higher or lower similarity or association than the global values. The output features also include charts and custom symbology that highlight areas where the hot spot results are most dissimilar and summarize the significance level pairs of all corresponding features.
The Input Hot Spot Result 1 and Input Hot Spot Result 2 parameter values must be the output features of the Hot Spot Analysis (Getis-Ord Gi*) or Optimized Hot Spot Analysis tools. Every feature in each result must be paired with a single feature of the other result so that their significance level categories can be compared. If the features of the two input hot spot results do not spatially align (such as polygons that do not have the same borders), the two feature layers will be intersected before the analysis, and the comparisons will be made on the feature intersections. Use caution when the two hot spot results are polygons of different sizes because the intersection will subdivide large polygons into many smaller polygons and change the frequencies of the significance level categories. At least 20 feature intersections are required to use the tool.
The results of the comparisons are returned through geoprocessing messages, a group layer of the output feature class, and charts.
The messages display information about overall comparisons between the hot spot results. The messages display the following information:
- Similarity Value—A value between 0 and 1 measuring the overall similarity between the hot spot result layers. The value can be interpreted as a fuzzy probability that any pair of corresponding features have the same significance level category.
- Expected Similarity Value—The expected value of the similarity under the assumption that the two hot spot result layers are unassociated (independent). If the similarity value is larger than its expected value, this suggests an underlying dependence between the two maps. The value is mostly informational and is used to scale the similarity value when calculating the kappa value. The value is calculated by pairing each feature with random features in the other hot spot result and calculating the similarity. By pairing each feature with random features (rather than its corresponding feature), the expected value is spatially adjusted to account for spatial clustering and category frequencies in both hot spot results. The Number of Permutations parameter specifies the number of random pairings of each feature, and the expected similarity value is the average of the similarity values of the permutations.
- Spatial Fuzzy Kappa—A measure of the association between the hot spot analysis variables that is calculated by scaling the similarity value by its expected value. Hot spot results that are perfectly associated will have the value 1, and unassociated (independent) results will have a value close to 0. Negative values indicate a negative relationship between the hot spot analysis variables. While the value has no lower bound, the values are rarely less than -3 in practice.
- Summaries of the weights between each hot spot significance level pair.
- Message tables displaying counts and percentages of each hot spot significance level pair. In the tables, the counts and percentages of the significance levels of the second hot spot result layer are broken down by the categories of the first result layer. For example, among 90% significant hot spots in the first result layer, you can see the count and percent that were also 90% significant hot spots in the second result layer, along with the counts and percentages for all other significance level categories. This is especially useful when the two hot spot results represent the same variable measured at different times. In this case, the table allows you to see how the categories transitioned in the time between the measurements.
The output features contain fields of the similarity value, expected similarity value, kappa value, and significance level categories of each pair of corresponding features. When the tool is run in a map, three layers will be added to a group layer that allow you to explore and investigate the similarity, association, and significance level pairs spatially. The first layer displays the similarity values classified into five equal intervals between 0 and 1, and lower similarity values are in darker colors to emphasize the areas that are most dissimilar. The second layer displays the spatial fuzzy kappa values symbolized with equal intervals and six classes. The third layer displays each significance level combination with custom symbology to identify features where one input hot spot result was a statistically significant hot spot and the other was a statistically significant cold spot (in the custom symbology, 90%, 95%, and 99% significance is not distinguished in order to reduce the number of combinations).
The final layer also has a heat chart and customized bar chart to further investigate the significance level pairs. These charts display the same information as the tables in the messages, but the charts are colored by the counts and percentages for ease of interpretation. You can also use selections between the charts and map to, for example, select all features that were 99% hot spots in one result and 99% cold spots in the other result, indicating the largest possible differences.
Learn more about tool outputs
The Similarity Weighting Method parameter defines the similarity between each combination of significance level categories using fuzzy set membership. Each weight is a value between 0 and 1 that indicates how similarly the categories will be treated when performing comparisons. For example, you can define a weight of 0.75 between the 99% hot and 95% hot categories to indicate that they are not exactly the same, but they are more similar than they are different.
The default Fuzzy weights option weights categories by the closeness of the significance level (determined by critical value ratios). Other options allow you to combine categories by assigning a weight value of 1 between them. For example, the Combine 95% and 99% significant option combines 99% hot and 95% hot into a single category, combines 99% cold and 95% cold, and combines 90% hot, not significant, and 90% cold. This option treats all hot (or cold) spots at or above 95% significance as being the same (statistically significant) and all features below 95% significance as being the same (not statistically significant). This is useful when you intended to perform the two hot spot analyses at a 95% significance level, and you want to treat all 90% significant hot and cold spots as if they are not significant. The Reverse hot and cold relationships option assigns large similarity weights between hot and cold spots. For example, 99% hot spots are considered perfectly similar to 99% cold spots and completely dissimilar to other 99% hot spots. This option is useful for measuring the similarity and association between variables that have a negative relationship, such as comparing hot spots of infant mortality to cold spots of median income.
The Custom weights option allows you to define custom similarity weights to merge categories and define your preferences. You can provide the custom weights in the Custom Similarity Weights parameter. The parameter displays as a pop-out matrix with the 49 (7 by 7) significance level combinations. To specify a weight between a category pair, type the value into the associated cell and press Enter. You can export the custom weights to a table from the pop-out dialog box so that they can be reused later with the Get weights from table option.
Similarity weights only affect the calculation of the similarity and kappa values. Even if significance level categories are combined using similarity weights, the message tables, output layer symbology, and charts will treat them as separate categories.
Learn more about categorical similarity
When large proportions of each hot spot result are not significant, the similarity value will be large due to the matching of nonsignificant areas. However, if the nonsignificant features are not of research interest, you may not want the similarity and kappa values to only reflect the abundance of nonsignificant areas in both results. You can use the Exclude Nonsignificant Features parameter to exclude any pair of corresponding features from the comparisons if both hot spot results are not statistically significant. If excluded, the tool calculates conditional similarity and kappa values that compare only the statistically significant hot and cold spots. By excluding the nonsignificant features from the calculations, you can calculate the similarity and kappa values only among the statistically significant hot and cold spots to accurately reflect their similarity and association.
If any significance level categories are combined with the nonsignificant category by providing a relative similarity weight of 1, those categories will also be excluded from the comparisons.
If either of the input hot spot result layers contains overlapping polygons, the overlaps will be intersected into new features. This can cause similarity values to not equal 1 even for result layers with identical significance level categories. Use the XY Tolerance environment to remove unintended overlaps, such as geocoding errors. It is recommended that you review the number of features in the output features to determine if there are more intersections than expected.
The Number of Neighbors parameter specifies the number of additional neighboring features that will be used for distance similarity. As with the similarity weighting method, distance similarity allows partial matches when the features themselves do not have the same significance level but other features in their neighborhood do have matching significance levels. Because hot spot analysis is a spatial method that uses local neighborhoods, the significance level of each feature is a characterization of the values of the feature and its closest neighbors, not just the feature. In this sense, if any neighboring feature is similar, it should contribute somewhat to the similarity of its neighbors.
Partial similarity through neighbors is incorporated using a distance weight based on the ordering of the neighbors. The feature receives a distance weight of 1, and the weights decrease consistently for each additional neighbor. The overall similarity between any two features is their categorical similarity (from the similarity weighting method) multiplied by their distance similarity.
Learn more about distance similarity and neighbor weighting
Changing the order of input hot spot results will not affect the similarity values, but the expected similarity and kappa values will change slightly due to randomness in permutations. The axes of the message tables and charts will also reverse, which will make it easier to interpret in some cases. Because the messages and charts display the significance level categories of the second hot spot result broken down by categories of the first result, you can instead display the categories of the first result broken down by categories of the second result by reversing the order of the input layers.