The Spatial Association Between Zones tool measures the degree of spatial association between two regionalizations of the same area, where each regionalization is comprised of a set of categories, called zones. The association between the sets of zones is determined by the area overlap between zones of each regionalization, and the association is highest when the areas of the zones of both regionalizations closely correspond spatially. Similarly, spatial association is lowest when the zones of each regionalization have large overlap with many zones of the other regionalization. The primary output is a global association of spatial association, a single number ranging from 0 (no correspondence) to 1 (perfect spatial alignment of zones), but the association can also be calculated and visualized for specific zones of either regionalization or for specific combinations of zones from both regionalizations using optional outputs. Each set of zones can be supplied as polygon features or a raster, and a categorical field is used to indicate the zone of each feature or raster cell.
This tool can be used in the following example applications:
- A forest manager can plan for pest-management by comparing a map of forest type to a map of insect disease risk (low, medium, and high) to a forest type. This allows the forest manager to determine which types of forest are most and least at risk for insect-based disease.
- An ecologist has created a classified habitat suitability map for the gray wolf by incorporating variables such as slope, land cover, and water distance, and the ecologist is interested in measuring how well the final suitability map corresponds to each of the variables used to create it. The ecologist can run this tool multiple times to calculate a numerical measure of association between the suitability map and each variable. High correspondence with a single variable and low correspondence with all other variables may indicate that a single variable has disproportionate influence on the final suitability.
Measures of association
The statistic used to measure the spatial association between the zones is called the V-Measure, and it quantifies the amount of information that can be gained about one set of zones by observing values of the other set of zones. For example, if you knew the forest type of a location, how certain would you be of the soil type of the same location? Similarly, if you knew the soil type of a location, how confidently could you predict the forest type? To understand why these two questions are not the same, suppose a particular soil type only appears within a particular type of forest, but that forest type is composed of many types of soils. If you knew a location had this soil type, you would be certain of the forest type of the location because that soil type is only present in one type of forest. However, if you knew a location was this type of forest, you would not be certain of the location's soil type because many soil types appear in that type of forest. The more the forest type is divided into different types of soils, the more difficult it is to predict the soil type of the location. In the worst case, if the area of the forest is evenly divided between every type of soil, you would have no reason to predict one type of soil over another. Thus, to measure the association of forest type and soil type, you must look at the diversity of soil types within fixed forest types, and you must look at diversity of forest type within fixed soil types.
For clarity, the first set of zones are called the input zones, and the second set of zones are called the overlay zones. The V-Measure is calculated by measuring the diversity of the overlay zones within the input zones and the diversity of the input zones within the overlay zones and calculating the harmonic mean of these two values. The three global measures of association are the following:
- Global Measure of Association—The V-Measure, a measure of the overall association between the input and overlay zones. The value ranges from 0 (no association) to 1 (perfect correspondence). The value does not depend on which of the two regionalizations is specified as the input and overlay zones (the input and overlay zones can be reversed, and this value will not change). The statistic is determined by calculating the harmonic mean of the following two global association measures.
- Global Correspondence of Overlay Zones within Input Zones—A measure of the consistency of the categories of the overlay within each of the input zones, ranging from 0 to 1. A value of 1 indicates that every input zone contains only a single overlay zone within it (perfect correspondence of zones). Values close to 0 indicate that the input zones are evenly divided into many categories of the overlay zones (low correspondence to a single overlay zone). This measure is referred to as completeness in the paper referenced in the Additional resources section.
- Global Correspondence of Input Zones within Overlay Zones—A measure of consistency of the categories of the overlay within each of the input zones. This value is analogous to the other global correspondence value, but it measures the variability of the input zones within the overlay zones. These two measures switch values if the input zones and overlay zones are reversed. This measure is referred to as homogeneity in the paper referenced in the Additional resources section.
These three global association measures are displayed in the geoprocessing messages and returned as derived outputs. These derived outputs can be referenced as variables in Python scripts or used as inputs to other tools in ModelBuilder. Formulas for each of the measures can be found in the reference in the Additional resources section below.
Investigate zone correspondence spatially
The two global correspondence measures used to calculate the V-Measure can each be partitioned spatially into each intersection of the input and overlay zones. Each of these intersections measures the correspondence of a particular combination of input and overlay zone, such as the correspondence between corn (crop type) and well-drained soil (soil drainage class). These specific combinations can be created using the Output Features parameter or the Output Raster parameter, depending on whether the zones were specified as polygons or rasters. Unlike the global association and correspondence measures, smaller values of the local correspondence measures indicate higher correspondence of zone combinations. The minimum value of 0 indicates perfect correspondence, and the local measures have no upper bound, but they are rarely greater than 2.
When added to a map, the output feature or raster appears in a color scheme simultaneously showing the correspondence of a specific input zone within a specific overlay zone and the correspondence of the input zone within the overlay zone. Lighter shades of blue indicate higher correspondence of the overlay zone within the input zone, and lighter shades of pink indicate higher correspondence of the input zone within the overlay zone. Areas displayed in the lightest shade of gray indicate the highest correspondence in both directions, and the darkest shade of purple indicates the lowest level of correspondence in both directions.
For the bivariate color scheme to be created, there must be at least three unique values for both the correspondence of overlay zones within input zones and correspondence of input zones within overlay zones. In this case, it is recommended to view the aggregated intersections by zone.
For example, the image below shows the intersections of temperature regions and climate zones. The highest levels of overall correspondence are in most southern and northern regions of the country, indicated by the light gray intersection areas. The lowest levels of correspondence are along the middle and western regions.
The intersections also come with two charts to visualize the correspondence of specific zone combinations. The first chart is the Summary of Overlay Zones within Input Zones, a side-by-side bar chart split by each category of the input zones. Each of the side-by-side bar charts shows the intersection area of the overlay zones with the input zone.
For example, in the image below, the input zones are forest type, and the overlay zones are insects and disease risk category. For each forest type, a bar chart is shown for the area of intersection within each of the insects and disease risk categories. The bar chart on the far left shows that the California Mixed Conifer forest type largely intersected with the No Risk category; however, approximately one-third of the area is either At Risk or already has Insects/Disease Present. For the Lodgepole Pine forest type, only about one-third of the area is not at risk, but only a small percentage of the forest already has insects/disease present. For the Pinyon Juniper Woodland forest type on the far right, only a small fraction is at risk or currently has insects or disease present. All three bars are shorter for pinyon juniper woodland than California mixed conifer and lodgepole pine because this type of forest has a smaller overall area within the study domain.
The second chart is the Summary of Input Zones within Overlay Zones, and it is analogous to the first chart, but it shows the area intersections of the input zone within each of the overlay zones.
Aggregate intersections by zone
The maximum number of intersections between the input and overlay zones is equal to the number of categories of the input zones multiplied by the number of categories of the overlay zones. With so many potential combinations and a bivariate color scheme, even with the bar charts, it can be difficult to discern which specific input zones correspond to which specific overlay zones. When there are too many combinations to evaluate each one individually, it is helpful to aggregate the correspondence measures by zone for each of the input and overlay zones. This allows you to identify input and overlay zones that correspond to some zone of the other regionalization, but it won't tell you exactly which zone it corresponds to. The output features or output raster and charts can then be used to identify the exact combination or combinations of the correspondence. These aggregations can be created using the Correspondence of Overlay Zones within Input Zones and Correspondence of Input Zones within Overlay Zones parameters if the input and overlay zones are polygons. If either is supplied as a raster, the aggregations are saved as fields of the output raster. When added to a map, areas displayed in lighter colors indicate areas of higher correspondence.
For more information and mathematical details, see the following reference:
- Nowosad, J., Stepinski, T. F. (2018). "Spatial association between regionalizations using the information-theoretical V-measure." International Journal of Geographical Information Science. https://doi.org/10.1080/13658816.2018.1511794