Understanding Moran eigenvectors

Many tools in the Spatial Statistics toolbox require the definition of a neighborhood (or conceptualization of spatial relationships) that defines which features are neighbors of each other and assigns a weight between each pair of neighbors. Together, the neighbors and weights define a spatial weights matrix (SWM) that represents the spatial relationships between all pairs of features. For N features, the SWM will have N rows and N columns (a square matrix) in which the rows represent the first feature of the pair, the columns represent the second feature of the pair, and the corresponding value in the matrix represents the weight (or relationship) between the pair. For example, when using a polygon contiguity neighborhood, any two polygons that are connected will have the value 1 in the corresponding cell and have the value 0 if they are not connected.

Any square, symmetric matrix can be decomposed into N independent (uncorrelated) components based on eigenvectors and eigenvalues, and each component represents an independent factor of the original matrix (similar to how principal component analysis refactors variables into uncorrelated components). These components contain all of the information of the original matrix but are refactored and separated so that they can be individually investigated, often revealing core structures hidden within the original matrix. When the matrix is a SWM, these eigenvectors are called Moran eigenvectors (also called spatial components) and represent core spatial patterns of the features and the SWM.

Each Moran eigenvector assigns a numeric value to each feature, and because they are usually mapped and symbolized to visualize the spatial patterns, they are often called Moran eigenvector maps (MEMs). The first several MEMs (those with the largest eigenvalues and strongest patterns) usually correspond to broad, global spatial patterns, such as north-south or east-west trend, and later MEMs (those with smaller eigenvalues and weaker patterns) usually represent more localized spatial patterns. For example, the following image shows various MEMs for a hexagonal tessellation using polygon contiguity to define the SWM. The top row displays the first four MEMs that represent broader spatial patterns, and the bottom row displays four later MEMs whose patterns are more localized.

Eight MEMs for the same features and SWM

It is important to note that the creation of MEMs only uses the SWM and the locations of the features but does not use any field or variable of the features, so the spatial patterns may not correspond to any variable present at the locations. Instead, they represent potential spatial patterns that can be combined to represent various spatial patterns of spatial variables. For example, if a field of the features has a broad west-to-east trend but also contains small clusters of low and high values, the spatial pattern of the variable could be represented by combining two MEMs: one representing the west-to-east trend and the other representing the clusters. More complicated spatial variables may require many different MEMs to adequately represent their spatial pattern.

MEMs are also closely related to the Moran's I statistic that measures the degree of spatial clustering (autocorrelation) of a spatial variable. The first MEM is the set of values of the features that results in the largest possible Moran's I value (the highest possible spatial autocorrelation). The second MEM is the set of values that results in largest possible Moran's I value, given that the values must be uncorrelated with the values of the first MEM. The third MEM is the set of values that results in the largest possible Moran's I value, given that it must be uncorrelated with each of the first two MEMs, and so on. For N features, up to N MEMs can be created, though typically fewer than 25 percent of the MEMs represent useful spatial patterns.

The Moran's I value of the first MEM represents the largest possible Moran's I value for any field of the features. In other words, if even a single value changed at a single feature, the Moran's I value would decrease and the variable would be less clustered. This allows you to contextualize the Moran's I values of your actual data. A common misconception is that the largest possible Moran's I value is equal to 1 for any dataset and any SWM, but frequently the largest possible Moran's I value is significantly less than 1 (often as low as 0.6) depending on the features and the SWM. It is also possible for the largest Moran's I to be greater than 1, but this is not common. For example, if a field of your data has a Moran's I value equal to 0.65, that may not seem very high if you assume the largest possible value is equal to 1, but if the first MEM has a Moran's I value equal to 0.7, it means that the field has nearly the highest possible spatial autocorrelation for your SWM. This also helps you choose an appropriate SWM for your analysis because some SWMs will have substantially larger possible Moran's I values than others.

Uses of MEMs in spatial analysis

MEMs have a wide variety of uses in spatial analysis, and the tools in the Spatial Component Utilities (Moran Eigenvectors) toolset create and use MEMs in various ways:

  • Decompose Spatial Structure (Moran Eigenvectors)—Creates the set of MEMs that have the highest Moran's I value for the input feature class and SWM. The input is a feature class, and the SWM is defined through neighborhood parameters. You can also control how many MEMs will be created by specifying a relative Moran's I threshold value and a maximum number of MEMs. The output is a feature class with the same features as the input with the MEMs included as fields. The MEMs created by the tool can be mapped to visualize the various spatial patterns of the SWM and to assess the maximum Moran's I value of any field of the features for the SWM.

  • Compare Neighborhood Conceptualizations—Suggests a neighborhood and weighting scheme that most accurately represents the spatial patterns of one or more fields of a feature class. The input is a feature class and one or more fields, and the output is a SWM file that can be used in other tools in the Spatial Statistics toolbox that allow using custom SWM files to define neighbors and weights, such as the Bivariate Spatial Association (Lee's L), Hot Spot Analysis (Geti-Ord Gi*), and Cluster and Outlier Analysis (Anselin Local Moran's I) tools. The tool determines the suggested SWM by determining which SWM creates MEMs that most closely resemble the spatial patterns of the input fields.

  • Create Spatial Component Explanatory Variables—Creates and selects a set of MEMs that best represent or explain the spatial patterns of multiple fields of an input feature class. This is useful when you want to create a model (such as an ordinary least-squares regression model) and want to account for the spatial patterns of the variables. You can provide the feature class and all variables (explanatory and dependent) in the tool, and the tool will create MEMs that are useful for representing the spatial patterns of the input fields. Including these MEMs as explanatory variables (in addition to the original explanatory variables) in the prediction model will generally improve the model, providing better estimates of the coefficients of the original explanatory variables and improving the accuracy of the predictions by accounting for the spatial patterns of the variables.

  • Filter Spatial Autocorrelation From Field—Creates and selects a set of MEMs that best remove the autocorrelation from an input field and produce a spatially filtered version of the input field. The input field will be separated into spatial components (the MEMs) and a nonspatial component (the spatially filtered version of the input field). The filtered field maintains the core statistical properties of the field while factoring out spatial effects, such as trends and clusters. The filtered field can then be used in correlation workflows or other analyses in which the effect of space is undesired and adds noise to the underlying signal of the field. For example, you can estimate the correlation between pollution and asthma rates while factoring out the spatial effects associated with both variables to isolate the direct correlation or relationship between the two variables. When the input field is a residual field from a prediction model, the selected MEMs can be used as explanatory variables of the prediction model (in addition to the original explanatory variables) to remove spatial autocorrelation from the residual term of the model. This is useful because an assumption of many prediction models is that the residuals are not spatially autocorrelated.

Additional information

MEMs will only be created or selected if they have positive spatial autocorrelation, meaning that the patterns represent spatial clusters rather than dispersed patterns.

The number of MEMs created will be equal to 25 percent of the number of input features, up to a maximum of 100. The Create Spatial Component Explanatory Variables and Filter Spatial Autocorrelation From Field tools will select from these MEMs in order to most effectively create explanatory variables or filter spatial autocorrelation, respectively.

With the exception of the Decompose Spatial Structure (Moran Eigenvectors) tool (which uses a single specified neighborhood and weighting scheme), the tools will test 28 different SWMs and use the one that creates MEMs that are most effective for the purpose of the tool. The following SWMs are tested:

  • Five distance bands, each with unweighted, Gaussian, and bisquare kernels (15 total). The shortest distance band is the distance that results in at least one neighbor for each feature. The longest distance band is 20 percent of the diagonal extent of the input features. The other three distance bands are created by evenly incrementing between the shortest and longest distance bands. For polygon features, the distance between centroids is used to determine distances and neighbors.
  • Four different numbers of neighbors (8, 16, 32, and 64), each with unweighted, Gaussian, and bisquare kernels (12 total). The bandwidths will be adaptive and equal to the distance to the (K+1)th neighbor, for K neighbors. If there are fewer than K input features, larger numbers of neighbors will be skipped. For example, if there are 50 input features, the three SWMs using 64 nearest neighbors will be skipped. For polygon features, the distance between centroids is used to determine distances and neighbors.
  • For point features, the final SWM is a Delaunay triangulation neighborhood. For polygon features, the final SWM is a contiguity (edges and corners) neighborhood.

See How Neighborhood Summary Statistics works and Modeling spatial relationships for more information about each neighborhood and kernel weighting. Alternatively, you can provide a custom .swm file in the Input Spatial Weights Matrix File parameter. If provided, the .swm file will be used to create and select MEMs, and the 28 SWMs above will not be tested.

Before calculating the MEMs, each SWM is adjusted such that the sum of every row and column is equal to 0 (called double-centering). When the SWM is not symmetric, such as when using a number of neighbors neighborhood, the SWM is added to its transpose to make it symmetric before double-centering.

References

The following resources were used to implement the tools:

  • Bauman, David, Thomas Drouet, Stéphane Dray, and Jason Vleminckx. 2018. "Disentangling good from bad practices in the selection of spatial or phylogenetic eigenvectors." Ecography 41.10: 1638-1649. https://doi.org/10.1111/ecog.03380.

  • Bauman, David, Thomas Drouet, Marie-Josée Fortin, and Stéphane Dray. 2018. "Optimizing the choice of a spatial weighting matrix in eigenvector-based methods." Ecology 99, no. 10: 2159-2166. https://doi.org/10.1002/ecy.2469.

  • Blanchet, F. Guillaume, Pierre Legendre, and Daniel Borcard. 2008. "Forward selection of explanatory variables." Ecology 89, no. 9: 2623-2632. https://doi.org/10.1890/07-0986.1.

  • Dray, Stéphane, David Bauman, Guillaume Blanchet, Daniel Borcard, Sylvie Clappe, Guillaume Guenard, Thibaut Jombart, Guillaume Larocque, Pierre Legendre, Naima Madi, and Helene H. Wagner. 2022. "adespatial: Multivariate Multiscale Spatial Analysis." R package version 0.3-16. https://CRAN.R-project.org/package=adespatial.

  • Griffith, Daniel A. 2003. "Spatial Autocorrelation and Spatial Filtering." Advances in Spatial Science. Springer. ISBN 978-3-540-24806-4. https://doi.org/10.1007/978-3-540-24806-4.

  • Griffith, Daniel A., and Pedro R. Peres-Neto. 2006. "Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses." Ecology 87, no. 10: 2603-2613. https://doi.org/10.1890/0012-9658(2006)87[2603:SMIETF]2.0.CO;2.

Related topics