# How Multiscale Geographically Weighted Regression (MGWR) works

The Multiscale Geographically Weighted Regression (MGWR) tool uses an advanced spatial regression technique that is used in geography, urban planning, and various other disciplines. It evolved from the Geographically Weighted Regression (GWR) models that use explanatory and dependent variables within the neighborhood of a target feature to construct a local linear regression model for interpretation or prediction. In GWR models, it is assumed that the neighboring scale of each explanatory variable is identical; in MGWR, it is not. MGWR allows the scale of analysis to vary between explanatory variables. MGWR excels with large datasets that contain several hundred features and datasets in which the dependent variable exhibits spatial heterogeneity. To model spatially varying relationships in smaller datasets, other tools may be more appropriate. The current Multiscale Geographically Weighted Regression (MGWR) tool only accepts continuous dependent variables. Do not run the model with binary or count data. This may lead to a biased model and meaningless results.

Much of this topic will explain MGWR using comparisons to other regression methods. It will be helpful to have a basic understanding of Ordinary Least Squares (OLS) regression and a familiarity with the neighborhoods, weighting schemes, and diagnostics of GWR before continuing.

## Regression model selection

OLS, GWR, and MGWR are all linear regression models, but they operate at different spatial scales and make different assumptions about the spatial heterogeneity (the consistency of the relationships across the study area) of a dataset. OLS is a global model. It is assumed that the data-generating process is stationary over space, so a single coefficient can account for the relationship between each explanatory variable and the dependent variable everywhere. GWR is a local model that relaxes the assumption of spatial stationarity by allowing the coefficients to vary over space. However, in GWR, it is assumed that all the local relationships operate at the same spatial scale by requiring that all explanatory variables use the same neighborhood. If one explanatory variable uses 20 neighbors for its calculations, all explanatory variables must also use 20 neighbors.

MGWR, however, not only allows the coefficients to vary over space but also allows the scale to vary across different explanatory variables. MGWR does this by using separate neighborhoods for each explanatory variable to account for different spatial scales of the relationships between each explanatory variable and the dependent variable. This allows the combining of explanatory variables that operate on relatively large spatial scales, such as temperature or atmospheric pressure, with variables that operate on smaller spatial scales, such as population density or median income.

MGWR estimates more accurate local coefficients and experiences fewer issues with multicollinearity than GWR. However, the processing time is much longer for MGWR than GWR, and it increases as the size of the data increases, particularly for datasets larger than 10,000 points.

When deciding which model to apply to your data, consider these questions:

• Should my model run at the local or global level?
• Do the explanatory variables in my model operate at different spatial scales?
• If you suspect that the explanatory variables may operate at different scales and you want to identify and model those different scales, apply MGWR.
• How large is my dataset? How long can I afford to wait for results?
• If your dataset is very large and you run the MGWR tool, you should expect to wait longer for the tool to execute. Using common hardware of the early 2020s (16 logical processors and 32 GB memory) and typical parameters, for datasets larger than approximately 10,000 points, the runtime will likely be several hours. For 50,000, the runtime will likely be several days. For 100,000 or more, memory errors are likely to occur.

If you remain unsure about which local model, GWR or MGWR, to apply to your data, begin with MGWR. When MGWR runs, it also performs GWR under specific settings. In the geoprocessing messages, you can find the GWR diagnostics and compare them to the diagnostics from MGWR. Alternatively, you can run multiple tools (OLS, GWR, and MGWR) and use the AICc listed in the geoprocessing messages to compare the models and choose the best one. If you choose to run multiple tools, either scale all the models or leave all the models unscaled to ensure the outputs are comparable.

## Potential applications

MGWR can apply to many multivariate analyses and questions, such as the following:

• How do various features, such as the number of rooms, year built, lot area, and so on, influence the price of a house? Do the relationships significantly differ in different communities?
• How is the distribution of PM2.5 associated with economic variables, such as regional household income, number of cars per household, or percentage of gross domestic product contributed by agriculture?
• In precision agriculture, do soil conditions affect crop yield at the same spatial scale as atmospheric variables such as temperature, humidity, and precipitation?

## Performance and benchmark considerations

Multiple factors affect the runtime of MGWR. The most important factor for the runtime is the number of features. The runtime grows cubically with the number of features. The neighborhood size and number of explanatory variables also affect the runtime of MGWR by requiring more calculations for each local model. To compute results as quickly as possible, MGWR employs parallel processing and uses half of the cores (logical processors) available on your machine by default. For better performance, you can increase the number of cores of the Parallel Processing Factor environment.

## Tool inputs

There are several methods to provide the spatial scale of the explanatory variables.

### Neighborhood (bandwidth) selection

A key enhancement of MGWR is the ability to vary the bandwidth (neighborhood) of each explanatory variable in the linear regression equation. The neighborhood of an explanatory variable at a target location includes all the locations that will contribute to the estimate of the explanatory variable's coefficient in the local linear regression model. Each neighborhood is defined by a shape and an extent.

There are three options for the Neighborhood Selection Method parameter that will be used to estimate the optimal spatial scale separately for each of the explanatory variables:

• Golden Search—Determines the number of neighbors or distance band for each explanatory variable using the Golden Search algorithm. This method tests multiple combinations of values for each explanatory variable between a specified minimum and maximum value. The procedure is iterative and uses the results from previous values to select each new combination to be tested. The final values selected will have the smallest AICc. For the number of neighbors option, the minimum and maximum are specified using the Minimum Number of Neighbors and Maximum Number of Neighbors parameters. For the distance band option, the minimum and maximum are specified using the Minimum Search Distance and Maximum Search Distance parameters. The minimum and maximum values are shared for all explanatory variables, but the estimated number of neighbors or distance band will be different for each explanatory variable (unless two or more have the same spatial scale). This option takes the longest to calculate, especially for large or highly-dimensional datasets.
• Manual Intervals—Determines the number of neighbors or distance band for each explanatory by incrementing the number of neighbors or distance band from a minimum value. For the number of neighbors option, the method starts with the value of the Minimum Number of Neighbors parameter. The number of neighbors is then increased by the value of the Number of Neighbors Increment parameter. This increment is repeated a certain number of times, specified using the Number of Increments parameter. For the distance band option, the method uses the Minimum Search Distance, Search Distance Increment, and Number of Increments parameters. The number of neighbors or distance band used by each explanatory variable will be one of the tested values, but the values may be different for each explanatory variable. This option is faster than Golden Search and frequently estimates comparable neighborhoods.
• User Defined—The number of neighbors or distance band that is used by all explanatory variables. The value is specified using the Number of Neighbors or Distance Band parameter. This option provides the most control if you know optimal values.

By default, the dependent parameters of each neighborhood selection method apply to all explanatory variables. However, customized neighborhood selection parameters can be provided only for particular explanatory variables using the corresponding override parameter for the neighborhood type and selection method: Number of Neighbors for Golden Search, Number of Neighbors for Manual Intervals, User Defined Number of Neighbors, Search Distance for Golden Search, Search Distance for Manual Intervals, or User Defined Search Distance. To use customized neighborhoods for particular explanatory variables, provide the explanatory variables in the first column of the corresponding override parameter, and provide the customized options of the neighborhood in the other columns. The columns have the same names as the parameters they override; for example, if you are using manual intervals with distance band, the Search Distance Increment column specifies customized values of the Search Distance Increment parameter. On the tool dialog box, customized neighborhood parameters are in the Customized Neighborhood Options parameter category pull-down menu.

### Local weighting scheme

MGWR applies a geographical weighting (kernel) function to the neighbors of each local model so that neighbors closer to the target feature have a larger impact on the results of the local model. The Multiscale Geographically Weighted Regression tool provides two kernel options in the Local Weighting Scheme parameter: Gaussian and Bisquare. To learn more about geographic weighting with kernels, see How Geographically Weighted Regression works. In MGWR, the weighting bandwidth varies across explanatory variables.

### Scaled data and coefficients

By default, all explanatory variables and the dependent variable are scaled to have a mean zero and a standard deviation one (also called Z-score standardization). The estimated coefficients of scaled data values are interpreted in standard deviations; for example, a coefficient of 1.2 means that a one-standard deviation increase in the explanatory variable is correlated with a 1.2-standard-deviation increase of the dependent variable. Because all coefficients use a shared unit, the values can be directly compared to see which explanatory variables have the largest impact on the model. It is generally recommended to scale the variables, but scaling is especially important when the range of values of the variables varies significantly. You can choose to not scale the data by unchecking the Scale Data parameter.

In most linear regression models such as OLS and GWR, coefficients are invariant to linear scaling. This means that if you scale the input data, fit the regression model, and then unscale the result back to the original units, the result will be the same as if you did not scale the data at all. In MGWR, however scaling and then unscaling will not result in the same model that you would receive from the original data. This is because backfitting is an iterative procedure in which the results of each step depend on the results of previous steps. Using different starting scales will affect the path of tested values and result in distinct MGWR models. The scaled results will typically be most accurate because scaling equalizes the variances of the variables, and the iterative procedure usually converges faster and to more accurate values when each variable contributes equal amounts to the total variance of the data. If the explanatory variables have different variances, the variables with larger variances have more influence on each step of the iterative estimation. In most cases, this influence will negatively affect the final bandwidths and coefficients for the model.

For ease of interpretation of the scaled results, all coefficients of the tool outputs will contain a scaled value and the value unscaled to original data units. These outputs include extra fields on the output features (also added as layers to the output group layer) and extra rasters in the directory of the Output Coefficient Raster Workspace parameter. When predicting to new locations using the Prediction Locations and Output Predicted Features parameters, all predicted values are unscaled to original data units. See Tool outputs for more information about the outputs.

## Tool outputs

The tool produces a variety of different outputs, including a group layer for various fields of the output features, messages, and charts. The optional outputs include a feature class predicting values at new locations, a neighborhood table, and raster surfaces of each coefficient.

### Group layers and symbology

The default output symbology layer visualizes the standardized residuals of the local linear regression models with a classified color scheme. Examine the patterns of the residuals to determine whether the model is well specified. Residuals for well-specified regression models will be normally distributed and spatially random with no clustering of values. You can run the Spatial Autocorrelation (Global Moran's I) tool on the regression residuals to test whether they are spatially random. Statistically significant high and low clustering of residuals indicates that the MGWR model is not optimal.

The results for all coefficients of each explanatory variable are visualized in separate layers in a group layer. Each feature layer presents a divergent color scheme centered at zero. This allows you to use the color to identify which variables have positive and negative relationships with the dependent variable. The significance of the coefficients of each explanatory variable is also visualized in a feature layer. For points, green halos indicate statistically significant relationships with 95 percent confidence, and gray halos indicate nonsignificant relationships. For polygons, significant relationships are indicated with texture meshes in the polygons. Examine the coefficient layers and the significance layers to better understand the spatial variation in the explanatory variables. You can use your insight from this spatial variation to inform policy. Global policies may work well when variables are globally statistically significant and show little regional variation, but local policies may work better when variables are not globally significant but instead exhibit a positive relationship in some locations and a negative relationship in others.

### Messages and diagnostics

The messages provide information about the MGWR model and its performance. The messages have several sections.

#### Summary Statistics for Coefficient Estimates

The Summary Statistics for Coefficient Estimates section summarizes the mean, standard deviation, minimum, median, and maximum of the coefficient estimates across the study area. The mean value of each coefficient reflects the association between that explanatory variable and the dependent variable. The standard deviation indicates the spatial variation of each explanatory variable. A small standard deviation implies a good fit by OLS. If the Scale Data parameter is checked, you can compare the values across explanatory variables. If the Scale Data parameter is not checked, the value of the coefficients between explanatory variables cannot be compared directly because the units may vary.

#### Model Diagnostics

The Model Diagnostics section includes a table displaying several model diagnostics for GWR and MGWR, including R2, Adjusted R2, AICc, residual variance, and number of effective degrees of freedom. For more details on these model diagnostics, see How Geographically Weighted Regression works.

##### Note:

In some cases, the GWR model for comparison may fail to calculate. In this case, only the diagnostics for MGWR are displayed.

You can use the R2 and Adjusted R2 diagnostics to evaluate the goodness of fit of the model to the data. The higher the R2 and Adjusted R2, the better the model fits the data. Evaluate the complexity of the model by the number of explanatory variables and the Effective Degree of Freedom diagnostic. Simpler models have a higher Effective Degree of Freedom and fewer parameters. If a model has too many parameters, it runs the risk of overfitting the data. The AICc diagnostic accounts for both goodness of fit and the complexity of the model. The Multiscale Geographically Weighted Regression tool selects the model with the lowest AICc.

#### Summary of Explanatory Variables and Neighborhoods

The Summary of Explanatory Variables and Neighborhoods section displays the estimated neighborhood and significance levels of each explanatory variable. For neighborhoods based on number of neighbors, the optimal number of neighbors is displayed as a count and as a percentage of the total number of input features. For distance band neighborhoods, the optimal distance bands are displayed along with the distance as a percentage of the diagonal extent of the input features. The percentages of feature or extent are useful for characterizing the spatial scale of the explanatory variables; for example, if an explanatory variable uses 75 percent of the features as neighbors, the local regression models are closer to global models than local models. If another explanatory variable uses only 5 percent of the input features as neighbors, it is a more local model. For all neighborhood types, the count and percentage of local models that were statistically significant at a 95 percent confidence level are displayed for each explanatory variable.

#### Optimal Bandwidths Search History

The Optimal Bandwidths and Search History section displays the search history of potential optimal bandwidths along with the AICc value for each set of tested values. The tool begins to search for the optimal bandwidth of each explanatory variable by assigning each variable the same value: the optimal bandwidth of GWR. The tool then adjusts the bandwidth of each variable at each iteration and estimates a new AICc value. As the iterations proceed, the AICc value decreases until it stabilizes or increases, which ends the iterations. The User Defined option typically requires the fewest iterations, while the Golden Search option typically requires the most.

#### Bandwidth Statistics Summary

The Bandwidth Statistics Summary section summarizes the values used to test whether each explanatory variable is statistically significant in each local model. These statistics include the optimal neighborhood (number of neighbors or distance band) of MGWR, the effective number of parameters, the adjusted significance level (alpha), and the adjusted critical value of pseudo-t-statistics. These values are used to create the fields related to statistical significance for each explanatory variable in the output features. The adjusted value of alpha is calculated by dividing the significance level (0.05) by the effective number of parameters; this controls the family-wise error rate (FWER) of the significance of the explanatory variables. The adjusted alpha is used as the significance level in a two-sided t-test with the effective number of degrees of freedom.

### Output features

The Multiscale Geographically Weighted Regression tool outputs a feature class that includes local diagnostics for each feature. These diagnostics include regression residuals, standardized residuals, predicted values of the dependent variable, intercept, explanatory variable coefficients, coefficient standard errors, coefficient pseudo-t statistics, coefficient significance, influence, Cook's D, Local R2, and condition number. For more details about these diagnostics, see How Geographically Weighted Regression works.

### Charts

The following charts are added to the Contents pane:

• Relationship between Variables—A scatterplot matrix with up to 19 variables showing scatterplots and correlations between each of the explanatory variables. Strong correlations between any pair indicate multicollinearity.
• Distribution of Standardized Residual—A histogram of the standardized residuals. The standardized residuals should be normally distributed with mean zero and standard deviation one.
• Standardized Residuals VS. Predicted—A scatterplot between the standardized residuals and their corresponding predicted values. The plot should be random and show no patterns or trends.

### Optional outputs

The following optional outputs can be specified in thePredictions Options and Additional Options drop-down.

• The Output Predicted Features parameter value is a feature class with predictions for the dependent variable at the locations specified by the Prediction Locations parameter.
• The Output Neighborhood Table parameter value saves a table containing the values of the Summary Statistics for Coefficients Estimates and Summary of Explanatory Variables and Neighborhoods sections of the messages.
• The Coefficient Raster Workspace parameter specifies a workspace (directory or geodatabase) where rasters of the coefficients are saved. These coefficient raster surfaces can help explain the spatial variation in the coefficients.

## Multicollinearity

Multicollinearity occurs when two or more explanatory variables are strongly correlated in a regression model. This can occur in OLS, GLR, GWR, and MGWR models. Multicollinearity may negatively impact the estimation of coefficients and optimal neighborhoods because if the explanatory variables are correlated, they share mutual information, and the regression model cannot differentiate between the effects of the variables. In moderate cases, the estimated coefficient estimates may be biased and have high uncertainty. In extreme cases, the model may fail to calculate. The following example shows a scatterplot matrix of three variables that are all highly correlated with each other, and a regression model using them as explanatory variables will likely encounter issues with multicollinearity.

### Identification and prevention of multicollinearity in MGWR

In an MGWR model, multicollinearity can occur in various situations:

• One of the explanatory variables is spatially clustered.

To prevent this, map each explanatory variable and identify the variables that have very few possible values or where identical variables are spatially clustered. If you observe these types of variables, consider removing them from the model or representing them in a way that increases the range of values. A variable number of bedrooms, for example, may be better represented as bedrooms per square foot.

• Two or more explanatory variables are highly correlated globally.

Run a global model using Generalized Linear Regression and examine the Variance Inflation Factor (VIF) for each explanatory variable. If the VIF values are large, for example, 7.5 or higher, global multicollinearity may prevent MGWR from running. In this case, the variables are redundant, so consider removing one of these variables from the model or combining them with other explanatory variables to increase the variation in the values.

• The defined neighborhood is too small.

Even if the two previous scenarios do not occur at the global scale, they may occur in a local model. To test this, check the local condition number in the output feature class. A high local condition number indicates that the results are unstable due to local multicollinearity. If this is the case, rerun the model using a larger number of neighbors or distance band. As a general rule, be skeptical of results in which the features have a condition number greater than 30 or are null. For shapefiles, null values are represented with the value -1.7976931348623158e+308. The condition number is scale-adjusted to correct for the number of explanatory variables in the model, and this allows you to directly compare the condition number between models that use a different number of explanatory variables.

Checking all these conditions may help with multicollinearity issues but may not always solve them.

## Coefficient and bandwidth estimation

The coefficients and bandwidths of the explanatory variables are estimated through a process called backfitting (Breiman et al. 1985). Originally developed for estimating the parameters of generalized additive models, the procedure moves through the explanatory variables one by one and uses a smoothing function to calibrate the coefficient while keeping all other explanatory variables constant. This process repeats over the explanatory variables until the values of the coefficients stabilize and do not change after a successive iteration.

When applied to MGWR (Fotheringham et al. 2017), the smoothing function is a univariate GWR model that regresses the previous residual-adjusted prediction against the single explanatory variable (treating all other explanatory variables as constants). This GWR model uses the same neighborhood selection method (Golden Search, manual intervals, or user-defined) to estimate the spatial scale of the explanatory variable. See the Additional resources section for a complete description of the process.

The backfitting algorithm must begin with initialized values of the coefficients. These initial values are estimated by a GWR model of all the explanatory variables. If this model fails due to multicollinearity, OLS is used instead. If the process does not converge after 25 iterations, the coefficient values of the final iteration are used.