The Multiscale Geographically Weighted Regression (MGWR) tool performs an advanced spatial regression technique that is used in geography, urban planning, and various other disciplines. It evolved from the Geographically Weighted Regression (GWR) model that uses explanatory and dependent variables within the neighborhood of a target feature to construct a local linear regression model for interpretation or prediction.
The main motivation of GWR is that it may be too restrictive to use a single regression model for a large geographical region. Instead, GWR allows a different regression model at each spatial location, with the regression coefficients changing smoothly over the region. This means that at different locations in the study area, the explanatory variables have different impacts on the dependent variable. GWR does this by creating a weighted regression model for each spatial feature using the explanatory and dependent variables of the feature and its spatial neighbors. Neighbors that are closer to the feature receive higher weights and have larger influence on the local regression model.
MGWR is an extension of GWR that allows the neighborhood around each spatial feature to vary between each explanatory variable. This means that for some explanatory variables, the neighborhood can be larger or smaller than for other variables. Allowing different neighborhoods for different explanatory variables is important because the relationships between the explanatory variables and dependent variable may operate on different spatial scales: coefficients of some variables may change gradually over the study area, while coefficients of other variables change quickly. Matching the neighborhood of each explanatory variable to the spatial scale of the explanatory variable is what allows MGWR to more accurately estimate the coefficients of the local regression model.
MGWR excels with large datasets that contain at least several hundred features and datasets in which the dependent variable exhibits spatial heterogeneity. To model spatially varying relationships in smaller datasets, other tools may be more appropriate. The current Multiscale Geographically Weighted Regression (MGWR) tool only accepts continuous dependent variables. Do not run the model with binary or count data. This may lead to a biased model and meaningless results.
Much of this topic will explain MGWR using comparisons to other regression methods. It will be helpful to have a basic understanding of Ordinary Least Squares (OLS) regression and a familiarity with the neighborhoods, weighting schemes, and diagnostics of GWR before continuing.
Learn more about OLS regression
Regression model selection
OLS, GWR, and MGWR are all linear regression models, but they operate at different spatial scales and make different assumptions about the spatial heterogeneity (the consistency of the relationships across the study area) of a dataset. OLS is a global model. It is assumed that the data-generating process is stationary over space, so a single coefficient can account for the relationship between each explanatory variable and the dependent variable everywhere. GWR is a local model that relaxes the assumption of spatial stationarity by allowing the coefficients to vary over space. However, in GWR, it is assumed that all the local relationships operate at the same spatial scale by requiring that all explanatory variables use the same neighborhood. For example, if one explanatory variable uses 20 neighbors, all explanatory variables must also use 20 neighbors.
MGWR, however, not only allows the coefficients to vary over space but also allows the scale to vary across different explanatory variables. MGWR does this by using separate neighborhoods for each explanatory variable to account for different spatial scales of the relationships between each explanatory variable and the dependent variable. This allows the combining of explanatory variables that operate on relatively large spatial scales, such as temperature or atmospheric pressure, with variables that operate on smaller spatial scales, such as population density or median income.
MGWR estimates more accurate local coefficients and experiences fewer issues with multicollinearity than GWR. However, the processing time is much longer for MGWR than GWR, especially for the Golden Search, Manual Intervals, or User Defined options of the Neighborhood Selection Method parameter. These three neighborhood selection methods are based on the backfitting algorithm, which is computation and memory intensive. The run time and memory usage increases significantly as the size of the data increases.
When deciding which model to apply to your data, consider these questions:
- Should my model run at the local or global level?
- If you want a local model, apply GWR or MGWR. Otherwise, use OLS or another model such as the Forest-based Classification and Regression tool.
- Do the explanatory variables in my model operate at different spatial scales?
- If you suspect that the explanatory variables may operate at different scales and you want to identify and model those different scales, apply MGWR.
- How large is my dataset? How long can I afford to wait for results?
- If your dataset is very large and you run the MGWR tool, you should expect to wait longer for the tool to execute. Using common hardware of the early 2020s (16 logical processors and 32 GB memory) and typical parameters, for datasets larger than approximately 10,000 points, the run time will likely be several hours. For 50,000, the run time will likely be several days. For 100,000 or more, memory errors are likely to occur.
If you remain unsure about which local model, GWR or MGWR, to apply to your data, begin with MGWR. When MGWR runs, it also performs GWR under specific settings. In the geoprocessing messages, you can find the GWR diagnostics and compare them to the diagnostics from MGWR. Alternatively, you can run multiple tools (OLS, GWR, and MGWR) and use the AICc listed in the geoprocessing messages to compare the models and choose the best one. If you choose to run multiple tools, either scale all the models or leave all the models unscaled to ensure the outputs are comparable.
Potential applications
MGWR can apply to many multivariate analyses and questions, such as the following:
- How do various features, such as the number of rooms, year built, lot area, and so on, influence the price of a house? Do the relationships significantly differ in different communities?
- How is the distribution of PM2.5 associated with economic variables, such as regional household income, number of cars per household, or percentage of gross domestic product contributed by agriculture?
- In precision agriculture, do soil conditions affect crop yield at the same spatial scale as atmospheric variables such as temperature, humidity, and precipitation?
Performance and benchmark considerations
Multiple factors affect the run time of MGWR. The most important factor for the run time is the number of features. The run time grows cubically with the number of features. The neighborhood size and number of explanatory variables also affect the run time of MGWR by requiring more calculations for each local model. To compute results as quickly as possible, MGWR employs parallel processing on your machine. Some computations will use all available cores, but others can be controlled by the Parallel Processing Factor environment.
Tool inputs
There are several methods to provide the spatial scale of the explanatory variables.
Neighborhood (bandwidth) selection
A key enhancement of MGWR is the ability to vary the bandwidth (neighborhood) of each explanatory variable in the linear regression equation. The neighborhood of an explanatory variable at a target location includes all the locations that will contribute to the estimate of the explanatory variable's coefficient in the local linear regression model. Each neighborhood is defined by a number of neighbors around the target feature or by all neighbors within a fixed distance. The number of neighbors or distance can be different for each explanatory variable.
There are four options for the Neighborhood Selection Method parameter that can be used to estimate the optimal spatial scale for each of the explanatory variables:
Golden Search—Determines either the number of neighbors or distance band for each explanatory variable using the Golden Search algorithm. This method searches multiple combinations of values for each explanatory variable between a specified minimum and maximum value. The procedure is iterative and uses the results from previous values to select each new combination to be tested. The final values selected will have the smallest AICc. For the number of neighbors option, the minimum and maximum are specified using the Minimum Number of Neighbors and Maximum Number of Neighbors parameters. For the distance band option, the minimum and maximum are specified using the Minimum Search Distance and Maximum Search Distance parameters. The minimum and maximum values are shared for all explanatory variables, but the estimated number of neighbors or distance band will be different for each explanatory variable (unless two or more coincidentally have the same spatial scale). This option takes the longest time to calculate, especially for large or high-dimensional datasets.
Gradient Search—Determines the number of neighbors or distance band for each explanatory variable using a gradient-based optimization algorithm. To find the optimal bandwidth of each explanatory variable, Gradient Search takes the derivative of the AICc with respect to the bandwidths and updates the bandwidths until it finds the lowest AICc. For the number of neighbors option, the minimum and maximum are specified using the Minimum Number of Neighbors and Maximum Number of Neighbors parameters. For the Distance Band option, the minimum and maximum are specified using the Minimum Search Distance and Maximum Search Distance parameters. Like Golden Search, the minimum and maximum values are shared for all explanatory variables, but the estimated number of neighbors or distance band may be different for each explanatory variable (unless two or more coincidentally have the same spatial scale). This option estimates comparable neighborhoods to Golden Search but has a better runtime performance and requires significantly less memory usage.
Manual Intervals—Determines the number of neighbors or distance band for each explanatory variable by incrementing the number of neighbors or distance band from a minimum value. For the number of neighbors option, the method starts with the value of the Minimum Number of Neighbors parameter. The number of neighbors is then increased by the value of the Number of Neighbors Increment parameter. This increment is repeated a certain number of times, specified using the Number of Increments parameter. For the distance band option, the method uses the Minimum Search Distance, Search Distance Increment, and Number of Increments parameters. The number of neighbors or distance band used by each explanatory variable will be one of the tested values, but the values may be different for each explanatory variable. This option is faster than Golden Search and frequently estimates comparable neighborhoods.
User Defined—The number of neighbors or distance band that is used by all explanatory variables. The value is specified using the Number of Neighbors or Distance Band parameter. This option provides the most control if you know the optimal values.
By default, the dependent neighborhood parameters of each neighborhood selection method apply to all explanatory variables. However, customized neighborhood selection parameters can be provided for particular explanatory variables using the corresponding override parameter for the neighborhood type and selection method: Number of Neighbors for Golden Search, Number of Neighbors for Gradient Search, Number of Neighbors for Manual Intervals, User Defined Number of Neighbors, Search Distance for Golden Search, Search Distance for Gradient Search, Search Distance for Manual Intervals, or User Defined Search Distance. To use customized neighborhoods for particular explanatory variables, provide the explanatory variables in the first column of the corresponding override parameter, and provide the customized options of the neighborhood in the other columns. The columns have the same names as the parameters they override; for example, if you are using manual intervals with distance band, the Search Distance Increment column specifies customized values of the Search Distance Increment parameter. In the Geoprocessing pane, customized neighborhood parameters are in the Customized Neighborhood Options parameter category.
For example, suppose you use three explanatory variables with the Golden Search neighborhood type with 30 minimum neighbors and 40 maximum neighbors. If the tool is run with these parameters, each of the three explanatory variables will use between 30 and 40 neighbors. If you instead want to use between 45 and 55 neighbors for only the second explanatory variable, you can provide the second explanatory variable, the custom minimum, and the custom maximum in the columns of the Number of Neighbors for Golden Search parameter. With these parameters, the first and third explanatory variables will use between 30 and 40 neighbors, and the second explanatory variable will use between 45 and 55 neighbors.
Local weighting scheme
MGWR estimates a local regression model for each target feature by applying a geographical weighting (kernel) function to the feature and its neighboring features. Neighbors that are closer to the target feature have a larger impact on the results of the local model. The kernel options are available in the Local Weighting Scheme parameter: Gaussian and Bisquare. To learn more about geographic weighting with kernels, see How Geographically Weighted Regression works. In MGWR, the weighting bandwidth varies across explanatory variables.
Note:
The gradient search neighborhood type only allows the bisquare kernel. The Gaussian kernel may be allowed in future versions.
Scaled data and coefficients
By default, all explanatory variables and the dependent variable are scaled to have a mean equal to zero and standard deviation equal to one (also called a Z-score standardization). The estimated coefficients of scaled data values are interpreted in standard deviations; for example, a coefficient of 1.2 means that a one-standard-deviation increase in the explanatory variable is correlated with a 1.2-standard-deviation increase of the dependent variable. Because all coefficients use a shared unit, the values can be directly compared to see which explanatory variables have the largest impact on the model. It is generally recommended to scale the variables, and scaling is especially important when the range of values of the variables varies significantly. However, you can choose to not scale the data by unchecking the Scale Data parameter.
In most linear regression models such as OLS and GWR, coefficients are invariant to linear scaling. This means that if you scale the input data, fit the regression model, and then unscale the result back to the original units, the result will be the same as if you did not scale the data at all. In MGWR, however, scaling and then unscaling will not result in the same model that you would receive from the original data. This is because backfitting is an iterative procedure in which the results of each step depend on the results of previous steps. Using different starting scales will affect the path of tested values and result in distinct MGWR models. The scaled results will typically be more accurate because scaling equalizes the variances of the variables, and the iterative procedure usually converges faster and to more accurate values when each variable contributes equal amounts to the total variance of the data. If the explanatory variables have different variances (for example, by having different units), the variables with larger variances have more influence on each step of the iterative estimation. In most cases, this influence will negatively affect the final bandwidths and coefficients for the model.
For ease of interpretation of the scaled results, all coefficients of the tool outputs will contain a scaled value and the value unscaled to original data units. These outputs include extra fields on the output features (also added as layers to the output group layer) and extra rasters in the directory of the Output Coefficient Raster Workspace parameter. When predicting to new locations using the Prediction Locations and Output Predicted Features parameters, all predicted values are unscaled to original data units. See Tool outputs for more information about the outputs.
Tool outputs
The tool produces a variety of outputs, including a group layer for various fields of the output features, messages, and charts. The optional outputs include a feature class that predicts values at new locations, a neighborhood table, and raster surfaces of each coefficient.
Group layers and symbology
The default output symbology layer visualizes the standardized residuals of the local linear regression models with a classified color scheme. Examine the patterns of the residuals to determine whether the model is well specified. Residuals for well-specified regression models will be normally distributed and spatially random with no clustering of values. You can run the Spatial Autocorrelation (Global Moran's I) tool on the regression residuals to test whether they are spatially random. Statistically significant high and low clustering of residuals indicates that the MGWR model is not optimal.
Layers of the coefficient and statistical significance of each explanatory variable are added to the map as a group layer, with separate subgroup layers for each explanatory variable. Each layer of coefficients presents a divergent color scheme centered at zero. This allows you to use the color to identify which variables have positive and negative relationships with the dependent variable. For points, statistically significant features (95 percent confidence) are indicated by green halos around the points, and nonsignificant relationships are indicated by gray halos. For polygons, significant relationships are indicated with texture meshes in the polygons. Examine the coefficient layers and the significance layers to better understand the spatial variation in the explanatory variables. You can use your insight from this spatial variation to inform policy. Global policies may work well when variables are globally statistically significant and show little regional variation, but local policies may work better when there is substantial spatial variation in the regression coefficients. In this case, it may be appropriate to initiate policies in areas where the local effect is positive and large. However, the same policies may not be appropriate in other areas where the effect is small or negative.
Messages and diagnostics
The messages provide information about the MGWR model and its performance. The messages have several sections.
Summary Statistics for Coefficient Estimates
The Summary Statistics for Coefficient Estimates section summarizes the mean, standard deviation, minimum, median, and maximum of the coefficient estimates across the study area. The mean value of each coefficient reflects the association between that explanatory variable and the dependent variable. The standard deviation indicates the spatial variation of each explanatory variable. A small standard deviation implies that a simpler method such as OLS may adequately model the data. If the Scale Data parameter is checked, you can compare the values across explanatory variables. If the Scale Data parameter is not checked, the value of the coefficients between explanatory variables cannot be compared directly because the units may vary.
Model Diagnostics
The Model Diagnostics section includes a table displaying several model diagnostics for GWR and MGWR, including R2, Adjusted R2, AICc, residual variance, and number of effective degrees of freedom. For more details on these model diagnostics, see How Geographically Weighted Regression works.
Note:
In some cases, the GWR model for comparison may fail to calculate. In this case, only the diagnostics for MGWR are displayed.
You can use the R2 and Adjusted R2 diagnostics to evaluate the goodness of fit of the model to the data. The higher the R2 and Adjusted R2, the better the model fits the data. Evaluate the complexity of the model by the number of explanatory variables and the Effective Degree of Freedom diagnostic. Simpler models have a higher Effective Degree of Freedom and fewer parameters. If a model has too many parameters, it runs the risk of overfitting the data. The AICc diagnostic accounts for both goodness of fit and the complexity of the model. The Multiscale Geographically Weighted Regression tool selects the model with the lowest AICc.
Summary of Explanatory Variables and Neighborhoods
The Summary of Explanatory Variables and Neighborhoods section displays the estimated neighborhood and significance levels of each explanatory variable. For neighborhoods based on number of neighbors, the optimal number of neighbors is displayed as a count and as a percentage of the total number of input features. For distance band neighborhoods, the optimal distance bands are displayed along with the distance as a percentage of the diagonal extent of the input features. The percentages of feature or extent are useful for characterizing the spatial scale of the explanatory variables; for example, if an explanatory variable uses 75 percent of the features as neighbors, the local regression models are closer to global models than local models (indicating that a simpler model such as OLS may be adequate). If another explanatory variable uses only 5 percent of the input features as neighbors, it is a more local model. For all neighborhood types, the count and percentage of local models that were statistically significant at a 95 percent confidence level are displayed for each explanatory variable.
Optimal Bandwidths Search History
The Optimal Bandwidths and Search History section displays the search history of potential optimal bandwidths along with the AICc value for each set of tested values. The tool begins to search for the optimal bandwidth of each explanatory variable by assigning each variable the same value: the optimal bandwidth of GWR. The tool then adjusts the bandwidth and coefficient of each variable at each iteration and estimates a new AICc value. As the iterations proceed, the AICc value decreases until it stabilizes or increases, which ends the iterations. The User Defined option typically requires the fewest iterations, while the Golden Search option typically requires the most. Although it uses a large number of iterations, the Gradient Search option typically has the fastest run time because each iteration can be computed quickly.
Note:
For Gradient Search with number of neighbors, the final AICc value displayed in the optimal bandwidth search history section will often be slightly different than the AICc value displayed in the model diagnostics section. This happens because Gradient Search uses a continuous representation of the number of neighbors during bandwidth optimization, which causes small amounts of imprecision in the calculated AICc value of each iteration. When reporting the AICc of the final model, use the value displayed in the model diagnostics section.
Bandwidth Statistics Summary
The Bandwidth Statistics Summary section summarizes the values used to test whether each explanatory variable is statistically significant in each local model. These statistics include the optimal neighborhood (number of neighbors or distance band) of MGWR, the effective number of parameters, the adjusted significance level (alpha), and the adjusted critical value of pseudo-t-statistics. These values are used to create the fields related to statistical significance for each explanatory variable in the output features. The adjusted value of alpha is calculated by dividing the significance level (0.05) by the effective number of parameters; this controls the family-wise error rate (FWER) of the significance of the explanatory variables. The adjusted alpha is used as the significance level in a two-sided t-test with the effective number of degrees of freedom.
Output features
The tool outputs a feature class that includes local diagnostics for each feature. These diagnostics include regression residuals, standardized residuals, predicted values of the dependent variable, intercept, explanatory variable coefficients, coefficient standard errors, coefficient pseudo-t statistics, coefficient significance, influence, Cook's D, Local R2, and condition number. In a map, the output features are added as a layer and symbolized by the standardized residuals. A positive standardized residual means that the dependent variable value is greater than the predicted value (under-predicted), and a negative standardized residual means that the value is less than the predicted value (over-predicted). For more details about these fields and diagnostics, see How Geographically Weighted Regression works.
Charts
The following charts are added to the Contents pane:
- Relationship between Variables—A scatterplot matrix, with one dependent variable and up to nine explanatory variables, that shows the correlation between the dependent variable and each explanatory variable and the correlation between each pair of explanatory variables. Strong correlations between any pair indicate multicollinearity.
- Distribution of Standardized Residual—A histogram of the standardized residuals. The standardized residuals should be normally distributed with mean zero and standard deviation one.
- Standardized Residuals VS. Predicted—A scatterplot between the standardized residuals and their corresponding predicted values. The plot should be random and show no patterns or trends.
Optional outputs
The following optional outputs can be specified in the Predictions Options and Additional Options drop-down menus:
- Output Predicted Features—A feature class with predictions for the dependent variable at the locations specified by the Prediction Locations parameter.
- Output Neighborhood Table—A table containing the values of the Summary Statistics for Coefficients Estimates and Summary of Explanatory Variables and Neighborhoods sections of the messages.
- Coefficient Raster Workspace—A workspace (directory or geodatabase) where rasters of the coefficients are saved. These coefficient raster surfaces can help explain the spatial variation in the coefficients.
Multicollinearity
Multicollinearity occurs when two or more explanatory variables are strongly correlated in a regression model. This can occur in OLS, GLR, GWR, and MGWR models. Multicollinearity may negatively impact the estimation of coefficients and optimal neighborhoods because if the explanatory variables are correlated, they share mutual information, and the regression model cannot differentiate between the effects of the variables. In moderate cases, the estimated coefficient estimates may be biased and have high uncertainty. In extreme cases, the model may fail to calculate. The following example shows a scatterplot matrix of three variables that are all highly correlated with each other, and a regression model using them as explanatory variables will likely encounter issues with multicollinearity.
Identification and prevention of multicollinearity in MGWR
In an MGWR model, multicollinearity can occur in various situations including the following:
One of the explanatory variables is strongly spatially clustered. As MGWR fits local regression models, when a feature and all of its neighbors have approximately the same value for an explanatory variable, multicollinearity is likely to occur.
To prevent this, map each explanatory variable and identify the variables that have very few possible values or where identical variables are spatially clustered. If you observe these types of variables, consider removing them from the model or representing them in a way that increases the range of values. A variable number of bedrooms, for example, may be better represented as bedrooms per square foot.
Two or more explanatory variables are highly correlated globally.
Run a global model using Generalized Linear Regression and examine the Variance Inflation Factor (VIF) for each explanatory variable. If the VIF values are large, for example, 7.5 or higher, global multicollinearity may prevent MGWR from running. In this case, the variables are redundant, so consider removing one of these variables from the model or combining them with other explanatory variables to increase the variation in the values.
The defined neighborhood is too small.
Multicollinearity can also involve multiple explanatory variables at the same time, and this occurs when linear combinations of some explanatory variables are highly correlated with linear combinations of other explanatory variables. This is most common with neighborhoods that have a small number of neighbors. To test this, check the local condition number in the output feature class. A high local condition number indicates that the results are unstable due to local multicollinearity. If this is the case, rerun the model using a larger number of neighbors or a distance band. As a rule, be skeptical of results in which the features have a condition number greater than 30 or are null. For shapefiles, null values are represented with the value -1.7976931348623158e+308. The condition number is scale-adjusted to correct for the number of explanatory variables in the model, and this allows you to directly compare the condition number between models that use a different number of explanatory variables.
Checking all these conditions may help with multicollinearity issues but may not always solve them.
Coefficient and bandwidth estimation
For all neighborhood selection methods except Gradient Search, the coefficients and bandwidths of the explanatory variables are estimated through a process called backfitting (Breiman et al. 1985). Originally developed for estimating the parameters of generalized additive models, the procedure moves through the explanatory variables one by one and uses a smoothing function to calibrate the coefficient while keeping all other explanatory variables constant. This process repeats over the explanatory variables until the values of the coefficients stabilize and do not change after a successive iteration.
When applied to MGWR (Fotheringham et al. 2017), the smoothing function is a univariate GWR model that regresses the previous residual-adjusted prediction against the single explanatory variable (treating all other explanatory variables as constants). This GWR model uses the same neighborhood selection method (Golden Search, manual intervals, or user-defined) to estimate the spatial scale of the explanatory variable. See the Additional resources section for a complete description of the process.
The backfitting algorithm must begin with initialized values of the coefficients. These initial values are estimated by a GWR model of all the explanatory variables. If this model fails due to multicollinearity, OLS is used instead. If the process does not converge after 25 iterations, the coefficient values of the final iteration are used.
Gradient Search
The Gradient Search neighborhood selection method option is a more recent approach to estimating optimal bandwidths in MGWR that does not use backfitting. The main benefits of Gradient Search are improved run times and efficient use of memory. This method is a second-order optimization algorithm that uses the gradient and Hessian matrix to minimize the AICc with respect to the spatial scale of the explanatory variables. Rather than updating the parameter of a single explanatory variable in each iterative step, the parameters of all explanatory variables are updated simultaneously by descending in the steepest direction of the gradient, corrected by the curvature of the AICc.
The results produced by Golden Search and Gradient Search are usually very similar. The following image shows the true coefficient surface along with the estimated coefficient surface from Golden Search and Gradient Search. Notice that all surfaces are similar and estimate the true surface accurately.
The image below compares the run times for Golden Search and Gradient Search for different numbers of explanatory variables and different dataset sizes. The run time of Gradient Search is consistently approximately half of the run time of Golden Search for the same number of explanatory variables.
The image below compares the memory usage of Golden Search and Gradient Search. The memory usage of Golden Search increases rapidly (quadratic growth) as the sample size increases, but the memory usage of Gradient Search is unaffected by the sample size.
The image below compares the AICc values of Golden Search and Gradient Search. The accuracy of the methods is very similar, but Golden Search achieves slightly lower AICc values (indicating a slightly more accurate estimate) than Gradient Search.
Additional resources
For more information, see the following:
- Breiman, L., and J. H. Friedman. 1985. "Estimating optimal transformations for multiple regression and correlations (with discussion)." Journal of the American Statistical Association 80, (391): 580–619. https://doi.org/10.2307/2288473. JSTOR 2288473.
- Brunsdon C., A. S. Fotheringham, and M. E. Charlton. 1996. "Geographically weighted regression: A method for exploring spatial nonstationarity." Geographical Analysis 28: 281–298.
- Conn, A.R., N.I.M. Gould, and P.L. Toint. 2000. "Trust Region Methods." Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719857.
- da Silva, A. R., and A. S. Fotheringham. 2016. "The multiple testing issue in geographically weighted regression." Geographical Analysis 48(3), 233–247. https://doi.org/10.1111/gean.12084.
- Fotheringham, A. S., W. Yang, and W. Kang. 2017. "Multiscale geographically weighted regression (MGWR)." Annals of the American Association of Geographers 107: 1247–265. https://doi.org/10.1080/24694452.2017.1352480
- Oshan, T. M., Z. Li, W. Kang, L. J. Wolf, and A. S. Fotheringham. 2019. "mgwr: A Python implementation of multiscale geographically weighted regression for investigating process spatial heterogeneity and scale." ISPRS International Journal of Geo-Information 8: 269.
- Yu, H., A. S. Fotheringham, Z. Li, T. Oshan, W. Kang, and L. J. Wolf. 2020. "Inference in multiscale geographically weighted regression." Geographical Analysis 52: 87–106.
- Zhou, X., R. Assunção, H. Shao, M. Janikas, C. Huang, and H. Asefaw. 2023. "Gradient-based optimization for Multi-scale Geographically Weighted Regression." (under review)