How Geographically Weighted Regression (GWR) works

Geographically Weighted Regression (GWR) is one of several spatial regression techniques used in geography and other disciplines. GWR evaluates a local model of the variable or process you are trying to understand or predict by fitting a regression equation to every feature in the dataset. GWR constructs these separate equations by incorporating the dependent and explanatory variables of the features falling within the neighborhood of each target feature. The shape and extent of each neighborhood analyzed is based on the Neighborhood Type and Neighborhood Selection Method parameters. GWR should be applied to datasets with several hundred features. It is not an appropriate method for small datasets and does not work with multipoint data.

Note:

This tool has been updated for ArcGIS Pro 2.3 and includes additional academic research, improvements to the method developed over the past several years, and expands support for additional models. The addition of the Count (Poisson) and Binary (Logistic) models allow the tool to be applied to a wider range of problems.

Potential applications

Geographically Weighted Regression can be used for a variety of applications, including the following:

  • Is the relationship between educational attainment and income consistent across the study area?
  • Do certain illness or disease occurrences increase with proximity to water features?
  • What are the key variables that explain high forest fire frequency?
  • Which habitats should be protected to encourage the reintroduction of an endangered species?
  • Where are the districts in which children are achieving high test scores? What characteristics seem to be associated? Where is each characteristic most important?
  • Are the factors influencing higher cancer rates consistent across the study area?

Inputs

To run the GWR tool, provide the Input Features parameter with a field representing the Dependent Variable and one or more fields representing the Explanatory Variable(s). These fields must be numeric and have a range of values. Features that contain missing values in the dependent or explanatory variables will be excluded from the analysis; however, you can use the Fill Missing Values tool to complete the dataset before running GWR. Next, you must choose a Model Type based on the data you are analyzing. It is important to use an appropriate model for your data. Descriptions of the model types and how to determine the appropriate one for your data are below.

Model type

GWR provides three types of regression models: Continuous, Binary, and Count. These types of regression are known in statistical literature as Gaussian, Logistic, and Poisson, respectively. The Model Type for your analysis should be chosen based on how your Dependent Variable was measured or summarized as well as the range of values it contains.

Continuous (Gaussian)

Use the Continuous (Gaussian) Model Type if your Dependent Variable can take on a wide range of values such as temperature or total sales. Ideally, your dependent variable will be normally distributed. You can create a histogram of your dependent variable to verify that it is normally distributed. If the histogram is a symmetrical bell curve, use a Gaussian model type. Most of the values will be clustered near the mean, with few values departing radically from the mean. There should be as many values on the left side of the mean as on the right (the mean and median values for the distribution are the same). If your Dependent Variable does not appear to be normally distributed, consider reclassifying it to a binary variable. For example, if your dependent variable is average household income, you can recode it to a binary variable, in which 1 indicates above the national median income and 0 (zero) indicates below the national median income. A continuous field can be reclassified to a binary field using the Reclassify helper function in the Calculate Field tool.

Binary (Logistic)

Use a Binary (Logistic) Model Type if your Dependent Variable can take on one of two possible values such as success and failure or presence and absence. The field containing your Dependent Variable must be numeric and contain only ones and zeros. Results will be easier to interpret if you code the event of interest, such as success or presence of an animal, as 1, as the regression will model the probability of 1. There must be variation of the ones and zeros in your data both globally and locally. If you create a histogram of your Dependent Variable, it should only show ones and zeros. You can use the Select By Circle tool to check for local variation by selecting various regions across the map and making sure there is a combination of ones and zeros in each region.

Count (Poisson)

Consider using a Count (Poisson) Model Type if your Dependent Variable is discrete and represents the number of occurrences of an event such as a count of crimes. Count models can also be used if your Dependent Variable represents a rate and the denominator of the rate is a fixed value such as sales per month or number of people with cancer per 10,000 population. A Count (Poisson) model assumes that the mean and variance of the Dependent Variable are equal, and the values of your Dependent Variable cannot be negative or contain decimals.

Choosing a neighborhood (bandwidth)

A neighborhood (also known as a bandwidth) is the distance band or number of neighbors used for each local regression equation and is perhaps the most important parameter to consider for Geographically Weighted Regression, as it controls the degree of smoothing in the model. The shape and extent of the neighborhoods analyzed are based on the input for the Neighborhood Type and Neighborhood Selection Method parameters with one modification: when the number of features in the neighborhood exceeds 1000, only the closest 1000 are used in each local regression equation.

The Neighborhood Type parameter can be based on either Number of Neighbors or Distance Band. When Number of Neighbors is used, the neighborhood size is a function of a specified number of neighbors, which allows neighborhoods to be smaller where features are dense and larger where features are sparse. When Distance Band is used, the neighborhood size remains constant for each feature in the study area, resulting in more features per neighborhood where features are dense and fewer per neighborhood where they are sparse.

The Neighborhood Selection Method parameter specifies how the size of the neighborhood is determined (the actual distance or number of neighbors used). The neighborhood selected with the Golden search or Manual intervals option is always based on minimizing the value of the Akaike Information Criterion (AICc). Alternatively, you can set a specific neighborhood distance or number of neighbors with the User defined option.

When the Golden search option is chosen, the tool determines the best values for the Distance band or Number of neighbors parameter using the golden section search method. Golden search first finds maximum and minimum distances and tests the AICc at various distances incrementally between them. When there are more than 1000 features in a dataset, the maximum distance is the distance at which any feature has at most 1000 neighbors. The minimum distance is the distance at which every feature has at least 20 neighbors. If there are less than 1000 features, the maximum distance is the distance at which every feature has n/2 neighbors (half the number of features as neighbors), and the minimum distance is the distance at which every feature has at least 5 percent of n (5 percent of the features in the dataset as neighbors). Golden search determines the distance or number of neighbors with the lowest AICc as the neighborhood size.

The Minimum Search Distance and Maximum Search Distance parameters (for Distance Band) and Minimum Number of Neighbors and Maximum Number of Neighbors (for Number of Neighbors) can be used to limit the search range by setting the starting and ending distances for Golden search manually.

Local weighting scheme

The power of GWR is that it applies a geographical weighting to the features used in each of the local regression equations. Features that are farther away from the regression point are given less weight and thus have less influence on the regression results for the target feature; features that are closer have more weight in the regression equation. The weights are determined using a kernel, which is a distance decay function that determines how quickly weights decrease as distances increase. The Geographically Weighted Regression tool provides two kernel options in the Local Weighting Scheme parameter, Gaussian and Bisquare.

The Gaussian weighting scheme assigns a weight of one to the regression feature (feature i), and weights for the surrounding features (j features) smoothly and gradually decrease as the distance from the regression feature increases. For example, if feature i and j are 0.25 units apart, the resulting weight in the equation will be approximately 0.88. If feature iand j are 0.75 units apart, the resulting weight will only be approximately 0.32. Feature jwill have less influence on the regression since it is farther away. A Gaussian weighting scheme never reaches zero, but weights for features far away from the regression feature can be quite small and have almost no impact on the regression. Conceptually, when using a Gaussian weighting scheme, every other feature in the input data is a neighboring feature and will be assigned a weight. However, for computational efficiency, when the number of neighboring features will exceed 1000, only the closest 1000 are incorporated into each local regression. A Gaussian weighting scheme ensures that each regression feature will have many neighbors and thus increases the chance that there will be variation in the values of those neighbors. This avoids a well-known problem in geographically weighted regression called local collinearity. Use a Gaussian weighting scheme when the influence of neighboring features becomes smoothly and gradually less important but that influence is always present regardless of how far away the surrounding features are.

The Bisquare weighting scheme is similar to Gaussian. It assigns a weight of one to the regression feature (feature i), and weights for the surrounding features (j features) smoothly and gradually decrease as the distance from the regression feature increases. However, all features outside of the neighborhood specified are assigned zero and do not impact the local regression for the target feature. When comparing a Bisquare weighting scheme to a Gaussian weighting scheme with the same neighborhood specifications, weights will decrease more quickly with Bisquare. Using a Bisquare weighting scheme allows you to specify a distance after which features will have no impact on the regression results. Since Bisquare excludes features after a certain distance, there is no guarantee that there will be sufficient features (with influence) in the surrounding neighborhood to produce a good local regression analysis. Use a Gaussian weighting scheme when the influence of neighboring features becomes smoothly and gradually less important and there is a distance after which that influence is no longer present. For example, regression is often used to model housing prices, and the sales price of surrounding houses is a common explanatory variable. These surrounding houses are called comps or comparable properties. Lending agencies sometimes establish rules that require a comparable house to be within a maximum distance. In this example, a Bisquare can be used with a neighborhood equal to the maximum distance specified by the lending institution.

Prediction

You can use the regression model that has been created to make predictions for other features (either points or polygons) in the same study area. Creating these predictions requires that each of the Prediction Locations has values for each of the Explanatory Variable(s) provided. If the field names from the Input Features and the Prediction Locations parameters do not match, a variable matching parameter is provided. When matching the explanatory variables, the fields from the Input Features and Prediction Locations parameters must be of the same type (double fields must be matched with double fields, for example).

Coefficient rasters

A powerful aspect of GWR is that it allows you to explore spatially varying relationships. One way to visualize how the relationships between the explanatory variables and the dependent variable vary across space is to create coefficient rasters. When you provide a path name for the Coefficient Raster Workspace parameter, the GWR tool will create coefficient raster surfaces for the model intercept and each explanatory variable. The resolution of the rasters is controlled by the Cell Size environment. A neighborhood (kernel) is constructed around each raster cell using the Neighborhood Type and Local Weighting Scheme parameters. Distance based weights are calculated from the center of the raster cell to all the input features falling within the neighborhood (bandwidth). These weights are used to calculate a unique regression equation for that raster cell. The coefficients vary from raster cell to raster cell because the distance based weights change, and potentially different input features will fall within the neighborhood (bandwidth).

Note:

There is currently no consensus on how to assess confidence in the coefficients from a GWR model. While t-tests have been used to base an inference on whether the estimated value of coefficients is significantly different than zero, the validity of this approach is still an area of active research. One approach to informally evaluate the coefficients is to divide the coefficient by the standard error provided for each feature as a way of scaling the magnitude of the estimation with the associated standard error and visualize those results, looking for clusters of high standard errors relative to their coefficients.

Outputs

The Geographically Weighted Regression tool produces a variety of different outputs. A summary of the GWR model and statistical summaries are available as messages at the bottom of the Geoprocessing pane during tool execution. To access the messages, hover the pointer over the progress bar, click the pop-out button, or expand the messages section in the Geoprocessing pane. You can also access messages of a previously run Geographically Weighted Regression tool via the geoprocessing history. The tool also generates Output Features, charts and optionally Output Predicted Features and coefficient raster surfaces. The Output Features and associated charts are automatically added to the Contents pane with a hot and cold rendering scheme applied to model residuals. The diagnostics and charts generated depend on the Model Type of the Input Features and are described below.

Global model statistics are calculated for all models.

Continuous (Gaussian)

Feature class and added fields

In addition to regression residuals, the Output Features parameter includes fields for observed and predicted y values, condition number (COND), Local R2, explanatory variable coefficients, and standard errors.

The Intercept (INTERCEPT), Standard Error of the Intercept (SE_INTERCEPT), Coefficients and Standard Errors for each of the explanatory variables, Predicted, Residual, Std Residual, Influence, Cook's D, and Local R-Squared are also reported.

Interpreting messages and diagnostics

Analysis details are provided in the messages including the number of features analyzed, the dependent and explanatory variables, and the number of neighbors specified. In addition, the diagnostics in the following screen capture are reported:

Model Diagnostics for the Continuous Model Type
  • R2—R-squared is a measure of goodness of fit. Its value varies from 0.0 to 1.0, with higher values being preferable. It may be interpreted as the proportion of dependent variable variance accounted for by the regression model. The denominator for the R2 computation is the sum of squared dependent variable values. Adding an extra explanatory variable to the model does not alter the denominator but does alter the numerator; this gives the impression of improvement in model fit that may not be real. See Adj R2 below.
  • AdjR2—Because of the problem described above for the R2 value, calculations for the adjusted R-squared value normalize the numerator and denominator by their degrees of freedom. This has the effect of compensating for the number of variables in a model, and consequently, the Adjusted R2 value is almost always less than the R2 value. However, in making this adjustment, you lose the interpretation of the value as a proportion of the variance explained. In GWR, the effective number of degrees of freedom is a function of the neighborhood used, so the adjustment may be quite marked in comparison to a global model such as Generalized Linear Regression (GLR). For this reason, AICc is preferred as a means of comparing models.
  • AICc—This is a measure of model performance and can be used to compare regression models. Taking into account model complexity, the model with the lower AICc value provides a better fit to the observed data. AICc is not an absolute measure of goodness of fit but is useful for comparing models with different explanatory variables as long as they apply to the same dependent variable. If the AICc values for two models differ by more than 3, the model with the lower AICc value is held to be better. Comparing the GWR AICc value to the GLR AICc value is one way to assess the benefits of moving from a global model (GLR) to a local regression model (GWR).
  • Sigma-Squared—This is the least-squares estimate of the variance (standard deviation squared) for the residuals. Smaller values of this statistic are preferable. This value is the normalized residual sum of squares, where the residual sum of squares is divided by the effective degrees of freedom of the residuals. Sigma-Squared is used for AICc computations.
  • Sigma-Squared MLE—This is the maximum likelihood estimate (MLE) of the variance (standard deviation squared) of the residuals. Smaller values of this statistic are preferable. This value is calculated by dividing the residual sum of squares by the number of input features.
  • Effective Degrees of Freedom—This value reflects a tradeoff between the variance of the fitted values and the bias in the coefficient estimates and is related to the choice of neighborhood size. As the neighborhood approaches infinity, the geographic weights for every feature approach 1, and the coefficient estimates will be very close to those for a global GLR model. For very large neighborhoods, the effective number of coefficients approaches the actual number; local coefficient estimates will have a small variance but will be quite biased. Conversely, as the neighborhood gets smaller and approaches zero, the geographic weights for every feature approach zero except for the regression point itself. For extremely small neighborhoods, the effective number of coefficients is the number of observations, and the local coefficient estimates will have a large variance but a low bias. The effective number is used to compute many other diagnostic measures.

Output charts

A scatter plot matrix is provided in the Contents pane (including up to 19 variables) as well as a histogram of the deviance residual displaying a normal distribution line.

Binary (Logistic)

Feature class and added fields

The Intercept (INTERCEPT), Standard Error of the Intercept (SE_INTERCEPT), Coefficients, and Standard Errors for each of the explanatory variables, as well as the Probability of Being 1, Predicted, Deviance Residual, GInfluence, and Local Percent Deviance are reported.

Interpreting messages and diagnostics

Analysis details are provided in the messages including the number of features analyzed, the dependent and explanatory variables and the number of neighbors specified. In addition, the diagnostics in the following screen capture are reported:

Model Diagnostics for the Binary Model Type
  • % deviance explained by the global model (non-spatial)—This is a measure of goodness of fit and quantifies the performance of a global model (GLR). Its value varies from 0.0 to 1.0, with higher values being preferable. It can be interpreted as the proportion of dependent variable variance accounted for by the regression model.
  • % deviance explained by the local model—This is a measure of goodness of fit and quantifies the performance of a local model (GWR). Its value varies from 0.0 to 1.0, with higher values being preferable. It can be interpreted as the proportion of dependent variable variance accounted for by the local regression model.
  • % deviance explained by the local model vs global model—This proportion is one way to assess the benefits of moving from a global model (GLR) to a local regression model (GWR) by comparing the residual sum of squares of the local model to the residual sum of squares of the global model. Its value varies from 0.0 to 1.0, with higher values signifying the local regression model performed better than a global model.
  • AICc—This is a measure of model performance and can be used to compare regression models. Taking into account model complexity, the model with the lower AICc value provides a better fit to the observed data. AICc is not an absolute measure of goodness of fit but is useful for comparing models with different explanatory variables as long as they apply to the same dependent variable. If the AICc values for two models differ by more than 3, the model with the lower AICc value is held to be better. Comparing the GWR AICc value to the OLS AICc value is one way to assess the benefits of moving from a global model (OLS) to a local regression model (GWR).
  • Sigma-Squared—This value is the normalized residual sum of squares, in which the residual sum of squares is divided by the effective degrees of freedom of the residual. This is the least-squares estimate of the variance (standard deviation squared) of the residuals. Smaller values of this statistic are preferable. Sigma-Squared is used for AICc computations.
  • Sigma-Squared MLE—This is the maximum likelihood estimate (MLE) of the variance (standard deviation squared) of the residuals. Smaller values of this statistic are preferable. This value is calculated by dividing the residual sum of squares by the number of input features.
  • Effective Degrees of Freedom—This value reflects a tradeoff between the variance of the fitted values and the bias in the coefficient estimates and is related to the choice of neighborhood size. As the neighborhood approaches infinity, the geographic weights for every feature approach 1, and the coefficient estimates will be very close to those for a global GLR model. For very large neighborhoods, the effective number of coefficients approaches the actual number; local coefficient estimates will have a small variance but will be quite biased. Conversely, as the neighborhood gets smaller and approaches zero, the geographic weights for every feature approach zero except for the regression point itself. For extremely small neighborhoods, the effective number of coefficients is the number of observations, and the local coefficient estimates will have a large variance but a low bias. The effective number is used to compute many other diagnostic measures.

Output charts

A scatter plot matrix as well as box plots and a histogram of the deviance residuals are provided.

Count (Poisson)

Interpreting messages and diagnostics

Analysis details are provided in the messages including the number of features analyzed, the dependent and explanatory variables, and the number of neighbors specified. In addition, the diagnostics in the following screen capture are reported:

Model Diagnostics for the Count Model Type
  • % deviance explained by the global model (non-spatial)—This is a measure of goodness of fit and quantifies the performance of a global model (GLR). Its value varies from 0.0 to 1.0, with higher values being preferable. It can be interpreted as the proportion of dependent variable variance accounted for by the regression model.
  • % deviance explained by the local model—This is a measure of goodness of fit and quantifies the performance of the local model (GWR). Its value varies from 0.0 to 1.0, with higher values being preferable. It can be interpreted as the proportion of dependent variable variance accounted for by the local regression model.
  • % deviance explained by the local model vs global model—This proportion is one way to assess the benefits of moving from a global model (GLR) to a local regression model (GWR) by comparing the residual sum of squares of the local model to the residual sum of squares of the global model. Its value varies from 0.0 to 1.0, with higher values signifying the local regression model performed better than a global model.
  • AICc—This is a measure of model performance and can be used to compare regression models. Taking into account model complexity, the model with the lower AICc value provides a better fit to the observed data. AICc is not an absolute measure of goodness of fit but is useful for comparing models with different explanatory variables as long as they apply to the same dependent variable. If the AICc values for two models differ by more than 3, the model with the lower AICc value is held to be better. Comparing the GWR AICc value to the OLS AICc value is one way to assess the benefits of moving from a global model (OLS) to a local regression model (GWR).
  • Sigma-Squared—This value is the normalized residual sum of squares, in which the residual sum of squares is divided by the effective degrees of freedom of the residual. This is the least-squares estimate of the variance (standard deviation squared) of the residuals. Smaller values of this statistic are preferable. Sigma-Squared is used for AICc computations.
  • Sigma-Squared MLE—This is the maximum likelihood estimate (MLE) of the variance (standard deviation squared) of the residuals. Smaller values of this statistic are preferable. This value is calculated by dividing the residual sum of squares by the number of input features.
  • Effective Degrees of Freedom—This value reflects a tradeoff between the variance of the fitted values and the bias in the coefficient estimates and is related to the choice of neighborhood size. As the neighborhood approaches infinity, the geographic weights for every feature approach 1, and the coefficient estimates will be very close to those for a global GLR model. For very large neighborhoods, the effective number of coefficients approaches the actual number; local coefficient estimates will have a small variance but will be quite biased. Conversely, as the neighborhood gets smaller and approaches zero, the geographic weights for every feature approach zero except for the regression point itself. For extremely small neighborhoods, the effective number of coefficients is the number of observations, and the local coefficient estimates will have a large variance but a low bias. The effective number is used to compute many other diagnostic measures.

Output charts

A scatter plot matrix is provided in the Contents pane (including up to 19 variables) as well as a histogram of the deviance residual and normal distribution line.

Other implementation notes and tips

In global regression models, such as GLR, results are unreliable when two or more variables exhibit multicollinearity (when two or more variables are redundant or together tell the same story). The Geographically Weighted Regression tool builds a local regression equation for each feature in the dataset. When the values for a particular explanatory variable cluster spatially, you will likely have problems with local multicollinearity. The condition number in the Output Features parameter indicates when results are unstable due to local multicollinearity. As a rule of thumb, be skeptical of results for features with a condition number larger than 30, equal to Null or, for shapefiles, equal to -1.7976931348623158e+308. The condition number is scale-adjusted in order to correct for the number of explanatory variables in the model. This allows direct comparison of the condition number between models using different numbers of explanatory variables.

Model design errors often indicate a problem with global or local multicollinearity. To determine where the problem is, run the model using GLR and examine the VIF value for each explanatory variable. If some of the VIF values are large (above 7.5, for example), global multicollinearity is preventing GWR from solving. More likely, however, local multicollinearity is the problem. Try creating a thematic map for each explanatory variable. If the map reveals spatial clustering of identical values, consider removing those variables from the model or combining those variables with other explanatory variables to increase value variation. If, for example, you are modeling home values and have variables for bedrooms and bathrooms, you may want to combine these to increase value variation or to represent them as bathroom/bedroom square footage. Avoid using spatial regime artificial or binary variables for Gaussian or Poisson model types, spatially clustering categorical or nominal variables with the logistic model type, or variables with few possible values when constructing GWR models.

Problems with local multicollinearity can also prevent the tool from resolving an optimal Distance band or Number of neighbors. Try specifying Manual intervals or a User defined Distance Band or specific neighbor count. Then examine the condition numbers in the Output feature class to see which features are associated with local multicollinearity problems (condition numbers larger than 30). You may want to remove these problem features temporarily while you find an optimal distance or number of neighbors. Keep in mind that results associated with condition numbers greater than 30 are not reliable.

Parameter estimates and predicted values for GWR are computed using the following spatial weighting function: exp(-d^2/b^2). There may be differences in this weighting function among various GWR software implementations. Consequently, results from the GWR tool may not match results of other GWR software packages exactly.

Additional resources

There are a number of resources to help you learn more about Generalized Linear Regression and Geographically Weighted Regression. Start with Regression analysis basics or work through the Regression Analysis tutorial.

The following are also helpful resources:

Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). "Geographically weighted regression: a method for exploring spatial nonstationarity". Geographical analysis, 28(4), 281-298.

Fotheringham, Stewart A., Chris Brunsdon, and Martin Charlton. Geographically Weighted Regression: the analysis of spatially varying relationships. John Wiley & Sons, 2002.

Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2013). GWmodel: an R package for exploring spatial heterogeneity using geographically weighted models. arXiv preprint arXiv:1306.0413.

Mitchell, Andy. The ESRI Guide to GIS Analysis, Volume 2. ESRI Press, 2005.

Nakaya, T., Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2005). "Geographically weighted Poisson regression for disease association mapping". Statistics in medicine, 24(17), 2695-2717.

Páez, A., Farber, S., & Wheeler, D. (2011). "A simulation-based study of geographically weighted regression as a method for investigating spatially varying relationships". Environment and Planning A, 43(12), 2992-3010.