This tool performs Geographically Weighted Regression (GWR), a local form of regression used to model spatially varying relationships. The GWR tool provides a local model of the variable or process you are trying to understand or predict by fitting a regression equation to every feature in the dataset. The GWR tool constructs these separate equations by incorporating the dependent and explanatory variables of features within the neighborhood of each target feature. The shape and extent of each neighborhood analyzed is based on the input for the Neighborhood Type and Neighborhood Selection Method parameters with one restriction: when the number of neighboring features will exceed 1000, only the closest 1000 are incorporated into each local equation.
Apply the GWR tool to datasets with several hundred features for best results. It is not an appropriate method for small datasets. The tool does not work with multipoint data.
Use the Input Features parameter with a field representing the phenomena you are modeling (the Dependent Variable value) and one or more fields representing the Explanatory Variable(s) value. These fields must be numeric and have a range of values. Features that contain missing values in the dependent or explanatory variable will be excluded from the analysis; however, you can use the Fill Missing Values tool to complete the dataset before running the GWR tool.
The GWR tool also produces Output Features values and adds fields reporting local diagnostic values. The Output Features values and associated charts are automatically added to the table of contents with a hot/cold rendering scheme applied to model residuals. A full explanation of each output and chart is provided in How Geographically Weighted Regression works.
The GWR tool produces a variety of outputs. A summary of the Geographically Weighted Regression model is available as a message at the bottom of the Geoprocessing pane during tool operation. To access the message, hover over the progress bar, click the pop-out button, or expand the messages section in the Geoprocessing pane. You can also access the messages of a previously run GWR tool via the geoprocessing history.
The Model Type value specified depends on the data you are modeling. It is important to use the correct model for the analysis to obtain accurate results of the regression analysis.
It is recommended that you use projected data. This is especially important when distance is a component of the analysis, as it is for Geographically Weighted Regression when you specify Distance band for the Neighborhood Type parameter. It is recommended that the data be projected using a projected coordinate system (rather than a geographic coordinate system).
Some of the GWR tool computations take advantage of multiple CPUs to increase performance and will automatically use up to eight threads/CPUs for processing.
It is common practice to explore data globally using the Generalized Linear Regression tool prior to exploring data locally using the GWR tool.
The Dependent Variable and Explanatory Variable(s) parameter values should be numeric fields containing a variety of values. There should be variation in these values both globally and locally. For this reason, do not use dummy explanatory variables to represent different spatial regimes in the Geographically Weighted Regression model (such as assigning a value of 1 to census tracts outside the urban core, while all others are assigned a value of 0). Because the GWR tool allows explanatory variable coefficients to vary, these spatial regime explanatory variables are unnecessary, and if included, will create problems with local multicollinearity.
In global regression models, such as Generalized Linear Regression, results are unreliable when two or more variables exhibit multicollinearity (when two or more variables are redundant or together tell the same story). The GWR tool builds a local regression equation for each feature in the dataset. When the values for a particular explanatory variable cluster spatially, it is likely that there are problems with local multicollinearity. The condition number field (COND) in the output feature class indicates when results are unstable due to local multicollinearity. As a general rule, be skeptical of results for features with a condition number greater than 30, equal to Null or, for shapefiles, equal to -1.7976931348623158e+308. The condition number is scale-adjusted to correct for the number of explanatory variables in the model. This allows direct comparison of the condition number between models using different numbers of explanatory variables.
Use caution when including nominal or categorical data in a Geographically Weighted Regression model. Where categories cluster spatially, there is risk of encountering local multicollinearity issues. The condition number included in the Geographically Weighted Regression output indicates when local collinearity is a problem (a condition number less than 0, greater than 30, or set to Null). Results in the presence of local multicollinearity are unstable.
To better understand regional variation among the coefficients of the explanatory variables, examine the optional raster coefficient surfaces created by the GWR tool. These raster surfaces are created in the Coefficient Raster Workspace parameter, under Additional Options, if specified. For polygon data, you can use graduated color or cold-to-hot rendering on each coefficient field in the Output Features value to examine changes across the study area.
You can use the GWR tool for prediction by supplying a Prediction Locations value (often this feature class is the same as the Input Features value), matching the explanatory variables and specifying a Output Predicted Features value. If the Explanatory Variables to Match fields from the Input Features value match the Fields From Prediction Locations fields, they will automatically populate. If not, specify the correct fields.
A regression model is incorrectly specified if it is missing a key explanatory variable. Statistically significant spatial autocorrelation of the regression residuals or unexpected spatial variation among the coefficients of one or more explanatory variables suggests that the model is incorrectly specified. Make every effort (through GLR residual analysis and Geographically Weighted Regression coefficient variation analysis, for example) to discover these key missing variables so they can be included in the model.
Determine whether it makes sense for an explanatory variable to be nonstationary. For example, suppose you are modeling the density of a particular plant species as a function of several variables including ASPECT. If you find that the coefficient for the ASPECT variable changes across the study area, you are likely seeing evidence of a key missing explanatory variable (prevalence of competing vegetation, for example). Make every effort to include all key explanatory variables in the regression model.
When the result of a computation is infinity or undefined, the result for nonshapefiles will be Null; for shapefiles, the result will be -DBL_MAX = -1.7976931348623158e+308.
When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from nonshapefile inputs may, consequently, store null values as zero or as a very small negative number (-DBL_MAX = -1.7976931348623158e+308). This can lead to unexpected results. For more information see Geoprocessing considerations for shapefile output.
There are three options for the Neighborhood Selection Method parameter. When you select Golden search, the tool will find the best values for the Distance Band or Number of Neighbors parameter using the golden section search method. The Manual intervals option will test neighborhoods in increments between the distances specified. In either case, the neighborhood size used is the one that minimizes the Akaike information criterion (AICc) value. Problems with local multicollinearity, however, will prevent both of these methods from resolving an optimal distance band or number of neighbors. If you receive an error or run into severe model design problems, you can try specifying a particular distance or neighborhood count using the User defined option. Then examine the condition numbers in the output feature class to see which features are associated with local collinearity problems.
Severe model design issues, or errors indicating that local equations do not include enough neighbors, often indicate a problem with global or local multicollinearity. To determine where the problem is, run a global model using Generalized Linear Regression and examine the VIF value for each explanatory variable. If some of the VIF values are large (above 7.5, for example), global multicollinearity is preventing Geographically Weighted Regression from solving. More likely, however, local multicollinearity is the problem. Try creating a thematic map for each explanatory variable. If the map reveals spatial clustering of identical values, consider removing those variables from the model or combining them with other explanatory variables to increase value variation. If, for example, you are modeling home values and have variables for bedrooms and bathrooms, you can combine these to increase value variation or to represent them as bathroom/bedroom square footage. Avoid using spatial regime dummy variables, spatially clustering categorical or nominal variables, or variables with very few possible values when constructing Geographically Weighted Regression models.
Geographically Weighted Regression is a linear model subject to the same requirements as Generalized Linear Regression . Review the diagnostics explained in How Geographically Weighted Regression works to ensure that the Geographically Weighted Regression model is properly specified. The How regression models go bad section in the Regression analysis basics topic also includes information for ensuring your model is accurate.