Geographically Weighted Regression (GWR) (GeoAnalytics)

Summary

Performs Geographically Weighted Regression (GWR), which is a local form of linear regression that is used to model spatially varying relationships.

Note:

This tool is a subset of capabilities added to the Geographically Weighted Regression (GWR) tool introduced at ArcGIS Pro 2.3.

For an understanding of the algorithms of the tool, see How Geographically Weighted Regression (GWR) works. This topic describes the Spatial Statistics toolbox tool; not all capabilities are included in the GeoAnalytics Server Toolbox tool at this time.

Usage

  • This geoprocessing tool is available with ArcGIS Enterprise 10.8.1 or later.

  • This tool performs Geographically Weighted Regression (GWR), a local form of regression used to model spatially varying relationships. The GWR tool provides a local model of the variable or process you are trying to understand or predict by fitting a regression equation to every feature in the dataset. The Geographically Weighted Regression (GWR) tool constructs these separate equations by incorporating the dependent and explanatory variables of features within the neighborhood of each target feature. The shape and extent of each neighborhood analyzed is based on the input for the Neighborhood Type and Neighborhood Selection Method parameters.

  • Apply the GWR tool to datasets with several hundred features for best results. It is not an appropriate tool for small datasets. The tool does not work with multipoint data.

  • Use the Input Features parameter with a field representing the phenomena you are modeling (the Dependent Variable) and one or more fields representing the Explanatory Variable(s) parameter values. These fields must be numeric and have a range of values. Features that contain missing values in the dependent or explanatory variable will be excluded from the analysis. You can use the Calculate Field tool to modify values. If your data is available for use in ArcGIS Pro, use the Fill Missing Values tool to add missing values to the dataset before running the Geographically Weighted Regression (GWR) tool.

  • The Geographically Weighted Regression (GWR) tool also produces output features and adds fields reporting local diagnostic values. The Output Features parameter values and associated charts are automatically added to the table of contents with a hot/cold rendering scheme applied to model residuals. A full explanation of each output is provided in How Geographically Weighted Regression (GWR) works.

    Note:

    The Geographically Weighted Regression (GWR) tool produces a variety of outputs. A summary of the GWR model is available as a message at the bottom of the Geoprocessing pane during tool execution. You can access the message by hovering over the progress bar, clicking the pop-out button, or expanding the messages section in the Geoprocessing pane. You can also access the messages of a previously run Geographically Weighted Regression (GWR) tool via the geoprocessing history.

  • You must use projected data.

  • It is common practice to explore data globally using the Generalized Linear Regression tool prior to exploring data locally using the Geographically Weighted Regression (GWR) tool.

  • The Dependent Variable and Explanatory Variable(s) parameters must be numeric fields containing a variety of values. There should be variation in these values both globally and locally. For this reason, do not use dummy explanatory variables to represent different spatial regimes in your GWR model (such as assigning a value of 1 to census tracts outside the urban core, while all others are assigned a value of 0). Because the Geographically Weighted Regression (GWR) tool allows explanatory variable coefficients to vary, these spatial regime explanatory variables are unnecessary, and if included, will create problems with local multicollinearity.

  • In global regression models, such as Generalized Linear Regression, results are unreliable when two or more variables exhibit multicollinearity (when two or more variables are redundant or together tell the same story). The Geographically Weighted Regression (GWR) tool builds a local regression equation for each feature in the dataset. When the values for a particular explanatory variable cluster spatially, it is likely that there are problems with local multicollinearity. An adjusted condition number field (COND_ADJ) in the output feature class indicates when results are unstable due to local multicollinearity. As a general rule, be skeptical of results for features with an adjusted condition number greater than 30, equal to Null or, for shapefiles, equal to -1.7976931348623158e+308.

  • Use caution when including nominal or categorical data in a GWR model. Where categories cluster spatially, there is risk of encountering local multicollinearity issues. The adjusted condition number included in the GWR output indicates when local collinearity is a problem (an adjusted condition number less than 0, greater than 30, or set to Null). Results in the presence of local multicollinearity are unstable.

  • A regression model is incorrectly specified if it is missing a key explanatory variable. Statistically significant spatial autocorrelation of the regression residuals or unexpected spatial variation among the coefficients of one or more explanatory variables suggests that your model is incorrectly specified. You should make every effort (through GLR residual analysis and GWR coefficient variation analysis, for example) to discover these key missing variables and include them in the model.

  • Always question whether it makes sense for an explanatory variable to be nonstationary. For example, suppose you are modeling the density of a particular plant species as a function of several variables including ASPECT. If you find that the coefficient for the ASPECT variable changes across the study area, you are likely seeing evidence of a key missing explanatory variable (perhaps prevalence of competing vegetation, for example). You should make every effort to include all key explanatory variables in your regression model.

  • When the result of a computation is infinity or undefined, the result for nonshapefiles will be Null.

  • Severe model design issues, or errors indicating local equations do not include enough neighbors, often indicate a problem with global or local multicollinearity. To determine where the problem is, run a global model using Generalized Linear Regression and examine the VIF value for each explanatory variable. If some of the VIF values are large (above 7.5, for example), global multicollinearity is preventing GWR from solving. More likely, however, local multicollinearity is the problem. Try creating a thematic map for each explanatory variable. If the map reveals spatial clustering of identical values, consider removing those variables from the model or combining those variables with other explanatory variables to increase value variation. If, for example, you are modeling home values and have variables for bedrooms and bathrooms, you can combine these to increase value variation or to represent them as bathroom/bedroom square footage. Avoid using spatial regime dummy variables, spatially clustering categorical or nominal variables, or variables with very few possible values when constructing GWR models.

  • Geographically Weighted Regression is a linear model subject to the same requirements as Generalized Linear Regression . Review the diagnostics explained in How Geographically Weighted Regression (GWR) works to ensure your GWR model is properly specified. Not all described diagnostics are available in the GeoAnalytics Desktop toolbox. The How regression models go bad section in the Regression analysis basics topic also includes information for ensuring that your model is accurate.

  • You can improve the performance of the Geographically Weighted Regression (GWR) tool by doing any or all of the following:

    • Set the extent environment so you only analyze data of interest.
    • Decrease the number of neighbors in your calculation.
    • Use the Number of neighbors option instead of the Distance band option in the Neighborhood Type parameter (neighborhood_type = "NUMBER OF NEIGHBORS" in Python).
    • Use fewer explanatory variables when possible.
    • Use data that is local to where the analysis is being run.

  • This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on your GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.

  • When running GeoAnalytics Server tools, the analysis is completed on the GeoAnalytics Server. For optimal performance, make data available to the GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to your GeoAnalytics Server will be moved to your GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool, and in some cases, moving the data from ArcGIS Pro to your GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. Therefore, it is recommended that you always share your data or create a big data file share.

    Learn more about sharing data to your portal

    Learn more about creating a big data file share through Server Manager

  • Similar analysis can also be completed using the Geographically Weighted Regression tool in the Spatial Statistics toolbox. Use the tool in the Spatial Statistics toolbox to complete the following workflows:

    • Use layers local to your ArcGIS Pro machine (for example, feature classes in a file geodatabase).
    • Predict to another layer or create a raster coefficient layer.
    • Model a binary (logistic) variable or count (Poisson value) variable.
    • Define the neighborhood search using golden search or manual intervals.

Parameters

LabelExplanationData Type
Input Features

The point feature class containing the dependent and explanatory variables.

Feature Set
Dependent Variable

The numeric field containing the observed values that will be modeled.

Field
Model Type

Specifies the type of data that will be modeled.

  • Continuous (Gaussian) — The Dependent Variable value is continuous. The Gaussian model will be used, and the tool performs ordinary least squares regression.
String
Explanatory Variable(s)

A list of fields representing independent explanatory variables in the regression model.

Field
Output Features

The name of the output feature service.

String
Neighborhood Type

Specifies whether the neighborhood used is constructed as a fixed distance or allowed to vary in spatial extent depending on the density of the features.

  • Number of neighbors — The neighborhood size is a function of a specified number of neighbors included in calculations for each feature. Where features are dense, the spatial extent of the neighborhood is smaller; where features are sparse, the spatial extent of the neighborhood is larger.
  • Distance band —The neighborhood size is a constant or fixed distance for each feature.
String
Neighborhood Selection Method

Specifies how the neighborhood size will be determined.

  • User defined — The neighborhood size will be determined by either the Number of Neighbors or Distance Band parameter.
String
Number of Neighbors
(Optional)

The closest number of neighbors (up to 1000) to consider for each feature. The number must be an integer between 2 and 1000.

Long
Distance Band
(Optional)

The spatial extent of the neighborhood.

Linear Unit
Local Weighting Scheme
(Optional)

Specifies the kernel type that will be used to provide the spatial weighting in the model. The kernel defines how each feature is related to other features within its neighborhood.

  • Bisquare —A weight of 0 will be assigned to any feature outside the neighborhood specified. This is the default.
  • Gaussian —All features will receive weights, but weights become exponentially smaller the farther away they are from the target feature.
String
Data Store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is Spatiotemporal big data store. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • Spatiotemporal big data store —Output will be stored in a spatiotemporal big data store. This is the default.
  • Relational data store —Output will be stored in a relational data store.
String

Derived Output

LabelExplanationData Type
Output

The output features.

Record Set

arcpy.geoanalytics.GWR(in_features, dependent_variable, model_type, explanatory_variables, output_features, neighborhood_type, neighborhood_selection_method, {number_of_neighbors}, {distance_band}, {local_weighting_scheme}, {data_store})
NameExplanationData Type
in_features

The point feature class containing the dependent and explanatory variables.

Feature Set
dependent_variable

The numeric field containing the observed values that will be modeled.

Field
model_type

Specifies the type of data that will be modeled.

  • CONTINUOUS The dependent_variable value is continuous. The Gaussian model will be used, and the tool will perform ordinary least squares regression.
String
explanatory_variables
[explanatory_variables,...]

A list of fields representing independent explanatory variables in the regression model.

Field
output_features

The name of the output feature service.

String
neighborhood_type

Specifies whether the neighborhood used is constructed as a fixed distance or allowed to vary in spatial extent depending on the density of the features.

  • NUMBER_OF_NEIGHBORS The neighborhood size is a function of a specified number of neighbors included in calculations for each feature. Where features are dense, the spatial extent of the neighborhood is smaller; where features are sparse, the spatial extent of the neighborhood is larger.
  • DISTANCE_BANDThe neighborhood size is a constant or fixed distance for each feature.
String
neighborhood_selection_method

Specifies how the neighborhood size will be determined.

  • USER_DEFINED The neighborhood size will be determined by either the number_of_neighbors or distance_band parameter.
String
number_of_neighbors
(Optional)

The closest number of neighbors (up to 1000) to consider for each feature. The number must be an integer between 2 and 1000.

Long
distance_band
(Optional)

The spatial extent of the neighborhood.

Linear Unit
local_weighting_scheme
(Optional)

Specifies the kernel type that will be used to provide the spatial weighting in the model. The kernel defines how each feature is related to other features within its neighborhood.

  • BISQUAREA weight of 0 will be assigned to any feature outside the neighborhood specified. This is the default.
  • GAUSSIANAll features will receive weights, but weights become exponentially smaller the farther away they are from the target feature.
String
data_store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is SPATIOTEMPORAL_DATA_STORE. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • SPATIOTEMPORAL_DATA_STOREOutput will be stored in a spatiotemporal big data store. This is the default.
  • RELATIONAL_DATA_STOREOutput will be stored in a relational data store.
String

Derived Output

NameExplanationData Type
output

The output features.

Record Set

Code sample

GeographicallyWeightedRegression example (stand-alone script)

The following Python window script demonstrates how to use the GWR tool.

In this script, you'll create a model to determine which environmental variables impact high forest fire frequency.

# Name: GWR.py
# Description: Run GWR on forest fire occurrence report data to understand 
#              which variables explain reoccurring forest fires
#
# Requirements: ArcGIS GeoAnalytics Server

# Import system modules
import arcpy

# Set local variables
inputFeatures = "https://analysis.org.com/server/rest/services/DataStoreCatalogs/bigDataFileShares_EcoData/BigDataCatalogServer/fireLocations"
outputLayerName = "GWR_ForestFireFrequency"
dependentVariable = "Fire_Frequency"
explanatoryVariables = "GroundCover, TreeCover, SoilMoisture, slope"
distanceValue = "5 Miles"

# Execute GWR
arcpy.geoanalytics.gwr(inputFeatures, dependentVariable, 
                                                    "CONTINUOUS", explanatoryVariables, 
                                                    outputLayerName, "DISTANCE_BAND", 
                                                    "USER_DEFINED", None, distanceValue, 
                                                    "GAUSSIAN", "SPATIOTEMPORAL_DATA_STORE"))

Environments

Output Coordinate System

The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

  • Basic: Requires ArcGIS GeoAnalytics Server
  • Standard: Requires ArcGIS GeoAnalytics Server
  • Advanced: Requires ArcGIS GeoAnalytics Server

Related topics