Multiscale Geographically Weighted Regression (MGWR) (Spatial Statistics)—ArcGIS Pro

Summary

Performs multiscale geographically weighted regression (MGWR), which is a local form of linear regression that models spatially varying relationships.

MGWR is a local regression model where the coefficient values may vary across space. The bandwidth used to define the neighborhood around each feature may also vary between explanatory variables. This allows the model to capture the varying scales of the relationships between the explanatory variables and dependent variable. These neighborhoods are used with a geographically weighted kernel to estimate the coefficient of each explanatory variable in the regression models.

Learn more about how Multiscale Geographically Weighted Regression (MGWR) works

Illustration

A bisquare kernel is applied to the neighborhood of each explanatory variable. Each explanatory variable uses a different bandwidth to capture varying spatial relationships.

Usage

The current model only accepts dependent variables representing continuous values. Do not use the tool with count, rate, or binary (indicator) dependent variables. Currently, the Model Type parameter's Continuous option is the only supported option. Other options may be added in future releases.
If a noncontinuous dependent variable is provided, the results may lack meaning, such as predictions of negative counts or probabilities larger than one.
Caution:
The explanatory variables (not dependent variable) can by any type, but use caution when using count, rate, or binary explanatory variables. Local regression models using noncontinuous explanatory variables frequently experience local multicollinearity problems. If any explanatory variables are highly correlated either globally or locally, the tool may fail with error 110222 due to multicollinearity.
Learn more about multicollinearity
There should be both global and local variation in the fields provided in the Dependent Variable and Explanatory Variables parameters. Do not use fields that contain a single constant value, indicator explanatory variables that represent different spatial regimes, or categorical variables that are spatially clustered.
To use categorical explanatory variables, the categories must be converted to indicator (0 or 1) variables using the Encode Field tool. These indicator variables can then be used as explanatory variables in the Multiscale Geographically Weighted Regression (MGWR) tool.
There are three options for the Neighborhood Selection Method parameter that will be used to estimate the optimal spatial scale separately for each of the explanatory variables:
- Golden Search—Determines the number of neighbors or distance band for each explanatory variable using the Golden Search algorithm. This method tests multiple combinations of values for each explanatory variable between a specified minimum and maximum value. The procedure is iterative and uses the results from previous values to select each new combination to be tested. The final values selected will have the smallest AICc. For the number of neighbors option, the minimum and maximum are specified using the Minimum Number of Neighbors and Maximum Number of Neighbors parameters. For the distance band option, the minimum and maximum are specified using the Minimum Search Distance and Maximum Search Distance parameters. The minimum and maximum values are shared for all explanatory variables, but the estimated number of neighbors or distance band will be different for each explanatory variable (unless two or more have the same spatial scale). This option takes the longest to calculate, especially for large or highly-dimensional datasets.
- Manual Intervals—Determines the number of neighbors or distance band for each explanatory by incrementing the number of neighbors or distance band from a minimum value. For the number of neighbors option, the method starts with the value of the Minimum Number of Neighbors parameter. The number of neighbors is then increased by the value of the Number of Neighbors Increment parameter. This increment is repeated a certain number of times, specified using the Number of Increments parameter. For the distance band option, the method uses the Minimum Search Distance, Search Distance Increment, and Number of Increments parameters. The number of neighbors or distance band used by each explanatory variable will be one of the tested values, but the values may be different for each explanatory variable. This option is faster than Golden Search and frequently estimates comparable neighborhoods.
- User Defined—The number of neighbors or distance band that is used by all explanatory variables. The value is specified using the Number of Neighbors or Distance Band parameter. This option provides the most control if you know optimal values.
By default, the dependent parameters of each neighborhood selection method apply to all explanatory variables. However, customized neighborhood selection parameters can be provided only for particular explanatory variables using the corresponding override parameter for the neighborhood type and selection method: Number of Neighbors for Golden Search, Number of Neighbors for Manual Intervals, User Defined Number of Neighbors, Search Distance for Golden Search, Search Distance for Manual Intervals, or User Defined Search Distance. To use customized neighborhoods for particular explanatory variables, provide the explanatory variables in the first column of the corresponding override parameter, and provide the customized options of the neighborhood in the other columns. The columns have the same names as the parameters they override; for example, if you are using manual intervals with distance band, the Search Distance Increment column specifies customized values of the Search Distance Increment parameter. On the tool dialog box, customized neighborhood parameters are in the Customized Neighborhood Options parameter category pull-down menu.
For example, suppose you use three explanatory variables with the Golden Search neighborhood type with 30 minimum neighbors and 40 maximum neighbors. If the tool is run with these parameters, each of the three explanatory variables will use between 30 and 40 neighbors. If you instead want to use between 45 and 55 neighbors for only the second explanatory variable, you can provide the second explanatory variable, the custom minimum, and the custom maximum in the columns of the Number of Neighbors for Golden Search parameter value. With these parameters, the first and third explanatory variables will use between 30 and 40 neighbors, and the second explanatory variable will use between 45 and 55 neighbors.
Several model diagnostics are shown in the geoprocessing messages that can be used to determine the reliability of the MGWR model. Review these diagnostics before viewing any other tool outputs. If the model diagnostics are acceptable, view the charts and symbology of the output features to better understand the results.
Learn more about model diagnostics and tool outputs
Each MGWR local model is subject to the same requirements as the Generalized Linear Regression tool. The How regression models go bad section in the Regression analysis basics topic includes tips to ensure that a model is accurate. For more information about regression analysis, see What they don't tell you about regression analysis.
For the most accurate results, project the data to a projected coordinate system if the coordinates are stored as latitude and longitude. This is especially important when using the Distance Band option of the Neighborhood Type parameter because the optimizations require accurate measures of distance.
If you check the Scale Data parameter, a layer will be created for each scaled coefficient. The coefficients that are rescaled to original data units are stored as fields in the output feature class. If coefficient rasters are created with the Coefficient Raster Workspace parameter, layers are created of the scaled coefficient rasters, and the rescaled rasters are saved in the workspace.
It is recommended that you scale the explanatory and dependent variables. This is especially important when the range of values of the variables vary significantly because scaling equalizes the variances of the values of the explanatory variables. When numerically estimating the bandwidth and coefficients of each local model, the estimations usually converge faster and to more accurate values when each variable contributes equal amounts to the total variance of the data. If the explanatory variables have different variances, the variables with larger variances have more influence on each step of the iterative estimation. In most cases, this influence will negatively affect the final bandwidths and coefficients for the model.
In some cases, the Manual Interval option for the Neighborhood Selection Method parameter may estimate a lower AICc value than the Golden Search option even when searching the same range of distances or neighbors. Similarly, if you perform Golden Search or manual intervals and then provide the estimated bandwidths or numbers of neighbors using the User Defined option, the outputs will not be exactly the same. Both of these behaviors are due to the path dependencies of the Golden Search and backfitting algorithms that are used to estimate MGWR model parameters. To reproduce the same MGWR results, you must run the tool with all the same parameter settings.

Parameters

Label	Explanation	Data Type
Input Features	The feature class containing the dependent and explanatory variables.	Feature Layer
Dependent Variable	The numeric field containing the observed values that will be modeled.	Field
Model Type	Specifies the regression model based on the values of the dependent variable. Currently, only continuous data is supported, and the parameter is hidden in the Geoprocessing pane. Do not use categorical, count, or binary dependent variables. Continuous—The dependent variable represents continuous values. This is the default.	String
Explanatory Variables	A list of fields that will be used as independent explanatory variables in the regression model.	Field
Output Features	The new feature class containing the coefficients, residuals, and significance levels of the MGWR model.	Feature Class
Neighborhood Type	Specifies whether the neighborhood will be a fixed distance or allowed to vary spatially depending on the density of the features. Number of Neighbors— The neighborhood size will be a specified number of closest neighbors for each feature. Where features are dense, the spatial extent of the neighborhood will be smaller; where features are sparse, the spatial extent of the neighborhood will be larger. Distance Band—The neighborhood size will be a constant or fixed distance for each feature.	String
Neighborhood Selection Method	Specifies how the neighborhood size will be determined. Golden Search—An optimal distance or number of neighbors will be identified by minimizing the AICc value using the Golden Search algorithm. Manual Intervals—A distance or number of neighbors will be identified by testing a range of values and choosing the value with the smallest AICc. If the Neighborhood Type parameter is set to Distance Band, the minimum value of this range is provided by the Minimum search distance parameter. The minimum value is then incremented by the value specified in the Search Distance Increment parameter. This is repeated the number of times specified by the Number of Increments parameter. If the Neighborhood Type parameter is set to Number of Neighbors, the minimum value, increment size, and number of increments are provided in the Minimum Number of Neighbors, Number of Neighbors Increment, and Number of Increments parameters, respectively. User Defined— The neighborhood size will be specified by either the Number of Neighbors parameter or the Distance Band parameter.	String
Minimum Number of Neighbors (Optional)	The minimum number of neighbors that each feature will include in its calculation. It is recommended that you use at least 30 neighbors.	Long
Maximum Number of Neighbors (Optional)	The maximum number of neighbors that each feature will include in its calculations.	Long
Distance Unit (Optional)	Specifies the unit of distance that will be used to measure the distances between features. US Survey Feet—Distances will be measured in US survey feet. Meters—Distances will be measured in meters. Kilometers—Distances will be measured in kilometers. US Survey Miles—Distances will be measured in US survey miles.	String
Minimum Search Distance (Optional)	The minimum search distance that will be applied to every explanatory variable. It is recommended that you provide a minimum distance that includes at least 30 neighbors for each feature.	Double
Maximum Search Distance (Optional)	The maximum neighborhood search distance that will be applied to all variables.	Double
Number of Neighbors Increment (Optional)	The number of neighbors by which manual intervals will increase for each neighborhood test.	Long
Search Distance Increment (Optional)	The distance by which manual intervals will increase for each neighborhood test.	Double
Number of Increments (Optional)	The number of neighborhood sizes to test when using manual intervals. The first neighborhood size is the value of the Minimum Number of Neighbors or Minimum Search Distance parameter.	Long
Number of Neighbors (Optional)	The number of neighbors that will be used for the user-defined neighborhood type.	Long
Distance Band (Optional)	The size of the distance band that will be used for the user-defined neighborhood type. All features within this distance will be included as neighbors in the local models.	Double
Number of Neighbors for Golden Search (Optional)	The customized Golden Search options for individual explanatory variables. For each explanatory variable to be customized, provide the variable, the minimum number of neighbors, and the maximum number of neighbors in the columns.	Value Table
Number of Neighbors for Manual Intervals (Optional)	The customized manual intervals options for individual explanatory variables. For each explanatory variable to be customized, provide the minimum number of neighbors, number of neighbors increment, and number of increments in the columns.	Value Table
User Defined Number of Neighbors (Optional)	The customized user-defined options for individual explanatory variables. For each explanatory variable to be customized, provide the number of neighbors.	Value Table
Search Distance for Golden Search (Optional)	The customized Golden Search options for individual explanatory variables. For each explanatory variable to be customized, provide the variable, the minimum search distance, and the maximum search distance in the columns.	Value Table
Search Distance for Manual Intervals (Optional)	The customized manual intervals options for individual explanatory variables. For each variable to be customized, provide the variable, the minimum search distance, search distance increments, and number of increments in the columns.	Value Table
User Defined Search Distance (Optional)	The customized user-defined options for individual explanatory variables. For each variable to be customized, provide the variable and the distance band in the columns.	Value Table
Prediction Locations (Optional)	A feature class with the locations where estimates will be computed. Each feature in this dataset should contain a value for every explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input feature class data. These feature locations should be close to (within 115 percent of the extent) or within the same study area as the input features.	Feature Layer
Explanatory Variables to Match (Optional)	The explanatory variables from the prediction locations that match corresponding explanatory variables from the input features.	Value Table
Output Predicted Features (Optional)	The output feature class that will receive dependent variable estimates for every prediction location.	Feature Class
Robust Prediction (Optional)	Specifies the features that will be used in the prediction calculations. Checked—Features with values greater than three standard deviations from the mean (value outliers) and features with weights of 0 (spatial outliers) will be excluded from the prediction calculations but will receive predictions in the output feature class. This is the default. Unchecked—Every feature will be used in the prediction calculations.	Boolean
Local Weighting Scheme (Optional)	Specifies the kernel type that will be used to provide the spatial weighting in the model. The kernel defines how each feature is related to other features within its neighborhood. Bisquare—A weight of zero will be assigned to any feature outside the neighborhood specified. This is the default. Gaussian—All features will receive weights, but weights become exponentially smaller the farther away they are from the target feature.	String
Output Neighborhood Table (Optional)	A table containing the output statistics of the MGWR model. A bar chart of estimated bandwidths or numbers of neighbors is included with the output.	Table View
Coefficient Raster Workspace (Optional)	The workspace where the coefficient rasters will be created. When this workspace is provided, rasters are created for the intercept and every explanatory variable. This parameter is only available with a Desktop Advanced license. If a directory is provided, the rasters will be TIFF (.tif) raster type.	Workspace
Scale Data (Optional)	Specifies whether the values of the explanatory and dependent variables will be scaled to have mean zero and standard deviation one prior to fitting the model. Checked—The values of the variables will be scaled. The results will contain scaled and unscaled versions of the explanatory variable coefficients. Unchecked—The values of the variables will not be scaled. All coefficients will be unscaled and in original data units.	Boolean

Derived Output

Label	Explanation	Data Type
Coefficient Raster Layers	The output rasters of explanatory variable coefficients.	Raster
Output Layer Group	A group layer of the outputs. Each layer in the group represents a different field of the output features.	Group Layer

arcpy.stats.MGWR(in_features, dependent_variable, model_type, explanatory_variables, output_features, neighborhood_type, neighborhood_selection_method, {minimum_number_of_neighbors}, {maximum_number_of_neighbors}, {distance_unit}, {minimum_search_distance}, {maximum_search_distance}, {number_of_neighbors_increment}, {search_distance_increment}, {number_of_increments}, {number_of_neighbors}, {distance_band}, {number_of_neighbors_golden}, {number_of_neighbors_manual}, {number_of_neighbors_defined}, {distance_golden}, {distance_manual}, {distance_defined}, {prediction_locations}, {explanatory_variables_to_match}, {output_predicted_features}, {robust_prediction}, {local_weighting_scheme}, {output_table}, {coefficient_raster_workspace}, {scale})

Name	Explanation	Data Type
in_features	The feature class containing the dependent and explanatory variables.	Feature Layer
dependent_variable	The numeric field containing the observed values that will be modeled.	Field
model_type	Specifies the regression model based on the values of the dependent variable. Currently, only continuous data is supported, and the parameter is hidden in the Geoprocessing pane. Do not use categorical, count, or binary dependent variables. CONTINUOUS—The dependent variable represents continuous values. This is the default.	String
explanatory_variables [explanatory_variables,...]	A list of fields that will be used as independent explanatory variables in the regression model.	Field
output_features	The new feature class containing the coefficients, residuals, and significance levels of the MGWR model.	Feature Class
neighborhood_type	Specifies whether the neighborhood will be a fixed distance or allowed to vary spatially depending on the density of the features. NUMBER_OF_NEIGHBORS— The neighborhood size will be a specified number of closest neighbors for each feature. Where features are dense, the spatial extent of the neighborhood will be smaller; where features are sparse, the spatial extent of the neighborhood will be larger. DISTANCE_BAND—The neighborhood size will be a constant or fixed distance for each feature.	String
neighborhood_selection_method	Specifies how the neighborhood size will be determined. GOLDEN_SEARCH—An optimal distance or number of neighbors will be identified by minimizing the AICc value using the Golden Search algorithm. MANUAL_INTERVALS—A distance or number of neighbors will be identified by testing a range of values and choosing the value with the smallest AICc. If the neighborhood_type parameter is set to DISTANCE_BAND, the minimum value of this range is provided by the minimum_search_distance parameter. The minimum value is then incremented by the value specified in the search_distance_increment parameter. This is repeated the number of times specified by the number_of_increments parameter. If the neighborhood_type parameter is set to NUMBER_OF_NEIGHBORS, the minimum value, increment size, and number of increments are provided by the minimum_number_of_neighbors, number_of_neighbors_increment, and number_of_increments parameters, respectively. USER_DEFINED— The neighborhood size will be specified by either the number_of_neighbors parameter or the distance_band parameter.	String
minimum_number_of_neighbors (Optional)	The minimum number of neighbors that each feature will include in its calculation. It is recommended that you use at least 30 neighbors.	Long
maximum_number_of_neighbors (Optional)	The maximum number of neighbors that each feature will include in its calculations.	Long
distance_unit (Optional)	Specifies the unit of distance that will be used to measure the distances between features. FEET—Distances will be measured in US survey feet. METERS—Distances will be measured in meters. KILOMETERS—Distances will be measured in kilometers. MILES—Distances will be measured in US survey miles.	String
minimum_search_distance (Optional)	The minimum search distance that will be applied to every explanatory variable. It is recommended that you provide a minimum distance that includes at least 30 neighbors for each feature.	Double
maximum_search_distance (Optional)	The maximum neighborhood search distance that will be applied to all variables.	Double
number_of_neighbors_increment (Optional)	The number of neighbors by which manual intervals will increase for each neighborhood test.	Long
search_distance_increment (Optional)	The distance by which manual intervals will increase for each neighborhood test.	Double
number_of_increments (Optional)	The number of neighborhood sizes to test when using manual intervals. The first neighborhood size is the value of the minimum_number_of_neighbors or minimum_search_distance parameter.	Long
number_of_neighbors (Optional)	The number of neighbors that will be used for the user-defined neighborhood type.	Long
distance_band (Optional)	The size of the distance band that will be used for the user-defined neighborhood type. All features within this distance will be included as neighbors in the local models.	Double
number_of_neighbors_golden [number_of_neighbors_golden,...] (Optional)	The customized Golden Search options for individual explanatory variables. For each explanatory variable to be customized, provide the variable, the minimum number of neighbors, and the maximum number of neighbors in the columns.	Value Table
number_of_neighbors_manual [number_of_neighbors_manual,...] (Optional)	The customized manual intervals options for individual explanatory variables. For each explanatory variable to be customized, provide the minimum number of neighbors, number of neighbors increment, and number of increments in the columns.	Value Table
number_of_neighbors_defined [number_of_neighbors_defined,...] (Optional)	The customized user-defined options for individual explanatory variables. For each explanatory variable to be customized, provide the number of neighbors.	Value Table
distance_golden [distance_golden,...] (Optional)	The customized Golden Search options for individual explanatory variables. For each explanatory variable to be customized, provide the variable, the minimum search distance, and the maximum search distance in the columns.	Value Table
distance_manual [distance_manual,...] (Optional)	The customized manual intervals options for individual explanatory variables. For each variable to be customized, provide the variable, the minimum search distance, search distance increments, and number of increments in the columns.	Value Table
distance_defined [distance_defined,...] (Optional)	The customized user-defined options for individual explanatory variables. For each variable to be customized, provide the variable and the distance band in the columns.	Value Table
prediction_locations (Optional)	A feature class with the locations where estimates will be computed. Each feature in this dataset should contain a value for every explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input feature class data. These feature locations should be close to (within 115 percent of the extent) or within the same study area as the input features.	Feature Layer
explanatory_variables_to_match [explanatory_variables_to_match,...] (Optional)	The explanatory variables from the prediction locations that match corresponding explanatory variables from the input features.	Value Table
output_predicted_features (Optional)	The output feature class that will receive dependent variable estimates for every prediction location.	Feature Class
robust_prediction (Optional)	Specifies the features that will be used in the prediction calculations. ROBUST—Features with values greater than three standard deviations from the mean (value outliers) and features with weights of 0 (spatial outliers) will be excluded from the prediction calculations but will receive predictions in the output feature class. This is the default. NON_ROBUST—Every feature will be used in the prediction calculations.	Boolean
local_weighting_scheme (Optional)	Specifies the kernel type that will be used to provide the spatial weighting in the model. The kernel defines how each feature is related to other features within its neighborhood. BISQUARE—A weight of zero will be assigned to any feature outside the neighborhood specified. This is the default. GAUSSIAN—All features will receive weights, but weights become exponentially smaller the farther away they are from the target feature.	String
output_table (Optional)	A table containing the output statistics of the MGWR model. A bar chart of estimated bandwidths or numbers of neighbors is included with the output.	Table View
coefficient_raster_workspace (Optional)	The workspace where the coefficient rasters will be created. When this workspace is provided, rasters are created for the intercept and every explanatory variable. This parameter is only available with a Desktop Advanced license. If a directory is provided, the rasters will be TIFF (.tif) raster type.	Workspace
scale (Optional)	Specifies whether the values of the explanatory and dependent variables will be scaled to have mean zero and standard deviation one prior to fitting the model. SCALE_DATA—The values of the variables will be scaled. The results will contain scaled and unscaled versions of the explanatory variable coefficients. NO_SCALE_DATA—The values of the variables will not be scaled. All coefficients will be unscaled and in original data units.	Boolean

Derived Output

Name	Explanation	Data Type
coefficient_raster_layers	The output rasters of explanatory variable coefficients.	Raster
output_layer_group	A group layer of the outputs. Each layer in the group represents a different field of the output features.	Group Layer

Code sample

MGWR example 1: (Python window)

The following Python window script demonstrates how to use the MGWR function.

import arcpy
arcpy.stats.MGWR("r\data.gdb\house_price", "price", "CONTINUOUS", 
                 "review;beds;areas", r"data.gdb\house_price_fit_model", 
                 "DISTANCE_BAND", "GOLDEN_SEARCH", None, None, None, None, 
                 None, None, None, None, None, None, None, None, None, 
                 "review # #;beds # #; areas # #", None, None, 
                 r"data.gdb\house_price", "review review;beds beds; areas areas", 
                 r"data.gdb\house_price_prediction", "ROBUST", "BISQUARE")

MGWR example 1 (stand-alone script)

The following stand-alone Python script demonstrates how to use the MGWR function.

# Run MGWR to predict house prices using "Number of Neighbors" and "Golden Search"
# Import modules
import arcpy

# Set the current workspace
arcpy.env.workspace = "C:/data"

# Run MGWR 
arcpy.stats.MGWR("r\data.gdb\house_price", "price", "CONTINUOUS", 
                 "review;beds;areas", r"data.gdb\house_price_fit_model", 
                 "DISTANCE_BAND", "GOLDEN_SEARCH", None, None, None, None, 
                 None, None, None, None, None, None, None, None, None, 
                 "review # #;beds # #; areas # #", None, None, 
                 r"data.gdb\house_price", "review review;beds beds; areas areas", 
                 r"data.gdb\house_price_prediction", "ROBUST", "BISQUARE")

Environments

Output Coordinate System, Geographic Transformations, Cell Size, Snap Raster, Parallel Processing Factor

Licensing information

Basic: Limited
Standard: Limited
Advanced: Yes

Summary

Illustration

Usage

Caution:

Parameters

Derived Output

Derived Output

Code sample

Environments

Licensing information

Related topics

In this topic