Label  Explanation  Data Type 
Input Features
 The feature class containing the dependent and independent variables.  Feature Layer 
Dependent Variable
 The numeric field containing the observed values to be modeled.  Field 
Model Type
 Specifies the type of data that will be modeled.
 String 
Output Features  The new feature class that will contain the dependent variable estimates and residuals.  Feature Class 
Explanatory Variable(s)  A list of fields representing independent explanatory variables in the regression model.  Field 
Explanatory Distance Features
(Optional)  Automatically creates explanatory variables by calculating a distance from the provided features to the Input Features values. Distances will be calculated from each of the input Explanatory Distance Features values to the nearest Input Features value. If the input Explanatory Distance Features values are polygons or lines, the distance attributes are calculated as the distance between the closest segments of the pair of features.  Feature Layer 
Prediction Locations (Optional)  A feature class containing features representing locations where estimates will be computed. Each feature in this dataset should contain values for all the explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input feature class data.  Feature Layer 
Match Explanatory Variables
(Optional)  Matches the explanatory variables in the Prediction Locations parameter to corresponding explanatory variables from the Input Feature Class parameter.  Value Table 
Match Distance Features
(Optional)  Matches the distance features specified for the Prediction Locations parameter on the left to corresponding distance features for the Input Features parameter on the right.  Value Table 
Output Predicted Features
(Optional)  The output feature class that will receive dependent variable estimates for each Prediction Location value.  Feature Class 
Summary
Performs generalized linear regression (GLR) to generate predictions or to model a dependent variable in terms of its relationship to a set of explanatory variables. This tool can be used to fit continuous (OLS), binary (logistic), and count (Poisson) models.
Illustration
Usage

The primary output for this tool is a report file that is available as messages at the bottom of the Geoprocessing pane during tool execution. To access the messages, hover over the progress bar, click the popout button, or expand the messages section in the Geoprocessing pane. You can also access the messages of a previous run of the tool in the geoprocessing history.
Use the Input Features parameter with a field representing the phenomena you are modeling (the Dependent Variable value) and one or more fields representing the Explanatory Variable(s) value. These fields must be numeric and have a range of values. Features that contain missing values in the dependent or explanatory variable will be excluded from the analysis; however, you can use the Fill Missing Values tool to complete the dataset before running the tool.

The Generalized Linear Regression tool also produces Output Features values with coefficient information and diagnostics. The output feature class is automatically added to the table of contents with a rendering scheme applied to model residuals. A full explanation of each output is provided in How Generalized Linear Regression works.
The option you choose for the Model Type parameter depends on the data you are modeling. It is important to use the correct model for your analysis to obtain accurate results from the regression analysis.

Model summary results and diagnostics are written to the messages window and charts will be created below the output feature class. The diagnostics and charts reported depend on the Model Type parameter value and are explained in detail in the How Generalized Linear Regression works topic.

Results from GLR are only reliable if the data and regression model satisfy all of the assumptions inherently required by this method. Review all resulting diagnostics and consult the Common regression problems, consequences, and solutions table in Regression analysis basics to ensure that the model is properly specified.
The Dependent Variable and Explanatory Variable(s) parameters should be numeric fields containing a variety of values. This tool cannot solve when variables have the same values (all the values for a field are 9.0, for example).
Explanatory variables can come from fields or be calculated from distance features using the Explanatory Distance Features parameter. You can use a combination of these explanatory variable types, but at least one type is required. The Explanatory Distance Features parameter values are used to automatically create explanatory variables representing a distance from the provided features to the Input Features parameter values. Distances will be calculated from each of the input Explanatory Distance Features values to the nearest Input Features values. If the input Explanatory Distance Features values are polygons or lines, the distance attributes are calculated as the distance between the closest segments of the pair of features. However, distances are calculated differently for polygons and lines. See How proximity tools calculate distance for details.
It is recommended that you use projected data when the Explanatory Distance Features values are a component of the analysis. It is also recommended that the data be projected using a projected coordinate system (rather than a geographic coordinate system) to accurately measure distances.

When there is statistically significant spatial autocorrelation of the regression residuals, the GLR model will be considered incorrectly specified and, consequently, results from GLR are unreliable. Run the Spatial Autocorrelation tool on the regression residuals to assess this potential problem. Statistically significant spatial autocorrelation of regression residuals may indicate that one or more key explanatory variables are missing from the model.

Visually inspect the over and underpredictions evident in the regression residuals to see if they provide clues about potential missing variables from the regression model. It may help to run Hot Spot Analysis on the residuals to help visualize spatial clustering of the over and underpredictions.

When misspecification is the result of trying to model nonstationarity variables using a global model (GLR is a global model), you can use the Geographically Weighted Regression tool to improve predictions and better understand the nonstationarity (regional variation) inherent in the explanatory variables.

When the result of a computation is infinity or undefined, the output for nonshapefiles will be Null; for shapefiles, the output will be DBL_MAX (1.7976931348623158e+308, for example).
Caution:
When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from nonshapefile inputs may store or interpret null values as zero. In some cases, nulls are stored as very large negative values in shapefiles. This can lead to unexpected results. See Geoprocessing considerations for shapefile output for more information.
Parameters
arcpy.stats.GeneralizedLinearRegression(in_features, dependent_variable, model_type, output_features, explanatory_variables, {distance_features}, {prediction_locations}, {explanatory_variables_to_match}, {explanatory_distance_matching}, {output_predicted_features})
Name  Explanation  Data Type 
in_features  The feature class containing the dependent and independent variables.  Feature Layer 
dependent_variable  The numeric field containing the observed values to be modeled.  Field 
model_type  Specifies the type of data that will be modeled.
 String 
output_features  The new feature class that will contain the dependent variable estimates and residuals.  Feature Class 
explanatory_variables [explanatory_variables,...]  A list of fields representing independent explanatory variables in the regression model.  Field 
distance_features [distance_features,...] (Optional)  Automatically creates explanatory variables by calculating a distance from the provided features to the in_features values. Distances will be calculated from each of the input distance_features values to the nearest in_features value. If the input distance_features values are polygons or lines, the distance attributes are calculated as the distance between the closest segments of the pair of features.  Feature Layer 
prediction_locations (Optional)  A feature class containing features representing locations where estimates will be computed. Each feature in this dataset should contain values for all the explanatory variables specified. The dependent variable for these features will be estimated using the model calibrated for the input feature class data.  Feature Layer 
explanatory_variables_to_match [[Field from Prediction Locations, Field from Input Features],...] (Optional)  Matches the explanatory variables in the prediction_locations parameter to corresponding explanatory variables from the in_features parameter—for example, [["LandCover2000", "LandCover2010"], ["Income", "PerCapitaIncome"]].  Value Table 
explanatory_distance_matching [[Prediction Distance Features, Input Explanatory Distance Features],...] (Optional)  Matches the distance features specified for the features_to_predict parameter on the left to the corresponding distance features for the in_features parameter on the right—for example, [["stores2010", "stores2000"], ["freeways2010", "freeways2000"]].  Value Table 
output_predicted_features (Optional)  The output feature class that will receive dependent variable estimates for each prediction_location value. The output feature class that will receive dependent variable estimates for each Prediction Location value.  Feature Class 
Code sample
The following Python window script demonstrates how to use the GeneralizedLinearRegression function.
import arcpy
arcpy.env.workspace = r"c:\data\project_data.gdb"
arcpy.stats.GeneralizedLinearRegression("landslides", "occurred",
"BINARY", "out_features",
"eastness;northness;elevation;slope",
"rivers")
The following standalone Python script demonstrates how to use the GeneralizedLinearRegression function.
# Linear regression using a count model to predict the number of crimes.
# The depend variable (total number of crimes) is predicted using total
# population, the median age of housing, average household income and the
# distance to the central business district (CBD)
import arcpy
# Set the current workspace (to avoid having to specify the full path to
# the feature classes each time)
arcpy.env.workspace = r"c:\data\project_data.gdb"
arcpy.stats.GeneralizedLinearRegression("crime_counts",
"total_crimes", "COUNT", "out_features", "YRBLT;TOTPOP;AVGHINC",
"CBD", "prediction_locations", "YRBLT YRBLT;TOTPOP TOTPOP;AVGHINC AVGHINC",
"CBD CBD", "predicted_features")
Environments
Licensing information
 Basic: Yes
 Standard: Yes
 Advanced: Yes