Label | Explanation | Data Type |
Prediction Type
| Specifies the operation mode of the tool. The tool can be run to train a model to only assess performance, predict features, or create a prediction surface.
| String |
Input Training Features
| The feature class containing the Variable to Predict parameter and, optionally, the explanatory training variables from fields. | Feature Layer |
Variable to Predict
(Optional) | The variable from the Input Training Features parameter containing the values to be used to train the model. This field contains known (training) values of the variable that will be used to predict at unknown locations. | Field |
Treat Variable as Categorical (Optional) | Specifies whether the Variable to Predict is a categorical variable.
| Boolean |
Explanatory Training Variables
(Optional) | A list of fields representing the explanatory variables that help predict the value or category of the Variable to Predict. Check the Categorical check box for any variables that represent classes or categories (such as land cover or presence or absence). | Value Table |
Explanatory Training Distance Features
(Optional) | Automatically creates explanatory variables by calculating a distance from the provided features to the Input Training Features. Distances will be calculated from each of the input Explanatory Training Distance Features to the nearest Input Training Features. If the input Explanatory Training Distance Features are polygons or lines, the distance attributes are calculated as the distance between the closest segments of the pair of features. | Feature Layer |
Explanatory Training Rasters (Optional) | Automatically creates explanatory training variables in your model whose values are extracted from rasters. For each feature in the Input Training Features, the value of the raster cell is extracted at that exact location. Bilinear raster resampling is used when extracting the raster value for continuous rasters. Nearest neighbor assignment is used when extracting a raster value from categorical rasters. Check the Categorical check box for any rasters that represent classes or categories such as land cover or presence or absence. | Value Table |
Input Prediction Features (Optional) | A feature class representing locations where predictions will be made. This feature class must also contain any explanatory variables provided as fields that correspond to those used from the training data if any. | Feature Layer |
Output Predicted Features
(Optional) | The output feature class to receive the results of the prediction results. | Feature Class |
Output Prediction Surface (Optional) | The output raster containing the prediction results. The default cell size will be the maximum cell size of the raster inputs. To set a different cell size, use the cell size environment setting. | Raster Dataset |
Match Explanatory Variables
(Optional) | A list of the Explanatory Variables specified from the Input Training Features on the right and their corresponding fields from the Input Prediction Features on the left. | Value Table |
Match Distance Features
(Optional) | A list of the Explanatory Distance Features specified for the Input Training Features on the right. Corresponding feature sets should be specified for the Input Prediction Features on the left. Explanatory Distance Features that are more appropriate for the Input Prediction Features can be provided if those used for training are in a different study area or time period. | Value Table |
Match Explanatory Rasters
(Optional) | A list of the Explanatory Rasters specified for the Input Training Features on the right. Corresponding rasters should be specified for the Input Prediction Features or the Prediction Surface to be created on the left. Explanatory Rasters that are more appropriate for the Input Prediction Features can be provided if those used for training are in a different study area or time period. | Value Table |
Output Trained Features
(Optional) | Output Trained Features will contain all explanatory variables used for training (including sampled raster values and distance calculations), as well as the observed Variable to Predict field and accompanying predictions that can be used to further assess performance of the trained model. | Feature Class |
Output Variable Importance Table
(Optional) | If specified, the table will contain information describing the importance of each explanatory variable (fields, distance features, and rasters) used in the model created. The chart created from this table can be accessed in the Contents pane. | Table |
Convert Polygons to Raster Resolution for Training
(Optional) | Specifies how polygons are treated when training the model if the Input Training Features are polygons with a categorical Variable to Predict and only Explanatory Training Rasters have been specified.
| Boolean |
Number of Trees
(Optional) | The number of trees to create in the forest model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate. The default number of trees is 100. | Long |
Minimum Leaf Size
(Optional) | The minimum number of observations required to keep a leaf (that is the terminal node on a tree without further splits). The default minimum for regression is 5 and the default for classification is 1. For very large data, increasing these numbers will decrease the run time of the tool. | Long |
Maximum Tree Depth
(Optional) | The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. The default is data driven and depends on the number of trees created and the number of variables included. | Long |
Data Available per Tree (%)
(Optional) | Specifies the percentage of the Input Training Features used for each decision tree. The default is 100 percent of the data. Samples for each tree are taken randomly from two-thirds of the data specified. Each decision tree in the forest is created using a random sample or subset (approximately two-thirds) of the training data available. Using a lower percentage of the input data for each decision tree increases the speed of the tool for very large datasets. | Long |
Number of Randomly Sampled Variables
(Optional) | Specifies the number of explanatory variables used to create each decision tree. Each of the decision trees in the forest is created using a random subset of the explanatory variables specified. Increasing the number of variables used in each decision tree will increase the chances of overfitting your model particularly if there is one or more dominant variables. A common practice is to use the square root of the total number of explanatory variables (fields, distances, and rasters combined) if your Variable to Predict is numeric or divide the total number of explanatory variables (fields, distances, and rasters combined) by 3 if Variable to Predict is categorical. | Long |
Training Data Excluded for Validation (%)
(Optional) | Specifies the percentage (between 10 percent and 50 percent) of Input Training Features to reserve as the test dataset for validation. The model will be trained without this random subset of data, and the observed values for those features will be compared to the predicted values. The default is 10 percent. | Double |
Output Classification Performance Table (Confusion Matrix)
(Optional) | If specified, creates a confusion matrix for classification summarizing the performance of the model created. This table can be used to calculate other diagnostics beyond the accuracy and sensitivity measures the tool calculates in the output messages. | Table |
Output
Validation Table (Optional) | If the Number of Runs for Validation specified is greater than 2, this table creates a chart of the distribution of R2 for each model. This distribution can be used to assess the stability of your model. | Table |
Compensate for Sparse Categories
(Optional) | If there are categories in your dataset that don't occur as often as others, checking this parameter will ensure that each category is represented in each tree.
| Boolean |
Number of Runs for Validation
(Optional) | The tool will run for the number of iterations specified. The distribution of the R2 for each run can be displayed using the Output Validation Table parameter. When this is set and predictions are being generated, only the model that produced the highest R2 value will be used for predictions. | Long |
Calculate Uncertainty
(Optional) | Specifies whether prediction uncertainty will be calculated when training, predicting to features, or predicting to raster.
| Boolean |
Derived Output
Label | Explanation | Data Type |
Output Uncertainty Raster Layers | When calculate_uncertainty is checked, the tool will calculate a 90 percent prediction interval around each predicted value of the variable_to_predict. | Raster Layer |