Label | Explanation | Data Type |
Prediction Type
| Specifies the operation mode of the tool. The tool can be run to train a model to only assess performance, predict features, or create a prediction surface.
| String |
Input Training Features
| The layercontaining the Variable to Predict parameter and the explanatory training variables fields. | Record Set |
Output Features Name
(Optional) | The output feature layer name. | String |
Variable to Predict
(Optional) | The variable from the Input Training Features parameter containing the values to be used to train the model. This field contains known (training) values of the variable that will be used to predict at unknown locations. | Field |
Treat Variable as Categorical (Optional) | Specifies whether Variable to Predict is a categorical variable.
| Boolean |
Explanatory Variables
(Optional) | A list of fields representing the explanatory variables that help predict the value or category of Variable to Predict. Check the Categorical check box for any variables that represent classes or categories (such as land cover or presence or absence). | Value Table |
Create Variable Importance Table
(Optional) | Specifies whether the output table will contain information describing the importance of each explanatory variable used in the model.
| Boolean |
Input Prediction Features (Optional) | A feature layer representing locations where predictions will be made. This feature layer must also contain any explanatory variables provided as fields that correspond to those used from the training data. | Record Set |
Match Explanatory Variables
(Optional) | A list of Explanatory Variables specified from Input Training Features on the right and their corresponding fields from Input Prediction Features on the left. | Value Table |
Number of Trees
(Optional) | The number of trees to create in the forest model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate. The default number of trees is 100. | Long |
Minimum Leaf Size
(Optional) | The minimum number of observations required to keep a leaf (that is, the terminal node on a tree without further splits). The default minimum for regression is 5, and the default for classification is 1. For very large data, increasing these numbers will decrease the run time of the tool. | Long |
Maximum Tree Depth
(Optional) | The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. The default is data driven and depends on the number of trees created and the number of variables included. | Long |
Data Available per Tree (%)
(Optional) | The percentage of Input Training Features used for each decision tree. The default is 100 percent of the data. Samples for each tree are taken randomly from two-thirds of the data specified. Each decision tree in the forest is created using a random sample or subset (approximately two-thirds) of the training data available. Using a lower percentage of the input data for each decision tree increases the speed of the tool for very large datasets. | Long |
Number of Randomly Sampled Variables
(Optional) | The number of explanatory variables used to create each decision tree. Each decision tree in the forest is created using a random subset of the explanatory variables specified. Increasing the number of variables used in each decision tree will increase the chances of overfitting your model, particularly if there is one or more dominant variables. A common practice is to use the square root of the total number of explanatory variables if Variable to Predict is numeric, or divide the total number of explanatory variables by 3 if Variable to Predict is categorical. | Long |
Training Data Excluded for Validation (%)
(Optional) | The percentage (between 10 percent and 50 percent) of Input Training Features to reserve as the test dataset for validation. The model will be trained without this random subset of data, and the observed values for those features will be compared to the predicted values. The default is 10 percent. | Long |
Data Store
(Optional) | Specifies the ArcGIS Data Store where the output will be stored. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.
| String |
Derived Output
Label | Explanation | Data Type |
Output Trained Features | The output containing the input variables used for training, as well as the observed variable to predict parameter, and the accompanying predictions that can be used to further assess the performance of the model. | Record Set |
Variable of Importance Table | A table containing information describing the importance of each explanatory variable to be used in the created model. | Record Set |
Output Predicted Features | The layer that will receive the predictions of the model. | Record Set |