Label | Explanation | Data Type |
Input Rasters | The single-band, multidimensional, or multiband raster datasets, or mosaic datasets, containing explanatory variables. | Mosaic Dataset; Mosaic Layer; Raster Dataset; Raster Layer; Image Service; String |
Target Raster or Points
| The raster or point feature class containing the target variable (dependant variable) data. | Feature Class; Feature Layer; Raster Dataset; Raster Layer; Mosaic Layer; Image Service |
Output Regression Definition File
| A JSON format file with an .ecd extension that contains attribute information, statistics, or other information for the classifier. | File |
Target Value Field
(Optional) | The field name of the information to model in the target point feature class or raster dataset. | Field |
Target Dimension Field
(Optional) | A date field or numeric field in the input point feature class that defines the dimension values. | Field |
Raster Dimension (Optional) | The dimension name of the input multidimensional raster (explanatory variables) that links to the dimension in the target data. | String |
Output Importance Table (Optional) | A table containing information describing the importance of each explanatory variable used in the model. A larger number indicates the corresponding variable is more correlated to the predicted variable and will contribute more in prediction. Values range between 0 and 1, and the sum of all the values equals 1. | Table |
Max Number of Trees
(Optional) | The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off. The number of trees increases the processing time linearly. The default is 50. | Long |
Max Tree Depth
(Optional) | The maximum depth of each tree in the forest. Depth determines the number of rules each tree can create, resulting in a decision. Trees will not grow any deeper than this setting. The default is 30. | Long |
Max Number of Samples
(Optional) | The maximum number of samples that will be used for the regression analysis. A value that is less than or equal to 0 means that the system will use all the samples from the input target raster or point feature class to train the regression model. The default value is 10,000. | Long |
Average Points Per Cell
(Optional) | Specifies whether the average will be calculated when multiple training points fall into one cell. This parameter is applicable only when the input target is a point feature class.Unchecked—All points will be used when multiple training points fall into a single cell. This is the default.Checked—The average value of the training points within a cell will be calculated.
| Boolean |
Available with Image Analyst license.
Summary
Models the relationship between explanatory variables (independent variables) and a target dataset (dependent variable).
Usage
The tool can be used to train with a variety of data types. The input rasters (explanatory variables) can be one raster or a list of rasters, a single band or a multiband in which each band is an explanatory variable, a multidimensional raster in which the variables in the raster are the explanatory variables, or a combination of data types.
An input mosaic dataset will be treated as a raster dataset (not a collection of rasters). To use a collection of rasters as input, build multidimensional info for the mosaic dataset and use the result as input.
The input target can be a feature class or a raster. When the target is a feature, the Target Value Field value must be set to a numeric field.
If the input target feature has a date field or a field that defines dimension, specify a value for both Target Value Field and Target Dimension Field.
The input raster target can also be a multidimensional raster.
If the input target is multidimensional, the corresponding input explanatory variables must have at least one multidimensional raster. Those that intersect the target dimensions will be used in training; other dimensionless rasters in the list will be applied to all dimensions. If no explanatory variables intersect or they are all dimensionless, no training will occur.
If the input target is dimensionless and the explanatory variables have dimension, the first slice will be used.
If the output is a multidimensional raster, use CRF format. If the output is a dimensionless raster, it can be stored in any output raster format.
The cell sizes of the input explanatory variables will affect the training result and the processing time. By default, the tool uses the cell size of the first explanatory raster; you can change it using the Cell Size environment setting. In general, training with a cell size lower than that of your data is not suggested.
The Output Importance Table can be used to analyze the importance of each explanatory variable contributing to predicting target the variable.
To create a scatter plot of predict values and training values, you can use the Sample tool to extract predicted values from predicted rasters. Then perform a table join using the LocationID field in the Sample tool output and the ObjectID field in the target field class. If the target input is a raster, you can generate random points and extract values from both input target raster and predict raster.
Parameters
TrainRandomTreesRegressionModel(in_rasters, in_target_data, out_regression_definition, {target_value_field}, {target_dimension_field}, {raster_dimension}, {out_importance_table}, {max_num_trees}, {max_tree_depth}, {max_samples}, {average_points_per_cell})
Name | Explanation | Data Type |
in_rasters [in_rasters,...] | The single-band, multidimensional, or multiband raster datasets, or mosaic datasets, containing explanatory variables. | Mosaic Dataset; Mosaic Layer; Raster Dataset; Raster Layer; Image Service; String |
in_target_data | The raster or point feature class containing the target variable (dependant variable) data. | Feature Class; Feature Layer; Raster Dataset; Raster Layer; Mosaic Layer; Image Service |
out_regression_definition | A JSON format file with an .ecd extension that contains attribute information, statistics, or other information for the classifier. | File |
target_value_field (Optional) | The field name of the information to model in the target point feature class or raster dataset. | Field |
target_dimension_field (Optional) | A date field or numeric field in the input point feature class that defines the dimension values. | Field |
raster_dimension (Optional) | The dimension name of the input multidimensional raster (explanatory variables) that links to the dimension in the target data. | String |
out_importance_table (Optional) | A table containing information describing the importance of each explanatory variable used in the model. A larger number indicates the corresponding variable is more correlated to the predicted variable and will contribute more in prediction. Values range between 0 and 1, and the sum of all the values equals 1. | Table |
max_num_trees (Optional) | The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off. The number of trees increases the processing time linearly. The default is 50. | Long |
max_tree_depth (Optional) | The maximum depth of each tree in the forest. Depth determines the number of rules each tree can create, resulting in a decision. Trees will not grow any deeper than this setting. The default is 30. | Long |
max_samples (Optional) | The maximum number of samples that will be used for the regression analysis. A value that is less than or equal to 0 means that the system will use all the samples from the input target raster or point feature class to train the regression model. The default value is 10,000. | Long |
average_points_per_cell (Optional) | Specifies whether the average will be calculated when multiple training points fall into one cell. This parameter is applicable only when the input target is a point feature class.
| Boolean |
Code sample
This Python window script models the relationship between explanatory variables and a target dataset.
# Import system modules
import arcpy
from arcpy.ia import *
# Check out the ArcGIS Image Analyst extension license
arcpy.CheckOutExtension("ImageAnalyst")
# Execute
arcpy.ia.TrainRandomTreesRegressionModel("weather_variables.crf";"dem.tif", "pm2.5.shp", r"c:\data\pm2.5_trained.ecd", "mean_pm2.5", "date_collected", "StdTime”, r"c:\data\pm2.5_importanc.csv", 50, 30, 10000)
This Python stand-alone script models the relationship between explanatory variables and a target dataset.
# Import system modules
import arcpy
from arcpy.ia import *
# Check out the ArcGIS Image Analyst extension license
arcpy.CheckOutExtension("ImageAnalyst")
# Define input parameters
in_weather_variables = "C:/Data/ClimateVariables.crf"
in_dem_varaible = "C:/Data/dem.tif"
in_target = "C:/Data/pm2.5_observations.shp"
target_value_field = "mean_pm2.5"
Target_date_field = "date_collected"
Raster_dimension = “StdTime”
out_model_definition = "C:/Data/pm2.5_trained_model.ecd"
Out_importance_table = "C:/Data/pm2.5_importance_table.csv"
max_num_trees = 50
max_tree_depth = 30
max_num_samples = 10000
# Execute - train with random tree regression model
arcpy.ia.TrainRandomTreesRegressionModel(in_weather_variables;in_dem_varaible, in_target, out_model_definition, target_value_field, Target_date_field, Raster_dimension, max_num_trees, max_tree_depth, max_num_samples)
Environments
Licensing information
- Basic: Requires Image Analyst
- Standard: Requires Image Analyst
- Advanced: Requires Image Analyst