Train Random Trees Regression Model (Image Analyst)—ArcGIS Pro

Available with Image Analyst license.

Summary

Models the relationship between explanatory variables (independent variables) and a target dataset (dependent variable).

Usage

The tool can be used to train with a variety of data types. The input rasters (explanatory variables) can be one raster or a list of rasters, a single band or a multiband in which each band is an explanatory variable, a multidimensional raster in which the variables in the raster are the explanatory variables, or a combination of data types.
An input mosaic dataset will be treated as a raster dataset (not a collection of rasters). To use a collection of rasters as input, build multidimensional info for the mosaic dataset and use the result as input.
The input target can be a feature class or a raster. When the target is a feature, the Target Value Field value must be set to a numeric field.
If the input target feature has a date field or a field that defines dimension, specify a value for both Target Value Field and Target Dimension Field.
The input raster target can also be a multidimensional raster.
If the input target is multidimensional, the corresponding input explanatory variables must have at least one multidimensional raster. Those that intersect the target dimensions will be used in training; other dimensionless rasters in the list will be applied to all dimensions. If no explanatory variables intersect or they are all dimensionless, no training will occur.
If the input target is dimensionless and the explanatory variables have dimension, the first slice will be used.
If the output is a multidimensional raster, use CRF format. If the output is a dimensionless raster, it can be stored in any output raster format.
The cell sizes of the input explanatory variables will affect the training result and the processing time. By default, the tool uses the cell size of the first explanatory raster; you can change it using the Cell Size environment setting. In general, training with a cell size lower than that of your data is not suggested.
The Output Importance Table can be used to analyze the importance of each explanatory variable contributing to predicting target the variable.
To create a scatter plot of predict values and training values, you can use the Sample tool to extract predicted values from predicted rasters. Then perform a table join using the LocationID field in the Sample tool output and the ObjectID field in the target field class. If the target input is a raster, you can generate random points and extract values from both input target raster and predict raster.

Parameters

Label	Explanation	Data Type
Input Rasters	The single-band, multidimensional, or multiband raster datasets, or mosaic datasets, containing explanatory variables.	Mosaic Dataset; Mosaic Layer; Raster Dataset; Raster Layer; Image Service; String
Target Raster or Points	The raster or point feature class containing the target variable (dependant variable) data.	Feature Class; Feature Layer; Raster Dataset; Raster Layer; Mosaic Layer; Image Service
Output Regression Definition File	A JSON format file with an .ecd extension that contains attribute information, statistics, or other information for the classifier.	File
Target Value Field (Optional)	The field name of the information to model in the target point feature class or raster dataset.	Field
Target Dimension Field (Optional)	A date field or numeric field in the input point feature class that defines the dimension values.	Field
Raster Dimension (Optional)	The dimension name of the input multidimensional raster (explanatory variables) that links to the dimension in the target data.	String
Output Importance Table (Optional)	A table containing information describing the importance of each explanatory variable used in the model. A larger number indicates the corresponding variable is more correlated to the predicted variable and will contribute more in prediction. Values range between 0 and 1, and the sum of all the values equals 1.	Table
Max Number of Trees (Optional)	The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off. The number of trees increases the processing time linearly. The default is 50.	Long
Max Tree Depth (Optional)	The maximum depth of each tree in the forest. Depth determines the number of rules each tree can create, resulting in a decision. Trees will not grow any deeper than this setting. The default is 30.	Long
Max Number of Samples (Optional)	The maximum number of samples that will be used for the regression analysis. A value that is less than or equal to 0 means that the system will use all the samples from the input target raster or point feature class to train the regression model. The default value is 10,000.	Long
Average Points Per Cell (Optional)	Specifies whether the average will be calculated when multiple training points fall into one cell. This parameter is applicable only when the input target is a point feature class.Unchecked—All points will be used when multiple training points fall into a single cell. This is the default.Checked—The average value of the training points within a cell will be calculated. Keep all points —All points will be used when multiple training points fall into a single cell. This is the default. Average points per cell —The average value of the training points within a cell will be calculated.	Boolean

TrainRandomTreesRegressionModel(in_rasters, in_target_data, out_regression_definition, {target_value_field}, {target_dimension_field}, {raster_dimension}, {out_importance_table}, {max_num_trees}, {max_tree_depth}, {max_samples}, {average_points_per_cell})

Name	Explanation	Data Type
in_rasters [in_rasters,...]	The single-band, multidimensional, or multiband raster datasets, or mosaic datasets, containing explanatory variables.	Mosaic Dataset; Mosaic Layer; Raster Dataset; Raster Layer; Image Service; String
in_target_data	The raster or point feature class containing the target variable (dependant variable) data.	Feature Class; Feature Layer; Raster Dataset; Raster Layer; Mosaic Layer; Image Service
out_regression_definition	A JSON format file with an .ecd extension that contains attribute information, statistics, or other information for the classifier.	File
target_value_field (Optional)	The field name of the information to model in the target point feature class or raster dataset.	Field
target_dimension_field (Optional)	A date field or numeric field in the input point feature class that defines the dimension values.	Field
raster_dimension (Optional)	The dimension name of the input multidimensional raster (explanatory variables) that links to the dimension in the target data.	String
out_importance_table (Optional)	A table containing information describing the importance of each explanatory variable used in the model. A larger number indicates the corresponding variable is more correlated to the predicted variable and will contribute more in prediction. Values range between 0 and 1, and the sum of all the values equals 1.	Table
max_num_trees (Optional)	The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off. The number of trees increases the processing time linearly. The default is 50.	Long
max_tree_depth (Optional)	The maximum depth of each tree in the forest. Depth determines the number of rules each tree can create, resulting in a decision. Trees will not grow any deeper than this setting. The default is 30.	Long
max_samples (Optional)	The maximum number of samples that will be used for the regression analysis. A value that is less than or equal to 0 means that the system will use all the samples from the input target raster or point feature class to train the regression model. The default value is 10,000.	Long
average_points_per_cell (Optional)	Specifies whether the average will be calculated when multiple training points fall into one cell. This parameter is applicable only when the input target is a point feature class. Unchecked—All points will be used when multiple training points fall into a single cell. This is the default. Checked—The average value of the training points within a cell will be calculated. KEEP_ALL_POINTS —All points will be used when multiple training points fall into a single cell. This is the default. AVERAGE_POINTS_PER_CELL —The average value of the training points within a cell will be calculated.	Boolean

Code sample

TrainRandomTreesRegressionModel example 1 (Python window)

This Python window script models the relationship between explanatory variables and a target dataset.

# Import system modules 
import arcpy 
from arcpy.ia import * 

# Check out the ArcGIS Image Analyst extension license 
arcpy.CheckOutExtension("ImageAnalyst") 

# Execute  
arcpy.ia.TrainRandomTreesRegressionModel("weather_variables.crf";"dem.tif", "pm2.5.shp", r"c:\data\pm2.5_trained.ecd",  "mean_pm2.5", "date_collected", "StdTime”,  r"c:\data\pm2.5_importanc.csv", 50, 30, 10000)

TrainRandomTreesRegressionModel example 2 (stand-alone script)

This Python stand-alone script models the relationship between explanatory variables and a target dataset.

# Import system modules 

import arcpy 
from arcpy.ia import * 

# Check out the ArcGIS Image Analyst extension license 
arcpy.CheckOutExtension("ImageAnalyst") 

# Define input parameters 
in_weather_variables = "C:/Data/ClimateVariables.crf" 
in_dem_varaible = "C:/Data/dem.tif" 
in_target = "C:/Data/pm2.5_observations.shp" 
target_value_field = "mean_pm2.5" 
Target_date_field = "date_collected" 
Raster_dimension = “StdTime” 
out_model_definition = "C:/Data/pm2.5_trained_model.ecd" 
Out_importance_table = "C:/Data/pm2.5_importance_table.csv" 
max_num_trees = 50 
max_tree_depth = 30 
max_num_samples = 10000 

# Execute - train with random tree regression model 
arcpy.ia.TrainRandomTreesRegressionModel(in_weather_variables;in_dem_varaible, in_target, out_model_definition,  target_value_field, Target_date_field, Raster_dimension, max_num_trees, max_tree_depth, max_num_samples)

Environments

Cell Size, Current Workspace, Extent, Geographic Transformations, Output Coordinate System, Scratch Workspace

Licensing information

Basic: Requires Image Analyst
Standard: Requires Image Analyst
Advanced: Requires Image Analyst