Train Random Trees Classifier (Spatial Analyst)

Available with Spatial Analyst license.

Available with Image Analyst license.

Summary

Generates an Esri classifier definition file (.ecd) using the Random Trees classification method.

The random trees classifier is an image classification technique that is resistant to overfitting and can work with segmented images and other ancillary raster datasets. For standard image inputs, the tool accepts multiband imagery with any bit depth, and it will perform the Random Trees classification on a pixel basis or segment, based on the input training feature file.

Usage

  • The Random Trees classification method is a collection of individual decision trees in which each tree is generated from different samples and subsets of the training data. The idea behind calling these decision trees is that for every pixel that is classified, a number of decisions are made in rank order of importance. When you graph these for a pixel, it looks like a branch. When you classify the entire dataset, the branches form a tree. This method is called random trees because you are actually classifying the dataset a number of times based on a random subselection of training pixels, resulting in many decision trees. To make a final decision, each tree has a vote. This process works to mitigate overfitting. The Random Trees classification method is a supervised machine-learning classifier based on constructing a multitude of decision trees, choosing random subsets of variables for each tree, and using the most frequent tree output as the overall classification. The Random Trees classification method corrects for the decision trees' propensity for overfitting to their training sample data. With this method, a number of trees are grown—by an analogy, a forest—and variation among the trees is introduced by projecting the training data into a randomly chosen subspace before fitting each tree. The decision at each node is optimized by a randomized procedure.

  • For segmented rasters that have their key property set to Segmented, the tool computes the index image and associated segment attributes from the RGB segmented raster. The attributes are computed to generate the classifier definition file to be used in a separate classification tool. The attributes for each segment can be computed from any Esri-supported image.

  • Any Esri-supported raster is accepted as input, including raster products, segmented rasters, mosaics, image services, and generic raster datasets. Segmented rasters must be 8-bit rasters with 3 bands.

  • To create the training sample file, use the Training Samples Manager pane from the Classification Tools drop-down menu.

  • The Segment Attributes parameter is only active if one of the raster layer inputs is a segmented image.

  • A two step process is necessary to classify time series raster data using the Continuous Change Detection and Classification (CCDC) algorithm. First, run the Analyze Changes Using CCDC tool, which is available with an Image Analyst extension license. Next, use those results as input to this training tool.

    The training sample data must have been collected at multiple times using the Training Samples Manager. The dimension value for each sample is listed in a field in the training sample feature class, which is specified in the Dimension Value Field parameter.

Parameters

LabelExplanationData Type
Input Raster

The raster dataset to classify.

You can use any Esri-supported raster dataset. One option is a 3-band, 8-bit segmented raster dataset in which all the pixels in the same segment have the same color. The input can also be a single band, 8-bit, grayscale segmented raster.

Raster Layer; Mosaic Layer; Image Service; String
Input Training Sample File

The training sample file or layer that delineates the training sites.

These can be either shapefiles or feature classes that contain the training samples. The following field names are required in the training sample file:

  • classname—A text field indicating the name of the class category
  • classvalue—A long integer field containing the integer value for each class category

Feature Layer
Output Classifier Definition File

A JSON file that contains attribute information, statistics, or other information for the classifier. An .ecd file is created.

File
Additional Input Raster
(Optional)

Ancillary raster datasets, such as a multispectral image or a DEM, will be incorporated to generate attributes and other required information for classification. This parameter is optional.

Raster Layer; Mosaic Layer; Image Service; String
Max Number of Trees
(Optional)

The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off eventually. The number of trees increases the processing time linearly.

Long
Max Tree Depth
(Optional)

The maximum depth of each tree in the forest. Depth is another way of saying the number of rules each tree is allowed to create to come to a decision. Trees will not grow any deeper than this setting.

Long
Max Number of Samples Per Class
(Optional)

The maximum number of samples that will be used to define each class.

The default value of 1000 is recommended when the inputs are nonsegmented rasters. A value that is less than or equal to 0 means that the system will use all the samples from the training sites to train the classifier.

Long
Segment Attributes
(Optional)

Specifies the attributes that will be included in the attribute table associated with the output raster.

  • Converged color —The RGB color values will be derived from the input raster on a per-segment basis.
  • Mean digital number —The average digital number (DN) will be derived from the optional pixel image on a per-segment basis.
  • Standard deviation —The standard deviation will be derived from the optional pixel image on a per-segment basis.
  • Count of pixels —The number of pixels composing the segment, on a per-segment basis.
  • Compactness —The degree to which a segment is compact or circular, on a per-segment basis. The values range from 0 to 1, in which 1 is a circle.
  • Rectangularity —The degree to which the segment is rectangular, on a per-segment basis. The values range from 0 to 1, in which 1 is a rectangle.
String
Dimension Value Field
(Optional)

Contains dimension values in the input training sample feature class.

This parameter is required to classify a time series of raster data using the change analysis raster output from the Analyze Changes Using CCDC tool in the Image Analyst toolbox.

Field

TrainRandomTreesClassifier(in_raster, in_training_features, out_classifier_definition, {in_additional_raster}, {max_num_trees}, {max_tree_depth}, {max_samples_per_class}, {used_attributes}, {dimension_value_field})
NameExplanationData Type
in_raster

The raster dataset to classify.

You can use any Esri-supported raster dataset. One option is a 3-band, 8-bit segmented raster dataset in which all the pixels in the same segment have the same color. The input can also be a single band, 8-bit, grayscale segmented raster.

Raster Layer; Mosaic Layer; Image Service; String
in_training_features

The training sample file or layer that delineates the training sites.

These can be either shapefiles or feature classes that contain the training samples. The following field names are required in the training sample file:

  • classname—A text field indicating the name of the class category
  • classvalue—A long integer field containing the integer value for each class category

Feature Layer
out_classifier_definition

A JSON file that contains attribute information, statistics, or other information for the classifier. An .ecd file is created.

File
in_additional_raster
(Optional)

Ancillary raster datasets, such as a multispectral image or a DEM, will be incorporated to generate attributes and other required information for classification. This parameter is optional.

Raster Layer; Mosaic Layer; Image Service; String
max_num_trees
(Optional)

The maximum number of trees in the forest. Increasing the number of trees will lead to higher accuracy rates, although this improvement will level off eventually. The number of trees increases the processing time linearly.

Long
max_tree_depth
(Optional)

The maximum depth of each tree in the forest. Depth is another way of saying the number of rules each tree is allowed to create to come to a decision. Trees will not grow any deeper than this setting.

Long
max_samples_per_class
(Optional)

The maximum number of samples that will be used to define each class.

The default value of 1000 is recommended when the inputs are nonsegmented rasters. A value that is less than or equal to 0 means that the system will use all the samples from the training sites to train the classifier.

Long
used_attributes
[used_attributes;used_attributes,...]
(Optional)

Specifies the attributes that will be included in the attribute table associated with the output raster.

  • COLORThe RGB color values will be derived from the input raster on a per-segment basis.
  • MEANThe average digital number (DN) will be derived from the optional pixel image on a per-segment basis.
  • STDThe standard deviation will be derived from the optional pixel image on a per-segment basis.
  • COUNTThe number of pixels composing the segment, on a per-segment basis.
  • COMPACTNESSThe degree to which a segment is compact or circular, on a per-segment basis. The values range from 0 to 1, in which 1 is a circle.
  • RECTANGULARITYThe degree to which the segment is rectangular, on a per-segment basis. The values range from 0 to 1, in which 1 is a rectangle.

This parameter is only enabled if the Segmented key property is set to true on the input raster. If the only input to the tool is a segmented image, the default attributes are COLOR, COUNT, COMPACTNESS, and RECTANGULARITY. If an in_additional_raster value is included as an input with a segmented image, MEAN and STD are also available attributes.

String
dimension_value_field
(Optional)

Contains dimension values in the input training sample feature class.

This parameter is required to classify a time series of raster data using the change analysis raster output from the Analyze Changes Using CCDC tool in the Image Analyst toolbox.

Field

Code sample

TrainRandomTreesClassifier example 1 (Python window)

This is a Python sample for the TrainRandomTreesClassifier tool.

import arcpy
from arcpy.sa import *

TrainRandomTreesClassifier("c:/test/moncton_seg.tif",
                           "c:/test/train.gdb/train_features",
                           "c:/output/moncton_sig_SVM.ecd",
                           "c:/test/moncton.tif", "50", "30", "1000",
                           "COLOR;MEAN;STD;COUNT;COMPACTNESS;RECTANGULARITY")
TrainRandomTreesClassifier example 2 (stand-alone script)

This is a Python script sample for the TrainRandomTreesClassifier tool.

# Import system modules
import arcpy
from arcpy.sa import *

# Set local variables
inSegRaster = "c:/test/cities_seg.tif"
train_features = "c:/test/train.gdb/train_features"
out_definition = "c:/output/cities_sig.ecd"
in_additional_raster = "c:/cities.tif"
maxNumTrees = "50"
maxTreeDepth = "30"
maxSampleClass = "1000"
attributes = "COLOR;MEAN;STD;COUNT;COMPACTNESS;RECTANGULARITY"

# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")

# Execute
TrainRandomTreesClassifier(inSegRaster, train_features,
                           out_definition, in_additional_raster, maxNumTrees,
                           maxTreeDepth, maxSampleClass, attributes)
TrainRandomTreesClassifier example 3 (stand-alone script)

This example shows how to train a random trees classifier using a change analysis raster from the Image Analyst Analyze Changes Using CCDC tool.

# Import system modules
import arcpy
from arcpy.sa import *

# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")


# Set local variables
in_changeAnalysisRaster = "c:/test/LandsatCCDC.crf"
train_features = "c:/test/train.gdb/train_features"
out_definition = "c:/output/change_detection.ecd"
additional_raster = ''
maxNumTrees = 50
maxTreeDepth = 30
maxSampleClass = 1000
attributes = None
dimension_field = "DateTime"

# Execute
arcpy.sa.TrainRandomTreesClassifier(
	in_changeAnalysisRaster, train_features, 
	out_definition, additional_raster, maxNumTrees, 
	maxTreeDepth, maxSampleClass, attributes, dimension_field)

Licensing information

  • Basic: Requires Spatial Analyst or Image Analyst
  • Standard: Requires Spatial Analyst or Image Analyst
  • Advanced: Requires Spatial Analyst or Image Analyst

Related topics