Subset Features (Data Management)

Summary

Divides the records of a feature class or table into two subsets: one subset to be used as training data, and one subset to be used as test features to compare and validate the output surface.

Usage

  • In the Random number generator environment, only the Mersenne Twister option is supported. If other options are chosen, Mersenne twister will be used instead.

  • Splitting a dataset into training and test features is common in interpolation, machine learning, and other analytical workflows that involve estimating and building models from data.

  • If multipart features are used as input, the output will be a subset of multipart features, not individual features.

Parameters

LabelExplanationData Type
Input Features

The features or table from which subsets will be created.

Table View
Output Training Feature Class

The subset of training features that will be created.

Feature Class; Table
Output Test Feature Class
(Optional)

The subset of test features that will be created.

Feature Class; Table
Size of Training Feature Subset
(Optional)

The size of the output training feature class, entered either as a percentage of the input features or as an absolute number of features.

Double
Subset Size Units
(Optional)

Specifies whether the subset size value will be used as a percentage of the input features or as an absolute number of features.

  • Percentage of input The subset size will be used as a percentage of the input features that will be in the training dataset.
  • Absolute value The subset size will be used as the number of features that will be in the training dataset.
Boolean

arcpy.management.SubsetFeatures(in_features, out_training_feature_class, {out_test_feature_class}, {size_of_training_dataset}, {subset_size_units})
NameExplanationData Type
in_features

The features or table from which subsets will be created.

Table View
out_training_feature_class

The subset of training features that will be created.

Feature Class; Table
out_test_feature_class
(Optional)

The subset of test features that will be created.

Feature Class; Table
size_of_training_dataset
(Optional)

The size of the output training feature class, entered either as a percentage of the input features or as an absolute number of features.

Double
subset_size_units
(Optional)

Specifies whether the subset size value will be used as a percentage of the input features or as an absolute number of features.

  • PERCENTAGE_OF_INPUT The subset size will be used as a percentage of the input features that will be in the training dataset.
  • ABSOLUTE_VALUE The subset size will be used as the number of features that will be in the training dataset.
Boolean

Code sample

SubsetFeatures example 1 (Python window)

Randomly split the features into two feature classes.

import arcpy
arcpy.management.SubsetFeatures("ca_ozone_pts", "C:/gapyexamples/output/training", 
                                "", "", "PERCENTAGE_OF_INPUT")
SubsetFeatures example 2 (stand-alone script)

Randomly split the features into two feature classes.

# Description: Randomly split the features into two feature classes.

# Import system modules
import arcpy

# Set environment settings
arcpy.env.workspace = "C:/dmpyexamples/data.gdb/data"

# Set local variables
inPointFeatures = "ca_ozone_pts.shp"
outtrainPoints = "C:/dmpyexamples/output.gdb/training"
outtestPoints = "C:/dmpyexamples/output.gdb/training"
subsetSize = 50
subsizeUnits = "PERCENTAGE_OF_INPUT"

# Run SubsetFeatures
arcpy.management.SubsetFeatures(inPointFeatures, outtrainPoints, outtestPoints, 
                                subsetSize, subsizeUnits)

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics