Prepare Point Cloud Training Data (3D Analyst)—ArcGIS Pro

Summary

Generates the data that will be used to train and validate a point cloud classification model.

Usage

Review the input point cloud to ensure that its points are well classified for the objects of interest. The quality of the classification model depends on the quality of the data that is used for training and validation. If the point cloud's classification needs refinement, consider interactively editing the point classification.
Learn more about interactive LAS classification editing
The point cloud training data is defined by a directory with a .pctd extension with two subdirectories, one that contains the data that will be used for training the classification model and one that contains the data that will be used for validating the trained model. An input point cloud must always be specified, as it provides the source of the data used for training. The training boundary can be defined to limit the points exported for training. The validation data is also required and can be specified by doing any one of the following:
- Provide a validation point cloud. This dataset must reference a different set of points than the input point cloud.
- Provide a validation point cloud with a validation boundary. This will result in the validation data being created from the portion of the validation point cloud overlapping the validation boundary.
- Provide a training boundary and a validation boundary without a validation point cloud. This will result in the training data being created from the portions of the input point cloud that intersect the training boundary, and the validation point cloud being created from the portions of the input point cloud that intersect the validation boundary. The boundary features must not overlap each other.
The input point cloud should have a fairly consistent point density. Evaluate the point cloud to determine if it contains locations with a higher density of points, such as areas collected by overlapping flight line surveys or idling terrestrial scanners. For airborne lidar with overlapping flight lines, the Classify LAS Overlap tool can be used to flag the overlapping points and achieve a more consistent point distribution. Other types of point clouds with oversampled hot spots can be thinned to a regular distribution using the Thin LAS tool.
Points in the point cloud can be excluded from the training data by their class codes to help improve the performance of training the model by reducing the amount of points that have to be processed. Excluded points should belong to classes that can be readily classified and do not necessarily provide adequate context for the objects for which the model is being trained. Consider filtering out points that are classified as overlap or noise. Ground classified points can also be filtered out if height from ground is computed during the generation of training data.
Reference height information can be incorporated into the training data to provide an additional attribute for the training process. This is done by specifying a raster in the Reference Surface parameter. This raster is used to derive the relative height attribute for each overlapping point. The attribute is calculated by taking the z-value of each point and subtracting the height obtained from the raster through bilinear interpolation. The inclusion of this information can help differentiate objects that have a distinct range of relative height from the raster surface. It also provides another basis for the neural network to infer directional relationships. For example, when training for power lines and using a ground elevation raster as the reference surface, the power line points will likely fall within a particular range of relative heights that are above the ground. Additionally, when the reference height is predicated on the ground elevation, it can provide a pretext for eliminating the inclusion of ground points in the training data when the presence of ground points does not provide useful context for identifying the objects of interest. The neural network will attempt to learn the classification of all the data that is provided to it during training. Since a high quality ground classification can be achieved with the Classify LAS Ground tool, there is no need to train the neural network to identify and distinguish ground points from other classes. Ground points, which are typically represented by class 2, and sometimes by class 8 and class 20, can be excluded by listing them in the Excluded Class Codes parameter. When this is done, the neural network will process the training data more quickly, since ground points typically account for approximately half the total points captured in a lidar survey.
The raster surface used as input to the Reference Surface parameter can be generated from a subset of LAS points, such as ground classified points, by filtering the LAS dataset and using the LAS Dataset To Raster tool. The desired subset of points from the LAS dataset can be filtered using any combination of classification codes, return values, and classification flags. The point filters can be applied through the LAS dataset layer's properties dialog or the Make LAS Dataset Layer tool. A raster surface can also be generated from a point cloud scene layer using the Point Cloud To Raster tool.
The Excluded Class Codes parameter can be used to omit points associated with class codes that do not provide a useful context for inferring how to identify objects of interest. Omitting them will improve the speed of the training process by reducing the number of points that are evaluated. For example, building classified points are usually immaterial to training a classification model for objects such as traffic lights, power lines, and other assets. Building points can also be reliably classified using the Classify LAS Building tool. Specifying class 6, which represents buildings, as an excluded class will omit the building points from the training data. Any point cloud that will use a model trained with excluded classes should have those classes classified before applying the model. Those classes should also be listed in the Excluded Class Codes parameter of the Classify Point Cloud Using Trained Model and Evaluate Point Cloud Training Data tools so that the model can infer its classification using a point cloud that matches the characteristics of the data used for training the model.
The block point limit should reflect the block size and average point spacing of the data. The number of points in a given block can be approximated using the LAS Point Statistics As Raster tool with the Method parameter's Point Count option and the desired block size as the output raster's cell size. An image histogram of this raster can illustrate the distribution of points per block across the dataset. If the histogram conveys a large number of blocks with wide variance, it may indicate the presence of irregularly sampled data containing potential hot spots of dense point collections. If a block contains more points than the block point limit, that block will be created multiple times to ensure all of its points are represented in the training data. For example, if the point limit is 10,000 and a given block contains 22,000 points, three blocks of 10,000 points will be created to ensure uniform sampling in each block. A block point limit that is significantly higher than the nominal amount of points in most blocks should also be avoided. In some architectures, the data is up-sampled to meet the point limit. For these reasons, use a block size and block point limit that will be close to the anticipated point count that covers most of the blocks in the training data. Once the training data is created, a histogram is displayed in the tool's message window, and an image of it is stored in the folder containing the training and validation data. This histogram can be reviewed to determine if an appropriate block size and point limit combination was specified. If the values indicate a suboptimal point limit, rerun the tool with a more appropriate value for the Block Point Limit parameter.
Ensure that the output is written to a location with enough disk space to accommodate the training data. This tool creates partially overlapping blocks of uncompressed HDF5 files that replicate each point in four blocks. In blocks that exceed the maximum point limit, some points may be duplicated more than four times. The resulting training data can occupy at least three times more disk space than the source point cloud data.

Parameters

Label	Explanation	Data Type
Input Point Cloud	The point cloud that will be used to create the training data and, potentially, the validation data if no validation point cloud is provided. In this case, both the training boundary and the validation boundary must be defined.	LAS Dataset Layer; File
Block Size	The diameter of each block of training data that will be created from the input point cloud. As a general rule, the block size should be large enough to capture the objects of interest and their surrounding context.	Linear Unit
Output Training Data	The location and name of the output training data (*.pctd file).	File
Training Boundary Features (Optional)	The polygon features that will delineate the subset of points from the input point cloud that will be used for training the model. This parameter is required when the Validation Point Cloud parameter value is not provided.	Feature Layer
Validation Point Cloud (Optional)	The point cloud that will be used to validate the deep learning model during the training process. This dataset must reference a different set of points than the input point cloud to ensure the quality of the trained model. If a validation point cloud is not provided, the input point cloud can be used to define the training and validation datasets by providing polygon feature classes for the Training Boundary Features and Validation Boundary Features parameters.	LAS Dataset Layer; File
Validation Boundary Features (Optional)	The polygon features that will delineate the subset of points to be used for validating the model during the training process. If a validation point cloud is not provided, the points will be sourced from the input point cloud and a polygon will be required for the Training Boundary Features parameter.	Feature Layer
Filter Blocks By Class Code (Optional)	The class codes that will be used to limit the exported training data blocks. All points in the blocks that contain at least one of the values listed for this parameter will be exported, except the classes specified in the Excluded Class Codes parameter or the points that are flagged as Withheld. Any value in the range of 0 to 255 can be specified.	Value Table
Block Point Limit (Optional)	The maximum number of points that will be allowed in each block of the training data. When a block contains points in excess of this value, multiple blocks will be created for the same location to ensure that all of the points are used when training. The default is 8,192.	Long
Reference Surface (Optional)	The raster surface that will be used to provide relative height values for each point in the point cloud data. Points that do not overlap with the raster will be omitted from the analysis.	Raster Layer
Excluded Class Codes (Optional)	The class codes that will be excluded from the training data. Any value in the range of 0 to 255 can be specified.	Long

arcpy.ddd.PreparePointCloudTrainingData(in_point_cloud, block_size, out_training_data, {training_boundary}, {validation_point_cloud}, {validation_boundary}, {class_codes_of_interest}, {block_point_limit}, {reference_height}, {excluded_class_codes})

Name	Explanation	Data Type
in_point_cloud	The point cloud that will be used to create the training data and, potentially, the validation data if no validation point cloud is provided. In this case, both the training boundary and the validation boundary must be defined.	LAS Dataset Layer; File
block_size	The diameter of each block of training data that will be created from the input point cloud. As a general rule, the block size should be large enough to capture the objects of interest and their surrounding context.	Linear Unit
out_training_data	The location and name of the output training data (*.pctd file).	File
training_boundary (Optional)	The polygon features that will delineate the subset of points from the input point cloud that will be used for training the model. This parameter is required when the validation_point_cloud parameter value is not provided.	Feature Layer
validation_point_cloud (Optional)	The source of the point cloud that will be used to validate the deep learning model. This dataset must reference a different set of points than the input point cloud to ensure the quality of the trained model. If a validation point cloud is not provided, the input point cloud can be used to define the training and validation datasets by providing polygon feature classes for the training_boundary and validation_boundary parameters.	LAS Dataset Layer; File
validation_boundary (Optional)	The polygon features that will delineate the subset of points to be used for validating the model during the training process. If a validation point cloud is not provided, the points will be sourced from the input point cloud and a polygon will be required for the training_boundary parameter.	Feature Layer
class_codes_of_interest [class_codes_of_interest,...] (Optional)	The class codes that will be used to limit the exported training data blocks. All points in the blocks that contain at least one of the values listed for this parameter will be exported, except the classes specified in the excluded_class_codes parameter or the points that are flagged as Withheld. Any value in the range of 0 to 255 can be specified.	Value Table
block_point_limit (Optional)	The maximum number of points that will be allowed in each block of the training data. When a block contains points in excess of this value, multiple blocks will be created for the same location to ensure that all of the points are used when training. The default is 8,192.	Long
reference_height (Optional)	The raster surface that will be used to provide relative height values for each point in the point cloud data. Points that do not overlap with the raster will be omitted from the analysis.	Raster Layer
excluded_class_codes [excluded_class_codes,...] (Optional)	The class codes that will be excluded from the training data. Any value in the range of 0 to 255 can be specified.	Long

Code sample

PreparePointCloudTrainingData example (Python window)

The following sample demonstrates the use of this tool in the Python window:

import arcpy
arcpy.env.workspace = 'C:/data'
arcpy.ddd.PreparePointCloudTrainingData('training_source.lasd', '35 Meters', 'vegetation_training.pctd', 
                                        validation_point_cloud='validation_source.lasd', 
                                        class_codes_of_interest=[14, 15], block_point_limit=12000,
                                        reference_height='Ground_Elevation.tif', 
                                        excluded_class_codes=[2, 6, 8, 9, 20])

Environments

Current Workspace, Scratch Workspace, Extent, Parallel Processing Factor, Output Coordinate System, Geographic Transformations

Licensing information

Basic: Requires 3D Analyst
Standard: Requires 3D Analyst
Advanced: Requires 3D Analyst