Prepare Point Cloud Training Data (3D Analyst)—ArcGIS Pro

Summary

Generates the data that will be used to train and validate a PointCNN model for point cloud classification.

Usage

Review the input point cloud to ensure that its points are well classified for the objects of interest. The quality of the classification model depends on the quality of the data that is used for training and validation. If the point cloud's classification needs refinement, consider interactively editing the point classification.
Learn more about interactive LAS classification editing
The point cloud training data is defined by a directory with a .pctd extension with two subdirectories, one that contains the data that will be used for training the classification model and one that contains the data that will be used for validating the trained model. An input point cloud must always be specified, as it provides the source of the data used for training. The training boundary can be optionally defined to limit the points exported for training. The validation data is also required and can be specified by doing any one of the following:
- Provide a validation point cloud. This dataset must reference a different set of points than the input point cloud.
- Provide a validation point cloud with a validation boundary. This will result in the validation data being created from the portion of the validation point cloud overlapping the validation boundary.
- Provide a training boundary and a validation boundary without a validation point cloud. This will result in the training data being created from the portions of the input point cloud that intersect the training boundary, and the validation point cloud being created from the portions of the input point cloud that intersect the validation boundary. The boundary features must not overlap each other.
When training the point cloud classification model, create the training data using a block size that sufficiently captures the object being classified and the data that is relevant for capturing its surrounding context. The block size does not need to capture the entire object, provided that there is enough surrounding data to adequately infer a classification strategy. If a block contains more points than the block point limit, multiple blocks will be created for the same location. For example, if the Block Point Limit parameter value is 10,000 and a given block contains 22,000 points, three blocks of 10,000 points will be created to ensure uniform sampling in each block. Some points will be repeated in two blocks, but all points will be stored in at least one block.
Avoid using a block size and block point limit that will result in the creation of many blocks which exceed the point limit. The number of points in a given block size can be approximated using the LAS Point Statistics As Raster tool by generating an output raster that uses the Point Count option for the Method parameter. This raster should have a cell size that matches the desired block size. You can examine the image histogram of this raster to get an approximation of the number of blocks that fall under a specific block size, and adjust the point limit accordingly.
The Block Point Limit parameter value must also factor the dedicated GPU memory capacity of the computer that will be used to train the deep learning model. Memory allocation during training will depend on the number of attributes that are used, the number of points in a given block, and the total number of blocks processed in a given iteration batch. If a larger block size is needed along with a larger point limit to effectively train the model, the batch size can be reduced in the training step to ensure that more points can be processed.
Ensure that the output is written to a location with enough disk space to accommodate the training data. This tool creates partially overlapping blocks of uncompressed HDF5 files that replicate each point in four blocks. In blocks that exceed the maximum point limit, some points may be duplicated more than four times. The resulting training data can occupy at least three times more disk space than the source point cloud data.
Reference height information can be incorporated into the training data to provide an additional attribute for the training process. This is done by specifying a raster in the Reference Surface parameter. This raster is used to derive the relative height attribute for each overlapping point. The attribute is calculated by taking the z-value of each point and subtracting the height obtained from the raster through bilinear interpolation. The inclusion of this information can help differentiate objects that have a distinct range of relative height from the raster surface. It also provides another basis for the neural network to infer directional relationships. For example, when training for power lines and using a ground elevation raster as the reference surface, the power line points will likely fall within a particular range of relative heights that are above the ground. Additionally, when the reference height is predicated on the ground elevation, it can provide a pretext for eliminating the inclusion of ground points in the training data when the presence of ground points does not provide useful context for identifying the objects of interest. The neural network will attempt to learn the classification of all the data that is provided to it during training. Since a high quality ground classification can be achieved with the Classify LAS Ground tool, there is no need to train the neural network to identify and distinguish ground points from other classes. Ground points, which are typically represented by class 2, and sometimes by class 8 and class 20, can be excluded by listing them in the Excluded Class Codes parameter. When this is done, the neural network will process the training data more quickly, since ground points typically account for approximately half of the total points captured in a lidar survey.
A raster surface that can be used as input to the Reference Surface parameter can be generated from a subset of LAS points, such as ground classified points, by filtering the LAS dataset from its layer properties and using the LAS Dataset To Raster tool. If the tool is used in Python, the Make LAS Dataset Layer tool can be used to filter for the desired points before creating the raster. A ground surface can also be generated from a point cloud scene layer using the Point Cloud To Raster tool. Raster surfaces that are not sourced from the input point cloud can also be used, but you must ensure that the z-values in the raster correspond appropriately with the z-values in the point cloud.
The Excluded Class Codes parameter can be used to omit points associated with class codes that do not provide a useful context for inferring how to identify objects of interest. Doing so will improve the speed of the training process by reducing the number of points that are evaluated. For example, building classified points are usually immaterial to training a classification model for objects such as traffic lights, power lines, and various railroad assets. Building points can also be classified using the Classify LAS Building tool. Specifying class 6, which represents buildings, as an excluded class would omit the building points from the training data. Any point cloud that will use a model trained with excluded classes should have those classes classified before applying the model. Those classes should also be listed in the Excluded Class Codes parameter of the Classify Point Cloud Using Trained Model and Evaluate Point Cloud Training Data tools so that the model can infer its classification using a point cloud that matches the characteristics of the data used for training the model.
If the spatial reference of the input point cloud does not use a projected coordinate system, the Output Coordinate System environment can be used to define a projected coordinate system that will be used when classifying its points.

Parameters

Label	Explanation	Data Type
Input Point Cloud	The point cloud that will be used to create the training data and, potentially, the validation data if no validation point cloud is specified. In this case, both the training boundary and the validation boundary must be defined.	LAS Dataset Layer; File
Block Size	The diameter size of each circular HDF5 tile created from the input point cloud. As a general rule, the block size should be large enough to capture the objects of interest and their surrounding context.	Linear Unit
Output Training Data	The location and name of the output training data (*.pctd).	File
Training Boundary Features (Optional)	The boundary polygons that will delineate the subset of points from the input point cloud that will be used to train the deep learning model.	Feature Layer
Validation Point Cloud (Optional)	The point cloud that will be used to validate the deep learning model during the training process. This dataset must reference a different set of points than the input point cloud to ensure the quality of the trained model. If a validation point cloud is not specified, the input point cloud can be used to define the training and validation datasets by providing polygon feature classes for the Training Boundary Features and Validation Boundary Features parameters.	LAS Dataset Layer; File
Validation Boundary Features (Optional)	The polygon features that will delineate the subset of points to be used for evaluating the model during the training process. If a validation point cloud is not specified, the points will be sourced from the input point cloud.	Feature Layer
Filter Blocks By Class Code (Optional)	The class codes that will be used to limit the exported training data blocks. All points in the blocks that contain at least one of the values listed for this parameter will be exported, except the classes specified in the Excluded Class Codes parameter or the points that are flagged as Withheld. Any value in the range of 0 to 255 can be specified.	Value Table
Block Point Limit (Optional)	The maximum number of points that will be allowed in each block of the training data. When a block contains points in excess of this value, multiple blocks will be created for the same location to ensure that all of the points are used when training.	Long
Reference Surface (Optional)	The raster surface that will be used to provide relative height values for each point in the point cloud data. Points that do not overlap with the raster will be omitted from the analysis.	Raster Layer
Excluded Class Codes (Optional)	The class codes that will be excluded from the training data. Any value in the range of 0 to 255 can be specified.	Long

arcpy.ddd.PreparePointCloudTrainingData(in_point_cloud, block_size, out_training_data, {training_boundary}, {validation_point_cloud}, {validation_boundary}, {class_codes_of_interest}, {block_point_limit}, {reference_height}, {excluded_class_codes})

Name	Explanation	Data Type
in_point_cloud	The point cloud that will be used to create the training data and, potentially, the validation data if no validation point cloud is specified. In this case, both the training boundary and the validation boundary must be defined.	LAS Dataset Layer; File
block_size	The diameter size of each circular HDF5 tile created from the input point cloud. As a general rule, the block size should be large enough to capture the objects of interest and their surrounding context.	Linear Unit
out_training_data	The location and name of the output training data (*.pctd).	File
training_boundary (Optional)	The boundary polygons that will delineate the subset of points from the input point cloud that will be used to train the deep learning model.	Feature Layer
validation_point_cloud (Optional)	The source of the point cloud that will be used to validate the deep learning model. This dataset must reference a different set of points than the input point cloud to ensure the quality of the trained model. If a validation point cloud is not specified, the input point cloud can be used to define the training and validation datasets by providing polygon feature classes for the training_boundary and validation_boundary parameters.	LAS Dataset Layer; File
validation_boundary (Optional)	The polygon features that will delineate the subset of points to be used for evaluating the model during the training process. If a validation point cloud is not specified, the points will be sourced from the input point cloud.	Feature Layer
class_codes_of_interest [class_codes_of_interest,...] (Optional)	The class codes that will be used to limit the exported training data blocks. All points in the blocks that contain at least one of the values listed for this parameter will be exported, except the classes specified in the excluded_class_codes parameter or the points that are flagged as Withheld. Any value in the range of 0 to 255 can be specified.	Value Table
block_point_limit (Optional)	The maximum number of points that will be allowed in each block of the training data. When a block contains points in excess of this value, multiple blocks will be created for the same location to ensure that all of the points are used when training.	Long
reference_height (Optional)	The raster surface that will be used to provide relative height values for each point in the point cloud data. Points that do not overlap with the raster will be omitted from the analysis.	Raster Layer
excluded_class_codes [excluded_class_codes,...] (Optional)	The class codes that will be excluded from the training data. Any value in the range of 0 to 255 can be specified.	Long

Code sample

PreparePointCloudTrainingData example (Python window)

The following sample demonstrates the use of this tool in the Python window.

import arcpy
arcpy.env.workspace = 'C:/data'
arcpy.ddd.PreparePointCloudTrainingData('training_source.lasd', '35 Meters', 'vegetation_training.pctd', 
                                        validation_point_cloud='validation_source.lasd', 
                                        class_codes_of_interest=[14, 15], block_point_limit=12000,
                                        reference_height='Ground_Elevation.tif', 
                                        excluded_class_codes=[2, 6, 8, 9, 20])

Environments

Current Workspace, Scratch Workspace, Extent, Parallel Processing Factor, Output Coordinate System, Geographic Transformations

Licensing information

Basic: Requires 3D Analyst
Standard: Requires 3D Analyst
Advanced: Requires 3D Analyst