Prepare Point Cloud Training Data (3D Analyst)—ArcGIS Pro

Summary

Generates the data that will be used to train and validate a PointCNN model for classifying a point cloud.

Usage

Review the input point cloud to ensure that its points are well classified for the objects of interest. The quality of the classification model depends on the quality of the data that is used for training and validation. If the point cloud's classification needs refinement, consider interactively editing the point classification.
Learn more about interactive LAS classification editing
The point cloud training data is defined by a directory with a .pctd extension with two subdirectories, one that contains the data that will be used for training the classification model and one that contains the data that will be used for validating the trained model. An input point cloud must always be specified, as it provides the source of the data used for training. The training boundary can be optionally defined to limit the points exported for training. The validation data is also required and can be specified by doing any one of the following:
- Provide a validation point cloud. This dataset must reference a different set of points than the input point cloud.
- Provide a validation point cloud with a validation boundary. This will result in the validation data being created from the portion of the validation point cloud overlapping the validation boundary.
- Provide a training boundary and a validation boundary without a validation point cloud. This will result in the training data being created from the portions of the input point cloud that intersect the training boundary, and the validation point cloud being created from the portions of the input point cloud that intersect the validation boundary. The boundary features must not overlap each other.
When training the point cloud classification model, export the training and validation data using a block size that sufficiently captures the object being classified and its surrounding context. The block size does not need to capture the entire object, provided that there is enough surrounding data to gain the context necessary to achieve a reasonable classification. If a block contains more points than the specified point limit, multiple blocks will be created for the same location. For example, if the Block Point Limit parameter value is 10,000 and a given block contains 22,000 points, three blocks of 10,000 points will be created to ensure uniform sampling in each block. Some points will be repeated in two blocks, but all points will be stored in at least one block.
It is best to avoid using a block size and block point limit that will result in the creation of many blocks which exceed the point limit. The number of points in a given block size can be approximated using the LAS Point Statistics As Raster tool by generating an output raster which uses the Point Count option for the Method parameter. This raster should have a cell size that matches the desired block size. You can examine the image histogram of this raster to get an idea about the number of blocks that fall under a specific block size, and adjust the point limit accordingly.
The Block Point Limit parameter value must also factor the dedicated GPU memory capacity of the computer that will be used to train the deep learning model. The memory allocation during training will depend on the number of attributes that are used to train the model, the number of points in a given block, and the total number of blocks processed in a given iteration batch. If a larger block size is needed along with a larger point limit to effectively train the model, the batch size can be reduced in the training step to ensure that more points can be processed.
Ensure that the output is written to a location with enough disk space to accommodate the training data. This tool creates partially overlapping blocks of uncompressed HDF5 files that replicate each point in four blocks. In blocks that exceed the maximum point limit, some points can end up getting duplicated more than four times. The resulting training data can occupy at least three times more disk space than the source point cloud data.
If the spatial reference of the input point cloud does not use a projected coordinate system, the Output Coordinate System environment can be used to define a projected coordinate system that will be used when classifying its points.

Parameters

Label	Explanation	Data Type
Input Point Cloud	The point cloud that will be used to create the training data and, potentially, the validation data if no validation point cloud is specified. In this case, both the training boundary and the validation boundary must be defined.	LAS Dataset Layer; File
Block Size	The two-dimensional width and height of each HDF5 tile created from the input point cloud. As a general rule, the block size should be large enough to capture the objects of interest and their surrounding context.	Linear Unit
Output Training Data	The location and name of the output training data (*.pctd).	File
Training Boundary Features (Optional)	The boundary polygons that will delineate the subset of points from the input point cloud that will be used to train the deep learning model.	Feature Layer
Validation Point Cloud (Optional)	The source of the point cloud that will be used to validate the deep learning model. This dataset must reference a different set of points than the input point cloud in order to ensure the quality of the trained model . If the validation point cloud is not specified, both the Training Boundary Features and Validation Boundary Features parameter values must be provided.	LAS Dataset Layer; File
Validation Boundary Features (Optional)	The polygon features that will delineate the subset of points to be used for validating the trained model. If a validation point cloud is not specified, the points will be sourced from the input point cloud.	Feature Layer
Class Codes of Interest (Optional)	The class codes that will limit the exported training data blocks to only those that contain the specified values. All points in the block will be exported for any block which contains at least one of the class codes listed in this parameter.	Long
Block Point Limit (Optional)	The maximum number of points allowed in each block of the training data. When a block contains points in excess of this value, multiple blocks will be created for the same location to ensure that all of the points are used when training.	Long

arcpy.ddd.PreparePointCloudTrainingData(in_point_cloud, block_size, out_training_data, {training_boundary}, {validation_point_cloud}, {validation_boundary}, {class_codes_of_interest}, {block_point_limit})

Name	Explanation	Data Type
in_point_cloud	The point cloud that will be used to create the training data and, potentially, the validation data if no validation point cloud is specified. In this case, both the training boundary and the validation boundary must be defined.	LAS Dataset Layer; File
block_size	The two-dimensional width and height of each HDF5 tile created from the input point cloud. As a general rule, the block size should be large enough to capture the objects of interest and their surrounding context.	Linear Unit
out_training_data	The location and name of the output training data (*.pctd).	File
training_boundary (Optional)	The boundary polygons that will delineate the subset of points from the input point cloud that will be used to train the deep learning model.	Feature Layer
validation_point_cloud (Optional)	The source of the point cloud that will be used to validate the deep learning model. This dataset must reference a different set of points than the input point cloud in order to ensure the quality of the trained model. If the validation point cloud is not specified, both the training_boundary and validation_boundary parameter values must be provided.	LAS Dataset Layer; File
validation_boundary (Optional)	The polygon features that will delineate the subset of points to be used for validating the trained model. If a validation point cloud is not specified, the points will be sourced from the input point cloud.	Feature Layer
class_codes_of_interest [class_codes_of_interest,...] (Optional)	The class codes that will limit the exported training data blocks to only those that contain the specified values. All points in the block will be exported for any block which contains at least one of the class codes listed in this parameter.	Long
block_point_limit (Optional)	The maximum number of points allowed in each block of the training data. When a block contains points in excess of this value, multiple blocks will be created for the same location to ensure that all of the points are used when training.	Long

Code sample

PreparePointCloudTrainingData example (Python window)

The following sample demonstrates the use of this tool in the Python window.

import arcpy
arcpy.env.workspace = 'C:/data'
arcpy.ddd.PreparePointCloudTrainingData('training_source.lasd', '20 Meters', 'vegetation_training.pctd', 
                                        validation_point_cloud='validation_source.lasd')

Environments

Current Workspace, Scratch Workspace, Extent, Parallel Processing Factor, Output Coordinate System, Geographic Transformations

Licensing information

Basic: Requires 3D Analyst
Standard: Requires 3D Analyst
Advanced: Requires 3D Analyst