Creation of a deep learning model that can be used for point cloud classification involves two primary steps: the preparation of training data and the actual training. The first part is generally the hardest because it's on you to come up with the training data. Once you have that, most of the remaining work is performed by the computer.
Training data is classified points that you provide as examples for the neural network to learn. Generally, the more examples you provide, the better the training. Your examples need to be as accurate as possible, so attention to correctness is important. The scope of the classification needs to be on whatever features or landscape elements are of interest to your application. It's also desirable to have diversity in training data so the model is more generic, and diversity in validation data, so the performance metrics obtained will be more realistic. Diversity will enable better training.
Point clouds typically contain samples of all kinds of things, whatever a laser or photo can see. It's not realistic, nor is it desirable, for every point to be classified as a specific thing. What you need are points of interest to be classified correctly. You can leave other points in a background, or everything else, class. For example, if you're interested in powerlines, vegetation, or buildings, make sure to have them correctly classified. You can leave all the other points unclassified (for example, class 1 for LAS format lidar).
There isn't a fixed rule on how much data to use for training. Generally, the more examples you can provide, the better. Of course, there are practical limits and there will also be some point of diminishing returns for your effort.
Preparing training data
For training, you'll need a point cloud that's a good representation of the data you intend to classify. The ideal situation is when the data to be classified was collected as part of the same project as the training data; same hardware, same collection specifications. That has the best potential for success. You can use other data, but you want it to be similar in characteristic to the training data. Nominal point spacing and density is a key factor. Positional accuracy is another. If you opt to include point attributes, such as lidar return number and intensity in the training to improve model prediction, make sure these attributes exist in the data targeted for classification.
You can classify training data using both manual and automated techniques. There are interactive LAS class code editing tools and a set of rule-based classifiers. Sometimes, using a combination of these can be helpful. For example, establish a base classification using the Classify LAS Ground and the Classify LAS Building geoprocessing tools. Then, select some good representative sub-areas, or tiles if using tiled LAS, from your dataset as training sites. Manually clean up these areas and add some other classes if appropriate. Use these edited and improved sub-areas as training data.
Validation data is also required. It's used to provide an unbiased evaluation of a model during training and is useful for identifying and preventing overfitting, where the model only works well on the training data but nothing else. Validation data should be similar to the training data in that it covers representative areas of interest and uses the same classification scheme but comes from different locations. You can use the same project data as that used for training, just different subsets. Generally, more training data is needed than validation data. There’s no fixed rule, but it’s not uncommon to use several times more training data than validation data.
Once you have your training and validation data defined, you'll need to figure out an appropriate sampling neighborhood, or block size, that training should use for evaluating points.
Block size training and validation data is split up into manageable small blocks. These blocks of points are then placed into a format accessible to deep learning libraries. Size blocks appropriately. The goal for them is to contain a reasonable number of points relative to available GPU memory. Training will load points, and secondary data structures, for as many blocks at a time as set by a training parameter called batch size. There’s an interplay of several variables to pay attention to here. Batches control how many blocks are processed at a time. Blocks are sets of points and their attributes. The number of points in a block is determined by the size of block and the point density at the location of the block. Blocks will contain a relatively consistent number of points if the density of the point cloud is consistent. You can use datasets where the point density varies a great deal, but it will likely require more training to get it to work well.
While your estimate for the number of points in a block may be correct on average, there will always be variance, and you must establish an upper limit to the number of points in a block. You can do this with the block point limit parameter. When a block contains points in excess of this value, multiple blocks will be created for the same location to ensure that all of its data is used.
Try starting with blocks sized to contain about 8,000 points on average.
Estimating a block size, which is the length of the side of a block, requires you to know the nominal point spacing of the data and desired number of points per block:
block_size = square_root(target_point_count) * 2d_point_spacing
When evaluating the block size, you can also take into consideration the size of the objects or features of interest. For example, if your features are significantly smaller than the block size estimated above, you can opt to reduce the block size accordingly.
For a GPU with 8 GB dedicated RAM, use the default batch size of 2 to load two blocks worth of points into the GPU at a time. Monitor the GPU’s memory use. If you find a lot of GPU memory remains available during training, you can safely increase the batch size to process more blocks at a time.
Training involves the creation of a convolution neural network (CNN) using your training and validation data. The resulting model is used to classify LAS format point clouds through a process called inferencing. PointCNN is the open source deep learning framework used by ArcGIS for training and inferencing. You can use the model on your own data or share it for others to use on theirs. The training process is resource intensive and can take a long time. Fortunately, the result is compact. The models themselves are usually between 15–20 MB in size.
Output models are composed of multiple files which are placed together in an output folder. These include an Esri Model Definition (*.emd) file, which is a JSON file containing parameter settings and a *.pth data file, plus additional files you can review to assess training results. A Deep Learning Package (*.dlpk) is also output to the folder. It includes all relevant files packaged into one for the sake of sharing and publishing.
By default, you train a model from scratch, but you can include a pretrained model in the process. When you do this, you're producing a new model by improving upon an existing one. The additional training provides more examples the model can use to improve its ability to predict correct classifications for points.
The minimum points per block setting on the training tool is used to skip training blocks containing an insufficient number of points. Often, blocks around a project perimeter don't have many points. Additionally, while creating the training data, the block point limit may have been reached for one of more blocks. Subsequent blocks are made to hold the overflow points. In either case, blocks with relatively few points aren't particularly useful so it can be better to not include them in the training process.
The Managing Classes category on the Train Point Cloud Classification Model geoprocessing tool contain parameters associated with class remapping, classes of interest, and class naming.
Class remapping comes in handy when you need the output model to use a different set of class codes and their meanings relative to the input training data. It's also useful for merging classes. For example, to combine three vegetation classes into one.
Class codes of interest are the focus of training. By default, all classes in the input training data are used to create the model. Multiple classes can unnecessarily complicate training if you're just interested in one class, or type, of feature. For example, if you're just interested in creating a model to classify power line conductor wires, you can set a code of interest to be just that one (for example, class 14 following the LAS standard). When you do this, you'll be prompted for a background class code. That's the code for everything else. Thus, even though the training data may contain more classes, the trained model will only know how to classify two of them: the class of interest and the background.
The Training Parameters category on the Train Point Cloud Classification Model geoprocessing tool contains parameters specific to the training process itself rather than the data and classes involved.
Training is an iterative process. Passes over the data are made repeatedly until a criterion is met. One criterion is the Maximum Number of Epochs. An epoch represents one pass over the training data. Within an epoch, data is processed in batches. A batch is a collection of one or more blocks. Iterations Per Epoch is the percentage of batches processed within an epoch. Therefore, when specifying less than 100 percent, a subset of batches are processed. The Batch Size is the number of blocks in a batch. Blocks in a batch are processed in parallel. If your GPU has sufficient dedicated RAM, it can train using a larger batch size, which will typically reduce the time needed for training overall.
The learning rate is a tuning parameter that controls how much to adjust the model each time its weights are updated as it heads toward a goal of minimal loss. It influences how much new information overrides old, and thus represents the speed at which the model learns.
The determination of a learning rate involves a trade-off. Too small a value can result in a long training time, with the possibility of the model even becoming stuck. Too large a value can result in learning a sub-optimal set of weights and an unstable learning process.
It's difficult to come up with an initial value for learning rate. The Train Point Cloud Classification Model tool can estimate a value for you. It is recommend that you leave it blank, the default, the first time you train a model. Let the tool estimate a learning rate. The Train Point Cloud Classification Model tool reports the learning rate in its messages. It also reports the learning rate in a file named model_metrics.html, which is written to the output model folder when the training process completes. To learn more about the results generated from the Train Point Cloud Classification Model, see Assess point cloud training results.