Skip To Content

Export Training Data For Deep Learning

Available with Spatial Analyst license.

Available with Image Analyst license.


Uses a remote sensing image to convert labeled vector or raster data into deep learning training datasets. The output is a folder of image chips and a folder of metadata files in the specified format.


  • This tool will create training datasets to support third-party deep learning applications, such as Google TensorFlow, PyTorch, or Microsoft CNTK.

  • Use your existing classification training sample data, or GIS feature class data such as a building footprint layer, to generate image chips containing the class sample from your source image. Image chips are often 256 pixel rows by 256 pixel columns, unless the training sample size is larger.

  • Deep learning class training samples are based on small subimages containing the feature or class of interest, called an image chip.


ExportTrainingDataForDeepLearning (in_raster, out_folder, in_class_data, image_chip_format, {tile_size_x}, {tile_size_y}, {stride_x}, {stride_y}, {output_nofeature_tiles}, {metadata_format}, {start_index}, class_value_field, buffer_radius)
ParameterExplanationData Type

The input source imagery, typically multispectral imagery.

Examples of the type of input source imagery include multispectral satellite, drone, aerial or National Agriculture Imagery Program (NAIP).

Raster Dataset; Raster Layer; Mosaic Layer; Image Service; MapServer; Map Server Layer; Internet Tiled Layer

Specify a folder where the output image chips and metadata will be stored.


Labeled data, in either vector or raster form.

Vector inputs should follow a training sample format as generated by the ArcGIS Pro Training Sample Manager.

Raster inputs should follow a classified raster format as generated by the Classify Raster tool.

Feature Class; Feature Layer; Raster Dataset; Raster Layer; Mosaic Layer; Image Service

Specifies the raster format for the image chip outputs.

  • TIFFTIFF format
  • PNGPNG format
  • JPEGJPEG format
  • MRFMRF (Meta Raster Format)

The size of the image chips for the X dimension.


The size of the image chips for the Y dimension.


The distance to move in the X direction when creating the next image chip.

When stride is equal to tile size, there will be no overlap. When stride is equal to half the tile size, there will be 50 percent overlap.


The distance to move in the Y direction when creating the next image chip.

When stride is equal to tile size, there will be no overlap. When stride is equal to half the tile size, there will be 50 percent overlap.


Specifies whether image chips with overlapped labeled data will be exported.

  • ALL_TILESAll image chips, including those that do not overlap labeled data, will be exported. This is the default.
  • ONLY_TILES_WITH_FEATURESOnly image chips that overlap labelled data will be exported.

Specifies the format of the output metadata labels.

The four options for output metadata labels for the training data are KITTI rectangles, PASCAL VOC rectangles, Classified Tiles (a class map), and RCNN Masks. If your input training sample data is a feature class layer, such as a building layer or standard classification training sample file, use the KITTI or PASCAL VOC rectangles option. The output metadata is a .txt file or .xml file containing the training sample data contained in the minimum bounding rectangle. The name of the metadata file matches the input source image name. If your input training sample data is a class map, use the Classified Tiles option as your output metadata format.

  • KITTI_rectanglesThe metadata follows the same format as the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) Object Detection Evaluation dataset. The KITTI dataset is a vision benchmark suite. This is the default.The label files are plain text files. All values, both numerical and strings, are separated by spaces, and each row corresponds to one object.
  • PASCAL_VOC_rectanglesThe metadata follows the same format as the Pattern Analysis, Statistical Modeling and Computational Learning, Visual Object Classes (PASCAL_VOC) dataset. The PASCAL VOC dataset is a standardized image dataset for object class recognition.The label files are XML files and contain information about image name, class value, and bounding boxes.
  • Classified_TilesThis option will output one classified image chip per input image chip. No other metadata for each image chip is used. Only the statistics output has more information on the classes, such as class names, class values, and output statistics.
  • RCNN_MasksThis option will output image chips that have a mask on the areas where the sample exists. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone in the deep learning framework model.

The table below describes the fifteen values in the KITTI metadata format. Only five of the possible fifteen values are used in the tool: the class name (in column 1) and the minimum bounding rectangle comprised of four image coordinate locations (columns 5 to 8). The minimum bounding rectangle encompasses the training chip used in the deep learning classifier.



Class value

The class value of the object listed in the stats.txt file.






The two-dimensional bounding box of objects in the image, based on a 0-based image space coordinate index. The bounding box contains the four coordinates for the left, top, right, and bottom pixels.



For more information, see KITTI metadata format KITTI metadata.

The following is an example of the PASCAL VOC:

<?xml version=”1.0”?>
- <layout>
    - <part>
       - <bndbox>

For more information, see PASCAL Visual Object ClassesPASCAL Visual Object Classes.


The start index for the sequence of image chips. This allows you to append more image chips to an existing sequence. The default value is 0.


The field that contains the class values. If all fields are specified, the system searches for a value or classvalue field. If the feature does not contain a class field, the system determines that all records belong to one class.


The radius for point feature classes to delineate a training sample area.


Code sample

ExportTrainingDataForDeepLearning example 1 (Python window)

This example creates training samples for deep learning.

from import *

ExportTrainingDataForDeepLearning("c:/test/image.tif", "c:/test/outfolder", 
                                 "c:/test/training.shp", "TIFF", "256", 
"256", "128", "128", "NO", "KITTI_rectangles", 0, "Classvalue", 1)
ExportTrainingDataForDeepLearning example 2 (stand-alone script)

This example creates training samples for deep learning.

# Import system modules and check out ArcGIS Spatial Analyst extension license
import arcpy
from import *

# Set local variables
inRaster = "c:/test/image.tif"
out_folder = "c:/test/outfolder"
in_training = "c:/test/training.shp"
image_chip_format = "TIFF"
tile_size_x = "256"
tile_size_y = "256"
start_index = 0
classvalue_field = "Classvalue"
buffer_radius = "1"

# Execute 
ExportTrainingDataForDeepLearning(inRaster, out_folder, in_training, 
                                 image_chip_format,tile_size_x, tile_size_y, 
                                 stride_x, stride_y,output_nofeature_tiles, 
metadata_format,start_index, classvalue_field, buffer_radius)

Licensing information

  • Basic: Requires Spatial Analyst or Image Analyst
  • Standard: Requires Spatial Analyst or Image Analyst
  • Advanced: Requires Spatial Analyst or Image Analyst

Related topics