Generalizing large datasets using partitions

Geoprocessing tools that consider multiple themes of data contextually must load all input data into memory before processing can begin. The memory limitations of these tools can be easily exceeded by large datasets or numerous input datasets. Partitioning is a way to subdivide a large amount of data into smaller, more manageable sets of features.

When the tools are run on partitioned data, each partition is processed sequentially. The features on or near partition boundaries are managed closely to avoid discrepancies. Additional data beyond each partition is loaded by the tool and considered during processing, but only the features within the partition are modified at that time. The result is a seamless final output.

Generally, if there are more than about 100,000 features collectively in all the input layers, or if features are complex with a large number of vertices, consider using partitioning to run the tool. The following tools can be enabled for partitioning:

How to enable partitioning

Partitioning is enabled for the geoprocessing tools listed above by specifying a partition feature class in the Cartographic Partitions geoprocessing environment setting. Using this setting prompts the applicable tools to process input features sequentially in portions instead of all at once.

The partition feature class should sensibly cover the area of interest and somewhat evenly divide the input features. Partitions that are too large will still exceed memory limitations, but using partitions that are too small will diminish the contextual considerations of the tool, and the quality of the results may suffer.

What to use as partitions

Partition features can come from different sources. Some workflows may include inherent logical partitions already, such as the data extents shown on a contiguous set of printed maps. Map sheets modeled as polygons often make ideal partitions. In this case, you could use the Grid Index Features tool to create a grid of rectangular polygon features. These will make reasonable partitions provided that the input data is relatively uniformly distributed across the area of interest.

In web mapping, the cache tiling scheme may make an appropriate set of partitions. Consider using the Map Server Cache Tiling Scheme To Polygons tool to create a grid of polygons representing this scheme. Similar to using map sheet extents, this is a valid workflow when input features are somewhat uniformly distributed.

In some workflows, a dataset may include a feature class that forms natural contiguous partitions, such as counties or ZIP Codes. Assuming that these features adequately cover and divide input features, they can be used as partitions. This is a good approach with distributions of data that vary in density. For example, ZIP Code polygons are likely to be smaller where there is a higher density of residences, so ZIP Codes may make good partitions when resolving building conflicts.

If no suitable polygons are readily available, you can also create some specifically for partitioning. Use the Create Cartographic Partitions tool to make a contiguous set of polygons that enclose a roughly equal number of input features or vertices.

Partition requirements

  • The size of each partition should be such that it does not enclose more input data than will exceed tool capabilities. This threshold is determined generally by the number of features from all input layers and the complexity of those features. It will also vary depending on which tool is run and how the parameters have been defined. As a general guideline, consider partitions that contain no more than about 50,000 input features. If using vertices for the partition method, choose a value based on the amount of memory available. While it will vary by tool, 1 million vertices will use around 0.5 GB of memory.
  • Partition features should represent a logical subdivision of input features that will be processed by the tools that honor this setting. Input features should be somewhat evenly distributed among partition features. These may be a spatially related set of features like counties or other administrative boundaries; polygons that represent individual map sheets, like those created with the Grid Index Features tool; or polygon partitions specifically created for this purpose by the Create Cartographic Partitions tool.
  • Partition features must be topologically correct. Adjacent polygon edges should match, and there can be no overlaps. Holes between partition features are acceptable, but partition features cannot be multipart polygons or polygons with holes. Polygons must have simple, nonoverlapping geometry.
  • Each partition polygon must have an area greater than zero. Null or empty partitions will not be processed and will raise a warning. These partitions will be ignored in processing.
  • Partition features should represent a logical subdivision of input features that will be processed by the tools that honor this setting. Input features should be somewhat evenly distributed among partition features.
  • The extent of the input features should be covered by the partition features.
  • Partition geometry should be as simple as possible. Complex geometries will impact tool performance when partitioning is enabled.

How processing works with partitioning

When partitioning is enabled (by specifying a partition feature class in the Cartographic Partitions geoprocessing environment setting), the partition-enabled tools process input data in sections, as defined by the partitions. Partitions are processed in the order of the object ID of each partition. To process only specific areas of the map, use a layer in the map as the environment variable, and then select just the relevant partition features before processing. If the partition feature class does not completely cover the inputs, just those areas covered by the partitions will be processed.

Even when data is partitioned, there may be situations where the amount of input data delineated by a single partition will still exceed the memory limitations of the processing tool. In this case, the processing for that partition fails, and the processing proceeds to the next partition. Geoprocessing messaging indicates which partitions were not processed. A field named STATUS is appended to the partition feature class, and populated with one of the following statements outlining its state:

  • 0 - Not Processed
  • 1 - Being Processed
  • 2 - Successfully Processed
  • 3 - Out of Memory
  • 4 - Error

Tip:

If you need to preserve the processing state present in the STATUS field, add a new field to the data and calculate the field to the STATUS field before running the next partition-enabled tool.

Related topics