Comparison of classic overlay tools to pairwise overlay tools—ArcGIS Pro

The classic overlay tools (Buffer, Clip, Dissolve, Erase, Integrate, and Intersect) and the pairwise overlay tools (Pairwise Buffer, Pairwise Clip, Pairwise Dissolve, Pairwise Erase, Pairwise Integrate, and Pairwise Intersect) are designed to maximize performance and accuracy of analysis during the processing of very large and complex datasets on a single desktop. Various functional and performance differences between tools of similar functionality determine the tool you should use in your workflow. There are additional considerations that affect all geoprocessing tools that impact the accuracy of output and performance of analysis depending on the tool being used.

Tool comparison

The Pairwise Overlay toolset provides a number of alternative tools to traditional overlay tools.

Pairwise Buffer and Buffer

The following compares the Pairwise Buffer and Buffer tools:

Both tools use parallel processing. For the Pairwise Buffer tool, parallel processing is enabled by default. For the Buffer tool, it is enabled in the Parallel Processing Factor environment.
The output features of the Pairwise Buffer tool are less smooth than the output features created by the Buffer tool by default.
The Pairwise Buffer tool allows you to control the smoothness of the buffer output features. See the tool documentation for the Maximum Offset Deviation parameter.
The Buffer tool provides output buffer options such as side type and end type.

Pairwise Clip and Clip

Both tools use parallel processing. For the Pairwise Clip tool, parallel processing is enabled by default. For the Clip tool, it is enabled in the Parallel Processing Factor environment.

Pairwise Dissolve and Dissolve

The following compares the Pairwise Dissolve and Dissolve tools:

The output of these tools is similar and the tools can be used interchangeably.
The Pairwise Dissolve tool uses parallel processing by default. The Dissolve tool does not have parallel capabilities.

Pairwise Erase and Erase

The following compares the Pairwise Erase and Erase tools:

The output of these tools is similar and the tools can be used interchangeably.
Both tools use parallel processing. For the Pairwise Erase tool, parallel processing is enabled by default. For the Erase tool, parallel processing is enabled in the Parallel Processing Factor environment.

Pairwise Integrate and Integrate

The following compares the Pairwise Integrate and Integrate tools:

The Pairwise Integrate tool uses parallel processing by default.
The internal tolerance of the Pairwise Integrate tool is slightly larger than the Integrate tool due to differences in the underlying engines that perform the integration.

Pairwise Intersect and Intersect

The following compares the Pairwise Intersect and Intersect tools:

The output of these tools is fundamentally different from one another. The tools cannot be used interchangeably without evaluating your workflow and how it must change to account for the different outputs. For more information, see How Pairwise Intersect works.
Both tools use parallel processing. For the Pairwise Intersect tool, parallel processing is enabled by default. For the Intersect tool, it is enabled in the Parallel Processing Factor environment.

Functional and performance considerations for all analysis tools

When deciding which of these complementary tools to use in your workflow, both tools are often acceptable. However, the information below may help you understand the differences between tools when choosing which to use.

Functional considerations

The primary consideration when deciding which tool to use is whether their output meets the needs of your project. Some of the comparable tools create equivalent output while others do not. Key differences in the tool output have been noted in the previous section. See each tool's documentation for full details when comparing which tool to use.

For all tools, project on the fly during analysis should be avoided. Project on the fly is used when tool inputs do not share the same spatial reference. Project on the fly is also used when setting geoprocessing environments that modify the output coordinate system (Output Coordinate System, XY Tolerance, XY Resolution, and so on). Project on the fly may lead to inaccuracies from misaligned data among layers. See Coordinate systems and projections for more information.

XY Resolution and XY Tolerance have a functional impact on the output generated by all tools. See Feature class basics for more information. A large body of work has been performed over many decades to determine the appropriate XY Resolution and XY Tolerance to use to generate the most accurate result when processing data in its assigned spatial reference. It is recommended that you use the default XY Resolution and XY Tolerance for the input data spatial reference and that you do not modify them during data generation or using geoprocessing environments during analysis. To avoid inaccurate analysis results, it is also recommended that you do not use the XY Tolerance tool parameter.

Each tool can implement XY Tolerance with slight differences.
- The internal implementation of the classic overlay tools and the pairwise overlay tools is very different. While the tools share some similarities due to how geometry is stored in a geodatabase, how they process the data is fundamentally different so slight differences in the output geometries should be expected.
- The classic overlay tools can be iterative depending on the complexity of the data. This means the tolerance can be applied multiple times during analysis. The impact of modifying the XY Tolerance from its default to a larger or smaller value is multiplied. The further the XY Tolerance value used is moved from the default, the more likely issues will arise.
  - During the clustering process, the classic overlay tools will snap two points together, when the distance between them is less than or equal to the following:
```
2 * sqrt(2) * tolerance
```
  - During the cracking process, the classic overlay tools will assume a point is on the segment if the closest distance from the point to the segment interior is less than or equal to the following:
```
sqrt(2) * tolerance
```
    In this case, the segment will be split and the new endpoints will be snapped to the point.
- The pairwise tools assume a philosophy that the output has to be considered topologically clean if the output were then run through a classic overlay tool. That is, when processing geometries for a topological operation, the output should not include any new segment intersections or points that should have been clustered. For this to always be true, the calculations to determine the distance to use during the cracking and clustering process has been adjusted. This process can result in very small geometry differences between the outputs of the tools.
  For more information on clustering, see Feature class basics.
  - During the clustering process, the pairwise tools will snap two points together, when the distance between them is less or equal to the following:
```
1.01 * sqrt(2) * (2 * tolerance + 2 *
resolution)
```
  - During the cracking process, the pairwise tools will assume a point is on the segment if the closest distance from the point to the segment interior is less than or equal to the following:
```
1.01 * sqrt(2) * (tolerance + 2 *
resolution)
```
  - Note:
    A factor of 1.01 is used to increase the original value for stability.
Modifying the XY Tolerance incorrectly can cause failure, inaccurate analysis, feature movement, topological errors, and even crashing.
Modifying the XY Resolution incorrectly may cause the output geometry to no longer represent the input geometry accurately and the analysis results to be incorrect. Modifying the XY Resolution to be smaller than the default causes the size of the feature to increase. This may lead to changes in how the data is processed. For classic overlay tools, this increase in size may lead to more tiling of the data to complete the analysis within the available resources causing an increase in the number of vertices introduced into feature that cross tile boundaries. See Tiled processing of large datasets for more information.

Input feature geometry must be valid. All tools that process geometry are affected by bad geometry resulting in failure, inaccurate analysis, the process freezing, crash, or in the worst case, no indication of the issue. It is your responsibility to ensure all input data contains valid geometry.

Learn more about checking and repairing geometries

Classic overlay tools can be more sensitive to invalid geometries than the pairwise tools.

Performance considerations

Performance of all geoprocessing tools varies based on the size and complexity of the input data. There are a few trends that may help you choose a tool that deals with the scenario more efficiently.

Areas of massive overlap

Areas of massive overlap of lines and polygons can impede the performance of classic core overlay tools.

In the case of densely packed buffer output polygons, the buffer polygons may overlap to a point where selecting a single area in the data returns a selection of tens or even hundreds of thousands of buffer features. For core overlay tools such as Intersect or Erase, the determination of every unique polygon created from all this overlap is costly. The pairwise overlay tools may be better choices for this scenario.

In the case of densely packed intersecting lines, every point of intersection and overlap is determined up front by the classic overlay tools (Intersect, Erase, and Clip to some extent). This can be costly in severe cases. The pairwise overlay tools may be better choices for this scenario.

If your analysis relies on finding every instance of overlap (such as when using the Intersect or Union tools), you must use the classic overlay tools. However, if your tens of thousands or hundreds of thousands of input features result in an output of hundreds of millions of output features representing each unique incident of overlap, you may need to rethink your approach to make sense of the results of the overlay operation. A reevaluation of the goal of your analysis may reveal that a different scale of size and scope can help you better understand the outcome of the analysis. Breaking up the data into more reasonable areas of interest may improve the performance of classic overlay tools when the overlap of the input features is extreme.

Considerations for extremely large features

Neither the classic overlay tools nor the pairwise overlay tools can accommodate features that exceed the resources of the machine processing the data.

These features are so large they cannot be accommodated on the machine where you are performing the analysis. They are often the result of a Raster To Polygon or Dissolve operation that was performed on a machine that had more available resources than the machine performing the analysis.
These features may fail to draw or only partially draw.
Using these features with a geoprocessing tool may cause an apparent processing freeze, out of memory errors, incorrect output, or even, in severe cases, a crash.
A feature can be too large on one machine and not another. Whether a feature is considered too large depends on the amount of RAM available on the machine performing the analysis. The more RAM a machine has, the bigger a feature can be before causing issues.
The maximum size an individual feature can be in the geodatabase is 2 GB.
When you have features that are too large to be processed, they must be broken up using the Dice tool for the analysis to be successful. Other tools that are traditionally used to manipulate existing features most likely will fail to edit a feature with millions of vertices.

Dicing Godzillas (features with too many vertices)

Features that are too large may cause issues during processing, but you may be able to perform simple tasks using them. They contain a very large number of vertices but are not so large as to cause issues on their own. However, they are large enough to impede performance and cause failures in both classic overlay and pairwise overlay tools.

These features draw and can be used in simple analysis without issue, but even during these simple operations, a noticeable performance degradation and increase in memory footprint may occur.
In some cases, if these features span a large part of the area of interest, the feature often must be duplicated in memory for processing (in particular for parallel processing). This can eventually lower the amount of available resources on the machine. Severe performance degradation and failure may occur.
These features cause performance degradation in both the classic overlay tools and the pairwise overlay tools. Pairwise overlay tools have a simpler approach to analyzing the data and generally have a smaller memory footprint, so they may accommodate these type of features more successfully.
If the data has very large features, you may want to reevaluate your analysis goals. For example, consider whether you need millimeter accuracy for continent coastline analysis or to define bird migration boundaries.
There are two methods to improve the analysis when the data contains large features. If you don't need the level of accuracy represented in the data, you can simplify the data (using Simplify Polygon or similar tools). To maintain the accuracy of your features, you can break up the features into smaller parts. If the feature has many parts, the Multipart To Single Part tool may decrease the feature's size enough to be successful with the analysis. If not, the Dice tool can be used to break up the feature prior to analysis. The features that are broken apart can be reassembled later in the workflow using Dissolve or Pairwise Dissolve with a unique ID assigned to each feature prior to it being broken apart.

Spatial reference considerations and projection on the fly

If you are using a geoprocessing tool with two inputs that have different spatial references, most tools set the output coordinate system to the first input, and data from the second input with a different spatial reference is projected to this coordinate system for analysis. Dynamically changing the projection on the fly can cause performance degradation (and lead to inaccuracies from misaligned data among layers). Whenever possible, for reliable performance and output accuracy, all inputs should share the same spatial reference.

Feedback on this topic?