Collapse duplicate features in the data—ArcGIS Pro

Reference data can be formatted to contain duplicate features that represent the same location, but with different attributes, as a way of creating a locator that supports alternate names. This is illustrated in the data below, in which 12725 Yosemite Blvd, Waterford and 12725 CA-132, Waterford have the same geometry but different values in the FullStreetName field.

PointAddress attribute table with duplicate features for the same location with different names

The recommended method for creating a locator that supports alternate names for features is to add the alternate values to a table and use an alternate name table role that corresponds to the primary locator role. However, if duplicate features exist in the reference data, alternate values can be created and duplicate geometries will be excluded when the locator is built with the Create Locator tool. To remove duplicate geometries, the primary reference data should contain a field with an ID that connects the duplicate features with the same location. This ID field must be mapped to a Feature ID field from the locator role, such as POINT_ADDRESS_ID. This reduces the size of the locator and removes excessive tied candidates from geocoding results.

PointAddress attribute table with POINT_ADDRESS_ID field to link duplicate features for the same location

When the primary reference data has duplicate features with different street name values and you want to specify which street name is the primary name, the reference data must have a field that contains a flag that indicates which street name is going to be the primary name returned when geocoding. This field must be mapped to the Primary Street Name Indicator field from the locator role, such as PrimaryStreetFlag. If the Feature ID is mapped, the Primary Street Name Indicator field is used to define Preferred Street Name of features with the same Feature ID. If the Feature ID is not mapped, each street name from the primary reference data is marked as Primary because de-duplication will not work and each street name is stored independently.

Feature class attribute table with Feature ID and Primary Street Name Indicator

The Create Locator tool uses the values mapped to the Feature ID field to skip all duplicate geometries, except the first geometry that is encountered, which is stored in the locator. The alternate attribute values are created based on the matching IDs of the duplicate features.

POINT_ADDRESS_ID field assigned to the Feature ID locator role field in the Create Locator tool

Note:

If the reference data does not include the ID field, it can be added using the Find Identical tool. The Shape field can be used to find duplicates in the primary reference data based on the assumption that they have the same geometry. Duplicates can occur in the reference data when two separate addresses or places of interest (POIs) share the same location, which can be problematic. This procedure does not work in all cases. If the Shape field is used with the Find Identical tool, the output table will contain identical IDs for the duplicate features. Then, it can be joined with the primary reference data and used to build the locator by assigning the new ID field to the Feature ID locator role field in the Create Locator tool.

If you have a point feature class you want to use as primary reference data and it contains 13 million features, of which 10 million are unique features, mapping the Feature ID field will activate the functionality in the Create Locator tool to remove duplicate geometries. The result is a locator that is reduced from 253 MB to 200 MB in size.

Feedback on this topic?