Collapse duplicate features in the data

Reference data can be formatted to contain duplicate features that represent the same location, but with different attributes, as a way of creating a locator that supports alternate names. This is illustrated in the data below, where 12725 Yosemite Blvd, Waterford and 12725 CA-132, Waterford have the same geometry but different values in the FullStreetName field.

PointAddress attribute table with duplicate features for the same location with different names

The recommended method for creating a locator that supports alternate names for features is to add the alternate values to a table and use an alternate name table role that corresponds to the primary locator role. However, if the reference data already contains duplicate features, alternate values can get created and duplicate geometries will be excluded when the locator is built with the Create Locator tool. To remove duplicate geometries, the primary reference data should contain a field with an ID that connects the duplicate features with the same location. This ID field must be mapped to a primary ID field from the locator role, such as POINT_ADDRESS_ID. This reduces the size of the locator and removes excessive tied candidates from geocoding results.

PointAddress attribute table with POINT_ADDRESS_ID field to link duplicate features for the same location

The Create Locator tool uses the values mapped to the primary ID field to skip all duplicate geometries, except for the first geometry that is encountered, which is stored in the locator. The alternate attribute values are created based on the matching IDs of the duplicate features.

Note:

If the reference data does not include the ID field, it can be added using the Find Identical tool. The Shape field can be used to find duplicates in the primary reference data based on the assumption they have the same geometry. It is possible that duplicates can occur in the reference data when two separate addresses or places of interest (POIs) share the same location, which can be problematic. This procedure will not work in all cases. If the Shape field is used with the Find Identical tool, the output table will contain identical IDs for the duplicate features. Then, it can be joined with the primary reference data and used to build the locator.

If you have a point feature class you want to use as primary reference data and it contains 13 million features, of which 10 million are unique features, mapping the primary ID field will activate the functionality in the Create Locator tool to remove duplicate geometries. The result is a locator that is reduced from 253 MB to 200 MB in size.

Primary ID fields for each role

RolePrimary ID

Point Address

Address Join ID

Parcel

Parcel Join ID

Street Address

Street Join ID

POI

Place Join ID

Distance Marker

Street Join ID

Distance Range

Street Join ID

Postal

Postal Join ID

Postal Extension

Postal Extension Join ID

Postal Locality

A combination of Postal Join ID and all of the mapped administrative areas' Join IDs is used as the primary ID in the Create Locator tool, so all of these should be mapped.

Zone

Zone Join ID

Block

Block Join ID

Sector

Sector Join ID

Neighborhood

Neighborhood Join ID

District

District Join ID

City

City Join ID

Metro Area

Metro Area Join ID

Subregion

Subregion Join ID

Region

Region Join ID

Territory

Territory Join ID

Country

Country Join ID