Skip To Content

Tips for preparing reference data

Reference data is one of the key elements in building a locator, because your geocoding experience is only as good as the primary reference data on which the locator is based. Errors in the reference data cause poor matching quality. For example, if the geometry of the reference features is incorrect, the addresses that are matched to them will also be spatially incorrect. If the name of a reference feature is misspelled, correctly spelled addresses may go unmatched. There are several potential errors to keep in mind regarding your reference data. They are described below.

Incomplete geometry and address attributes

The world is constantly changing and your reference data must be updated to reflect these changes to have the best geocoding experience with the locators you create. For example, if a new housing tract is added to the city street network, the additional street segments, with their associated house number ranges, street names, and other properties, need to be added. The locator created based on the street network will not find the addresses in the new housing tract until the locator is also updated.

If the address attributes are incomplete or contain errors, such as incorrect address ranges or missing street names and ZIP Codes, matching the address to those features may return unexpected results. For locators created based on the Street Address role, features containing empty street names will cause a failure when building the locator. Thus, it is essential to review and correct the address attribute errors in the reference data.

Learn more about updating your reference data

Spatial reference and geometry errors

The reference data, such as a street or point address feature class, is usually produced based on a specific spatial reference. The coordinate system adopted in the feature class determines how the features are georeferenced. When a locator is created, information of the spatial reference is stored in the locator. Locations of addresses geocoded against the locator will be georeferenced on the same spatial reference. It is important to make sure that the reference data contains a spatial reference.

Learn more about spatial references

To draw a feature on a map, the feature is required to contain a valid shape or geometry. If the shape of features in the reference data is null or empty, the locator will fail to build with the Create Locator tool. Geometry errors such as coincident line segments that are not completely snapped to a vertex, polygons with self-intersections, or incorrect ring ordering can also prevent addresses from being matched to intersections as well as a failure to build the locator. It is helpful to run tools like Check Geometry and Repair Geometry against the reference data to check and repair geometry errors. The Planarize editing tool can also be used to modify line features, if matches are not returned for intersection addresses due to invalid connectivity of lines and vertices in the data.

Formatting the reference data

The following elements are used to format reference data:

Address elements

ArcGIS Pro comes with several predefined locator roles that have specific requirements for the primary reference data that it can use to match addresses. You can use data from your organization or from other data providers, but they each will contain some common address elements that are required to build a locator. These address elements need to be broken into multiple fields within the attribute table of the reference data. Common address elements and their descriptions are shown in the following table:

ElementDescription

Left house number range

A low number and a high number of the address range for the left side of the street, such as 100, 198

Right house number range

A low number and a high number of the address range for the right side of the street, such as 101, 199

Left Parity

The odd, even, or mixed house number range value on the left side of the street, such as O, E, or B

Right Parity

The odd, even, or mixed house number range value on the right side of the street, such as O, E, or B

Prefix direction

A direction that precedes the street name, such as the W in W. Redlands Blvd.

Prefix type

A street type that precedes the street name, such as Avenue in Avenue B

Street name

The name of the street, such as Cherry in Cherry Rd.

Suffix type

A street type that follows the street name, such as St. in New York St.

Suffix direction

A direction that follows the street name, such as NW in Bridge St. NW

Unit type

A subaddress unit type that follows the street name, such as Apt. in Gilman Ave., Apt. 17

Unit name

A subaddress unit ID that follows the street name, such as 2C in Orange St., Suite 2C

Level type

A floor subunit ID that follows the level type, such as 3 in Level 3

Level name

A floor subunit classification that follows the building name, such as Floor in Building C, Floor 3

Building Name

A building subunit ID that follows the building type, such as 14 in Building 14

Building Type

A building subunit classification that follows the street name, such as Bldg. in Orchard Ct., Bldg. F

Neighborhood

A subsection of a city or district, such as Little Italy

City name

A city name, such as Olympia

County

A county name, such as San Bernardino

State

A state name or its abbreviation, such as Washington or WA

ZIP Code

The postal codes used by the United States Postal Service, such as 98501

ZIP Code extension

The postal codes with an additional extension used by the United States Postal Service ZIP+4 , such as 90210-3841

Country

A three-digit ISO 3166-1 code for a country, such as USA

Language

A three-digit MARC language code, representing the language of the address, such as ENG

Learn more about each locator role provided with ArcGIS Pro and the requirements for reference data

Country and language

Country is used to help handle local address patterns and formats, as well as identify where country-specific geocoding logic should be applied to the reference data when building a locator. If the primary reference data used to build a locator contains data for multiple countries, you will be able to perform a search of addresses or locations that are within all countries, or limit your search and exclude matches outside of a selection of countries.

Language is also used to help format the output label where more than one language can be used in a country to represent addresses, as well as to identify where language-specific geocoding logic should be applied to the reference data when building a locator. If language is included in the primary reference data used to build the locator, you will be able to search for addresses or locations within the same country or region using multiple languages. For example, North America is a multilingual region and each feature in the reference data is represented by a record for each language spoken, such as English, French, and Spanish. This means that you would be able to search for the same address or place in all languages represented in the data using a single locator.

Table showing records with multiple languages, country codes, and language codes

When building a locator with data for multiple countries or languages, country code and language code fields are required when selecting the <As defined in data> option.

Specifying an extent for each feature (optional)

If a locator is specified with predefined x,y minimums and maximums for each feature from the reference data, these values from the locator will be used as the extent to which the feature is zoomed to. The ArcGIS World Geocoding Service, for example, contains these predefined values.

The following four elements define the extent of the feature. You can create these fields and assign the values in your reference data. They can be in latitude-longitude coordinates or projected values that are in the same spatial reference of the reference data. You can specify these fields when you create the locator.

ElementDescription

Xmin

Minimum x-coordinate value

Ymin

Minimum y-coordinate value

Xmax

Maximum x-coordinate value

Ymax

Maximum y-coordinate value

If these fields are not specified, the default zoom scale defined by ArcGIS Pro is used.

Related topics