Reference data is one of the key elements in building a locator, because your geocoding experience is only as good as the primary reference data on which the locator is based. Errors in the reference data cause poor matching quality. For example, if the geometry of the reference features is incorrect, the addresses that are matched to them will also be spatially incorrect. If the name of a reference feature is misspelled, correctly spelled addresses may go unmatched. There are several potential errors to keep in mind regarding your reference data. They are described below.
Incomplete geometry and address attributes
The world is constantly changing and your reference data must be updated to reflect these changes to have the best geocoding experience with the locators you create. For example, if a new housing tract is added to the city street network, the additional street segments, with their associated house number ranges, street names, and other properties, need to be added. The locator created based on the street network will not find the addresses in the new housing tract until the locator is also updated.
If the address attributes are incomplete or contain errors, such as incorrect address ranges or missing street names and ZIP Codes, matching the address to those features may return unexpected results. For locators created based on the Street Address role, features containing empty street names will cause a failure when building the locator. Thus, it is essential to review and correct the address attribute errors in the reference data.
Spatial reference and geometry errors
The reference data, such as a street or point address feature class, is usually produced based on a specific spatial reference. The coordinate system adopted in the feature class determines how the features are georeferenced. When a locator is created, information of the spatial reference is stored in the locator. Locations of addresses geocoded against the locator will be georeferenced on the same spatial reference. It is important to make sure that the reference data contains a spatial reference.
To draw a feature on a map, the feature is required to contain a valid shape or geometry. If the shape of features in the reference data is null or empty, the locator will fail to build with the Create Locator tool. Geometry errors such as coincident line segments that are not completely snapped to a vertex, polygons with self-intersections, or incorrect ring ordering can also prevent addresses from being matched to intersections as well as a failure to build the locator. It is helpful to run tools like Check Geometry and Repair Geometry against the reference data to check and repair geometry errors. The Planarize editing tool can also be used to modify line features, if matches are not returned for intersection addresses due to invalid connectivity of lines and vertices in the data.
Formatting the reference data
The following elements are used to format reference data:
ArcGIS Pro comes with several predefined locator roles that have specific requirements for the primary reference data that it can use to match addresses. You can use data from your organization or from other data providers, but they each will contain some common address elements that are required to build a locator. These address elements need to be broken into multiple fields within the attribute table of the reference data. Common address elements and their descriptions are shown in the following table:
Left house number range
A low number and a high number of the address range for the left side of the street, such as 100, 198
Right house number range
A low number and a high number of the address range for the right side of the street, such as 101, 199
The odd, even, or mixed house number range value on the left side of the street, such as O, E, or B
The odd, even, or mixed house number range value on the right side of the street, such as O, E, or B
A direction that precedes the street name, such as the W in W. Redlands Blvd.
A street type that precedes the street name, such as Avenue in Avenue B
The name of the street, such as Cherry in Cherry Rd.
A street type that follows the street name, such as St. in New York St.
A direction that follows the street name, such as NW in Bridge St. NW
A subaddress unit type that follows the street name, such as Apt. in Gilman Ave., Apt. 17
A subaddress unit ID that follows the street name, such as 2C in Orange St., Suite 2C
A floor subunit ID that follows the level type, such as 3 in Level 3
A floor subunit classification that follows the building name, such as Floor in Building C, Floor 3
A building subunit ID that follows the building type, such as 14 in Building 14
A building subunit classification that follows the street name, such as Bldg. in Orchard Ct., Bldg. F
A subsection of a city or district, such as Little Italy
A city name, such as Olympia
A county name, such as San Bernardino
A state name or its abbreviation, such as Washington or WA
The postal codes used by the United States Postal Service, such as 98501
ZIP Code extension
The postal codes with an additional extension used by the United States Postal Service ZIP+4 , such as 90210-3841
A three-digit ISO 3166-1 code for a country, such as USA
A three-digit MARC language code, representing the language of the address, such as ENG
Country and language
Country is used to help handle local address patterns and formats, as well as identify where country-specific geocoding logic should be applied to the reference data when building a locator. If the primary reference data used to build a locator contains data for multiple countries, you will be able to perform a search of addresses or locations that are within all countries, or limit your search and exclude matches outside of a selection of countries.
Language is also used to help format the output label where more than one language can be used in a country to represent addresses, as well as to identify where language-specific geocoding logic should be applied to the reference data when building a locator. If language is included in the primary reference data used to build the locator, you will be able to search for addresses or locations within the same country or region using multiple languages. For example, North America is a multilingual region and each feature in the reference data is represented by a record for each language spoken, such as English, French, and Spanish. This means that you would be able to search for the same address or place in all languages represented in the data using a single locator.
When building a locator with data for multiple countries or languages, country code and language code fields are required when selecting the <As defined in data> option.
Specifying an extent for each feature (optional)
If a locator is specified with predefined x,y minimums and maximums for each feature from the reference data, these values from the locator will be used as the extent to which the feature is zoomed to. The ArcGIS World Geocoding Service, for example, contains these predefined values.
The following four elements define the extent of the feature. You can create these fields and assign the values in your reference data. They can be in latitude-longitude coordinates or projected values that are in the same spatial reference of the reference data. You can specify these fields when you create the locator.
Minimum x-coordinate value
Minimum y-coordinate value
Maximum x-coordinate value
Maximum y-coordinate value
If these fields are not specified, the default zoom scale defined by ArcGIS Pro is used.