Available with LocateXT license.
The ArcGIS LocateXT extension allows you to use the Extract Locations pane to search unstructured data for spatial locations and generate point features representing those locations.
Unstructured data is any text or document including, but not limited to, web pages, reports, emails, and social media content. Microsoft Office documents (Word, PowerPoint, and Excel), Adobe PDF documents, text files, and so on can all be processed. The Extract Locations pane can process many folders and files at a time, or scan an entire disk. You can also drag text from an email or a web page onto the pane to be analyzed.
Each point in the output feature class has content in the attribute table indicating the file in which the spatial location was found. Text around the spatial location is extracted from the original document and stored in attributes to provide context for the location. Dates and keywords associated with the location can also be extracted. The Extract Locations pane does not automatically recognize text that represents an address as a spatial location and, therefore, cannot use a locator to produce a point representing that location.
The capabilities provided in the Extract Locations pane are also available using the Extract Locations From Document and Extract Locations From Text geoprocessing tools.
Free-form text example
If you are reviewing news articles about earthquakes in Alaska and want to see each location that is mentioned in an article on a map, the sample input text below can be copied and pasted directly into the pane.
Alaska averages 100 earthquakes a day. The tectonics of the region are dominated by the interaction of the Pacific and North American plates. This interaction has accounted for three of the largest recorded earthquakes in history. The largest, measuring 9.2 on the Richter scale, occurred in the Prince William Sound (60.91°N, 147.34°W) on March 28th, 1964. The second largest Alaskan earthquake, measuring 8.7, occurred on February 4th, 1965, near the Rat Islands (51.25°N, 178.72°E). The third, measuring 8.6, occurred on March 9th, 1957, near the Andreanof Islands (51.50°N, 175.63°W).
Once the locations of the three earthquakes have been extracted from the input text, the output feature class appears in the Contents pane, and the points are visible in the active map.
By default, any dates found in the input text that are within a relatively recent time period are also extracted and recorded in the attribute table of the output feature class. However, the dates in the input text above fall outside the default date range.
Some international spatial coordinate formats and international date formats are not recognized by default when they occur in the input text. For example, this includes when a spatial coordinate uses direction abbreviations that are translated to languages other than English, and when dates are not provided in an ISO format but are written in a language other than English. The settings can be customized to correctly recognize coordinates and dates in other languages, either instead of or in addition to English.
Semistructured text example
In addition to recognizing spatial coordinates, custom locations can be defined that associate a place with a spatial coordinate. For example, if the word Portland is found anywhere in a document, a point representing a location in the city can be associated with that word. Similarly, if an airport code is found, a point representing the airport's location can be associated with that code.
Some documents have a certain amount of structure. If you have a folder full of travel forms, information can be extracted from them and stored in custom attributes in the output feature class's attribute table. For example, with a document containing the input text below, custom attributes can be defined that extract text following the labels Name:, Address:, and Purpose of travel:. Later, the attributes can be processed using other tools available in ArcGIS Pro.
Name: Doe, Jane
Address: 380 New York St, Redlands, CA, 92373
Purpose of travel: Meet with the team at the Esri R&D Center at 309 SW 6th Ave #600, Portland, OR, 97204.
Once the custom locations in Portland have been extracted from the input text, the output feature class appears in the Contents pane, and the points are visible in the active map. Click features to explore the information extracted from the document. For this example, dates are extracted and stored in the attribute table. Custom attributes are also used to extract text from the end of a label to the end of the line, and store that content in fields representing the name and address of the person traveling and the reason for the trip. An additional custom attribute is used to locate any keywords that exist in the document and store them in another field.
The contents of the Address field in the output feature class's attribute table can be geocoded using other tools; it is not geocoded automatically. Similarly, the address stored in the free-form text describing the purpose of the trip is not automatically identified as an address and geocoded.
If you have structured text data, such as a comma-delimited text file, where the x- and y-coordinates are stored in separate columns of the table, use the XY Table to Point tool to create point features representing these locations.