Extract locations from documents and text

Available with LocateXT license.

As part of the ArcGIS LocateXT extension, the ArcGIS Pro Extract Locations pane allows you to scan documents and text for spatial coordinates and custom locations. Open the map to which you would like to add any locations that are found. Points representing the locations are stored in a feature class and are added as a layer to the active map.

Open the Extract Locations pane

A map must be active in ArcGIS Pro to open the Extract Locations pane.

  1. Create or open a map. For example, on the Insert tab, in the Project group, click New Map.
  2. On the Map tab, in the Layer group, click the Add Data drop-down menu and click Extract Locations Extract Locations.

Extract locations

In the Extract Locations pane, the Extract tab allows you to specify the following:

  • The files, folders, or text that will be scanned for locations
  • The name of the map layer and output feature class that will be created or updated
  • The coordinate system of the output feature class, when one is created

Each time you extract locations from documents or text, you can choose if a new feature class is created and a new layer is added to the active map, if an existing map layer and feature class is updated, or if an existing feature class is overwritten.

Add a new layer to the map

A feature class is created to store the extracted locations. A map layer is created in the active map to display the contents of the feature class.

  1. Open the Extract Locations pane.
  2. Provide a name for the new map layer and feature class that will be created.
    • Type a name for the new map layer and feature class in the Name combo box. A new feature class will be created with this name in the project's default geodatabase.
    • Click the Browse button Browse. In the New Feature Class dialog box, browse to the location in which you want to create a new feature class or shapefile. Type a name for the new item into the Name text box and click Save.
      Caution:

      If you select an existing feature class instead of providing a name for a new feature class, a warning will appear in the Extract Locations pane. The existing feature class will be deleted and a new feature class with the same name will be created. Other maps may be affected.

  3. Click the Coordinate System drop-down list or the Select coordinate system button Coordinate System and click the coordinate system you want to use for the output feature class.

    The coordinate system of the input features is specified independently on the Coordinates tab and in the custom locations file. The locations that are found are transformed to the output feature class's coordinate system.

  4. Click the Files and Folders tab and specify the items to scan for locations.
    • Drag files and folders from Windows Explorer onto the tab.
    • Click Browse. On the Add Files and Folders dialog box, browse to and select the appropriate files or folders and click OK. Click Add More to add files and folders to the list.
  5. Click the Text tab and specify any text to scan for locations.
    • Copy text from a document, email, or web page, and paste it on the tab.
    • Select the text to scan in a document, email, or web page. Drag it to ArcGIS Pro and onto the tab.
  6. At least one file or folder, or text, must be specified as input. All may be scanned at once, if appropriate.
  7. Click Extract.

You can cancel the process at any time, if necessary. A message appears at the bottom of the pane when the process is complete, indicating if it was successful.

The specified feature class is created and locations that are found are stored in the feature class as points. A map layer referencing the feature class is added to the active map. If no locations are found in the documents and text, the feature class and map layer will be empty.

Note:

If you chose to overwrite an existing feature class that was previously added to the map, a new map layer will be created and added to the map that accesses the new feature class.

To extract locations from a different set of documents or text captured from a different location, click Clear All Input at the bottom of the Extract tab. All files will be removed from the list on the Files and Folders tab and all text will be removed from the Text tab. Specify a new set of items to process.

Update an existing layer in the map

You can progressively add locations to an existing feature class. For example, every week you can process a new set of reports and add locations from those files to the existing set. Or, after processing a sample set of documents, when you are satisfied with the results, you can process additional documents and add those locations to the existing feature class.

  1. Open the Extract Locations pane.
  2. Click the Name drop-down list and click the existing map layer that will be updated.

    Locations extracted from the documents and text will be added to the existing feature class referenced by the map layer. The controls used to specify the coordinate system of the output feature class will be disabled.

  3. Click the Files and Folders tab and specify the items to scan for locations.
  4. Click the Text tab and specify any text to scan for locations.
  5. Click Extract.

    The Field Matching panel appears in the Extract Locations pane.

  6. Specify which field in the existing layer's attribute table will store the information extracted from the documents and text. The full set of fields that can be populated in the output feature class are described below.
  7. If no fields in the existing feature class can store extracted information that you want to keep, click Back Back and select a different output layer or create a layer instead.
  8. When you are satisfied with the match between the existing layer's fields and the fields of information that are extracted from the documents and text, click OK.

You can cancel the process at any time, if necessary. A message appears at the bottom of the pane when the process is complete, indicating if it was successful.

If locations are found when the documents and text are scanned, those locations are added to the specified feature class. The existing map layer and its attribute table are updated to show the new locations.

Review the extracted locations

After documents and text have been scanned and the output feature class has been created, the output map layer is added to the map and selected in the Contents pane. Click a location that was found to learn more about it. The pop-up window shows the location that was extracted, which document it was extracted from, and information extracted from the document around the location that provides context. Open the layer's attribute table to compare the full range of locations that were found. As you assess the data, you may want to delete locations beyond your current scope, or export a subset of locations that represent your primary interest.

The Extract Locations pane uses various default settings designed to recognize the most common locations. When you have a better understanding of the locations present in your data, you can adjust those settings on the Properties tab to extract additional locations or more focused information in the output fields.

Learn about the settings used to extract locations and attributes

Output field definitions

When a new output feature class is created to store the extracted locations, the feature class will have the following default fields and any additional fields defined by a custom attributes file.

Learn about custom attributes files

Field nameField aliasData typeDescription

Name

Name

Text—50 characters, by default

The name of the file that was processed, or Text to indicate text was processed. The size is controlled by settings on the Output tab.

Pre_Text

Pre-Text

Text—254 characters, by default

An excerpt of the file or text preceding the location that was found. The size is controlled by settings on the Output tab.

Ext_Text

Extracted Text

Text—120 characters, by default

The location that was found, as it was found in the file or text. For example, 52.825°N, 169.944°W for a spatial coordinate, or LAX for a custom location that associates an airport code with a spatial coordinate. The size is controlled by settings on the Output tab.

Ext_Type

Extracted Type

Text—50 characters, by default

The type of location that was found, for example, a decimal degrees (DD) coordinate. When a custom location is found, the location defined in the custom location file that was matched is recorded. The size is controlled by settings on the Output tab.

Post_Text

Post-Text

Text—254 characters, by default

An excerpt of the file or text following the location that was found. The size is controlled by settings on the Output tab.

Precision

Precision (m)

Long

For spatial coordinates, the level of precision on the ground to which the location is accurate, in meters. For example, a decimal degrees coordinate with many decimal places will be more accurate and have a smaller distance.

For custom locations, the number of letters that did not match when comparing the original text to the matched location. When fuzzy match is disabled, an exact match is required and the value is 0. When enabled and the misspelled location Redalnds is matched to Redlands, the value is 2.

Std_Coord

Stand. Coord.

Text—30 characters

A standardized version of the extracted location. For example, 52.825000N 169.944000W. The format of this coordinate is controlled by settings on the Output tab.

First_Date

First Date

Date

The first date found in the file or text, if dates are extracted. Otherwise, the field contains null values. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached.

Early_Date

Earliest Date

Date

The oldest date that was found in the file or text, if dates are extracted. Otherwise, the field contains null values. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached.

Late_Date

Latest Date

Date

The most recent date found in the file or text, if dates are extracted. Otherwise, the field contains null values. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached.

All_Dates

All Dates

Text—254 characters, by default

A comma-delimited list of all dates found in the text, if dates are extracted. Otherwise, the field contains null values. All dates are standardized in yyyy-mm-dd format. Dates are only extracted if they fall within the range specified on the Output tab, the date is not set to be skipped, and the limit on the number of dates extracted has not been reached. If the comma-delimited list of dates is too large for this field's size, the list will be truncated. The size is controlled by settings on the Output tab.

ExDateText

Extracted Date Text

Text—254 characters, by default

The dates that were found, as they were found in the file or text. For example, August 18, 2019 or 2/3/2020. If the comma-delimited list of dates is too large for this field's size, the list will be truncated. The size is controlled by settings on the Output tab.

Filename

Filename

Text—254 characters, by default

The full path to the file that was processed, or a null value if text was processed. You can choose which files to process or skip. The size is controlled by settings on the Output tab.

File_Type

File Type

Text—10 characters, by default

The format of the file that was processed, or a null value if text was processed. You can choose to process specific file types. The size is controlled by settings on the Output tab.

Modified

Modified (UTC)

Text—20 characters

The date and time when the file was last modified, in yyyy-mm-dd hh:mm:ss format.

Scanned

Scanned (UTC)

Text—20 characters

The date and time when the file was processed, in yyyy-mm-dd hh:mm:ss format.

Evaluate results

The first time you scan a document, you may not get the locations you expect. Two log files can be created in addition to the output map layer and feature class: a scan log and an invalid coordinates log. If you provided a document as input where you know its content, and the number of locations created in the output feature class does not match the number you expected, the log files can help you assess the results.

After documents and text have been scanned and the output feature class has been created, a message appears at the bottom of the Extract Locations pane indicating the process has completed successfully. The message includes links to the log files, which are temporary. To keep them for future review, open the files and save them to a permanent location such as the project's home folder. For example, add the name of the map layer or feature class with which the log file is associated.

Scan log

Click the View scan log link within the message at the bottom of the Extract Locations pane to open the scan log file. For each document that is scanned, the log indicates the following information:

  • The document's file name and its location on the local or network computer
  • A message indicating a problem was encountered when scanning the document, if appropriate
  • How many potential locations were found
  • How many unique dates were found

A potential location is text found in the document's content that resembles a spatial coordinate or a custom location. When text is provided as input, a file name and location are not provided in the scan log but the rest of the information in the log file is the same.

If you expected nine locations to be extracted but only six locations were created as output, the scan log can shed some light on what happened. The log might indicate only six possible locations were found based on your current settings in the Extract Locations pane. The log might also indicate more dates were found than expected—a coordinate could be interpreted as a date. Try adjusting your settings before attempting to extract locations from the document again.

Invalid coordinates log

An invalid coordinates log is created if a potential location was evaluated and found to be invalid. Click View bad coordinates log to open it.

The invalid coordinates log indicates:

  • The document in which the potential location was found
  • The original text that was determined to be a potential location
  • The coordinate format that was used to evaluate the location

For example, if a latitude and longitude coordinate was found but the latitude of the coordinate is greater than 90 degrees, the coordinate is considered invalid. You may find the potential locations in the document were evaluated using a different coordinate format than you expected. Try adjusting your settings before attempting to extract locations from the document again.

If you do not find the invalid coordinates log helpful, you can choose to not record invalid coordinates by unchecking the Log invalid coordinates options on the Coordinates tab for the spatial coordinate formats you are using.

Related topics