Skip To Content

Extract Locations From Document

Available with LocateXT license.

Summary

Searches documents and files for spatial information to extract into a feature class.

This tool allows you to do the following:

  • Identify and extract variations of the Decimal Degrees, Degrees Decimal Minutes, Degrees Minutes Seconds, Universal Transverse Mercator, and Military Grid Reference System coordinate formats.
  • Configure place name extraction using geospatial layers or gazetteer files.
  • Extract textual information from a document or text before or after identifying locations (pretext and posttext) of each output.
  • Create custom extracted attributes by configuring keyword search and extraction controls.

This tool supports all Microsoft Office documents (Word, PowerPoint, and Excel); Adobe PDF, XML, and HTML formats; and social media text.

Usage

  • The parameter default values are designed to optimize the identification of coordinates and dates. Default values can be modified for each parameter. The fewer parameters that are modified, the faster the tool will run.

  • If the tool is being rerun with the same output feature class name, it will be overwritten. To change this setting, click the Project tab and click Options. On the Options dialog box, click the Geoprocessing tab and uncheck the Allow geoprocessing tools to overwrite existing datasets check box. When this option is unchecked, a number will be appended to the end of the name to make the feature class name unique (for example, streams_1).

  • All coordinate formats are on by default. If you'll be using custom locations only, turn off the coordinates format parameters.

  • If a Microsoft Office file or a PDF is used as the input and the output feature class does not contain features, you may need to ensure that you have the correct IFilters for these programs. If the tool does not recognize the file, it will be treated as a text file.

  • It is best practice to use the smallest custom locations possible. Custom location files can be edited for certain workflow optimization. For more information about these file types, see Extract locations.

  • Minimize the use of fuzzy match (approximate string matching), and add multiple name and spelling variations to a custom locations file if possible. While fuzzy match will find misspellings and variations, it can create false positives.

  • A useful workflow for fuzzy match is to first run the tool with no fuzzy, and run the tool again with fuzzy to find additional place names. These fields can then be compared to determine the best fuzzy results. This will help identify spelling variations that can be added to a custom locations file.

  • If you're using a custom location list, the column selected for locations can be a column of place names or terms instead of coordinates.

Syntax

ExtractLocationsDocument_conversion (in_file, out_feature_class, {in_template}, {coord_dd_latlon}, {coord_dd_xydeg}, {coord_dd_xyplain}, {coord_dm_latlon}, {coord_dm_xymin}, {coord_dms_latlon}, {coord_dms_xysec}, {coord_dms_xysep}, {coord_utm}, {coord_ups_north}, {coord_ups_south}, {coord_mgrs}, {coord_mgrs_northpolar}, {coord_mgrs_southpolar}, {comma_decimal}, {coord_use_lonlat}, {in_coor_system}, {in_custom_locations}, {fuzzy_match}, {max_features_extracted}, {ignore_first_features}, {date_monthname}, {date_m_d_y}, {date_yyyymmdd}, {date_yymmdd}, {date_yyjjj}, {max_dates_extracted}, {ignore_first_dates}, {date_range_begin}, {date_range_end}, {in_custom_attributes}, {file_link}, {file_mod_datetime}, {pre_text_length}, {post_text_length}, {std_coord_fmt})
ParameterExplanationData Type
in_file

The input file that will be searched for locations (coordinates or custom locations), dates, and custom attributes.

File
out_feature_class

The feature class containing the results of the tool's search, with one feature per location found. If the tool will be rerun with the same output feature class name, it will be overwritten.

Feature Class
in_template
(Optional)

The template file (*.lxttmpl) that will be used for setting option defaults. All settings will match the template file, and all other parameter settings will be ignored. This parameter is non-operational for the release of ArcGIS Pro 2.3.

File
coord_dd_latlon
(Optional)

Specifies whether to search for coordinates stored as Decimal Degrees formatted as latitude and longitude (infrequent false positives). 33.8N 77.035W and W77N38.88909 are examples.

  • FIND_DD_LATLONThe tool will search for Decimal Degrees coordinates formatted as latitude and longitude. This is the default.
  • DONT_FIND_DD_LATLONThe tool will not search for Decimal Degrees coordinates formatted as latitude and longitude.
Boolean
coord_dd_xydeg
(Optional)

Specifies whether to search for coordinates stored as Decimal Degrees formatted as X Y with degree symbols (infrequent false positives). 38.8° -77.035° and -077d+38.88909d are examples.

  • FIND_DD_XYDEG The tool will search for Decimal Degrees coordinates formatted as X Y with degree symbols. This is the default.
  • DONT_FIND_DD_XYDEGThe tool will not search for Decimal Degrees coordinates formatted as X Y with degree symbols.
Boolean
coord_dd_xyplain
(Optional)

Specifies whether to search for coordinates stored as Decimal Degrees formatted as X Y with no symbols (frequent false positives). 38.8 -77.035 and -077.0, +38.88909 are examples.

  • FIND_DD_XYPLAINThe tool will search for Decimal Degrees coordinates formatted as X Y with no symbols (frequent false positives). This is the default.
  • DONT_FIND_DD_XYPLAINThe tool will not search for Decimal Degrees coordinates formatted as X Y with no symbols.
Boolean
coord_dm_latlon
(Optional)

Specifies whether to search for coordinates stored as Degrees Decimal Minutes formatted as latitude and longitude (infrequent false positives). 3853.3N 7702.100W and W7702N3853.3458 are examples.

  • FIND_DM_LATLONThe tool will search for Degrees Decimal Minutes coordinates formatted as latitude and longitude. This is the default.
  • DONT_FIND_DM_LATLONThe tool will not search for Degrees Decimal Minutes coordinates formatted as latitude and longitude.
Boolean
coord_dm_xymin
(Optional)

Specifies whether to search for coordinates stored as Degrees Decimal Minutes formatted as X Y with minutes symbols (infrequent false positives). 3853' -7702.1' and -07702m+3853.3458m are examples.

  • FIND_DM_XYMINThe tool will search for Degrees Decimal Minutes coordinates formatted as X Y with minutes symbols. This is the default.
  • DONT_FIND_DM_XYMINThe tool will not search for Degrees Decimal Minutes coordinates formatted as X Y with minutes symbols.
Boolean
coord_dms_latlon
(Optional)

Specifies whether to search for coordinates stored as Degrees Minutes Seconds formatted as latitude and longitude (infrequent false positives). 385320.7N 770206.000W and W770206N385320.76 are examples.

  • FIND_DMS_LATLONThe tool will search for Degrees Minutes Seconds coordinates formatted as latitude and longitude. This is the default.
  • DONT_FIND_DMS_LATLONThe tool will not search for Degrees Minutes Seconds coordinates formatted as latitude and longitude.
Boolean
coord_dms_xysec
(Optional)

Specifies whether to search for coordinates stored as Degrees Minutes Seconds formatted as X Y with seconds symbols (infrequent false positives). 385320" -770206.0" and -0770206.0s+385320.76s are examples.

  • FIND_DMS_XYSECThe tool will search for Degrees Minutes Seconds coordinates formatted as X Y with seconds symbols. This is the default.
  • DONT_FIND_DMS_XYSECThe tool will not search for Degrees Minutes Seconds coordinates formatted as X Y with seconds symbols.
Boolean
coord_dms_xysep
(Optional)

Specifies whether to search for coordinates stored as Degrees Minutes Seconds formatted as X Y with separators (moderate false positives). 8:53:20 -77:2:6.0 and -077/02/06/+38/53/20.76 are examples.

  • FIND_DMS_XYSEPThe tool will search for Degrees Minutes Seconds coordinates formatted as X Y with separators. This is the default.
  • DONT_FIND_DMS_XYSEPThe tool will not search for Degrees Minutes Seconds coordinates formatted as X Y with separators.
Boolean
coord_utm
(Optional)

Specifies whether to search for Universal Transverse Mercator coordinates (infrequent false positives). 18S 323503 4306438 and 18 north 323503.25 4306438.39 are examples.

  • FIND_UTM_MAINWORLDThe tool will search for Universal Transverse Mercator coordinates. This is the default.
  • DONT_FIND_UTM_MAINWORLDThe tool will not search for Universal Transverse Mercator coordinates.
Boolean
coord_ups_north
(Optional)

Specifies whether to search for Universal Polar Stereographic coordinates in the north polar area (infrequent false positives). Y 2722399 2000000 and north 2722399 2000000 are examples.

  • FIND_UTM_NORTHPOLARThe tool will search for Universal Polar Stereographic coordinates in the north polar area. This is the default.
  • DONT_FIND_UTM_NORTHPOLARThe tool will not search for Universal Polar Stereographic coordinates in the north polar area.
Boolean
coord_ups_south
(Optional)

Specifies whether to search for Universal Polar Stereographic coordinates in the south polar area (infrequent false positives). A 2000000 3168892 and south 2000000 3168892 are examples.

  • FIND_UTM_SOUTHPOLARThe tool will search for Universal Polar Stereographic coordinates in the south polar area. This is the default.
  • DONT_FIND_UTM_SOUTHPOLARThe tool will not search for universal polar stereographic coordinates in the south polar area.
Boolean
coord_mgrs
(Optional)

Specifies whether to search for Military Grid Reference System coordinates (infrequent false positives). 18S UJ 13503 06438 and 18SUJ0306 are examples.

  • FIND_MGRS_MAINWORLDThe tool will search for Military Grid Reference System coordinates. This is the default.
  • DONT_FIND_MGRS_MAINWORLDThe tool will not search for Military Grid Reference System coordinates.
Boolean
coord_mgrs_northpolar
(Optional)

Specifies whether to search for Military Grid Reference System coordinates in the north polar area (infrequent false positives). Y TG 56814 69009 and YTG5669 are examples.

  • FIND_MGRS_NORTHPOLARThe tool will search for Military Grid Reference System coordinates in the north polar area. This is the default.
  • DONT_FIND_MGRS_NORTHPOLARThe tool will not search for Military Grid Reference System coordinates in the north polar area.
Boolean
coord_mgrs_southpolar
(Optional)

Specifies whether to search for Military Grid Reference System coordinates in the south polar area (moderate false positives). A TN 56814 30991 and ATN5630 are examples.

  • FIND_MGRS_SOUTHPOLARThe tool will search for Military Grid Reference System coordinates in the south polar area. This is the default.
  • DONT_FIND_MGRS_SOUTHPOLARThe tool will not search for Military Grid Reference System coordinates in the south polar area.
Boolean
comma_decimal
(Optional)

Specifies whether to use locations with a period or comma as the decimal point.

  • USE_COMMA_DECIMAL_MARKA comma will be used as the decimal point.
  • USE_DOT_DECIMAL_MARKA period will be used as the decimal point. This is the default.
Boolean
coord_use_lonlat
(Optional)

When numbers resemble x,y coordinates and both numbers are less than 180 and there are no symbols or notations to indicate which number represents the longitude, results can be ambiguous. Interpret the numbers as a longitude, latitude coordinate (i.e. x,y) instead of latitude, longitude (i.e. y,x).

  • PREFER_LONLATX Y coordinates will be interpreted as longitude-latitude.
  • PREFER_LATLONX Y coordinates will be interpreted as latitude-longitude. This is the default.
Boolean
in_coor_system
(Optional)

The coordinate system that the input will use when interpreting the extracted coordinates. GCS-WGS-84 is the default.

Spatial Reference
in_custom_locations
(Optional)

The custom locations file (.lxtgaz) that will be used when searching the input file. Every place-name and its individual settings will be used during the search. Each match will result in a feature in the output layer.

File
fuzzy_match
(Optional)

Specifies whether fuzzy match will be used for searching the custom locations file.

  • USE_FUZZYFuzzy match will be used when searching the custom locations file.
  • DONT_USE_FUZZYExact matching will be used when searching the custom locations file. This is the default.
Boolean
max_features_extracted
(Optional)

The maximum number of features that can be extracted. Searching will stop when the maximum number is reached. This parameter can be used to limit the search of your data. When running as a geoprocessing service, the service and server may have separate limits on the number of features allowed.

Long
ignore_first_features
(Optional)

The number of features detected and ignored before extracting all other features. This parameter can be used to focus the search on a specific portion of your data.

Long
date_monthname
(Optional)

Specifies whether to search for dates where the month name appears (infrequent false positives). 12 May 2003 and January 15, 1997 are examples.

  • FIND_DATE_MONTHNAMEThe tool will search for dates where the month name appears. This is the default.
  • DONT_FIND_DATE_MONTHNAMEThe tool will not search for dates where the month name appears.
Boolean
date_m_d_y
(Optional)

Specifies whether to search for dates where numbers are in the M/D/Y format (moderate false positives). 5/12/03 and 1-15-1997 are examples.

  • FIND_DATE_M_D_YThe tool will search for dates where numbers are in the M/D/Y format (moderate false positives). This is the default.
  • DONT_FIND_DATE_M_D_YThe tool will not search for dates where numbers are in the M/D/Y format.
Boolean
date_yyyymmdd
(Optional)

Specifies whether to search for dates where numbers are in the YYYYMMDD format (moderate false positives). 20030512 and 19970115 are examples.

  • FIND_DATE_YYYYMMDDThe tool will search for dates where numbers are in the YYYYMMDD format (moderate false positives). This is the default.
  • DONT_FIND_DATE_YYYYMMDDThe tool will not search for dates where numbers are in the YYYYMMDD format.
Boolean
date_yymmdd
(Optional)

Specifies whether to search for dates where numbers are in the YYMMDD format (frequent false positives). 030512 and 970115 are examples.

  • FIND_DATE_YYMMDDThe tool will search for dates where numbers are in the YYMMDD format (frequent false positives). This is the default.
  • DONT_FIND_DATE_YYMMDDThe tool will not search for dates where numbers are in the YYMMDD format.
Boolean
date_yyjjj
(Optional)

Specifies whether to search for dates where numbers are in the YYJJJ format (frequent false positives). 03132 and 97015 are examples.

  • FIND_DATE_YYJJJThe tool will search for dates where numbers are in the YYJJJ format (frequent false positives). This is the default.
  • DONT_FIND_DATE_YYJJJThe tool will not search for dates where numbers are in the YYJJJ format.
Boolean
max_dates_extracted
(Optional)

The maximum number of dates that will be extracted.

Long
ignore_first_dates
(Optional)

The number of dates that will be detected and ignored before extracting all other dates.

Long
date_range_begin
(Optional)

The earliest acceptable date to extract. Detected dates matching this value or later will be extracted.

Date
date_range_end
(Optional)

The latest acceptable date to extract. Detected dates matching this value or earlier will be extracted.

Date
in_custom_attributes
(Optional)

The custom attributes file (.lxtca) that will be used to search the input file. Every attribute in the file will be used during the search, but will not necessarily result in an output, depending on the input file. Each attribute in the file will result in an additional field in the output layer.

File
file_link
(Optional)

The file path that will be used as the file name in the output data when the Input File (in_file for Python) is transferred to the server. If this parameter is not specified, the path of the Input File will be used, which may be an unreachable folder on a server. This parameter has no effect when the Input File is not specified.

String
file_mod_datetime
(Optional)

The UTC date and time that the file was modified will be used as the modified attribute in the output data when the Input File (in_file for Python) is transferred to the server. If this parameter is not specified, the current modified time of the input file will be used. This parameter has no effect when the Input File is not specified.

Date
pre_text_length
(Optional)

The maximum number of characters that will be captured before the location. The default is 254. The Pre-Text attribute will also have this length (shapefiles are limited to 254).

Long
post_text_length
(Optional)

The maximum number of characters that will be captured after the location. The default is 254. The Post-Text attribute will also have this length (shapefiles are limited to 254).

Long
std_coord_fmt
(Optional)

Coordinate location is are recorded in a field in the attribute table. This specifies the format in which the coordinate is recorded.

  • STD_COORD_FMT_DDCoordinate location is recorded in Decimal Degrees format. This is the default.
  • STD_COORD_FMT_DMCoordinate location is recorded in Degrees Decimal Minutes format.
  • STD_COORD_FMT_DMSCoordinate location is recorded in Degrees Minutes Seconds format.
  • STD_COORD_FMT_UTMCoordinate location is recorded in Universal Transverse Mercator format.
  • STD_COORD_FMT_MGRSCoordinate location is recorded in Military Grid Reference System format.
String

Code sample

ExtractLocationsFromDocument example (Python window)

The following Python window script demonstrates how to use the ExtractLocationsFromDocument function in immediate mode.

import arcpy
arcpy.env.workspace = "c:/data"
arcpy.ExtractLocationsFromDocument_conversion("wells.docx", "water.gdb/wells")

Licensing information

  • Basic: Requires LocateXT
  • Standard: Requires LocateXT
  • Advanced: Requires LocateXT

Related topics