Skip To Content

Positionen aus Dokument extrahieren

Mit der LocateXT-Lizenz verfügbar.

Zusammenfassung

Analysiert Dokumente mit nicht oder nur grob strukturiertem Text (z. B. E-Mail-Nachrichten, Reiseformulare usw.) und extrahiert Positionen in eine Point-Feature-Class.

Die Eingabe-Dokumente werden von dem Werkzeug wie folgt analysiert und verarbeitet:

  • Im Inhalt der Dokumente angegebene räumliche Koordinaten werden erkannt und als Punkte erstellt, die diese Positionen darstellen. Die folgenden Koordinatenformate werden erkannt: Dezimalgrad, Grad Dezimalminuten, Grad Minuten Sekunden, Universal Transverse Mercator und Military Grid Reference System.
  • Im Inhalt des Dokuments angegebene und in einer benutzerdefinierten Datei mit Ortsangaben definierte Ortsnamen werden erkannt und als Punkte erstellt, die diese Positionen darstellen. In einer benutzerdefinierten Positionsdatei wird ein Ortsname mit einer räumlichen Koordinate für diese Position verknüpft.
  • Relevante Informationen im Text werden erkannt, aus einem Dokument extrahiert und in Feldern in der Attributtabelle der Ausgabe-Feature-Class aufgezeichnet.

Dieses Werkzeug unterstützt alle Microsoft Office-Dokumente (Word, PowerPoint und Excel), Adobe PDF-Dokumente, Text mit Markups wie XML- und HTML-Dokumente und Nur-Text-Dateien wie Textdateien (.txt).

Verwendung

  • The parameter default values are designed to optimize the identification of coordinates and dates. Default values can be modified for each parameter. The fewer parameters that are modified, the faster the tool will run.

  • All coordinate formats are on by default. If you want to extract custom locations only and do not want to extract spatial coordinates, turn off the coordinate format parameters.

  • If an Adobe PDF document is provided as input and its content includes a spatial coordinate in a format that is turned on, and the output feature class does not contain a feature representing the spatial coordinate, your computer may not have a component that is required to process PDF documents.

    Learn more about scanning files

  • When you use a custom location file to extract place names, it is a best practice to specify fewer place names in the file. For example, if you convert a feature class representing all places in the world to a custom location file, the process can take a lot of time looking for places that are unlikely to be present or are in areas of the world in which you are not interested for your analysis.

    Learn more about custom location files

  • When the place names in which you are interested can be misspelled or have known variations, you will typically get better results by specifying common misspellings and alternate place names in the custom location file instead of using fuzzy matching. When fuzzy matching is turned on, you will get an output location if 70 percent of the characters in a place name have a match with the input content. This can produce more false positives than if you provide known alternates and misspellings.

    A useful workflow for fuzzy matching is to first run the tool with fuzzy matching turned off. Then, run the tool again with fuzzy matching turned on and check the results. This can help you identify spelling variations that can be added to a custom locations file.

    Learn more about fuzzy matching

Syntax

ExtractLocationsDocument(in_file, out_feature_class, {in_template}, {coord_dd_latlon}, {coord_dd_xydeg}, {coord_dd_xyplain}, {coord_dm_latlon}, {coord_dm_xymin}, {coord_dms_latlon}, {coord_dms_xysec}, {coord_dms_xysep}, {coord_utm}, {coord_ups_north}, {coord_ups_south}, {coord_mgrs}, {coord_mgrs_northpolar}, {coord_mgrs_southpolar}, {comma_decimal}, {coord_use_lonlat}, {in_coor_system}, {in_custom_locations}, {fuzzy_match}, {max_features_extracted}, {ignore_first_features}, {date_monthname}, {date_m_d_y}, {date_yyyymmdd}, {date_yymmdd}, {date_yyjjj}, {max_dates_extracted}, {ignore_first_dates}, {date_range_begin}, {date_range_end}, {in_custom_attributes}, {file_link}, {file_mod_datetime}, {pre_text_length}, {post_text_length}, {std_coord_fmt})
ParameterErklärungDatentyp
in_file

Die Eingabedatei, in der nach Positionen (Koordinaten oder benutzerdefinierte Positionen), Datumsangaben und benutzerdefinierten Attributen gesucht wird, oder ein Ordner, in dem in allen Dateien im Ordner nach Positionen gesucht wird.

File
out_feature_class

The feature class containing point features that represent the locations that were found.

Feature Class
in_template
(optional)

The template file (*.lxttmpl) that determines the setting to use for each tool parameter. When a template file is provided, all values specified for other parameters will be ignored except those that determine the input content that will be processed and the output feature class.

File
coord_dd_latlon
(optional)

Specifies whether to search for coordinates stored as decimal degrees formatted as latitude and longitude (infrequent false positives). 33.8N 77.035W and W77N38.88909 are examples.

  • FIND_DD_LATLONThe tool will search for decimal degrees coordinates formatted as latitude and longitude. This is the default.
  • DONT_FIND_DD_LATLONThe tool will not search for decimal degrees coordinates formatted as latitude and longitude.
Boolean
coord_dd_xydeg
(optional)

Specifies whether to search for coordinates stored as decimal degrees formatted as X Y with degree symbols (infrequent false positives). 38.8° -77.035° and -077d+38.88909d are examples.

  • FIND_DD_XYDEG The tool will search for decimal degrees coordinates formatted as X Y with degree symbols. This is the default.
  • DONT_FIND_DD_XYDEGThe tool will not search for decimal degrees coordinates formatted as X Y with degree symbols.
Boolean
coord_dd_xyplain
(optional)

Specifies whether to search for coordinates stored as decimal degrees formatted as X Y with no symbols (frequent false positives). 38.8 -77.035 and -077.0, +38.88909 are examples.

  • FIND_DD_XYPLAINThe tool will search for decimal degrees coordinates formatted as X Y with no symbols (frequent false positives). This is the default.
  • DONT_FIND_DD_XYPLAINThe tool will not search for decimal degrees coordinates formatted as X Y with no symbols.
Boolean
coord_dm_latlon
(optional)

Specifies whether to search for coordinates stored as degrees decimal minutes formatted as latitude and longitude (infrequent false positives). 3853.3N 7702.100W and W7702N3853.3458 are examples.

  • FIND_DM_LATLONThe tool will search for degrees decimal minutes coordinates formatted as latitude and longitude. This is the default.
  • DONT_FIND_DM_LATLONThe tool will not search for degrees decimal minutes coordinates formatted as latitude and longitude.
Boolean
coord_dm_xymin
(optional)

Specifies whether to search for coordinates stored as degrees decimal minutes formatted as X Y with minutes symbols (infrequent false positives). 3853' -7702.1' and -07702m+3853.3458m are examples.

  • FIND_DM_XYMINThe tool will search for degrees decimal minutes coordinates formatted as X Y with minutes symbols. This is the default.
  • DONT_FIND_DM_XYMINThe tool will not search for degrees decimal minutes coordinates formatted as X Y with minutes symbols.
Boolean
coord_dms_latlon
(optional)

Specifies whether to search for coordinates stored as degrees minutes seconds formatted as latitude and longitude (infrequent false positives). 385320.7N 770206.000W and W770206N385320.76 are examples.

  • FIND_DMS_LATLONThe tool will search for degrees minutes seconds coordinates formatted as latitude and longitude. This is the default.
  • DONT_FIND_DMS_LATLONThe tool will not search for degrees minutes seconds coordinates formatted as latitude and longitude.
Boolean
coord_dms_xysec
(optional)

Specifies whether to search for coordinates stored as degrees minutes seconds formatted as X Y with seconds symbols (infrequent false positives). 385320" -770206.0" and -0770206.0s+385320.76s are examples.

  • FIND_DMS_XYSECThe tool will search for degrees minutes seconds coordinates formatted as X Y with seconds symbols. This is the default.
  • DONT_FIND_DMS_XYSECThe tool will not search for degrees minutes seconds coordinates formatted as X Y with seconds symbols.
Boolean
coord_dms_xysep
(optional)

Specifies whether to search for coordinates stored as degrees minutes seconds formatted as X Y with separators (moderate false positives). 8:53:20 -77:2:6.0 and -077/02/06/+38/53/20.76 are examples.

  • FIND_DMS_XYSEPThe tool will search for degrees minutes seconds coordinates formatted as X Y with separators. This is the default.
  • DONT_FIND_DMS_XYSEPThe tool will not search for degrees minutes seconds coordinates formatted as X Y with separators.
Boolean
coord_utm
(optional)

Specifies whether to search for Universal Transverse Mercator coordinates (infrequent false positives). 18S 323503 4306438 and 18 north 323503.25 4306438.39 are examples.

  • FIND_UTM_MAINWORLDThe tool will search for Universal Transverse Mercator coordinates. This is the default.
  • DONT_FIND_UTM_MAINWORLDThe tool will not search for Universal Transverse Mercator coordinates.
Boolean
coord_ups_north
(optional)

Specifies whether to search for Universal Polar Stereographic coordinates in the north polar area (infrequent false positives). Y 2722399 2000000 and north 2722399 2000000 are examples.

  • FIND_UTM_NORTHPOLARThe tool will search for Universal Polar Stereographic coordinates in the north polar area. This is the default.
  • DONT_FIND_UTM_NORTHPOLARThe tool will not search for Universal Polar Stereographic coordinates in the north polar area.
Boolean
coord_ups_south
(optional)

Specifies whether to search for Universal Polar Stereographic coordinates in the south polar area (infrequent false positives). A 2000000 3168892 and south 2000000 3168892 are examples.

  • FIND_UTM_SOUTHPOLARThe tool will search for Universal Polar Stereographic coordinates in the south polar area. This is the default.
  • DONT_FIND_UTM_SOUTHPOLARThe tool will not search for universal polar stereographic coordinates in the south polar area.
Boolean
coord_mgrs
(optional)

Specifies whether to search for Military Grid Reference System coordinates (infrequent false positives). 18S UJ 13503 06438 and 18SUJ0306 are examples.

  • FIND_MGRS_MAINWORLDThe tool will search for Military Grid Reference System coordinates. This is the default.
  • DONT_FIND_MGRS_MAINWORLDThe tool will not search for Military Grid Reference System coordinates.
Boolean
coord_mgrs_northpolar
(optional)

Specifies whether to search for Military Grid Reference System coordinates in the north polar area (infrequent false positives). Y TG 56814 69009 and YTG5669 are examples.

  • FIND_MGRS_NORTHPOLARThe tool will search for Military Grid Reference System coordinates in the north polar area. This is the default.
  • DONT_FIND_MGRS_NORTHPOLARThe tool will not search for Military Grid Reference System coordinates in the north polar area.
Boolean
coord_mgrs_southpolar
(optional)

Specifies whether to search for Military Grid Reference System coordinates in the south polar area (moderate false positives). A TN 56814 30991 and ATN5630 are examples.

  • FIND_MGRS_SOUTHPOLARThe tool will search for Military Grid Reference System coordinates in the south polar area. This is the default.
  • DONT_FIND_MGRS_SOUTHPOLARThe tool will not search for Military Grid Reference System coordinates in the south polar area.
Boolean
comma_decimal
(optional)

Specifies whether a comma (,) will be recognized as a decimal separator. By default, content is scanned for spatial coordinates defined by numbers that use a period (.) or a mid-dot (·) as the decimal separator. If you are working with content in which spatial coordinates are defined by numbers that use a comma (,) as the decimal separator, set this parameter to use a comma as the decimal separator instead. This parameter is not set automatically based on the regional setting for your computer's operating system.

  • USE_COMMA_DECIMAL_MARKA comma will be recognized as the decimal point.
  • USE_DOT_DECIMAL_MARKA period or a mid-dot will be recognized as the decimal point. This is the default.
Boolean
coord_use_lonlat
(optional)

When numbers resemble x,y coordinates, both numbers are less than 90, and there are no symbols or notations to indicate which number represents the latitude or longitude, results can be ambiguous. Interpret the numbers as a longitude, latitude coordinate (x,y) instead of a latitude, longitude coordinate (y,x).

  • PREFER_LONLATx,y coordinates will be interpreted as longitude-latitude.
  • PREFER_LATLONx,y coordinates will be interpreted as latitude-longitude. This is the default.
Boolean
in_coor_system
(optional)

The coordinate system that will be used to interpret the spatial coordinates defined in the input. GCS-WGS-84 is the default.

Spatial Reference
in_custom_locations
(optional)

The custom location file (.lxtgaz) that will be used when scanning the input content. A point is created to represent each occurrence of each place name in the custom location file up to the limits established by other tool parameters.

File
fuzzy_match
(optional)

Specifies whether fuzzy matching will be used for searching the custom location file.

  • USE_FUZZYFuzzy matching will be used when searching the custom location file.
  • DONT_USE_FUZZYExact matching will be used when searching the custom location file. This is the default.
Boolean
max_features_extracted
(optional)

The maximum number of features that can be extracted. The tool will stop scanning the input content for locations when the maximum number is reached. When running as a geoprocessing service, the service and the server may have separate limits on the number of features allowed.

Long
ignore_first_features
(optional)

The number of features detected and ignored before extracting all other features. This parameter can be used to focus the search on a specific portion of the data.

Long
date_monthname
(optional)

Specifies whether to search for dates where the month name appears (infrequent false positives). 12 May 2003 and January 15, 1997 are examples.

  • FIND_DATE_MONTHNAMEThe tool will search for dates where the month name appears. This is the default.
  • DONT_FIND_DATE_MONTHNAMEThe tool will not search for dates where the month name appears.
Boolean
date_m_d_y
(optional)

Specifies whether to search for dates where numbers are in the M/D/Y format (moderate false positives). 5/12/03 and 1-15-1997 are examples.

  • FIND_DATE_M_D_YThe tool will search for dates where numbers are in the M/D/Y format (moderate false positives). This is the default.
  • DONT_FIND_DATE_M_D_YThe tool will not search for dates where numbers are in the M/D/Y format.
Boolean
date_yyyymmdd
(optional)

Specifies whether to search for dates where numbers are in the YYYYMMDD format (moderate false positives). 20030512 and 19970115 are examples.

  • FIND_DATE_YYYYMMDDThe tool will search for dates where numbers are in the YYYYMMDD format (moderate false positives). This is the default.
  • DONT_FIND_DATE_YYYYMMDDThe tool will not search for dates where numbers are in the YYYYMMDD format.
Boolean
date_yymmdd
(optional)

Specifies whether to search for dates where numbers are in the YYMMDD format (frequent false positives). 030512 and 970115 are examples.

  • FIND_DATE_YYMMDDThe tool will search for dates where numbers are in the YYMMDD format (frequent false positives). This is the default.
  • DONT_FIND_DATE_YYMMDDThe tool will not search for dates where numbers are in the YYMMDD format.
Boolean
date_yyjjj
(optional)

Specifies whether to search for dates where numbers are in the YYJJJ format (frequent false positives). 03132 and 97015 are examples.

  • FIND_DATE_YYJJJThe tool will search for dates where numbers are in the YYJJJ format (frequent false positives). This is the default.
  • DONT_FIND_DATE_YYJJJThe tool will not search for dates where numbers are in the YYJJJ format.
Boolean
max_dates_extracted
(optional)

The maximum number of dates that will be extracted.

Long
ignore_first_dates
(optional)

The number of dates that will be detected and ignored before extracting all other dates.

Long
date_range_begin
(optional)

The earliest acceptable date to extract. Detected dates matching this value or later will be extracted.

Date
date_range_end
(optional)

The latest acceptable date to extract. Detected dates matching this value or earlier will be extracted.

Date
in_custom_attributes
(optional)

The custom attribute file (.lxtca) that will be used to scan the input content. Fields will be created in the output feature class's attribute table for all custom attributes defined in the file. When the input content is scanned, it will be examined to see if it contains text associated with all custom attributes specified in the file. When a match is found, the appropriate text is extracted from the input content and stored in the appropriate field.

File
file_link
(optional)

The file path that will be used as the file name in the output data when the Input File (in_file in Python) is transferred to the server. If this parameter is not specified, the path of the Input File will be used, which may be an unreachable folder on a server. This parameter has no effect when the Input File is not specified.

String
file_mod_datetime
(optional)

The UTC date and time that the file was modified will be used as the modified attribute in the output data when the Input File (in_file in Python) is transferred to the server. If this parameter is not specified, the current modified time of the input file will be used. This parameter has no effect when the Input File is not specified.

Date
pre_text_length
(optional)

Content is extracted from the input document to provide context for the location that was found. This parameter defines the maximum number of characters that will be extracted preceding the text that defines the location. The extracted text is stored in the Pre-Text field in the output feature class's attribute table. The default is 254. The Pre-Text field's data type will also have this length. The length of a text field in a shapefile is limited to 254; when the output is a shapefile, a larger number will be truncated to 254.

Long
post_text_length
(optional)

Content is extracted from the input document to provide context for the location that was found. This parameter defines the maximum number of characters that will be extracted following the text that defines the location. The extracted text is stored in the Post-Text field in the output feature class's attribute table. The default is 254. The Post-Text field's data type will also have this length. The length of a text field in a shapefile is limited to 254; when the output is a shapefile, a larger number will be truncated to 254.

Long
std_coord_fmt
(optional)

Specifies the coordinate format that will be used to store the coordinate location. A standard representation of the spatial coordinate that defines the point feature is recorded in a field in the attribute table.

  • STD_COORD_FMT_DDThe coordinate location is recorded in decimal degrees format. This is the default.
  • STD_COORD_FMT_DMThe coordinate location is recorded in degrees decimal minutes format.
  • STD_COORD_FMT_DMSThe coordinate location is recorded in degrees minutes seconds format.
  • STD_COORD_FMT_UTMThe coordinate location is recorded in Universal Transverse Mercator format.
  • STD_COORD_FMT_MGRSThe coordinate location is recorded in Military Grid Reference System format.
String

Codebeispiel

ExtractLocationsFromDocument – Beispiel (Python-Fenster)

Das folgende Skript für das Python-Fenster veranschaulicht, wie die Funktion ExtractLocationsFromDocument im unmittelbaren Modus verwendet wird.

import arcpy
arcpy.env.workspace = "c:/data"
arcpy.ExtractLocationsFromDocument_conversion("wells.docx", "water.gdb/wells")

Lizenzinformationen

  • Basic: Erfordert LocateXT
  • Standard: Erfordert LocateXT
  • Advanced: Erfordert LocateXT

Verwandte Themen