Étiquette | Explication | Type de données |
Fichier en entrée | Fichier en entrée numérisé pour les emplacements (coordonnées ou emplacements personnalisés), dates et attributs personnalisés ; ou un dossier dans lequel tous les fichiers vont être scannés pour emplacements. | File |
Classe d'entités en sortie | The feature class containing point features that represent the locations that were found. | Feature Class |
Modèle en entrée (Facultatif) | The template file (*.lxttmpl) that determines the setting to use for each tool parameter. When a template file is provided, all values specified for other parameters will be ignored except those that determine the input content that will be processed and the output feature class. Some settings that are available in the Extract Locations pane are only available to this tool when the settings are saved to a template file, and the template file is referenced in this parameter. These settings are as follows:
| File |
Latitude et longitude (Facultatif) | Specifies whether to search for coordinates stored as decimal degrees formatted as latitude and longitude (infrequent false positives). Examples are: 33.8N 77.035W and W77N38.88909.
| Boolean |
X Y avec symboles de degrés (Facultatif) | Specifies whether to search for coordinates stored as decimal degrees formatted as X Y with degree symbols (infrequent false positives). Examples are: 38.8° -77.035° and -077d+38.88909d.
| Boolean |
X Y sans aucun symbole (Facultatif) | Specifies whether to search for coordinates stored as decimal degrees formatted as X Y with no symbols (frequent false positives). Examples are: 38.8 -77.035 and -077.0, +38.88909.
| Boolean |
Latitude et longitude (Facultatif) | Specifies whether to search for coordinates stored as degrees decimal minutes formatted as latitude and longitude (infrequent false positives). Examples are: 3853.3N 7702.100W and W7702N3853.3458.
| Boolean |
X Y avec symboles de minutes (Facultatif) | Specifies whether to search for coordinates stored as degrees decimal minutes formatted as X Y with minutes symbols (infrequent false positives). Examples are: 3853' -7702.1' and -07702m+3853.3458m.
| Boolean |
Latitude et longitude (Facultatif) | Specifies whether to search for coordinates stored as degrees minutes seconds formatted as latitude and longitude (infrequent false positives). Examples are: 385320.7N 770206.000W and W770206N385320.76.
| Boolean |
X Y avec symboles de secondes (Facultatif) | Specifies whether to search for coordinates stored as degrees minutes seconds formatted as X Y with seconds symbols (infrequent false positives). Examples are: 385320" -770206.0" and -0770206.0s+385320.76s.
| Boolean |
X Y avec séparateurs (Facultatif) | Specifies whether to search for coordinates stored as degrees minutes seconds formatted as X Y with separators (moderate false positives). Examples are: 38:53:20 -77:2:6.0 and -077/02/06/+38/53/20.76.
| Boolean |
Universal Transverse Mercator (Facultatif) | Specifies whether to search for Universal Transverse Mercator (UTM) coordinates (infrequent false positives). Examples are: 18S 323503 4306438 and 18 north 323503.25 4306438.39.
| Boolean |
UPS Polaire Nord (Facultatif) | Specifies whether to search for Universal Polar Stereographic (UPS) coordinates in the north polar area (infrequent false positives). Examples are: Y 2722399 2000000 and north 2722399 2000000.
| Boolean |
UPS Polaire Sud (Facultatif) | Specifies whether to search for Universal Polar Stereographic (UPS) coordinates in the south polar area (infrequent false positives). Examples are: A 2000000 3168892 and south 2000000 3168892.
| Boolean |
Military Grid Reference System (Facultatif) | Specifies whether to search for Military Grid Reference System (MGRS) coordinates (infrequent false positives). Examples are: 18S UJ 13503 06438 and 18SUJ0306.
| Boolean |
Polaire Nord (Facultatif) | Specifies whether to search for Military Grid Reference System (MGRS) coordinates in the north polar area (infrequent false positives). Examples are: Y TG 56814 69009 and YTG5669.
| Boolean |
Polaire Sud (Facultatif) | Specifies whether to search for Military Grid Reference System (MGRS) coordinates in the south polar area (moderate false positives). Examples are: A TN 56814 30991 and ATN5630.
| Boolean |
Utiliser la virgule comme séparateur décimal (Facultatif) | Specifies whether a comma (,) will be recognized as a decimal separator. By default, content is scanned for spatial coordinates defined by numbers that use a period (.) or a middle dot (·) as the decimal separator, for example: Lat 01° 10·80’ N Long 103° 28·60’ E. If you are working with content in which spatial coordinates are defined by numbers that use a comma (,) as the decimal separator, for example: 52° 8′ 32,14″ N; 5° 24′ 56,09″ E, set this parameter to recognize a comma as the decimal separator instead. This parameter is not set automatically based on the regional setting for your computer's operating system.
| Boolean |
Interpréter comme longitude, latitude (Facultatif) | Specifies whether x,y coordinates will be interpreted as longitude-latitude. When numbers resemble x,y coordinates, both numbers are less than 90, and there are no symbols or notations to indicate which number represents the latitude or longitude, results can be ambiguous. Interpret the numbers as a longitude-latitude coordinate (x,y) instead of a latitude-longitude coordinate (y,x).
| Boolean |
Système de coordonnées en entrée (Facultatif) | The coordinate system that will be used to interpret the spatial coordinates defined in the input. GCS-WGS-84 is the default. | Spatial Reference |
Emplacements personnalisés en entrée (Facultatif) | The custom location file (.lxtgaz) that will be used when scanning the input content. A point is created to represent each occurrence of each place name in the custom location file up to the limits established by other tool parameters. | File |
Utiliser l’appariement flou (Facultatif) | Specifies whether fuzzy matching will be used when comparing the input content to the place names specified in the custom location file.
| Boolean |
Nombre maximum d’entités extraites
(Facultatif) | The maximum number of features that can be extracted. The tool will stop scanning the input content for locations when the maximum number is reached. When running as a geoprocessing service, the service and the server may have separate limits on the number of features allowed. | Long |
Ignorer ce premier nombre d’entités (Facultatif) | The number of features detected and ignored before extracting all other features. This parameter can be used to focus the search on a specific portion of the data. | Long |
Nom du mois utilisé (Facultatif) | Specifies whether to search for dates in which the month name appears (infrequent false positives). 12 May 2003 and January 15, 1997 are examples.
| Boolean |
M/J/A et J/M/A (Facultatif) | Specifies whether to search for dates in which numbers are in the M/D/Y or D/M/Y format (moderate false positives). 5/12/03 and 1-15-1997 are examples.
| Boolean |
AAAAMMJJ (Facultatif) | Specifies whether to search for dates in which numbers are in the YYYYMMDD format (moderate false positives). 20030512 and 19970115 are examples.
| Boolean |
AAMMJJ (Facultatif) | Specifies whether to search for dates in which numbers are in the YYMMDD format (frequent false positives). 030512 and 970115 are examples.
| Boolean |
AAJJJ (Facultatif) | Specifies whether to search for dates in which numbers are in the YYJJJ or YYYYJJJ format (frequent false positives). 03132 and 97015 are examples.
| Boolean |
Nombre maximum de dates extraites
(Facultatif) | The maximum number of dates that will be extracted. | Long |
Ignorer ce premier nombre de dates
(Facultatif) | The number of dates that will be detected and ignored before extracting all other dates. | Long |
Date la plus ancienne de la plage de dates acceptables
(Facultatif) | The earliest acceptable date to extract. Detected dates matching this value or later will be extracted. | Date |
Date la plus récente de la plage de dates acceptables
(Facultatif) | The latest acceptable date to extract. Detected dates matching this value or earlier will be extracted. | Date |
Attributs personnalisés en entrée
(Facultatif) | The custom attribute file (.lxtca) that will be used to scan the input content. Fields will be created in the output feature class's attribute table for all custom attributes defined in the file. When the input content is scanned, it will be examined to see if it contains text associated with all custom attributes specified in the file. When a match is found, the appropriate text is extracted from the input content and stored in the appropriate field. | File |
Texte du lien du fichier en entrée (Facultatif) | The file path that will be used as the file name in the output data when the Input File parameter (in_file in Python) is transferred to the server. If this parameter is not specified, the path of the Input File will be used, which may be an unreachable folder on a server. This parameter has no effect when the Input File is not specified. | String |
Date et heure de du fichier en entrée
(Facultatif) | The UTC date and time that the file was modified will be used as the modified attribute in the output data when the Input File parameter (in_file in Python) is transferred to the server. If this parameter is not specified, the current modified time of the input file will be used. This parameter has no effect when the Input File is not specified. | Date |
Longueur du champ avant le texte (Facultatif) | Content is extracted from the input document to provide context for the location that was found. This parameter defines the maximum number of characters that will be extracted preceding the text that defines the location. The extracted text is stored in the Pre-Text field in the output feature class's attribute table. The default is 254. The Pre-Text field's data type will also have this length. The length of a text field in a shapefile is limited to 254 characters; when the output is a shapefile, a larger number of characters will be truncated to 254. | Long |
Longueur du champ après le texte (Facultatif) | Content is extracted from the input document to provide context for the location that was found. This parameter defines the maximum number of characters that will be extracted following the text that defines the location. The extracted text is stored in the Post-Text field in the output feature class's attribute table. The default is 254. The Post-Text field's data type will also have this length. The length of a text field in a shapefile is limited to 254 characters; when the output is a shapefile, a larger number of characters will be truncated to 254. | Long |
Format de coordonnées (Facultatif) | Specifies the coordinate format that will be used to store the coordinate location. A standard representation of the spatial coordinate that defines the point feature is recorded in a field in the attribute table.
| String |
Demander des césures de mots (Facultatif) | Specifies whether to search for text using word breaks. A word break occurs when words (text) are bounded by whitespace or punctuation characters as in European languages. This setting can produce frequent false positives or infrequent false positives depending on the language of the text. For example, when word breaks are not required, the English text Bernard will produce a match against the text San Bernardino, which would likely be considered a false positive. However, when text is written using a language that does not use word breaks, you cannot find words if word breaks are required. For example, with the text I flew to Tokyo in Japanese, 私は東京に飛んで, you would only be able to find the word Tokyo, 東京, when word breaks are not required.
| Boolean |
Disponible avec une licence LocateXT.
Synthèse
Analyse les documents contenant un texte non-structuré ou semi-structuré, comme des messages électroniques, des formulaires de déplacement, etc. et extrait les emplacements vers une classe d’entités ponctuelles.
L’outil analyse et traite les documents en entrée comme suit :
- Il identifie les coordonnées spatiales spécifiques dans le contenu des documents et génère des points représentant ces emplacements. Les formats de coordonnées suivants sont reconnus : degrés décimaux, minutes décimales degrés, secondes minutes degrés, Universal Transverse Mercator et système de référence de carroyage militaire.
- Il identifie les noms de lieu précisés dans le contenu des documents définis dans un fichier d’emplacements personnalisés et génère des points représentant ces emplacements. Un fichier d’emplacements personnalisés associe un nom de lieu à des coordonnées spatiales représentant cet emplacement.
- Il identifie le texte d’intérêt, extrait ces informations depuis un document et les enregistre dans des champs dans la table attributaire de la classe d’entités en sortie.
Cet outil prend en charge tous les documents Microsoft Office (Word, PowerPoint et Excel) ; les documents Adobe PDF ; le texte balisé comme les documents XML et HTML ; et tout fichier contenant le texte brut comme les fichiers texte (.txt).
Utilisation
The parameter default values are designed to optimize the identification of coordinates and dates. Default values can be modified for each parameter. The fewer parameters that are modified, the faster the tool will run.
All coordinate formats are on by default. If you want to extract custom locations only and do not want to extract spatial coordinates, turn off the coordinate format parameters.
If an Adobe PDF document is provided as input and its content includes a spatial coordinate in a format that is turned on, and the output feature class does not contain a feature representing the spatial coordinate, your computer may not have a component that is required to process PDF documents.
When you use a custom location file to extract place names, it is a best practice to specify fewer place names in the file. For example, if you convert a feature class representing all places in the world to a custom location file, the process can take a lot of time looking for places that are unlikely to be present or are in areas of the world in which you are not interested for your analysis.
When the place names in which you are interested can be misspelled or have known variations, you will typically get better results by specifying common misspellings and alternate place names in the custom location file instead of using fuzzy matching. When fuzzy matching is turned on, you will get an output location if 70 percent of the characters in a place name have a match with the input content. This can produce more false positives than if you provide known alternates and misspellings.
A useful workflow for fuzzy matching is to first run the tool with fuzzy matching turned off. Then, run the tool again with fuzzy matching turned on and check the results. This can help you identify spelling variations that can be added to a custom locations file.
Paramètres
arcpy.conversion.ExtractLocationsDocument(in_file, out_feature_class, {in_template}, {coord_dd_latlon}, {coord_dd_xydeg}, {coord_dd_xyplain}, {coord_dm_latlon}, {coord_dm_xymin}, {coord_dms_latlon}, {coord_dms_xysec}, {coord_dms_xysep}, {coord_utm}, {coord_ups_north}, {coord_ups_south}, {coord_mgrs}, {coord_mgrs_northpolar}, {coord_mgrs_southpolar}, {comma_decimal}, {coord_use_lonlat}, {in_coor_system}, {in_custom_locations}, {fuzzy_match}, {max_features_extracted}, {ignore_first_features}, {date_monthname}, {date_m_d_y}, {date_yyyymmdd}, {date_yymmdd}, {date_yyjjj}, {max_dates_extracted}, {ignore_first_dates}, {date_range_begin}, {date_range_end}, {in_custom_attributes}, {file_link}, {file_mod_datetime}, {pre_text_length}, {post_text_length}, {std_coord_fmt}, {req_word_breaks})
Nom | Explication | Type de données |
in_file | Fichier en entrée numérisé pour les emplacements (coordonnées ou emplacements personnalisés), dates et attributs personnalisés ; ou un dossier dans lequel tous les fichiers vont être scannés pour emplacements. | File |
out_feature_class | The feature class containing point features that represent the locations that were found. | Feature Class |
in_template (Facultatif) | The template file (*.lxttmpl) that determines the setting to use for each tool parameter. When a template file is provided, all values specified for other parameters will be ignored except those that determine the input content that will be processed and the output feature class. Some settings that are available in the Extract Locations pane are only available to this tool when the settings are saved to a template file, and the template file is referenced in this parameter. These settings are as follows:
| File |
coord_dd_latlon (Facultatif) | Specifies whether to search for coordinates stored as decimal degrees formatted as latitude and longitude (infrequent false positives). Examples are: 33.8N 77.035W and W77N38.88909.
| Boolean |
coord_dd_xydeg (Facultatif) | Specifies whether to search for coordinates stored as decimal degrees formatted as X Y with degree symbols (infrequent false positives). Examples are: 38.8° -77.035° and -077d+38.88909d.
| Boolean |
coord_dd_xyplain (Facultatif) | Specifies whether to search for coordinates stored as decimal degrees formatted as X Y with no symbols (frequent false positives). Examples are: 38.8 -77.035 and -077.0, +38.88909.
| Boolean |
coord_dm_latlon (Facultatif) | Specifies whether to search for coordinates stored as degrees decimal minutes formatted as latitude and longitude (infrequent false positives). Examples are: 3853.3N 7702.100W and W7702N3853.3458.
| Boolean |
coord_dm_xymin (Facultatif) | Specifies whether to search for coordinates stored as degrees decimal minutes formatted as X Y with minutes symbols (infrequent false positives). Examples are: 3853' -7702.1' and -07702m+3853.3458m.
| Boolean |
coord_dms_latlon (Facultatif) | Specifies whether to search for coordinates stored as degrees minutes seconds formatted as latitude and longitude (infrequent false positives). Examples are: 385320.7N 770206.000W and W770206N385320.76.
| Boolean |
coord_dms_xysec (Facultatif) | Specifies whether to search for coordinates stored as degrees minutes seconds formatted as X Y with seconds symbols (infrequent false positives). Examples are: 385320" -770206.0" and -0770206.0s+385320.76s.
| Boolean |
coord_dms_xysep (Facultatif) | Specifies whether to search for coordinates stored as degrees minutes seconds formatted as X Y with separators (moderate false positives). Examples are: 8:53:20 -77:2:6.0 and -077/02/06/+38/53/20.76.
| Boolean |
coord_utm (Facultatif) | Specifies whether to search for Universal Transverse Mercator (UTM) coordinates (infrequent false positives). Examples are: 18S 323503 4306438 and 18 north 323503.25 4306438.39.
| Boolean |
coord_ups_north (Facultatif) | Specifies whether to search for Universal Polar Stereographic (UPS) coordinates in the north polar area (infrequent false positives). Examples are: Y 2722399 2000000 and north 2722399 2000000.
| Boolean |
coord_ups_south (Facultatif) | Specifies whether to search for Universal Polar Stereographic (UPS) coordinates in the south polar area (infrequent false positives). Examples are: A 2000000 3168892 and south 2000000 3168892.
| Boolean |
coord_mgrs (Facultatif) | Specifies whether to search for Military Grid Reference System (MGRS) coordinates (infrequent false positives). Examples are: 18S UJ 13503 06438 and 18SUJ0306.
| Boolean |
coord_mgrs_northpolar (Facultatif) | Specifies whether to search for Military Grid Reference System (MGRS) coordinates in the north polar area (infrequent false positives). Examples are: Y TG 56814 69009 and YTG5669.
| Boolean |
coord_mgrs_southpolar (Facultatif) | Specifies whether to search for Military Grid Reference System (MGRS) coordinates in the south polar area (moderate false positives). Examples are: A TN 56814 30991 and ATN5630.
| Boolean |
comma_decimal (Facultatif) | Specifies whether a comma (,) will be recognized as a decimal separator. By default, content is scanned for spatial coordinates defined by numbers that use a period (.) or a middle dot (·) as the decimal separator, for example: Lat 01° 10·80’ N Long 103° 28·60’ E. If you are working with content in which spatial coordinates are defined by numbers that use a comma (,) as the decimal separator, for example: 52° 8′ 32,14″ N; 5° 24′ 56,09″ E, set this parameter to recognize a comma as the decimal separator instead. This parameter is not set automatically based on the regional setting for your computer's operating system.
| Boolean |
coord_use_lonlat (Facultatif) | When numbers resemble x,y coordinates, both numbers are less than 90, and there are no symbols or notations to indicate which number represents the latitude or longitude, results can be ambiguous. Interpret the numbers as a longitude-latitude coordinate (x,y) instead of a latitude-longitude coordinate (y,x).
| Boolean |
in_coor_system (Facultatif) | The coordinate system that will be used to interpret the spatial coordinates defined in the input. GCS-WGS-84 is the default. | Spatial Reference |
in_custom_locations (Facultatif) | The custom location file (.lxtgaz) that will be used when scanning the input content. A point is created to represent each occurrence of each place name in the custom location file up to the limits established by other tool parameters. | File |
fuzzy_match (Facultatif) | Specifies whether fuzzy matching will be used for searching the custom location file.
| Boolean |
max_features_extracted (Facultatif) | The maximum number of features that can be extracted. The tool will stop scanning the input content for locations when the maximum number is reached. When running as a geoprocessing service, the service and the server may have separate limits on the number of features allowed. | Long |
ignore_first_features (Facultatif) | The number of features detected and ignored before extracting all other features. This parameter can be used to focus the search on a specific portion of the data. | Long |
date_monthname (Facultatif) | Specifies whether to search for dates in which the month name appears (infrequent false positives). 12 May 2003 and January 15, 1997 are examples.
| Boolean |
date_m_d_y (Facultatif) | Specifies whether to search for dates in which numbers are in the M/D/Y or D/M/Y format (moderate false positives). 5/12/03 and 1-15-1997 are examples.
| Boolean |
date_yyyymmdd (Facultatif) | Specifies whether to search for dates in which numbers are in the YYYYMMDD format (moderate false positives). 20030512 and 19970115 are examples.
| Boolean |
date_yymmdd (Facultatif) | Specifies whether to search for dates in which numbers are in the YYMMDD format (frequent false positives). 030512 and 970115 are examples.
| Boolean |
date_yyjjj (Facultatif) | Specifies whether to search for dates in which numbers are in the YYJJJ or YYYYJJJ format (frequent false positives). 03132 and 97015 are examples.
| Boolean |
max_dates_extracted (Facultatif) | The maximum number of dates that will be extracted. | Long |
ignore_first_dates (Facultatif) | The number of dates that will be detected and ignored before extracting all other dates. | Long |
date_range_begin (Facultatif) | The earliest acceptable date to extract. Detected dates matching this value or later will be extracted. | Date |
date_range_end (Facultatif) | The latest acceptable date to extract. Detected dates matching this value or earlier will be extracted. | Date |
in_custom_attributes (Facultatif) | The custom attribute file (.lxtca) that will be used to scan the input content. Fields will be created in the output feature class's attribute table for all custom attributes defined in the file. When the input content is scanned, it will be examined to see if it contains text associated with all custom attributes specified in the file. When a match is found, the appropriate text is extracted from the input content and stored in the appropriate field. | File |
file_link (Facultatif) | The file path that will be used as the file name in the output data when the Input File parameter (in_file in Python) is transferred to the server. If this parameter is not specified, the path of the Input File will be used, which may be an unreachable folder on a server. This parameter has no effect when the Input File is not specified. | String |
file_mod_datetime (Facultatif) | The UTC date and time that the file was modified will be used as the modified attribute in the output data when the Input File parameter (in_file in Python) is transferred to the server. If this parameter is not specified, the current modified time of the input file will be used. This parameter has no effect when the Input File is not specified. | Date |
pre_text_length (Facultatif) | Content is extracted from the input document to provide context for the location that was found. This parameter defines the maximum number of characters that will be extracted preceding the text that defines the location. The extracted text is stored in the Pre-Text field in the output feature class's attribute table. The default is 254. The Pre-Text field's data type will also have this length. The length of a text field in a shapefile is limited to 254 characters; when the output is a shapefile, a larger number of characters will be truncated to 254. | Long |
post_text_length (Facultatif) | Content is extracted from the input document to provide context for the location that was found. This parameter defines the maximum number of characters that will be extracted following the text that defines the location. The extracted text is stored in the Post-Text field in the output feature class's attribute table. The default is 254. The Post-Text field's data type will also have this length. The length of a text field in a shapefile is limited to 254 characters; when the output is a shapefile, a larger number of characters will be truncated to 254. | Long |
std_coord_fmt (Facultatif) | Specifies the coordinate format that will be used to store the coordinate location. A standard representation of the spatial coordinate that defines the point feature is recorded in a field in the attribute table.
| String |
req_word_breaks (Facultatif) | Specifies whether to search for text using word breaks. A word break occurs when words (text) are bounded by whitespace or punctuation characters as in European languages. This setting can produce frequent false positives or infrequent false positives depending on the language of the text. For example, when word breaks are not required, the English text Bernard will produce a match against the text San Bernardino, which would likely be considered a false positive. However, when text is written using a language that does not use word breaks, you cannot find words if word breaks are required. For example, with the text I flew to Tokyo in Japanese, 私は東京に飛んで, you would only be able to find the word Tokyo, 東京, when word breaks are not required.
| Boolean |
Exemple de code
Le script de fenêtre Python ci-dessous illustre l'utilisation de la fonction ExtractLocationsFromDocument en mode immédiat.
import arcpy
arcpy.env.workspace = "c:/data"
arcpy.ExtractLocationsFromDocument_conversion("wells.docx", "water.gdb/wells")
Environnements
Cas particuliers
Informations de licence
- Basic: Nécessite LocateXT
- Standard: Nécessite LocateXT
- Advanced: Nécessite LocateXT
Rubriques connexes
Vous avez un commentaire à formuler concernant cette rubrique ?