Label | Explanation | Data Type |
Input Folder or Table | The input to this parameter can be either of the following:
| Folder; Feature Layer; Table View; Feature Class |
Output Table | The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses. | Feature Class; Table; Feature Layer |
Input Model Definition File | The trained model that will be used to extract entities from text. The model definition file can be either an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote). To use a .dlpk file that is trained using the Mistral backbone, it must be installed before using the model. To install the Mistral backbone, see ArcGIS Mistral Backbone The .dlpk file can also be a third-party language model. Caution:A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source. | File |
Model Arguments (Optional) |
Additional arguments that will be used by the model while performing inference. The supported model argument is sequence_length, which will be used to adjust the model's output. Note:When using a third party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see getParameterInfo section in Use third party language models with ArcGIS. | Value Table |
Batch Size
(Optional) | The number of training samples that will be processed at one time. The default value is 4. Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size. | Double |
Location Zone
(Optional) | The geographic region or zone where the addresses are expected to be located. The specified text will be appended to the address extracted by the model. The locator uses the location zone information to identify the region or geographic area where the address is located to produce better results. | String |
Input Locator
(Optional) | The locator that will be used to geocode addresses in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class. | Address Locator |
Text Field | A text field in the input feature class or table that contains the text that will be used by the model as input. This parameter is required when the Input Folder or Table parameter value is a feature class or table. | Field |
Summary
Runs a trained named entity recognizer model on text files in a folder, or a text field in a feature class or table, to extract entities and locations (such as addresses, place or person names, dates, and monetary values) in a table. If the extracted entities contain an address, the tool geocodes the addresses using the specified locator and produces a feature class as an output.
Usage
This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in ArcGIS Pro, see Install deep learning frameworks for ArcGIS.
This tool requires a model definition file containing trained model information. The model can be trained using the Train Entity Recognition Model tool. The Input Model Definition File parameter value can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk). The model files can be stored locally or hosted on ArcGIS Living Atlas of the World.
This tool supports models trained using transformer-based backbones and the Mistral backbone. To install the Mistral backbone, see ArcGIS Mistral Backbone.
This tool supports the use of third-party language models created using the model extensibility feature. The model extensibility feature enables entity extraction tasks using a custom deep learning model file (.dlpk) that is not created using the Train Entity Recognition Model tool. To learn more about creating a custom deep learning (.dlpk) model file, see Use third party language models with ArcGIS.
This tool can run on CPU or GPU; however, deep learning is computationally intensive and a GPU is recommended. To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead.
For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.
Parameters
arcpy.geoai.ExtractEntitiesUsingDeepLearning(in_folder, out_table, in_model_definition_file, {model_arguments}, {batch_size}, {location_zone}, {in_locator}, text_field)
Name | Explanation | Data Type |
in_folder | The input to this parameter can be either of the following:
| Folder; Feature Layer; Table View; Feature Class |
out_table | The output feature class or table that will contain the extracted entities. If a locator is provided and the model extracts addresses, the feature class will be produced by geocoding the extracted addresses. | Feature Class; Table; Feature Layer |
in_model_definition_file | The trained model that will be used to extract entities from text. The model definition file can be either an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote). To use a .dlpk file that is trained using the Mistral backbone, it must be installed before using the model. To install the Mistral backbone, see ArcGIS Mistral Backbone The .dlpk file can also be a third-party language model. Caution:A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source. | File |
model_arguments [model_arguments,...] (Optional) |
Additional arguments that will be used by the model while performing inference. The supported model argument is sequence_length, which will be used to adjust the model's output. Note:When using a third party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see getParameterInfo section in Use third party language models with ArcGIS. | Value Table |
batch_size (Optional) | The number of training samples that will be processed at one time. The default value is 4. Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size. | Double |
location_zone (Optional) | The geographic region or zone where the addresses are expected to be located. The specified text will be appended to the address extracted by the model. The locator uses the location zone information to identify the region or geographic area where the address is located to produce better results. | String |
in_locator (Optional) | The locator that will be used to geocode addresses in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class. | Address Locator |
text_field | A text field in the input feature class or table that contains the text that will be used by the model as input. This parameter is required when the in_folder parameter value is a feature class or table. | Field |
Code sample
The following example demonstrates how to use the ExtractEntitiesUsingDeepLearning function.
# Name: ExtractEntities.py
# Description: Extract useful entities such as "Address", "Date" from text.
# Import system modules
import arcpy
import os
arcpy.env.workspace = "C:/textanalysisexamples/data"
dbpath = "C:/textanalysisexamples/Text_analysis_tools.gdb"
# Set local variables
in_folder = 'test_data'
out_table = os.path.join(dbpath, "ExtractedEntities")
pretrained_model_path_emd = "c:\\extractentities\\EntityRecognizer.emd"
# Run Extract Entities Using Deep Learning
arcpy.geoai.ExtractEntitiesUsingDeepLearning(
in_folder, out_table, pretrained_model_path_emd)
Environments
Licensing information
- Basic: No
- Standard: No
- Advanced: Yes