Process Text Using AI Model (GeoAI)

Summary

Processes text from various types of sources, such as text fields in feature classes or tables, or text files in a folder, to support a variety of use cases including text transformation, entity recognition, text classification, text generation, translation, summarization, and so on. The tool uses custom third-party models or deep learning models trained using the Train Text Classification Model, Train Text Transformation Model, and Train Entity Recognition Model tools.

Usage

  • This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in ArcGIS Pro, see Install deep learning frameworks for ArcGIS.

  • This tool requires a model definition file containing model information. The model can be trained using the Train Text Classification Model, Train Text Transformation Model, or Train Entity Recognition Model tool. The Input Model Definition File parameter value can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk). The model files can be stored locally or hosted on ArcGIS Living Atlas of the World.

  • This tool supports the use of third-party language models created using the model extensibility feature. This feature enables tasks—such as entity extraction, text classification, text summarization, text translation, and so on—using custom deep learning models that were not trained with tools supported by ArcGIS Pro. To learn more about creating a custom deep learning model file, see Use third-party language models with ArcGIS.

  • This tool can run on CPU or GPU; however, deep learning is computationally intensive and a GPU is recommended. To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead.

  • This tool supports running third-party language models hosted remotely, without requiring the installation of deep learning frameworks or GPU specifications, as these are managed remotely.

  • For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.

Parameters

LabelExplanationData Type
Input Layer or Table

The input can be either of the following:

  • The point, line, or polygon input feature class, or table, containing the input fields. Each row in the input represents a single record.
  • A folder containing the text files.
Feature Layer; Table View; Table; Folder
Data Fields

The name of the fields from the input feature class or table that will be used for downstream natural language processing (NLP) tasks.

Field
Input Model Definition File

The trained model that will be used for NLP tasks. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

The .dlpk file can also be a third-party language model.

Caution:

A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source.

File
Output Layer or Table

The feature class or table where the output from the NLP tasks will be stored.

Feature Class; Table; Feature Layer
Model Arguments
(Optional)

Additional arguments that will be used by the model while performing inference. These can include arguments supported by third-party models, as well as additional parameters supported by the Train Text Classification, Train Text Transformation Model, or Train Entity Recognition Model tool.

Note:

When using a third-party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see the getParameterInfo section in Use third-party language models with ArcGIS.

Value Table
Location Zone
(Optional)

The geographic region or zone where the addresses are expected to be located. The specified text will be appended to the address extracted by the model.

The locator uses the location zone information to identify the region or geographic area where the address is located to produce better results.

Note:

This parameter is only supported for models trained using the Train Entity Recognition Model tool with a defined address entity.

String
Input Locator
(Optional)

The locator that will be used to geocode addresses in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class.

Note:

This parameter is only supported for models trained using the Train Entity Recognition Model tool with a defined address entity.

Address Locator

Derived Output

LabelExplanationData Type
Updated Table

The output feature layer containing the result derived from the input data.

Feature Layer; Table

arcpy.geoai.ProcessTextUsingAIModel(in_layer, data_fields, in_model_definition_file, out_layer, {model_arguments}, {location_zone}, {in_locator})
NameExplanationData Type
in_layer

The input can be either of the following:

  • The point, line, or polygon input feature class, or table, containing the input fields. Each row in the input represents a single record.
  • A folder containing the text files.
Feature Layer; Table View; Table; Folder
data_fields
[data_fields,...]

The name of the fields from the input feature class or table that will be used for downstream natural language processing (NLP) tasks.

Field
in_model_definition_file

The trained model that will be used for NLP tasks. The model definition file can be an Esri model definition JSON file (.emd) or a deep learning model package (.dlpk) that is stored locally or hosted on ArcGIS Living Atlas (.dlpk_remote).

The .dlpk file can also be a third-party language model.

Caution:

A third-party language model .dlpk file can potentially contain harmful code. Use these models only if you trust their source.

File
out_layer

The feature class or table where the output from the NLP tasks will be stored.

Feature Class; Table; Feature Layer
model_arguments
[model_arguments,...]
(Optional)

Additional arguments that will be used by the model while performing inference. These can include arguments supported by third-party models, as well as additional parameters supported by the Train Text Classification, Train Text Transformation Model, or Train Entity Recognition Model tool.

Note:

When using a third-party language model, the model arguments will be updated according to the parameters specified in the .dlpk file. To learn more about defining model arguments, see the getParameterInfo section in Use third-party language models with ArcGIS.

Value Table
location_zone
(Optional)

The geographic region or zone where the addresses are expected to be located. The specified text will be appended to the address extracted by the model.

The locator uses the location zone information to identify the region or geographic area where the address is located to produce better results.

Note:

This parameter is only supported for models trained using the Train Entity Recognition Model tool with a defined address entity.

String
in_locator
(Optional)

The locator that will be used to geocode addresses in the input text documents. A point is generated for each address that is geocoded successfully and stored in the output feature class.

Note:

This parameter is only supported for models trained using the Train Entity Recognition Model tool with a defined address entity.

Address Locator

Derived Output

NameExplanationData Type
updated_table

The output feature layer containing the result derived from the input data.

Feature Layer; Table

Code sample

ProcessTextUsingAIModel (stand-alone script)

The following example demonstrates how to use the ProcessTextUsingAIModel function.

# Name: ProcessText.py
# Description: ArcGIS geoprocessing tool that enables a broad range of advanced
# text processing tasks, with customizable outputs to meet various NLP needs.
#
# Requirements: ArcGIS Pro Advanced license

# Import system modules
import arcpy

arcpy.env.workspace = "C:/processtextexamples/data"

# Set local variables
in_table = "ProcessTextData"
pretrained_model_path_emd = "c:\\processtextdata\\ProcessTextUsingLLMs.emd"

# Run Process Text Using AI Model
arcpy.geoai.ProcessTextUsingAIModel(
    in_layer, data_fields, in_model_definition_file, out_layer, model_arguments,
    location_zone, in_locator)

Environments

Licensing information

  • Basic: No
  • Standard: No
  • Advanced: Yes

Related topics