Label | Explanation | Data Type |
Input Training Features | The input feature class that will be used to train the model. | Feature Layer; Table View |
Output Model | The output trained model that will be saved as a deep learning package (.dlpk file). | File |
Variable to Predict | A field from the Input Training Features parameter that contains the values that will be used to train the model. This field contains known (training) values of the variable that will be used to predict at unknown locations. | Field |
Treat Variable as Categorical
(Optional) | Specifies whether the Variable to Predict parameter value will be treated as a categorical variable.
| Boolean |
Explanatory Training Variables
(Optional) | A list of fields representing the explanatory variables that will help predict the value or category of the Variable to Predict parameter value. Check the accompanying check box for any variables that represent classes or categories (such as land cover, presence, or absence). | Value Table |
Explanatory Training Distance Features
(Optional) | The features whose distances from the input training features will be estimated automatically and added as more explanatory variables. Distances will be calculated from each of the input explanatory training distance features to the nearest input training features. Point and polygon features are supported, and if the input explanatory training distance features are polygons, the distance attributes will be calculated as the distance between the closest segments of the pair of features. | Feature Layer |
Explanatory Training Rasters
(Optional) | The rasters whose values will be extracted from the raster and considered as explanatory variables for the model. Each layer forms one explanatory variable. For each feature in the input training features, the value of the raster cell will be extracted at that exact location. Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters. If the Input Training Features parameter value has polygons, and you provided a value for this parameter, one raster value for each polygon will be used in the model. Each polygon is assigned the average value for continuous rasters and the majority for categorical rasters. Ensure that the Categorical check box is checked for any raster that depicts classes or categories, such as land cover, vegetation, or soil type. | Value Table |
Total Time Limit (Minutes)
(Optional) | The total time limit in minutes it takes for AutoML model training. The default is 240 (4 hours). | Double |
AutoML Mode
(Optional) | Specifies the goal of AutoML and how intensive the AutoML search will be.
| String |
Algorithms
(Optional) | Specifies the algorithms that will be used during the training. By default, all the algorithms will be used.
| String |
Validation Percentage
(Optional) | The percentage of input data that will be used for validation. The default value is 10. | Long |
Output Report
(Optional) | The output report that will be generated as an .html file. If the path provided is not empty, the report will be created in a new folder under the provided path. The report will contain details of the various models as well as details of the hyperparameters that were used during the evaluation and the performance of each model. Hyperparameters are parameters that control the training process. They are not updated during training and include model architecture, learning rate, number of epochs, and so on. | File |
Output Importance Table
(Optional) | An output table containing information about the importance of each explanatory variable (fields, distance features, and rasters) used in the model. | Table |
Output Feature Class
(Optional) | The feature layer containing the predicted values by the best performing model on the training feature layer. It can be used to verify model performance by visually comparing the predicted values with the ground truth. | Feature Class |
Add Image Attachments
(Optional) | Specifies whether images will be used as explanatory variables from the Input Training Features parameter value for training a multimodal or mixed data model. Training a multimodal or mixed data tabular model involves using machine and deep learning backbones in AutoML to learn from multiple types of data formats by a single model. The input data can consist of a combination of explanatory variables from a diverse set of data sources such as text descriptions, corresponding images, and any additional categorical and continuous variables.
| Boolean |
Sensitive Feature Attributes
(Optional) |
Assesses and improves the fairness of the trained models for tabular data for classification and regression models. Set the following two components for this parameter:
| Value Table |
Fairness Metric
(Optional) | Specifies the fairness metrics that will be used for measuring fairness for classification and regression problems, which are used for grid searches for selecting the best fair model.
| String |
Summary
Trains a deep learning model by building training pipelines and automating much of the training process. This includes exploratory data analysis, feature selection, feature engineering, model selection, hyperparameter tuning, and model training. Its outputs include performance metrics of the best model on the training data, as well as the trained deep learning model package (.dlpk) that can be used as input for the Predict Using AutoML tool to predict on a new dataset.
Usage
You must install the proper deep learning framework for Python in ArcGIS Pro.
The time it takes for the tool to produce the trained model depends on the following:
- The amount of data provided during training
- The AutoML Mode parameter value
By default, the timer for all modes is set at 240 minutes. Regardless of the amount of data used in training, the Basic option of the AutoML Mode parameter will not take the entire 240 minutes to find the optimum model. The fit process will complete as soon as the optimum model is identified. The Advanced option will take more time due to the additional tasks of feature engineering, feature selection, and hyperparameter tuning. In addition to the new features obtained by combining multiple features from the input, the tool creates spatial features with names from zone3_id through zone7_id. These new features will be extracted from the location information in the input data and will be used to train better models. For more information about the new spatial features, see How AutoML Works. If the amount of data being trained is large, all combinations of the models may not be evaluated within 240 minutes. In such cases, the best performing model determined within 240 minutes will be considered the optimum model. You can then either use this model or rerun the tool with a higher Total Time Limit (Minutes) parameter value.
An ArcGIS Spatial Analyst extension license is required to use rasters as explanatory variables.
The Output Report parameter value is a file in HTML format that provides a way to review the information in the working directory.
The first page in the output report includes links to each of the models evaluated and shows their performance on a validation dataset along with the time it took to train them. Based on the evaluation metric, the report shows the best performing model that was chosen.
RMSE is the default evaluation metric for regression problems, while Logloss is the default metric for classification problems. The following metrics are available in the output report:
- Classification—AUC, Logloss, F1, Accuracy, Average precision
- Regression—MSE, RMSE, MAE, R2, MAPE, Spearman coefficient, Pearson coefficient
When you click a model combination, details about the training for that model combination are displayed including the learning curves, variable importance curves, hyperparameters used, and so on.
Example use cases for the tool include training an annual solar energy generation model based on weather factors, training a crop prediction model using related variables, and training a house value prediction model.
For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.
To use the Add Image Attachments parameter, prepare the Input Training Features parameter value for image attachments by doing the following:
- Ensure that the feature layer includes a field with image file paths for each record.
- Enable attachments for the feature layer using the Enable Attachments tool .
- Use the Add Attachments tool to specify the image path field and add it as an image attachment to the feature layer.
Parameters
arcpy.geoai.TrainUsingAutoML(in_features, out_model, variable_predict, {treat_variable_as_categorical}, {explanatory_variables}, {distance_features}, {explanatory_rasters}, {total_time_limit}, {autoML_mode}, {algorithms}, {validation_percent}, {out_report}, {out_importance}, {out_features}, {add_image_attachments}, {sensitive_feature}, {fairness_metric})
Name | Explanation | Data Type |
in_features | The input feature class that will be used to train the model. | Feature Layer; Table View |
out_model | The output trained model that will be saved as a deep learning package (.dlpk file). | File |
variable_predict | A field from the in_features parameter that contains the values that will be used to train the model. This field contains known (training) values of the variable that will be used to predict at unknown locations. | Field |
treat_variable_as_categorical (Optional) | Specifies whether the variable_predict parameter value will be treated as a categorical variable.
| Boolean |
explanatory_variables [explanatory_variables,...] (Optional) | A list of fields representing the explanatory variables that will help predict the value or category of the variable_predict parameter value. Pass the true value ("<name_of_variable> true") for any variables that represent classes or categories (such as land cover, presence, or absence). | Value Table |
distance_features [distance_features,...] (Optional) | The features whose distances from the input training features will be estimated automatically and added as more explanatory variables. Distances will be calculated from each of the input explanatory training distance features to the nearest input training features. Point and polygon features are supported, and if the input explanatory training distance features are polygons, the distance attributes will be calculated as the distance between the closest segments of the pair of features. | Feature Layer |
explanatory_rasters [explanatory_rasters,...] (Optional) | The rasters whose values will be extracted from the raster and considered as explanatory variables for the model. Each layer forms one explanatory variable. For each feature in the input training features, the value of the raster cell will be extracted at that exact location. Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters. If the in_features parameter value has polygons, and you provided a value for this parameter, one raster value for each polygon will be used in the model. Each polygon is assigned the average value for continuous rasters and the majority for categorical rasters. Pass the true value using "<name_of_raster> true" for any raster that depicts classes or categories such as land cover, vegetation, or soil type. | Value Table |
total_time_limit (Optional) | The total time limit in minutes it takes for AutoML model training. The default is 240 (4 hours). | Double |
autoML_mode (Optional) | Specifies the goal of AutoML and how intensive the AutoML search will be.
| String |
algorithms [algorithms,...] (Optional) | Specifies the algorithms that will be used during the training.
By default, all the algorithms will be used. | String |
validation_percent (Optional) | The percentage of input data that will be used for validation. The default value is 10. | Long |
out_report (Optional) | The output report that will be generated as an .html file. If the path provided is not empty, the report will be created in a new folder under the provided path. The report will contain details of the various models as well as details of the hyperparameters that were used during the evaluation and the performance of each model. Hyperparameters are parameters that control the training process. They are not updated during training and include model architecture, learning rate, number of epochs, and so on. | File |
out_importance (Optional) | An output table containing information about the importance of each explanatory variable (fields, distance features, and rasters) used in the model. | Table |
out_features (Optional) | The feature layer containing the predicted values by the best performing model on the training feature layer. It can be used to verify model performance by visually comparing the predicted values with the ground truth. | Feature Class |
add_image_attachments (Optional) | Specifies whether images will be used as explanatory variables from the in_features parameter value for training a multimodal or mixed data model. Training a multimodal or mixed data tabular model involves using machine and deep learning backbones in AutoML to learn from multiple types of data formats by a single model. The input data can consist of a combination of explanatory variables from a diverse set of data sources such as text descriptions, corresponding images, and any additional categorical and continuous variables.
| Boolean |
sensitive_feature [sensitive_feature,...] (Optional) |
Assesses and improves the fairness of the trained models for tabular data for classification and regression models. Set the following two components for this parameter:
| Value Table |
fairness_metric (Optional) | Specifies the fairness metrics that will be used for measuring fairness for classification and regression problems, which are used for grid searches for selecting the best fair model.
| String |
Code sample
This example shows how to use the TrainUsingAutoML function.
# Name: TrainUsingAutoML.py
# Description: Train a machine learning model on feature or tabular data with
# automatic hyperparameter selection.
# Import system modules
import arcpy
import os
# Set local variables
datapath = "path_to_data"
out_path = "path_to_trained_model"
in_feature = os.path.join(datapath, "train_data.gdb", "name_of_data")
out_model = os.path.join(out_path, "model.dlpk")
# Run Train Using AutoML Model
arcpy.geoai.TrainUsingAutoML(in_feature, out_model, "price", None,
"bathrooms #;bedrooms #;square_fee #", None, None,
240, "BASIC")
Environments
Licensing information
- Basic: No
- Standard: No
- Advanced: Yes
Related topics
- An overview of the Feature and Tabular Analysis toolset
- Find a geoprocessing tool
- An overview of the Imagery AI toolset
- Fairness in the Train Using AutoML tool
- How LightGBM algorithm works
- How Linear regression algorithm works
- How XGBoost algorithm works
- How Decision tree classification and regression algorithm works
- How Extra trees classification and regression algorithm works
- How Random trees classification and regression algorithm works