Presence-only Prediction (MaxEnt) (Spatial Statistics)

Summary

Models the presence of a phenomenon given known presence locations and explanatory variables using a maximum entropy approach (MaxEnt). The tool provides output features and rasters that include the probability of presence and can be applied to problems in which only presence is known and absence is not known.

Learn more about how Presence-only Prediction (MaxEnt) works

Illustration

Presence-only Prediction (MaxEnt) tool illustration

Usage

  • The tool works with three primary inputs to create a presence prediction model: known presence locations, a study area where presence is possible, and explanatory variables.

    • The Input Point Features parameter value is used to designate known presence locations of a phenomenon of interest.
    • The study area is characterized by background points. Background points are locations distributed across the study area where presence of the phenomenon of interest may be possible but unknown. These can be automatically created by the tool or manually included with the input point features by checking the Contains Background Points parameter.
    • The tool accepts explanatory variables in the form of rasters, fields, and distance features.

  • The tool can be run in two modes that are specified by the Contains Background Points parameter:

    • Unchecked—The tool will run with presence-only points and only accept explanatory variables from raster sources.
    • Checked—The tool will run with presence and background points and allow explanatory variable sources to include rasters, fields in the input point features, and distance features.

  • An ArcGIS Spatial Analyst extension license is required to use rasters as inputs to or outputs from the tool.

  • The tool requires at least two presence points in the input point features to create a model. If the input features contain background points, the tool also requires at least two background points to create a model.

  • The Explanatory Training Distance Features parameter is inactive when the Contains Background Points parameter is unchecked. To include distances to features as explanatory variables for presence-only data, distance rasters can be calculated using the Distance Accumulation tool, and the distance rasters can be included in the Explanatory Training Rasters parameter.

  • The spatial resolution of the Explanatory Training Rasters parameter values is important in the following ways:

    • The cell sizes have a significant impact on processing time. The higher the raster resolution, the longer the processing time.
    • The tool will use cell centroids of the rasters to generate background points when using presence-only data (the Contains Background Points parameter is unchecked). The proportion of background points to presence points impacts the model; it is recommended that you consider the cell size of the rasters and investigate the resulting background points using the Output Trained Features parameter to ensure that assumptions about the study area are appropriate for your question.
      Note:

      You can use the Resample tool to decrease the spatial resolution of explanatory training rasters.

  • The defined study area, whether from the Study Area parameter or from the locations of input point features that include background points, contributes to the model’s outcome. The extent used will determine which raster cells are used as background points. This establishes the environment conditions that are compared with presence conditions and establish a relative occurrence rate, which affects the prediction results.

  • Use the Relative Weight of Presence to Background parameter to specify the meaning of background points. Use a value of 100 when background points represent locations with unknown presence. Use a value of 1 when background points represent locations with observed absence.

    • The value affects how the model operates and the tool’s resulting predictions. When the value is close to 100, the model penalizes each misclassified presence point 100 times more than each misclassified background point (assuming that the correct classification of background is absence) and the traditional MaxEnt approach is applied. When the value is 1, the model penalizes each presence and background point equally and is similar to logistic regression.
    • A value of 1 should be used cautiously when using presence-only mode (the Contains Background Points parameter is unchecked), since the tool generates background points that are treated as absence and weighted equally to provided presence points.

  • Sampling bias is inherent to most presence data and impacts the results of the analysis. You can use the Spatial Thinning parameter to help reduce this impact. However, while spatial thinning is a useful remediation to reduce the effects of sampling bias, it is recommended that you use data from structured surveys to further minimize the impact of sampling bias.

  • Classification diagnostics are available from geoprocessing messages and from the Classification Result Percentages chart that is provided with the resulting layer from the Output Trained Features parameter value. The chart displays a comparison of the observed and predicted classifications and you can use it to assess the model’s ability to predict performance on known presence points. For example, you can assess the model’s ability to predict presence by focusing on the portion of misclassified presence points in the training input point features. In use cases in which presence prediction on background points is important, you can also use the chart to view and select the background points that are predicted to have presence.

  • You can use the tool in two ways. You can focus on training and evaluating candidate models, or you can focus on predicting presence probabilities across a new dataset.

    • Training and evaluating candidate models—Run the tool without specifying outputs to evaluate the model diagnostics included in geoprocessing messages. Once the diagnostic results seem appropriate, specify an Output Trained Features parameter value and use the classification diagnostic charts to further evaluate prediction performance across the training data. The charts included in the Output Sensitivity Table and Output Response Curve Table parameter values are diagnostic metrics for the training data and will also be useful as you adjust and find an appropriate model.
    • Prediction—Specify the parameters in the Prediction Outputs parameter category to apply the model to new locations that are not part of the training data. The Input Prediction Features and the resulting Output Prediction Features parameter values represent new point locations where a prediction is needed. In addition to point features, a prediction surface can be created by specifying an Output Prediction Raster parameter value. Prediction features and prediction rasters must be used in conjunction with matched explanatory variables in the same form that was used in the training data (raster, fields, or distance features).

  • Spatial thinning can result in the training data not including all the input point features. To test the model’s performance across all points when spatial thinning is used, provide the same feature class for the Input Point Features and Input Prediction Features parameters.

  • The tool specifies coordinate systems for outputs by honoring the coordinate system of a feature dataset used in the output path. Otherwise, the tool will use the coordinate system specified in the Output Coordinate System environment. If you don't specify a feature dataset or an environment setting, the tool uses the following approaches for each output:

    • For the Output Training Features and Output Training Raster parameter values, the tool uses the coordinate system of the Input Point Features parameter value.
    • For the Output Prediction Features parameter value, the tool uses the coordinate system of the Input Prediction Features parameter value.
    • For the Output Prediction Raster parameter value, the tool uses the coordinate system defined by the Output Prediction Features parameter value. If the output prediction features are not specified, the tool uses the coordinate system of the first raster provided in the Match Explanatory Rasters parameter.

  • The Explanatory Variable Expansions (Basis Functions) parameter options have restrictions. The Smoothed step (Hinge) and Discrete step (Threshold) options are mutually exclusive; when one is selected the other one cannot be selected. When an explanatory variable is specified as Categorical, only the Original (Linear) option will be used.

  • When the Resampling Scheme parameter is set to Random, the tool will group the data and validate the model's performance on a subset of the grouped data. Each training group is subject to the same data requirements of the broader model: at least two presence and two background points are required. If these requirements are not fulfilled after 10 attempts, the tool will stop attempting to cross-validate and warn that cross-validation was not possible.

Parameters

LabelExplanationData Type
Input Point Features

The point features representing locations where presence of a phenomenon of interest is known to occur.

Feature Layer
Contains Background Points
(Optional)

Specifies whether the input point features contain background points.

If the input points do not contain background points, the tool will generate background points using cells in the explanatory training rasters. The tool uses background points to model the characteristics of the landscape in unknown locations and compare them to landscape characteristics in known presence locations. Therefore, background points can be considered as the study area. Generally, these are locations where presence of a phenomenon of interest is unknown. However, if any information is known about the background points, the Relative Weight of Presence to Background parameter can be used to indicate this.

  • Checked—The input point features include background points.
  • Unchecked—The input point features do not include background points. This is the default.
Boolean
Presence Indicator Field
(Optional)

The field from the input point features containing binary values that indicate each point as presence (1) or background (0). The field must be numeric (Short, Long, Float, or Double types).

Field
Explanatory Training Variables
(Optional)

A list of fields representing the explanatory variables that will help predict the probability of presence. You can specify whether each variable is categorical or numeric. Check the Categorical check box for each variable that represents a class or category (such as land cover).

Value Table
Explanatory Training Distance Features
(Optional)

A list of feature layers or feature classes that will be used to automatically create explanatory variables that represent the distance from the input point features to the nearest provided distance features. If the input explanatory training distance features are polygons or lines, the distance attributes are calculated as the distance between the closest segment and the point.

Feature Layer
Explanatory Training Rasters
(Optional)

A list of rasters that will be used to automatically create explanatory training variables in the model whose values are extracted from rasters. For each feature (presence and background points) in the input point features, the value of the raster cell will be extracted at that exact location.

Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters.

You can specify whether each raster value is categorical or numeric. Check the Categorical check box for each raster that represents a class or category (such as land cover).

Value Table
Explanatory Variable Expansions (Basis Functions)
(Optional)

Specifies the basis function that will be used to transform the provided explanatory variables for use in the model. If multiple basis functions are selected, the tool will produce multiple transformed variables and attempt to use them in the model.

  • Original (Linear) A linear transformation to the input variables will be applied. This is the default
  • Pairwise interaction (Product) A pairwise multiplication on continuous explanatory variables will be used, yielding interaction variables. This option is only available when multiple explanatory variables have been provided.
  • Smoothed step (Hinge) The continuous explanatory variable values will be converted into two segments, a static segment (composed of zeroes or ones) and a linear function segment (increasing or decreasing).
  • Discrete step (Threshold) The continuous explanatory variable values will be converted into a binary variable composed of zeroes and ones.
  • Squared (Quadratic) The square of each continuous explanatory variable value will be returned.
String
Number of Knots
(Optional)

The number of knots that will be used by the hinge and threshold explanatory variable expansions. The value controls how many thresholds are created, which are used to create multiple explanatory variable expansions using each threshold. The value must be between 2 and 50. The default is 10.

Long
Study Area
(Optional)

Specifies the type of study area that will be used to define where presence is possible when the input point features do not contain background points.

  • Convex hull The smallest convex polygon that encloses all the presence points in the input point features will be used. This is the default
  • Raster extentThe extent of the intersection of the explanatory training rasters will be used.
  • Polygon study areaA custom study area that is defined by a polygon feature class will be used.
String
Study Area Polygon
(Optional)

A feature class containing the polygons that define a custom study area. The input point features must be located within the custom study area covered by the polygon features. A study area can be composed of multiple polygons.

Feature Layer
Apply Spatial Thinning
(Optional)

Specifies whether spatial thinning will be applied to presence and background points before training the model.

Spatial thinning helps to reduce sampling bias by removing points and ensuring that remaining points have a minimum nearest-neighbor distance, set in the Minimum Nearest Neighbors parameter. Spatial thinning is also applied to background points whether they are provided in input point features or generated by the tool.

  • Checked—Spatial thinning will be applied.
  • Unchecked—Spatial thinning will not be applied. This is the default.
Boolean
Minimum Nearest Neighbor Distance
(Optional)

The minimum distance between any two presence points or any two background points when spatial thinning is applied.

Linear Unit
Number of Iterations for Thinning
(Optional)

The number of runs that will be used to find the optimal spatial thinning solution, seeking to maintain as many presence and background points as possible while ensuring that no two presence or two background points are within the specified Minimum Nearest Neighbor Distance parameter value. The minimum possible is 1 iteration and the maximum possible is 50 iterations. The default is 10.

This parameter is only applicable for spatial thinning applied to presence and background points in the input point features. Spatial thinning that is applied to background points generated from raster cells undergo spatial thinning by resampling the raster cells to the specified Minimum Nearest Distance parameter value without needing to iterate for an optimal solution.

Long
Relative Weight of Presence to Background
(Optional)

A value between 1 and 100 that specifies the relative information weight of presence points to background points. The default is 100.

A higher value indicates that presence points are the primary source of information; it is unknown whether background points represent presence or absence and background points receive lower weight in the model. A lower value indicates that background points also contribute valuable information that can be used in conjunction with presence points; there is greater confidence that background points represent absence and their information can be used in the model as absence locations.

Long
Presence Probability Transformation (Link Function)
(Optional)

Specifies the function that will convert the unbounded outputs of the model to a number between 0 and 1. This value can be interpreted as the probability of presence at the location. Each option converts the same continuous value to a different probability.

  • C-log-log The C-log-log link function will be used to convert the predictions to probabilities. This option is recommended when the presence and location of a phenomenon is unambiguous, for example, when modeling the presence of an immobile plant species. This is the default.
  • LogisticThe logistic link function will be used to convert predictions to probabilities. This option is recommended when the presence and location of a phenomenon is ambiguous, for example, when modeling the presence of a migratory animal species.
String
Presence Probability Cutoff
(Optional)

A cutoff value between 0.01 and 0.99 that establishes which probabilities correspond with presence in the resulting classification. The cutoff value is used to help evaluate the model's performance using training data and known presence points. Classification diagnostics are provided in geoprocessing messages and in the output trained features.

Double
Output Trained Features
(Optional)

An output feature class that will contain all features and explanatory variables used in the training of the model.

Feature Class
Output Trained Raster
(Optional)

The output raster with cell values indicating the probability of presence using the selected link function. The default cell size is the maximum of the cell sizes of the explanatory training rasters. An output trained raster can only be created if the input point features do not contain background points.

Raster Dataset
Output Response Curve Table
(Optional)

The output table that will contain diagnostics from the training model that indicate the effect of each explanatory variable on the probability of presence after accounting for the average effects of all other explanatory variables in the model.

The table will have up to two derived charts of partial dependence plots: one set of line charts for continuous variables and one set of bar charts for categorical variables.

Table
Output Sensitivity Table
(Optional)

The output table that will contain diagnostics of training model accuracy as the probability presence cutoff changes from 0 to 1.

Table
Input Prediction Features
(Optional)

The feature class representing locations where predictions will be made. The feature class must contain any provided explanatory variable fields that were used from the input point features.

When using spatial thinning, you can use the original input point features as input prediction features to receive a prediction for the entire dataset.

Feature Layer
Output Prediction Features
(Optional)

The output feature class that will contain the results of the prediction model applied to the input prediction features.

Feature Class
Output Prediction Raster
(Optional)

The output raster containing the prediction results at each cell of the matched explanatory rasters. The default cell size is the maximum of the cell sizes of the explanatory training rasters.

Raster Dataset
Match Explanatory Variables
(Optional)

The matching explanatory variable fields for the input point features and input prediction features.

Value Table
Match Distance Features
(Optional)

The matching distance features for the training and prediction.

Value Table
Match Explanatory Rasters
(Optional)

The matching rasters for the training and prediction.

Value Table
Allow Predictions Outside of Data Ranges
(Optional)

Specifies whether the prediction will allow extrapolation when explanatory variable values are out of the range of values used in training.

  • Checked—The prediction will allow extrapolation beyond the range of values used in training. This is the default.
  • Unchecked—The prediction will not allow extrapolation beyond the range of values used in training.
Boolean
Resampling Scheme
(Optional)

Specifies the method that will be used to perform cross validation of the prediction model. Cross validation excludes a portion of the data during training of the model and uses it to test the model's performance after it is trained.

  • NoneCross validation will not be performed. This is the default
  • Random The points will be randomly divided into groups, and each group will be left out once when performing cross validation. The number of groups is specified in the Number of Groups parameter.
String
Number of Groups
(Optional)

The number of groups that will be used in cross validation for the random resampling scheme. A field in the output trained features indicates the group that each point was assigned to. The default is 3. A minimum of 2 groups and a maximum of 10 groups are allowed.

Long

arcpy.stats.PresenceOnlyPrediction(input_point_features, {contains_background}, {presence_indicator_field}, {explanatory_variables}, {distance_features}, {explanatory_rasters}, {basis_expansion_functions}, {number_knots}, {study_area_type}, {study_area_polygon}, {spatial_thinning}, {thinning_distance_band}, {number_of_iterations}, {relative_weight}, {link_function}, {presence_probability_cutoff}, {output_trained_features}, {output_trained_raster}, {output_response_curve_table}, {output_sensitivity_table}, {features_to_predict}, {output_pred_features}, {output_pred_raster}, {explanatory_variable_matching}, {explanatory_distance_matching}, {explanatory_rasters_matching}, {allow_predictions_outside_of_data_ranges}, {resampling_scheme}, {number_of_groups})
NameExplanationData Type
input_point_features

The point features representing locations where presence of a phenomenon of interest is known to occur.

Feature Layer
contains_background
(Optional)

Specifies whether the input point features contain background points.

If the input points do not contain background points, the tool will generate background points using cells in the explanatory training rasters. The tool uses background points to model the characteristics of the landscape in unknown locations and compare them to landscape characteristics in known presence locations. Therefore, background points can be considered as the study area. Generally, these are locations where presence of a phenomenon of interest is unknown. However, if any information is known about the background points, the relative_weight parameter can be used to indicate this.

  • PRESENCE_AND_BACKGROUND_POINTSThe input point features include background points.
  • PRESENCE_ONLY_POINTSThe input point features do not include background points. This is the default.
Boolean
presence_indicator_field
(Optional)

The field from the input point features containing binary values that indicate each point as presence (1) or background (0). The field must be numeric (Short, Long, Float, or Double types).

Field
explanatory_variables
[[Variable, Categorical],...]
(Optional)

A list of fields representing the explanatory variables that will help predict the probability of presence. You can specify whether each variable is categorical or numeric. Specify the CATEGORICAL option for each variable that represents a class or category (such as land cover).

Value Table
distance_features
[distance_features,...]
(Optional)

A list of feature layers or feature classes that will be used to automatically create explanatory variables that represent the distance from the input point features to the nearest provided distance features. If the input explanatory training distance features are polygons or lines, the distance attributes are calculated as the distance between the closest segment and the point.

Feature Layer
explanatory_rasters
[[Variable, Categorical],...]
(Optional)

A list of rasters that will be used to automatically create explanatory training variables in the model whose values are extracted from rasters. For each feature (presence and background points) in the input point features, the value of the raster cell will be extracted at that exact location.

Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters.

You can specify whether each raster value is categorical or numeric. Specify the CATEGORICAL option for each raster that represents a class or category (such as land cover).

Value Table
basis_expansion_functions
[basis_expansion_functions,...]
(Optional)

Specifies the basis function that will be used to transform the provided explanatory variables for use in the model. If multiple basis functions are selected, the tool will produce multiple transformed variables and attempt to use them in the model.

  • LINEAR A linear transformation to the input variables will be applied. This is the default
  • PRODUCT A pairwise multiplication on continuous explanatory variables will be used, yielding interaction variables. This option is only available when multiple explanatory variables have been provided.
  • HINGE The continuous explanatory variable values will be converted into two segments, a static segment (composed of zeroes or ones) and a linear function segment (increasing or decreasing).
  • THRESHOLD The continuous explanatory variable values will be converted into a binary variable composed of zeroes and ones.
  • QUADRATIC The square of each continuous explanatory variable value will be returned.
String
number_knots
(Optional)

The number of knots that will be used by the hinge and threshold explanatory variable expansions. The value controls how many thresholds are created, which are used to create multiple explanatory variable expansions using each threshold. The value must be between 2 and 50. The default is 10.

Long
study_area_type
(Optional)

Specifies the type of study area that will be used to define where presence is possible when the input point features do not contain background points.

  • CONVEX_HULL The smallest convex polygon that encloses all the presence points in the input point features will be used. This is the default
  • RASTER_EXTENTThe extent of the intersection of the explanatory training rasters will be used.
  • STUDY_POLYGONA custom study area that is defined by a polygon feature class will be used.
String
study_area_polygon
(Optional)

A feature class containing the polygons that define a custom study area. The input point features must be located within the custom study area covered by the polygon features. A study area can be composed of multiple polygons.

Feature Layer
spatial_thinning
(Optional)

Specifies whether spatial thinning will be applied to presence and background points before training the model.

Spatial thinning helps to reduce sampling bias by removing points and ensuring that remaining points have a minimum nearest-neighbor distance, set in the thinning_distance_bandparameter. Spatial thinning is also applied to background points whether they are provided in input point features or generated by the tool.

  • THINNINGSpatial thinning will be applied.
  • NO_THINNINGSpatial thinning will not be applied. This is the default.
Boolean
thinning_distance_band
(Optional)

The minimum distance between any two presence points or any two background points when spatial thinning is applied.

Linear Unit
number_of_iterations
(Optional)

The number of runs that will be used to find the optimal spatial thinning solution, seeking to maintain as many presence and background points as possible while ensuring that no two presence or two background points are within the specified thinning_distance_band parameter value. The minimum possible is 1 iteration and the maximum possible is 50 iterations. The default is 10.

This parameter is only applicable for spatial thinning applied to presence and background points in the input point features. Spatial thinning that is applied to background points generated from raster cells undergo spatial thinning by resampling the raster cells to the specified thinning_distance_band parameter value, without needing to iterate for an optimal solution.

Long
relative_weight
(Optional)

A value between 1 and 100 that specifies the relative information weight of presence points to background points. The default is 100.

A higher value indicates that presence points are the primary source of information; it is unknown whether background points represent presence or absence and background points receive lower weight in the model. A lower value indicates that background points also contribute valuable information that can be used in conjunction with presence points; there is greater confidence that background points represent absence and their information can be used in the model as absence locations.

Long
link_function
(Optional)

Specifies the function that will convert the unbounded outputs of the model to a number between 0 and 1. This value can be interpreted as the probability of presence at the location. Each option converts the same continuous value to a different probability.

  • CLOGLOG The C-log-log link function will be used to convert the predictions to probabilities. This option is recommended when the presence and location of a phenomenon is unambiguous, for example, when modeling the presence of an immobile plant species. This is the default.
  • LOGISTICThe logistic link function will be used to convert predictions to probabilities. This option is recommended when the presence and location of a phenomenon is ambiguous, for example, when modeling the presence of a migratory animal species.
String
presence_probability_cutoff
(Optional)

A cutoff value between 0.01 and 0.99 that establishes which probabilities correspond with presence in the resulting classification. The cutoff value is used to help evaluate the model's performance using training data and known presence points. Classification diagnostics are provided in geoprocessing messages and in the output trained features.

Double
output_trained_features
(Optional)

An output feature class that will contain all features and explanatory variables used in the training of the model.

Feature Class
output_trained_raster
(Optional)

The output raster with cell values indicating the probability of presence using the selected link function. The default cell size is the maximum of the cell sizes of the explanatory training rasters. An output trained raster can only be created if the input point features do not contain background points.

Raster Dataset
output_response_curve_table
(Optional)

The output table that will contain diagnostics from the training model that indicate the effect of each explanatory variable on the probability of presence after accounting for the average effects of all other explanatory variables in the model.

The table will have up to two derived charts of partial dependence plots: one set of line charts for continuous variables and one set of bar charts for categorical variables.

Table
output_sensitivity_table
(Optional)

The output table that will contain diagnostics of training model accuracy as the probability presence cutoff changes from 0 to 1.

Table
features_to_predict
(Optional)

The feature class representing locations where predictions will be made. The feature class must contain any provided explanatory variable fields that were used from the input point features.

When using spatial thinning, you can use the original input point features as input prediction features to receive a prediction for the entire dataset.

Feature Layer
output_pred_features
(Optional)

The output feature class that will contain the results of the prediction model applied to the input prediction features.

Feature Class
output_pred_raster
(Optional)

The output raster containing the prediction results at each cell of the matched explanatory rasters. The default cell size is the maximum of the cell sizes of the explanatory training rasters.

Raster Dataset
explanatory_variable_matching
[[Prediction, Training],...]
(Optional)

The matching explanatory variable fields for the input point features and input prediction features.

Value Table
explanatory_distance_matching
[[Prediction, Training],...]
(Optional)

The matching distance features for the training and prediction.

Value Table
explanatory_rasters_matching
[[Prediction, Training],...]
(Optional)

The matching rasters for the training and prediction.

Value Table
allow_predictions_outside_of_data_ranges
(Optional)
  • ALLOWEDThe prediction will allow extrapolation beyond the range of values used in training. This is the default.
  • NOT_ALLOWEDThe prediction will not allow extrapolation beyond the range of values used in training.
Boolean
resampling_scheme
(Optional)

Specifies the method that will be used to perform cross validation of the prediction model. Cross validation excludes a portion of the data during training of the model and uses it to test the model's performance after it is trained.

  • NONECross validation will not be performed. This is the default
  • RANDOM The points will be randomly divided into groups, and each group will be left out once when performing cross validation. The number of groups is specified in the number_of_groups parameter.
String
number_of_groups
(Optional)

The number of groups that will be used in cross validation for the random resampling scheme. A field in the output trained features indicates the group that each point was assigned to. The default is 3. A minimum of 2 groups and a maximum of 10 groups are allowed.

Long

Code sample

PresenceOnlyPrediction example 1 (Python window)

The following Python script demonstrates how to use the PresenceOnlyPrediction function.

# Import system modules 
import arcpy 

# Call Presence-only Prediction (MaxEnt)
arcpy.stats.PresenceOnlyPrediction(
    input_point_features=r"C:\MyData.gdb\Presence_Points", 
    contains_background="PRESENCE_ONLY_POINTS",
    presence_indicator_field=None,
    explanatory_variables=None,
    distance_features=None,
    explanatory_rasters=[[r"C:\MyData.gdb\Elevation", "false"], 
                         [r"C:\MyData.gdb\Canopy", "false"], 
                         [r"C:\MyData.gdb\ClimacticWaterDeficit", "false"], 
                         [r"C:\MyData.gdb\LandCoverClassification", "true"], 
                         [r"C:\MyData.gdb\UpperSlope", "false"],
                         [r"C:\MyData.gdb\LowerSlope", "false"]], 
    basis_expansion_functions="LINEAR;QUADRATIC;PRODUCT;HINGE",
    number_knots=10,
    study_area_type="CONVEX_HULL",
    study_area_polygon=None,
    spatial_thinning="THINNING",
    thinning_distance_band="500 Meters", 
    number_of_iterations=10
    relative_weight=100
    link_function="CLOGLOG"
    presence_probability_cutoff=0.5
    output_trained_features=r"C:\MyData.gdb\Out_Trained_Features"
    output_trained_raster=r"C:\MyData.gdb\Out_Trained_Raster"
    output_response_curve_table=r"C:\MyData.gdb\Out_Response_Curve_Table"
    output_sensitivity_table=r"C:\MyData.gdb\Out_Sensitivity_Table"
    features_to_predict=r"C:\MyData.gdb\In_Prediction_Features"
    output_pred_features=r"C:\MyData.gdb\Out_Prediction_Features"
    output_pred_raster=r"C:\MyData.gdb\Out_Prediction_Raster",
    explanatory_variable_matching=None
    explanatory_distance_matching=None
    explanatory_rasters_matching=[[r"C:\MyData.gdb\Prediction_Elevation", "false"], 
                                  [r"C:\MyData.gdb\Prediction_Canopy", "false"], 
                                  [r"C:\MyData.gdb\Prediction_ClimacticWaterDeficit", "false"], 
                                  [r"C:\MyData.gdb\Prediction_LandCoverClassification", "true"], 
                                  [r"C:\MyData.gdb\Prediction_UpperSlope", "false"],
                                  [r"C:\MyData.gdb\Prediction_LowerSlope", "false"]], 
    allow_predictions_outside_of_data_ranges="ALLOWED"
    resampling_scheme="RANDOM"
    number_of_groups=3)
PresenceOnlyPrediction example 2 (stand-alone script)

The following Python script demonstrates how to use the PresenceOnlyPrediction function.

# This example is a simple run of the tool using presence-only points and 
# explanatory training rasters to train an initial model. No outputs are 
# specified, as the intent is to interrogate geoprocessing messages to gain 
# an initial sense of model performance. 

# Import system modules 
import arcpy 

try: 
    # Set the workspace and overwrite properties
    arcpy.env.workspace = r"C:\MyData.gdb" 
    arcpy.env.overwriteOutput = True 
    
    # Set the input point feature parameters
    in_point_features = "presence_observations"
    contains_background = "PRESENCE_ONLY_POINTS”
    
    # Set the explanatory Training variables, using only explanatory rasters
    # Note the categorical setting for the LandCoverClassification raster
    explanatory_rasters = [["Elevation", "false"], 
                           ["Canopy", "false"], 
                           ["ClimacticWaterDeficit", "false"], 
                           ["LandCoverClassification", "true"], 
                           ["UpperSlope", "false"],
                           ["LowerSlope", "false"]]
    
    # Set basis functions, adding quadratic to use the square of each variable
    basis_functions = "LINEAR;QUADRATIC"
    number_knots = 10

    # Set the study area
    study_area_type = "CONVEX_HULL"
    study_area_polygon = None
    
    # Set cross-validation options
    resampling_scheme = "RANDOM"
    number_of_groups = 3

    # Call the tool using the parameters defined above.
    arcpy.stats.PresenceOnlyPrediction(
        input_point_features=in_point_features,
        contains_background=contains_background,
        explanatory_rasters=explanatory_rasters,
        basis_expansion_functions=basis_functions,
        study_area_type=study_area_type,
        resampling_scheme=resampling_scheme,
        number_of_groups=number_of_groups)
PresenceOnlyPrediction example 3 (stand-alone script)

The following Python script demonstrates how to use the PresenceOnlyPrediction function.

# This example uses presence and background points and explanatory 
# variables from rasters, fields, and distance features to train a 
# model, using additional parameters to apply basis functions, use 
# spatial thinning, perform cross-validation, and receive diagnostic 
# training outputs. 

# Import system modules 
import arcpy 

try: 
    # Set the workspace and overwrite properties
    arcpy.env.workspace = r"C:\MyData.gdb" 
    arcpy.env.overwriteOutput = True 

    ### MODEL INPUTS ###
    
    # Set the input point feature parameters
    in_point_features = "presence_observations"
    contains_background = "PRESENCE_AND_BACKGROUND_POINTS
    presence_indicator_field = "Presence"
    
    # Set the explanatory Training variables
    explanatory_fields = [["Survey_Region", "true"], 
                          ["Temperature", "false"], 
                          ["Humidity", "false"]]
    explanatory_rasters = [["Elevation", "false"], 
                           ["Canopy", "false"], 
                           ["ClimacticWaterDeficit", "false"], 
                           ["LandCoverClassification", "true"], 
                           ["UpperSlope", "false"],
                           ["LowerSlope", "false"]]
    explanatory_dist_features = [["Streams", "false"], 
                                 ["Lakes", "false"], 
                                 ["Roads", "false"]]                           
    
    ### MODEL CONFIGURATION ###

    # Set basis functions
    basis_functions = "LINEAR;QUADRATIC;PRODUCT;HINGE"
    number_knots = 10

    # Set the study area
    study_area_type = "CONVEX_HULL"
    study_area_polygon = None

    # Set spatial thinning 
    spatial_thinning = "THINNING"
    min_nearest_neighbor_distance = "500 Meters"
    number_of_iterations = 10

    # Set the relative weight of presence to background and link function, using
    # background points as observed absence
    relative_weight = 1
    link_function = "LOGISTIC"

    # Set the presence probability cutoff
    cutoff = 0.3

    ### MODEL OUTPUTS AND VALIDATION ###

    # Set training outputs for model evaluation
    out_trained_features = "Out_Trained_Features"
    out_trained_raster = "Out_Trained_Raster"
    out_response_curve_table = "Out_Response_Curves"
    out_sensitivity_table = "Out_Sensitivity_Table"
    
    # Set cross-validation options
    resampling_scheme = "RANDOM"
    number_of_groups = 3

    # Call the tool using the parameters defined above.
    arcpy.stats.PresenceOnlyPrediction(
        input_point_features=in_point_features,
        contains_background=contains_background,
        explanatory_variables=explanatory_fields,
        explanatory_rasters=explanatory_rasters,
        distance_features=explanatory_dist_features,
        basis_expansion_functions=basis_functions,
        number_knots=number_knots,
        study_area_type=study_area_type,
        spatial_thinning=spatial_thinning,
        thinning_distance_band=min_nearest_neighbor_distance,
        number_of_iterations=number_of_iterations,
        relative_weight=relative_weight,
        link_function=link_function,
        presence_probability_cutoff=cutoff,
        output_trained_features=out_trained_features,
        output_trained_raster=out_trained_raster,
        output_response_curve_table=out_response_curve_table,
        output_sensitivity_table=out_sensitivity_table,
        resampling_scheme=resampling_scheme,
        number_of_groups=number_of_groups)

Environments

Special cases

Parallel Processing Factor

Parallel processing is only used when making predictions.

Random number generator

The Mersenne Twister random number generator is always used.

Licensing information

  • Basic: Limited
  • Standard: Limited
  • Advanced: Limited

Related topics