How Presence-only Prediction (MaxEnt) works

The Presence-only Prediction (MaxEnt) tool uses a maximum entropy approach (MaxEnt) to estimate the probability of presence of a phenomenon. The tool uses known occurrence points and explanatory variables in the form of fields, rasters, or distance features to provide an estimate of presence across a study area. You can use the trained model to predict presence in different data if corresponding explanatory variables are known. Unlike other methods that either assume or explicitly require defined absence locations, Presence-only Prediction can be applied to prediction problems where only the presence of the event is known.

Presence-only Prediction (MaxEnt) overview diagram

Potential applications

While common examples relate to modeling species presence for ecological and conservation purposes, presence prediction problems span a variety of domains and applications:

  • A wildlife ecologist has collected field data for observed presence locations of a plant species. They need to estimate the species’ presence in a broader study area. Using the known presence locations and providing underlying factors as rasters, the ecologist can model the species' presence and create a map of predicted locations where the species is most likely to be found.
  • A researcher wants to understand the impact that climate change will have on the habitat of a sensitive species. They model presence using known occurrence locations and a series of explanatory variables, including various climate-related factors such as temperature and precipitation. Using projected climate change raster surfaces, the researcher then models estimated species distribution as the impacts of climate change are observed in the explanatory variables, receiving an estimate of the species’ new habitat following the projected effects of climate change.
  • A flood hazard analyst wants to estimate the probability of flooding following a hurricane landfall in a study area. As a supplement to high-resolution aerial imagery during the event, the analyst uses spatially distributed physical and socioeconomic characteristics coupled with crowdsourced data to model the presence of flooding. The analyst uses this model to identify where people are most likely to need immediate emergency assistance following the hurricane (Mobley, et. al, 2019).
  • An epidemiologist models the emergence of new infectious diseases. They use existing known pathogen spillover locations and ecological factors, such as temperature, precipitation, land cover, normalized difference vegetation index (NDVI), and duration of sunshine as predictors in a model. The model is used to create a preliminary risk surface that reflects the suitability for emergence of new infectious diseases (Du, et.al., 2014).

An overview of MaxEnt

A facet of spatial analysis problems focuses on modeling and estimating the occurrence of an event across geography. While common examples relate to modeling species presence for ecological and conservation purposes, presence prediction problems span a variety of domains and applications.

In some cases, presence data is recorded as a count of presence events in quadrat cells: each observation increments a count at its location and a variety of modeling approaches can be used to model this count, such as the Poisson method of the Generalized Linear Regression tool. In other cases, explicit presence and absence data is recorded at specified intervals in known locations, such as air quality monitoring stations recording unhealthy ozone levels. In these cases, modeling presence and absence is a binary classification problem that can benefit from a variety of methods, such as logistic regression.

In the case of ecological species modeling and several other domains, where the presence of an event is often recorded but the absence of the event rarely is, the lack of explicit absence data makes it challenging to model presence and absence using multiclass prediction methods.

MaxEnt does not assume nor require absence. MaxEnt is a general-purpose method for making predictions or inferences from incomplete information (Phillips et al. 2006). Given a set of known presence locations and given explanatory variables that describe the study area, MaxEnt contrasts the conditions between presence locations and the study area to estimate a presence probability surface.

At its core, MaxEnt works with three primary inputs:

  • The location of known presence points.
  • A study area.
  • Explanatory variables, or covariates, that describe the environmental factors that may relate to presence across the study area.

The study area defines a landscape where presence is possible, and is often represented by a set of unknown presence locations. These locations are also known as background points, and the MaxEnt method uses them to contrast the conditions between presence locations and the study area to estimate a presence probability surface.

The presence probability surface can take many forms, and MaxEnt selects the form that is most like the environment it was drawn from while reducing all other assumptions (or maximizing its entropy). “It agrees with everything that is known, but carefully avoids assuming anything that is not known." (Jaynes 1990).

In addition to its modeling approach, MaxEnt includes steps to perform input data preparation, explanatory variable transformation, output data preparation, and model validation that makes it a robust method for modeling presence-only phenomena.

Using the Presence-only Prediction (MaxEnt) tool

The Presence-only Prediction tool incorporates aspects of MaxEnt’s data preparation, modeling, variable selection, and prediction workflows. This section provides important information about each parameter to help you create more appropriate models.

Specifying known presence locations and background points

Presence-only prediction requires input data to represent known presence locations. The Input Point Features parameter is used to designate an existing dataset with these locations.

Input point features do not contain background points

If your input point features do not include background points, you can leave the Contains Background Points parameter unchecked.

Automatic creation of background points using raster cells

When the Contains Background Points parameter is unchecked, the tool uses the coarsest cell centroids of intersecting Explanatory Training Rasters parameter values in the study area to automatically create background points.

Automatic creation of background points using raster cells

You can use the Output Trained Features parameter to create an output that includes created background points by the tool.

Input point features contain background points

If your input point features contain background points, you can use the Contains Background Points and Presence Indicator Field parameters with field values that designate each location as presence (1) or background (0).

Using background points in the Input Point Features

The proportion of background points to presence points has a significant impact on the prediction results. Whether background points are provided in your input point features or the tool creates them for you, it is recommended that you test and compare classification diagnostics for your models using different amounts of background points. You can use the Spatial Thinning parameter to reduce the amount of background points in the analysis. See the Defining a study area and Reducing sample bias using spatial thinning sections for more details.

Note:

The tool requires at least two presence and two background points in the training data to create a model.

Specifying explanatory variables

In addition to known presence points and background points, the tool uses explanatory variables to create the prediction model. There are three ways to specify explanatory variables: using rasters, using fields in the input point features, and using distance features. For rasters and fields, explanatory variables can be continuous or categorical. For categorical explanatory variables, the tool requires a minimum of three data points per category.

Three types of explanatory variables: raster, distance feature, and fields

Using explanatory variables from rasters

You can use rasters to represent conditions in the landscape that may be helpful predictors of presence of an event. For example, a plant species may heavily depend on a particular elevation range; you can then use an elevation raster to associate elevation values with the plant’s presence locations in the model.

Check the Categorical box when rasters represent categorical data, such as land use cover classes.

Using explanatory variables from rasters is required when the input point features do not include background points, as each cell in the study area will be used to create a background point.

The cell sizes of Explanatory Training Rasters parameter values have a significant impact in processing time: the higher the resolution, the longer the processing time. For this reason, the tool has a limit of 100 million total cells in the area of interest. You can use the Resample tool to decrease the spatial resolution of the raster, resulting in fewer cells and faster processing time.

Using explanatory variables from fields

Use the Explanatory Training Variables parameter to specify fields whose attributes are used as explanatory variables in modeling presence of the phenomenon. This option is only available when Input Point Features include background points and the Contains Background Points parameter is checked.

Use the Categorical check box to designate if a field provided in the Explanatory Training Variables parameter is categorical.

Using explanatory variables from distance features

Use the Explanatory Training Distance Features parameter to designate features whose proximity to the input point features will be used as explanatory variables. This option is only available when input point features include background points and the Contains Background Points parameter is checked.

Distance features are used to automatically create explanatory variables by calculating a distance from the input point features to the nearest provided feature. If the Explanatory Training Distance Features parameter value is polygons or lines, the distance attributes are calculated as the distance between the closest segments of the pair of features. If the input distance features are polygons or lines, the distance attributes are calculated as the distance between the closest segments of the pair of features. Distances are calculated differently for polygons and lines; see How proximity tools calculate distance for details.

The Explanatory Training Distance Features parameter is not available when input point features do not include background points, due to performance considerations. However, you can use distance features when using presence-only points by using the Distance Accumulation tool to create distance rasters. Distance rasters contain cells with values describing the distance between the cell and the nearest feature in a specified data source. Once distance rasters are created, you can use them as inputs in the Explanatory Training Rasters parameter for presence-only input point features.

Performing data preparation on the model inputs

The tool includes data preparation steps for the provided input point features and explanatory variables. Data preparation includes variable transformation by using basis functions, specifying a study area, and reducing sampling bias by using spatial thinning.

Transforming explanatory variables using basis functions

The characteristics of the landscape are used as candidate explanatory variables in MaxEnt. In some cases, the conditions that promote presence may have complex relationships with the occurrence of the event. To help incorporate more intricate relationship forms into the model, the tool transforms (or expands) these candidate explanatory variables using basis functions.

You can select multiple basis functions in one run of the tool using the Explanatory Variable Expansions (Basis Functions) parameter, and all transformed versions of the explanatory variables are then used in the model. The best performing variables are selected by regularization, a method of variable selection that balances trade-offs between model fit and model complexity.

There are five types of basis functions, which provide different considerations when attempting to model complex phenomena.

  • Original (Linear)—Applies a linear basis function to the input variables and can be used when a transformation does not need to be applied. This is the default option.

    A sample use case is using the tool with the goal of modeling presence of a species that is known to require access to a water stream. Using the linear basis function for a variable corresponding to the distance to a stream allows the model to estimate the linear relationship between species presence and the distance to a water stream. The resulting coefficient can be used to interpret the marginal linear relationship before attempting more complex relationship forms.

    Use the Original (Linear) basis function when interpretability is a priority in the model. Since no transformation occurs, interpreting coefficients in the context of their effect on presence probability is easiest with the Linear method.

    Linear basis function

    Note:

    Categorical explanatory variables will only allow the use of the Original (Linear) basis function. When both continuous and categorical explanatory variables are applied, you can choose multiple basis functions but the categorical variables will only have the Original (Linear) basis function applied.

  • Squared (Quadratic)—Transforms each explanatory variable value by squaring it, resulting in a quadratic relationship between the explanatory variable and the presence response. In some domains, such as species distribution, species’ responses to environmental conditions are often nonlinear and unimodal (Austin 2002, 2007), and a quadratic form may best represent the relationships.

    In some cases, while a quadratic relationship may be inherent to an explanatory variable’s relationship with a response event, the sampling data in the input point features may only represent one facet of the parabolic relationship. For example, a tropical species may have a parabolic relationship with temperature: extremely cold temperatures result in low probability of presence, tropical temperatures result in high probability, and extremely hot temperatures result in low probability again. If the sampling data for this species does not include frigid temperatures, the relationship may be simply represented with a linear relationship (Merow et al. 2013).

    Quadratic basis function

  • Pairwise interaction (Product)—Performs a pairwise multiplication on explanatory variables. For example, if three variables, A, B, and C, are selected, this basis function will yield transformed variables corresponding to the results of A x B, A x C, and B x C. These transformed variables are commonly known as interaction terms and may be useful representations of complex relationships that depend on conditions among multiple variables. For example, an interaction term including both income and distance to a store may be a stronger predictor of customer patronage than if each variable was used on its own.

    While transformed explanatory variables from the Pairwise interaction (Product) method may be useful in modeling interaction between environmental conditions, model interpretability may be more difficult as interaction terms make it challenging to disentangle the effects of one explanatory variable as opposed to the other. This is most important when evaluating each explanatory variable’s coefficient and partial response plots.

    Product basis function

    Note:

    The Pairwise interaction (Product) option is only available when multiple continuous explanatory variables are chosen.

  • Discrete step (Threshold)—Converts the continuous explanatory variable into a binary explanatory variable by applying a stepwise function: values under a threshold are assigned a value of 0 and values above the threshold are assigned a value of 1.

    The Number of Knots parameter controls how many thresholds are created, which are then used to create multiple transformed binary explanatory variables using each threshold. Thresholds are applied between the minimum and maximum values in the explanatory variable to create equal length segments.

    A sample use case is running Presence-only Prediction with the goal of studying the impact of hot temperatures on occurrence (for example: above 32 degrees Celsius or not). Using the threshold basis function, the continuous temperature variable is separated into values of 1 (above 32 degrees) and 0 (below 32 degrees) and allows interpretation of each condition as it relates to presence.

    Threshold basis function

  • Smoothed step (Hinge)—Converts the continuous explanatory variable into two segments, a static segment (all zeros or ones) and a linear function (increasing or decreasing), separated by a threshold called a knot. This can be performed using a forward hinge (start with zeros between the minimum and the knot, and then apply an increasing linear function between the knot and the maximum) or a reverse hinge (start with a decreasing linear function between the minimum and the knot, and then apply all ones between the knot and the maximum).

    The Number of Knots parameter controls how many explanatory variable transformations are produced, resulting in (Number of Knots – 1) * 2 transformed explanatory variables. The reason for this formula is that the number of knots specifies the number of equal intervals that are used between the minimum and maximum values in the explanatory variable (subtracting one from the number of knots), and both forward hinge transformed variables and reverse hinge transformed variables are created (multiply by 2).

    A sample use case is running the tool with the goal of studying the impact of variation in hot temperatures (for example: keeping all values above 32 degrees Celsius and ignoring anything below). The hinge basis function would allow the variable to keep the variation above the knot (by applying a linear function for all values above 32 degrees), while reducing noise from all data below the knot (converting all values below 32 degrees into 0).

    Smoothed step (Hinge) and Discrete step (Threshold) options are mutually exclusive piecewise functions; when one is selected the other one cannot be selected. When one of these is selected, it is recommended that you test multiple runs of the model and adjust the value of the Number of Knots parameter to interpret how these thresholds help or hinder the model.

    Hinge basis function

    The tool uses multiple transformed versions of each explanatory variable when attempting to model complex conditions that promote the presence of a phenomenon. For example, a model that uses annual mean temperature to estimate the probability of presence of a desert turtle species may use different variable expansions to describe a complex relationship between temperature and desert turtle habitats.

    Partial response plot of annual mean temperature and species presence

    The above partial response plot displays the marginal response of the probability of presence as the annual mean temperature changes. Keeping all other factors the same, the probability of presence does the following:

    • Increases in a linear manner as the annual mean temperature increases between 0 and 15 degrees Celsius
    • Gradually diminishes between 15 and 21 degrees Celsius
    • Decreases rapidly for annual mean temperature values above 21 degrees Celsius

    The tool uses multiple basis functions to generate explanatory variable expansions that best represent this type of relationship, selecting the most helpful transformations through a process called regularization.

Regularization

MaxEnt can be prone to overfitting the training data. To reduce this problem, the method applies a form of regularization that penalizes large explanatory variable coefficients, forcing the model to focus on the most important explanatory variables (Phillips et al. 2006).

A way to conceptualize regularization is that a limited coefficient budget is shared by all explanatory variables provided to the model. As coefficients are reduced to satisfy the budget, several explanatory variables with low coefficients are reduced to zero and therefore removed from the model. The effect of this is that the model retains fewer explanatory variables, keeping only those that had high enough coefficients to survive even under a coefficient budget. With a reduced count of explanatory variables, the model is less likely to overfit and easier to interpret. Following the principle of parsimony, the simplest explanation of a phenomenon is usually the best (Phillips et al.,2006).

Regularization has the added effect of helping address multicollinearity: as related explanatory variables are added, the total coefficient value that a single variable would include is now shared among multiple correlated variables, resulting in lower coefficients for multicollinear variables. As regularization penalizes the remaining coefficient values, multicollinear variable coefficients are more likely to be reduced to zero and removed from the model.

Defining a study area

A study area must be specified when background points are not part of your input point features and defines where presence is possible. You can use three options of the Study Area parameter to define your study area:

  • Convex hull—Uses the convex hull of the input point features.

    Convex hull study area

  • Raster extent—The extent of the intersection of the rasters provided in the Explanatory Training Rasters parameter.

    Raster extent study area

  • Study area polygon—Uses a customized polygon feature class boundary, provided in the Study Area Polygon parameter.

    Custom polygon study area

The study area has a significant impact on the model’s outcome: the extent of the study area determines the raster cells from explanatory training rasters that will be used to create background points. Background points establish the environment conditions in which presence is possible and are contrasted with environment conditions where presence is observed. Prediction results will differ as the proportion of background points and presence points changes.

The study area establishes the extent of the training data for the model. The input point features in this scenario represent where presence was observed, and the study area represents where presence is possible (though not necessarily observed). As such, it is recommended that the study area for an analysis be guided by the survey design of the collected presence points. For example, if a presence data collection survey thoroughly inspected a 100 squared kilometer region, the bounding polygon delineating the region may be used as the study area.

In some cases, different study areas for a given set of input point features may be useful to explore different dynamics of a phenomenon (Elith et al. 2011, 51–52).

Reducing sample bias using spatial thinning

Sampling bias occurs as sampled areas represented in the input point features exhibit distinct spatial clusters. For example, data collection surveys are commonly taken closer to roads, paths, and other conditions that favor data collection. The effect of sampling bias is that data intended to portray presence of a phenomenon becomes conflated with data showing presence of suitable conditions for data collection. Sampling bias is inherent in most presence-only datasets and is mitigated only in the most stringent and structured survey designs.

Spatial thinning is a technique to reduce the effect of sampling bias on the model; it removes presence and background points from the training data such that there is a minimum specified distance between points. By reducing the amount of points within a specified distance of each other, areas that are spatially over-sampled are reduced in the training data for the model.

Spatial thinning of input points

To use spatial thinning, check the Apply Spatial Thinning parameter and provide values for the following two parameters:

  • Minimum Nearest Neighbor Distance—Determines how close two points may be from each other.
  • Number of Iterations for Thinning—Specifies how many times to attempt to remove points to find an appropriate solution. After this number of spatial thinning runs are attempted, the run with the most points left is used in the training of the model.

Spatial thinning occurs for both presence and background points, even if background points are generated by the tool in the case of using presence-only data. The spatial thinning applied to background points occurs separately from the spatial thinning applied to presence points, which may result in a presence point being closer to a background point than the minimum nearest neighbor distance.

Separate spatial thinning applied to presence and background points

When background points are created by the tool using raster cells, spatial thinning is applied by resampling the raster to the Minimum Nearest Neighbor Distance parameter value and using the resulting raster cell centroids as the spatially thinned background points.

Spatial thinning can be a useful technique to reduce issues arising from rasters with large cell counts, as it reduces the amount of background points. Regardless of the raster resolution, approximately the same number of background points will be left after thinning, according to the specified minimum nearest neighbor distance.

Spatial thinning is not applied to cases where the value of the minimum nearest neighbor distance is lower than the closest distance between any two points (whether from input point features or derived from raster cell centroids), as the data fulfills the spatial thinning criteria.

Configuring the model

The tool contains various parameters to configure and adjust the model. While use of every parameter is not required to run the tool, understanding how the model works and how each parameter is used can have a significant impact on the utility of the tool for your presence modeling workflows.

Setting the Relative Information Weight of Presence to Background

The tool uses the Relative Weight of Presence to Background parameter to designate how background points are considered by the model.

The default value of 100 indicates that presence points in the input point features are the primary source of presence information; occurrence at each background point is unknown and they can only be used to represent landscape characteristics where presence is possible, yet unknown. A value of 1 indicates that background points are equally meaningful to presence points; since they are not presence locations but are equally meaningful, they represent known absence locations. Background points, as absence locations, can then be used equally and in conjunction with presence locations to create a binary classification model that estimates both presence and absence.

This value has a strong effect on how the model operates and on the tool’s resulting predictions. When the Relative Weight of Presence to Background value is close to 100, the model applies the traditional form of the MaxEnt method. When the value is 1, the model treats each presence and background point equally and is similar to logistic regression.

It is recommended that you rely on domain expertise when deciding on appropriate values between 1 and 100 for the Relative Weight of Presence to Background parameter, since these can be considered a representation of the prevalence of the event in the study area.

Using link functions and presence probability thresholds to interpret outputs

An intermediary output of the model (not returned by the tool) is a relative occurrence rate (ROR) for each location. This intermediary output does not represent probability of occurrence; it corresponds to the relative suitability of each location for promoting presence across the study area. To translate these raw values into values that may be interpreted as presence probabilities and predictions of presence, the values are translated using a link function and cutoff value specified in the Presence Probability Transformation (Link Function) and Presence Probability Cutoff parameters, respectively.

While link functions are primarily used to convert the MaxEnt raw output into an interpretable probability of presence, they also have an association with how background points are considered (true background versus absence). Link functions do not directly affect the model’s underlying calculations, but the outputs of a link function have a direct impact on the results.

Two link functions are available in the Presence Probability Transformation (Link Function) parameter:

  • C-log-log—Treats background points as locations where the presence of the phenomenon is unknown. Uses the formula 1-exp(-exp(entropy + raw output)) to calculate presence probability at each location. This is the default.

  • Logistic—Treats background points as locations representing absence of the phenomenon. Because of this assumption, the Relative Weight of Presence to Background parameter should have values close to 1 when selecting this function. This link function uses the formula 1/(1+exp(-entropy- raw output)) to calculate presence probability at each location.

Presence probabilities from link functions are provided as values between 0 and 1. You can use the Presence Probability Cutoff parameter to specify a probability threshold that classifies a location as presence. By default, 0.5 is used, and a value greater than or equal to 0.5 is classified as presence. You may enter a value between 0.01 and 0.99 to set your own cutoff.

The classification results using the provided cutoff value are compared to known presence points in the input point features and diagnostics are provided in geoprocessing messages and in the output trained features.

Specifying model training outputs

The tool organizes outputs into training and prediction outputs. The main distinction is that training outputs correspond to the data that was used in the training and selection of the model, and prediction outputs correspond to data that the model has not yet been exposed to.

Output trained features

Use the Output Trained Features parameter to produce a feature class containing the points used in the training of the model. This output symbolizes each trained point using a comparison between the classification from the model and the observed classification.

The symbology and legend for the output trained features

The points included in the output trained features are not necessarily the same as the points in the Input Point Features, since background points will be generated when presence-only data is used and since spatial thinning may reduce the number of points used in the training of the model.

Three charts are included with the output trained features:

  1. Classification Result Percentages—Used to assess the portion of correct predictions using the observed classification in the training features.

    Classification Result Percentages chart

  2. Count of Presence and Background by Probability Ranges—Used to compare how the model's distribution of presence probability values compare with observed presence and background classifications.

    Count of Presence and Background by Probability Ranges chart

  3. Distribution of Probability of Presence by Classifications—Used to see the distribution of presence probability ranges by classification designation.

    Distribution of Probability of Presence by Classifications chart

Output trained raster

You can choose to use Output Trained Raster to create a raster that classifies the probability of presence at each cell in the extent of the input training data into four categories. This is only available when using input point features that do not include background points.

The extent of the output trained raster corresponds to the intersection of the explanatory training rasters in the study area. The default cell size is the maximum cell size of the raster inputs, which you can modify using the Cell Size environment.

Output trained raster symbology and legend

Response curve table and sensitivity table

You can use the Output Response Curve Table parameter to create a table with charts that visualize the marginal effect of each explanatory variable on predicting presence. This is also known as the partial dependence, or the partial response, of the phenomenon’s presence to each explanatory variable.

The Partial Response of Continuous Variables chart is composed of multiple charts; each chart visualizes the effect of changing values in each explanatory variable on presence probability, while keeping all other factors the same.

Partial Response of Continuous Variables chart

The Partial Response of Categorical Variables chart is a single bar chart displaying the marginal response of presence for each explanatory variable category.

Partial Response of Categorical Variables chart

The Output Sensitivity Table parameter provides a table that includes two charts:

  1. Omission Rates chart—Used to assess the portion of known presence points that were misclassified as non-presence by the model, using a range of presence probability cutoff values between zero and one.

    Omission Rates chart

  2. ROC Plot chart—Used to compare the portion of correctly classified known presence points, known as the sensitivity of the model, and the portion of background points that were classified as presence. Like the Omission Rates chart, this comparison is made across a range of presence probability cutoff values between zero and one.

    ROC Plot chart

Applying the model to predict

In addition to training models, the Presence-only Prediction tool is used to apply trained models to estimate presence at new locations using parameters found in the Prediction Options parameter category.

Configuring the tool to predict using new input prediction features

The Input Prediction Features parameter specifies locations where the tool will apply the trained model to estimate presence. The Output Prediction Features parameter indicates an output containing the results of the prediction applied to the Input Prediction Features parameter value.

For each explanatory variable used in the training of the model, you must specify a matched explanatory variable in the form of a field, a distance feature, or a raster, using the Match Explanatory Variables, Match Distance Features, and Match Explanatory Rasters parameters.

The ranges of values encountered in the prediction data may differ from ranges in values found in the training data. For example, an elevation raster for training the model may include values between 400 and 1,000 meters, but the corresponding elevation raster for the prediction locations has areas with elevations between 200 and 1,200 meters. While it is advised to maintain explanatory variable ranges in prediction locations within the ranges found in training data, the Allow Predictions Outside of Data Ranges parameter allows the model to extrapolate and provide estimates even for these locations. Use the tool’s geoprocessing messages to diagnose if any explanatory variable ranges exceeded the training data ranges.

You can also use the Output Prediction Raster parameter to create a raster containing the results of the model’s predictions applied to each cell of in the extent of the intersection of the rasters provided in the Match Explanatory Rasters parameter. Using this parameter provides a prediction surface across the extent of the environmental conditions available for the prediction locations.

Output prediction raster symbology and legend

The output prediction raster differs from the output training raster in that the training raster is generated only for the extent of the training data that was used in the model, and the prediction raster is generated for the extent of the input prediction features and the intersection of their matched explanatory rasters.

Validating the model

The tool provides options to help validate and evaluate a model. It is recommended that you use these options in conjunction with the Output Response Curve Table and Output Sensitivity Table parameters to evaluate the quality and utility of a model.

Using resampling and cross-validation

The Resampling Scheme and Number of Groups parameters in the Validation Options parameter category specify whether cross-validation of the model will be applied.

If the Random resampling scheme is chosen, the tool will subset the training data into the specified number of groups.

Resampling scheme using random groups

The tool then starts an iteration across each group: selecting the data for the current group to become the validation subset and selecting the collective data for all the remaining groups to become the training subset.

Validation and training subsets for the first group

The tool creates a model using the training subset for the group and predicts presence for each validation feature. The results of the prediction are then compared with the known presence and background designations in the validation subset.

The tool continues this process by iterating and allowing each group to take the role of the validation subset. This process is commonly known as K-fold cross-validation, where K corresponds to the number of groups.

Cross-validation across each group

For each group, the percent of correctly classified presence features and the percent of background features classified as potential presence are recorded. The diagnostics from each group help indicate how the model will perform when estimating presence in unknown locations. These diagnostics are included in the geoprocessing messages of the tool.

Cross-validation diagnostics in geoprocessing messages

The tool requires at least two presence and two background points in the training subset for each group to create a model for cross-validation. If the tool's randomly selected groups do not result in at least two presence and two background points across each group's training subsets, the tool will attempt to recreate the groups until this requirement is met or until 10 attempts are made. If the tool is still unable to meet this requirement for cross-validation after 10 attempts using the provided data, the tool will provide a warning stating that cross-validation was not possible.

Geoprocessing messages

An important output of the tool is the report included in the geoprocessing messages. The report includes important information about the trained model, including a table of model parameters, model comparison diagnostics, regression coefficients, a categorical summary (if any explanatory variables are categorical), a cross-validation summary (for the random resampling scheme), and explanatory variable range diagnostics for training and prediction data (if input prediction features were used).

Model Characteristics messages

The Regression Coefficients table includes each explanatory variable used in the training of the model, including their corresponding basis expansions, and the resulting coefficient. The names of the explanatory variables indicate the nature of the basis expansion; for example, a product variable composed of the product of an Elevation variable and a Climactic Water Deficit variable is named product(ELEVATION, CLIMACTICWATERDEFICIT) in the regression coefficients table.

Regression Coefficients messages

The Cross-Validation Summary table includes each cross-validation group’s ID, count of observations in its training validation subsets, percent of observed presence features predicted as presence, and percent of observed background features predicted as background.

Cross-validation diagnostics in geoprocessing messages

The Explanatory Variable Range Diagnostics table includes each provided explanatory variable (whether in the form of a field, a distance feature, or a raster), its minimum and maximum values found in the training data, and, if input prediction features are used, the minimum and maximum values found in the prediction data.

Explanatory Variable Range Diagnostics messages

Best practices and considerations

There are various best practices and considerations that should be made when using the tool.

Dealing with multicollinearity

While the tool’s regularization mitigates the impacts of multicollinearity in explanatory variables, it is still recommended that you identify and reduce the amount of correlated explanatory variables. Common tools to analyze multicollinearity include scatterplot matrix charts, Exploratory Regression, and Dimension Reduction.

Dealing with categorical data

The tool subsets the input training data into groups to perform cross-validation when the Random option is chose for the Resampling Scheme parameter. In this case, any categories with fewer than three data points in the resulting groups will prevent cross-validation and a warning is provided to notify you that the resampling method could not be applied. Running the tool with a lower value for the Number of Groups parameter reduces the chances of encountering this problem by making each group larger and allowing more chances for categories to be part of each group.

Using and evaluating spatial thinning

Use the Output Training Features parameter to explore the results of spatial thinning on the Input Point Features value.

To build a model using spatial thinning and apply the model to all input point features, provide the same features into the Input Point Features and Input Prediction Features parameters.

Setting a presence probability cutoff value

To decide an appropriate value for the Presence Probability Cutoff parameter, use the Omission Rates and ROC Plot charts

The Omission Rates chart visualizes how several Presence Probability Cutoff parameter values result in different rates of incorrectly classified presence points, otherwise known as the omission rate. While having an omission rate close to 0 is desired, it is also important not to lower the cutoff value simply for the sake of minimizing the omission rate, as this will also minimize how many background points are classified as potential presence (a useful result, in many scenarios).

Omission Rates chart

To evaluate how different cutoff values impact the rate of background points being classified as presence, use the ROC Plot chart. It includes a comparison between correctly classified presence points and background classified as potential presence across different presence probability cutoff values.

ROC Plot chart

The objective of an ROC Plot chart differs depending on the nature of background points. When background points represent absence and the Relative Weight of Presence to Background parameter value is 1, the chart may be used as a traditional ROC chart in which the sensitivity (correctly classified presence points) is maximized and the 1-specificity (background or absence classified as presence) is minimized. In this case, cutoff values close to the upper left corner of the chart are more appropriate. When background points represent unknown but possible occurrence, the ROC plot demonstrates how different cutoff rates impact how many potential background locations have been estimated to be presence.

It is recommended that you use both charts in conjunction. As you evaluate the omission rates chart for the default cutoff of 0.5, select the candidate cutoff point in the Omission Rates chart, and compare this entry in the ROC Plot chart.

Map with Omission Rates and ROC Plot charts

Using the output trained features charts for validation

The Classification Result Percentages chart displays a comparison of the observed and predicted classifications. You can use the chart to assess the model’s ability to predict performance on known presence points. For example, you may assess the model’s performance on predicting presence on known presence points by focusing on the portion of misclassified presence points. In use cases where presence prediction on background points is important, you can also use the chart to view and select the background points that are predicted to have presence.

Classification Result Percentages chart used to evaluate true and false positives

General model selection criteria

A workflow for model selection that may be applicable to your use cases is as follows:

  1. Evaluate the presence probability cutoff default of 0.5 and its effect on the model's ability to identify known presence locations as presence (sensitivity) by using the ROC plot's y-axis.

    Open the Omission Rates and the ROC Plot charts side by side. Select the default presence probability cutoff of 0.5 in the omission rates plot and note the resulting sensitivity on the ROC plot's y-axis.

    Omission rates plot and ROC plot signifying cutoff values corresponding sensitivity value

  2. Evaluate the presence probability cutoff default of 0.5 and its effect on the model's ability to identify known background locations as background (1-specificity) by using the ROC plot's x-axis.

    Open the Omission Rates and the ROC Plot charts side by side. Select the default presence probability cutoff of 0.5 in the omission rates plot and note the resulting (1 - specificity) value on the ROC plot's x-axis.

    When background points reflect locations with unknown presence (by using the default Relative Weight of Presence to Background parameter value of 100), this reflects the portion of background locations in the submitted training data that are estimated to correspond to potential presence.

    When background points correspond to known absence (by using the Relative Weight of Presence to Background value of 1), this reflects the portion of false positives (known absence locations that are mistakenly labeled as presence).

    Omission Rates and ROC Plot charts showing cutoff values

  3. Interpret the area under the curve (AUC) in the ROC plot, which is an evaluation diagnostic for how capable the model is at estimating known presence locations as presence and known background locations as background. The higher the area under the curve, the more appropriate the model for the presence prediction task.

    ROC plot showing area under the curve

    While the area under the curve is a helpful general evaluation diagnostic, it is important to decide whether the objective of the model is to reduce false positives (in other words, ensure that predicted presence is very likely to indeed be presence) or to reduce false negatives (in other words, ensure that predicted non-presence is very likely to indeed be absence). A balance of the two objectives is the ROC plot value closest to the upper left of the chart.

    ROC plot showing cutoff values that balance sensitivity and specificity

  4. When multiple models have similar validation diagnostics, select the simpler model. The model with fewer and simpler explanatory variables may be desirable for its interpretability and ease of explanation. Following the principle of parsimony, the simplest explanation of a phenomenon is usually the best (Phillips et al.,2006).

    Above all else, use domain expertise and a thorough understanding of the problem to guide model design, validation, and use.

Additional resources

For more information, see the following resources:

  • Aiello-Lammens, Matthew E., Robert A. Boria, Aleksandar Radosavljevic, Bruno Vilela, Robert P. Anderson. 2015. "spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models." Ecography 38: 541-545.

  • Du, Zhaohui , Zhiqiang Wang, Yunxia Liu, Hao Wang, Fuzhong Xue, Yanxun Liu. 2014. "Ecological niche modeling for predicting the potential risk areas of severe fever with thrombocytopenia syndrome." International Journal of Infectious Diseases, 26: 1-8. ISSN 1201-9712. https://doi.org/10.1016/j.ijid.2014.04.006
  • Elith, Jane, Steven J. Phillips, Trevor Hastie, Miroslav Dudík, Yung En Chee, and Colin J. Yates. 2011. "A statistical explanation of MaxEnt for ecologists." Diversity and Distributions, 17: 43-57. PDF

  • Fithian, William, Jane Elith, Trevor Hastie, David A. Keith. 2014. "Bias Correction in Species Distribution Models: Pooling Survey and Collection Data for Multiple Species." arXiv:1403.7274v2 [stat.AP].

  • Fithian, William, Trevor Hastie. 2013. "Finite-sample equivalence in statistical models for presence-only data." The Annals of Applied Statistics, 7, no. 4 (December), 1917-1939.

  • Merow, Cory, Matthew J. Smith, and John A. Silander, Jr. 2013. "A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter." Ecography, 36: 1058–1069. PDF

  • Mobley W, Sebastian A,Highfield W, Brody SD. 2019. "Estimating flood extentduring Hurricane Harvey using maximum entropy tobuild a hazard distribution model." J Flood RiskManagement. 2019;12 (Suppl. 1):e12549. https://doi.org/10.1111/jfr3.12549

  • Phillips, Steven J., Miroslav Dudik. 2008. "Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation." Ecography 31: 161-175.

  • Phillips, Steven J. , Robert P. Anderson, Robert E. Schapire. 2006. "Maximum entropy modeling of species geographic distributions." Ecological Modelling, 190: 231-259. PDF

  • Radosavljevic, Aleksandar, Robert P. Anderson. 2014. "Making better Maxent models of species distributions: complexity, overfitting and evaluation." Journal of Biogeography 41, 629-643.