Cross Validation (Geostatistical Analyst)

Available with Geostatistical Analyst license.

Summary

Removes one data location and predicts the associated data using the data at the rest of the locations. The primary use for this tool is to compare the predicted value to the observed value in order to obtain useful information about some of your model parameters.

Learn more about performing cross validation and validation

Usage

  • When using this tool in Python, the result object contains both a feature class and a CrossValidationResult, which has the following properties:

    • Count—Total number of samples used.
    • Mean Error—The averaged difference between the measured and the predicted values.
      Mean error
    • Root Mean Square Error—Indicates how closely your model predicts the measured values. The smaller this error, the better.
      Root mean square error
    • Average Standard Error—The average of the prediction standard errors.
      Average standard error
    • Mean Standardized Error—The average of the standardized errors. This value should be close to 0.
      Mean standardized error
    • Root Mean Square Standardized Error—This should be close to 1 if the prediction standard errors are valid. If the root-mean-squared standardized error is greater than 1, you are underestimating the variability in your predictions. If the root-mean-square-standardized error is less than 1, you are overestimating the variability in your predictions.
      Root mean square standardized error
    • Percent in 90% Interval—The percentage of points that are in a 90 percent cross-validation confidence interval. This value should be close to 90.
    • Percent in 95% Interval—The percentage of points that are in a 95 percent cross-validation confidence interval. This value should be close to 95.
    • Average CRPS—The average Continuous Ranked Probability Score (CRPS) of all points. The CRPS is a diagnostic that measures the deviation from the predictive cumulative distribution function to each observed data value. This value should be as small as possible. This diagnostic has advantages over other cross-validation diagnostics because it compares the data to a full distribution rather than to single-point predictions. The calculation of this statistic involves simulations so it cannot be written in a simple formula.

    Only the Mean and Root Mean Square Error results are available for IDW, Global Polynomial Interpolation, Radial Basis Functions, Diffusion Interpolation With Barriers, and Kernel Interpolation With Barriers.

    Percent in 90% Interval, Percent in 95% Interval, and Average CRPS are only available for Empirical Bayesian Kriging and EBK Regression Prediction models.

  • The fields in the optional output feature class are described in the GA Layer To Points tool.

Parameters

LabelExplanationData Type
Input geostatistical layer

The geostatistical layer to be analyzed.

Geostatistical Layer
Output point feature class
(Optional)

Stores the cross-validation statistics at each location in the geostatistical layer.

Feature Class

Derived Output

LabelExplanationData Type
Count

Total number of samples used.

Long
Mean error

Mean Error—The averaged difference between the measured and the predicted values.

Double
Root mean square

Root Mean Square Error—Indicates how closely your model predicts the measured values.

Double
Average standard

Average Standard Error—The average of the prediction standard errors.

Double
Mean standardized

Mean Standardized Error—The average of the standardized errors.

Double
Root mean square standardized

Root Mean Square Standardized Error—This should be close to 1 if the prediction standard errors are valid.

Double
Percent in 90% Interval

Percent in 90% Interval—The percentage of points that are in a 90 percent cross-validation confidence interval. This value should be close to 90.

Double
Percent in 95% Interval

Percent in 95% Interval—The percentage of points that are in a 95 percent cross-validation confidence interval. This value should be close to 95.

Double
Average CRPS

Average CRPS—The average Continuous Ranked Probability Score (CRPS) of all points. The CRPS is a diagnostic that measures the deviation from the predictive cumulative distribution function to each observed data value. This value should be as small as possible. This diagnostic has advantages over other cross-validation diagnostics because it compares the data to a full distribution rather than to single-point predictions. The calculation of this statistic involves simulations so it cannot be written in a simple formula.

Double

arcpy.ga.CrossValidation(in_geostat_layer, {out_point_feature_class})
NameExplanationData Type
in_geostat_layer

The geostatistical layer to be analyzed.

Geostatistical Layer
out_point_feature_class
(Optional)

Stores the cross-validation statistics at each location in the geostatistical layer.

Feature Class

Derived Output

NameExplanationData Type
count

Total number of samples used.

Long
mean_error

Mean Error—The averaged difference between the measured and the predicted values.

Double
root_mean_square

Root Mean Square Error—Indicates how closely your model predicts the measured values.

Double
average_standard

Average Standard Error—The average of the prediction standard errors.

Double
mean_standardized

Mean Standardized Error—The average of the standardized errors.

Double
root_mean_square_standardized

Root Mean Square Standardized Error—This should be close to 1 if the prediction standard errors are valid.

Double
percent_in_90_interval

Percent in 90% Interval—The percentage of points that are in a 90 percent cross-validation confidence interval. This value should be close to 90.

Double
percent_in_95_interval

Percent in 95% Interval—The percentage of points that are in a 95 percent cross-validation confidence interval. This value should be close to 95.

Double
average_crps

Average CRPS—The average Continuous Ranked Probability Score (CRPS) of all points. The CRPS is a diagnostic that measures the deviation from the predictive cumulative distribution function to each observed data value. This value should be as small as possible. This diagnostic has advantages over other cross-validation diagnostics because it compares the data to a full distribution rather than to single-point predictions. The calculation of this statistic involves simulations so it cannot be written in a simple formula.

Double

Code sample

CrossValidation example 1 (Python window)

Perform cross validation on an input geostatistical layer.

import arcpy
arcpy.env.workspace = "C:/gapyexamples/data"
cvResult = arcpy.CrossValidation_ga("C:/gapyexamples/data/kriging.lyr")
print("Root Mean Square error = " + str(cvResult.rootMeanSquare))
CrossValidation example 2 (stand-alone script)

Perform cross validation on an input geostatistical layer.

# Name: CrossValidation_Example_02.py
# Description: Perform cross validation on an input geostatistical layer.
# Requirements: Geostatistical Analyst Extension

# Import system modules
import arcpy

# Set environment settings
arcpy.env.workspace = "C:/gapyexamples/data"

# Set local variables
inLayer = "C:/gapyexamples/data/kriging.lyr"

# Execute CrossValidation
cvResult = arcpy.CrossValidation_ga(inLayer)
print("Root Mean Square error = " + str(cvResult.rootMeanSquare))

Licensing information

  • Basic: Requires Geostatistical Analyst
  • Standard: Requires Geostatistical Analyst
  • Advanced: Requires Geostatistical Analyst

Related topics