Label | Explanation | Data Type |
Input geostatistical layers
| The geostatistical layers representing interpolation results. Each layer will be compared and ranked. | Geostatistical Layer |
Output cross validation table
| The output table containing cross validation statistics and ranks for each interpolation result. The final ranks of the interpolation results are stored in the RANK field. | Table |
Output geostatistical layer with highest rank
(Optional) | The output geostatistical layer of the interpolation result with highest rank. This interpolation result will have the value 1 in the RANK field of the output cross validation table. If there are ties for the interpolation result with highest rank or all results are excluded by exclusion criteria, the layer will not be created even if a value is provided. Warning messages will be returned by the tool if this occurs. | Geostatistical Layer |
Comparison method
(Optional) | Specifies the method that will be used to compare and rank the interpolation results.
| String |
Criterion
(Optional) | Specifies the criterion that will be used to rank the interpolation results.
| String |
Criteria hierarchy
(Optional) | The hierarchy of criteria that will be used for hierarchical sorting with tolerances. Provide multiple criteria in priority order with the first being most important. The interpolation results are ranked by the first criterion, and any ties are broken by the second criterion. Ties in the second criterion are broken by the third criterion, and so on. Cross validation statistics are continuous values and generally do not have exact ties, so tolerances are used to induce ties in the criteria. For each row, specify a criterion in the first column, a tolerance type (percent or absolute) in the second column, and a tolerance value in the third column. If no tolerance value is provided, no tolerance will be used; this is most useful for the final row so that there will be no ties for the interpolation result with highest rank. For each row (level of the hierarchy), the following criteria are available:
For example, you can specify a Root mean square error (Accuracy) value with a 5 percent tolerance in the first row and a Mean error (Bias) value with no tolerance in the second row. These options will first rank the interpolation results by lowest root mean square error (highest prediction accuracy), and all interpolation results whose root mean square error values are within 5 percent of the most accurate result will be considered ties by prediction accuracy. Among the tying results, the result with a mean error closest to zero (lowest bias) will receive the highest rank. | Value Table |
Weighted criteria
(Optional) | The multiple criteria with weights that will be used to rank interpolation results. For each row, provide a criterion and a weight. The interpolation results will be ranked independently by each of the criteria, and a weighted average of the ranks will be used to determine the final ranks of the interpolation results.
| Value Table |
Exclusion criteria
(Optional) | The criteria and associated values that will be used to exclude interpolation results from the comparison. Excluded results will not receive ranks and will have the value No in the Included field of the output cross validation table.
| Value Table |
Available with Geostatistical Analyst license.
Summary
Compares and ranks geostatistical layers using customizable criteria based on cross validation statistics.
Interpolation results can be ranked based on a single criterion (such as highest prediction accuracy or lowest bias), weighted average ranks of multiple criteria, or hierarchical sorting of multiple criteria (in which ties by each of the criteria are broken by subsequent criteria in the hierarchy). Exclusion criteria can also be used to exclude interpolation results from the comparison that do not meet minimal quality standards. The output is a table summarizing the cross validation statistics and ranks for each interpolation result. Optionally, you can output a geostatistical layer of the interpolation result with highest rank to be used in further workflows.
Illustration
Usage
Cross validation is a leave-one-out method for evaluating interpolation results. The method sequentially removes each point in the dataset and uses all remaining points to predict the value of the excluded point. The cross validation prediction is then compared to the true value of the hidden point, and the difference between the two is the cross validation error (the error can be positive or negative). The reasoning behind cross validation is that if the interpolation result is effective at predicting the values of the hidden points, it should also be effective at predicting unknown values at new locations, which is the goal of interpolation. All criteria used by this tool are based on summary statistics of the cross validation results.
While assessing interpolation results using cross validation summary statistics is a convenient and effective way to compare multiple interpolation results, it does not replace expert knowledge of the data and interactive investigation of the results. Reviewing charts and individual cross validation errors often reveals patterns in the results that are not obvious from the summary statistics. For example, there are often spatial patterns in the cross validation errors where some areas are underestimated and other areas are overestimated; patterns such as this may not be represented by summary statistics.
Learn more about using cross validation to assess interpolation results
The Comparison method parameter has three options for comparing the cross validation statistics of the interpolation results. Each option has advantages and disadvantages:
- Single criterion—A single criterion is used to compare and rank results. You can rank results by highest prediction accuracy, lowest bias, lowest worst-case error, highest standard error accuracy, or highest precision. The criterion is provided in the Criterion parameter.
- Advantages—This option is a simple and common method for comparing interpolation results that are known to be stable and consistent. It is also useful for choosing between results that are all very similar.
- Disadvantages—Interpolation results frequently perform well by some criteria but not others, for example, by having high prediction accuracy but also high bias. In this case, ranking by a single criterion will assign high ranks to results that are unstable or misleading. When ranking by a single criterion, it is recommended that you use various options of the Exclusion criteria parameter to ensure that unstable or misleading results are removed prior to the comparison.
- Hierarchical sorting with tolerances—Hierarchical sorting is used to compare and rank results. Multiple criteria are specified in priority order (highest priority first) in the Criteria hierarchy parameter. The interpolation results are ranked by the first criterion, and any ties are broken by the second criterion. Ties in the second criterion are broken by the third criterion, and so on. This process is modeled after Custom Sort and hierarchical sorting in spreadsheet software (sort by A, then by B, then by C, and so on). However, cross validation statistics are continuous values and generally do not have exact ties, so tolerances (percent or absolute) can be specified to create ties in each of the criteria.
- Advantages—This option uses multiple criteria, and it takes into account the relative differences of the cross validation statistics. For example, if one interpolation result is much better than the rest by the highest priority criterion, that interpolation result will receive the highest rank regardless of the subsequent criteria in the hierarchy.
- Disadvantages—The effectiveness of hierarchical sorting depends on the provided tolerance values. If tolerances are too small, some criteria may not be used because there are no ties to break. If tolerances are too large, there may be many ties in the rankings due to many results being within the tolerances of each other.
- Weighted average rank—The weighted average rank of multiple criteria are used to compare and rank results. Multiple criteria and associated weights are specified in the Weighted criteria parameter. The interpolation results are ranked independently by each of the criteria, and a weighted average of the ranks is used to determine the final ranks. Criteria with larger weights will have more influence on the final ranks, so they can be used to indicate preference for certain criteria over others.
- Advantages—This option uses multiple criteria, allows for preferences of some criteria over others, and always uses all criteria in the comparison.
- Disadvantages—The relative differences in the values of the cross validation statistics are ignored. For example, all root mean square error values may be within a very small tolerance of each other (indicating that all results have approximately equal prediction accuracy), but they will still be ranked 1 through N by prediction accuracy (for N interpolation results). However, the mean error values may vary by large amounts between the results (indicating the results have large differences in their biases), but they will also be ranked 1 through N by the bias criterion. The weighted average uses only the ranks of the criteria, so the relative differences in the cross validation statistics are ignored in the ranking.
- Single criterion—A single criterion is used to compare and rank results. You can rank results by highest prediction accuracy, lowest bias, lowest worst-case error, highest standard error accuracy, or highest precision. The criterion is provided in the Criterion parameter.
The input geostatistical layers can be created in the Geostatistical Wizard or by the tools in the Interpolation toolset.
The output is a table summarizing the cross validation statistics, descriptions of the interpolation results, and rankings and can be included in a presentation or report. Cross validation statistics will only be included in the table if they apply to at least one interpolation result. For example, if only inverse distance weighting and radial basis functions are used, the output table will not contain a field of average standard error values because these methods do not calculate standard errors. If a statistic applies to some interpolation results but not others, the value will be null for results to which the statistic does not apply. Additionally, if any of the input geostatistical layers were created using the Empirical Bayesian Kriging, EBK Regression Prediction, or Empirical Bayesian Kriging 3D tool, several cross validation statistics will be included in the table that are not used by any criteria in this tool; these are included for informational purposes and will have null values for all other interpolation methods. If weighted average rank is used, the ranks for all provided criteria and their weighted average will also be included in the table.
Optionally, you can use the Output geostatistical layer with highest rank parameter to create a copy of the input geostatistical layer with the highest rank. This is useful in ModelBuilder and Python scripting to automatically propagate the best interpolation result to subsequent tools.
While the tool is running, geoprocessing messages and progress bar messages display the current interpolation result being calculated. After all results are calculated and compared, the ranks are printed as geoprocessing messages. The ranks are also available in the output cross validation table.
The Exploratory Interpolation tool performs the same cross validation comparisons as this tool, but it generates various interpolation results automatically from input points and a field before comparing and ranking them.
The following table lists the available criteria, the cross validation statistics that measures them, and the formulas used to assign a score to each interpolation result (smaller scores are better). Ranks for the criteria are determined by sorting the scores of each interpolation result.
Note:
For three of the criteria, the score is equal to the cross validation statistic.
Criteria Cross validation statistic Score formula Highest prediction accuracy
Root mean square error
Results are ranked by smallest root mean square error.
Score = RootMeanSquareError
Lowest bias
Mean error
Results are ranked by mean error closest to zero.
Score = AbsoluteValue( MeanError )
Lowest worst-case error
Maximum absolute error
Results are ranked by smallest maximum absolute error.
Score = MaximumAbsoluteError
Highest standard error accuracy
Root mean square standardized error
Results are ranked by root mean square standardized error closest to one.
Score = AbsoluteValue( RMSStdError - 1 )
Highest precision
Average standard error
Results are ranked by smallest average standard error.
Score = AverageStandardError
If there are ties in any criteria, all tying results receive the same rank, equal to the highest of the ranks shared between them (where a higher rank means a smaller rank number). For example, ordered from best to worst, the root mean square error values (12, 14, 14, 15, 16, 16, 18) will receive ranks (1, 2, 2, 4, 5, 5, 7) by the prediction accuracy criterion. Ranks 3 and 6 are skipped due to the tying values.
Ties can occur at various stages of the comparisons. Ties are most common when using hierarchical sorting because all results within the tolerance are considered ties to each other, and all results outside the tolerance are also considered ties to each other. Ties are also common in weighted average rank when the interpolation results have varying ranks by different criteria, which can result in equal weighted averages of the ranks. While uncommon, ties can also occur in single criteria comparisons (for example, if all points have a constant value). Ties by single criteria will also affect weighted average rank if the criteria are used in the weighted average.
In hierarchical sorting, provide the tolerances relative to the score of the criterion rather than the cross validation statistic. For the criteria where the score is equal to the statistic (highest prediction accuracy, lowest worst-case error, and highest precision), appropriate tolerance values are usually clear. For example, if the lowest root mean square error value of the interpolation results is 200, then a 10 percent tolerance will include all results with root mean square error values less than or equal to 220: 200 + (10/100) x 200 = 220. Similarly, an absolute tolerance of 15 will include all results with root mean square error values less than or equal to 215: 200 + 15 = 215.
However, for the criteria where the score is not equal to the value of the statistic (lowest bias and highest standard error accuracy), appropriate tolerance values are less clear. For the mean error statistic, bias is scored by the absolute value of the mean error. This means, for example, that mean error values -4 and 6 have a relative difference of 50 percent because they are 50 percent different in absolute value: ABS(-4) + (50/100) x ABS(-4) = ABS(6). Similarly, their absolute difference is 2: ABS(-4) + 2 = ABS(6).
For the root mean square standardized error statistic, standard error accuracy is scored by the absolute difference between the root mean square standardized error value and the ideal value of 1. This means, for example, that root mean square standardized error values 0.2 and 2.4 have a 75 percent relative difference. To understand why, comparing the values 0.2 and 2.4, the latter is 1.75 times farther away (a 75 percent increase) from the ideal value of 1 than the former (absolute differences of 0.8 and 1.4, respectively): ABS(0.2 - 1) + (75/100) x ABS(0.2 - 1) = ABS(2.4 - 1). Similarly, their absolute difference is 0.6: ABS(0.2 - 1) + 0.6 = ABS(2.4 - 1).
Various criteria require all input geostatistical layers to support the standard error output type. If any geostatistical layers do not allow standard errors, various options of several parameters will become unavailable. These options are related to standard error accuracy, precision, the root mean square standardized error statistic, or the average standard error statistic. On the Geostatistical Layer contextual tab, in the Drawing group, the Display Type menu shows the supported output types of a geostatistical layer.
Learn more about which interpolation methods allow standard errors of predictions
The Minimum percent error reduction option of the Exclusion criteria parameter is particularly useful when you do not know the values or range of the points being interpolated (for example, in an automated environment). This option excludes interpolation results that are not sufficiently more accurate than a baseline nonspatial model that predicts the global average value at all locations in the map. This relative accuracy is measured by comparing the root mean square error value to the standard deviation of the values of the points being interpolated, and the root mean square error must be at least the specified percent less than the standard deviation to be included in the comparison. For example, a value of 10 means that the root mean square error must be at least 10 percent lower than the standard deviation to be included in the comparison and ranking.
Different disciplines have different standards for acceptable error reductions in interpolation results. In physical sciences with measurements that are densely sampled, errors often reduce by more than 90 percent. In social sciences, however, error reductions of only 10 to 20 percent are often significant to researchers.
For the comparisons to be meaningful and fair, it is recommended that you use the same points and field when creating each input geostatistical layer. If any of the layers do not share the same data source, the tool will return a warning message.
Parameters
arcpy.ga.CompareGeostatisticalLayers(in_geostat_layers, out_cv_table, {out_geostat_layer}, {comparison_method}, {criterion}, {criteria_hierarchy}, {weighted_criteria}, {exclusion_criteria})
Name | Explanation | Data Type |
in_geostat_layers [in_geostat_layer1,in_geostat_layer2,...] | The geostatistical layers representing interpolation results. Each layer will be compared and ranked. | Geostatistical Layer |
out_cv_table | The output table containing cross validation statistics and ranks for each interpolation result. The final ranks of the interpolation results are stored in the RANK field. | Table |
out_geostat_layer (Optional) | The output geostatistical layer of the interpolation result with highest rank. This interpolation result will have the value 1 in the RANK field of the output cross validation table. If there are ties for the interpolation result with highest rank or all results are excluded by exclusion criteria, the layer will not be created even if a value is provided. Warning messages will be returned by the tool if this occurs. | Geostatistical Layer |
comparison_method (Optional) | Specifies the method that will be used to compare and rank the interpolation results.
| String |
criterion (Optional) | Specifies the criterion that will be used to rank the interpolation results.
| String |
criteria_hierarchy [[criteria1, tol_type1, tol_val1], [criteria2, tol_type2, tol_val2],...] (Optional) | The hierarchy of criteria that will be used for hierarchical sorting with tolerances. Provide multiple criteria in priority order with the first being most important. The interpolation results are ranked by the first criterion, and any ties are broken by the second criterion. Ties in the second criterion are broken by the third criterion, and so on. Cross validation statistics are continuous values and generally do not have exact ties, so tolerances are used to induce ties in the criteria. For each row, specify a criterion in the first column, a tolerance type (percent or absolute) in the second column, and a tolerance value in the third column. If no tolerance value is provided, no tolerance will be used; this is most useful for the final row so that there will be no ties for the interpolation result with highest rank. For each row (level of the hierarchy), the following criteria are available:
For example, you can specify an ACCURACY value with a 5 percent tolerance in the first row and a BIAS value with no tolerance in the second row. These options will first rank the interpolation results by lowest root mean square error (highest prediction accuracy), and all interpolation results whose root mean square error values are within 5 percent of the most accurate result will be considered ties by prediction accuracy. Among the tying results, the result with a mean error closest to zero (lowest bias) will receive the highest rank. | Value Table |
weighted_criteria [[criteria1, weight1], [criteria2, weight2],...] (Optional) | The multiple criteria with weights that will be used to rank interpolation results. For each row, provide a criterion and a weight. The interpolation results will be ranked independently by each of the criteria, and a weighted average of the ranks will be used to determine the final ranks of the interpolation results.
| Value Table |
exclusion_criteria [[criteria1, value1], [criteria2, value2],...] (Optional) | The criteria and associated values that will be used to exclude interpolation results from the comparison. Excluded results will not receive ranks and will have the value No in the Included field of the output cross validation table.
| Value Table |
Code sample
The following Python script demonstrates how to use the CompareGeostatisticalLayers function.
# Compare Simple kriging, EBK, and Kernel Interpolation results
# Rank results by highest prediction accuracy
# Exclude results with error reductions under 25%
myGALayers = ["Simple Kriging", "EBK", "Kernel Interpolation"]
outTable = "outCVtable"
outGALayer = "Result With Highest Rank"
compMethod = "SINGLE"
criterion = "ACCURACY"
exclCrit = [["MIN_PERC_ERROR", 25]]
arcpy.ga.CompareGeostatisticalLayers(myGALayers, outTable, outGALayer,
compMethod, criterion, None, None, exclCrit)
The following Python script demonstrates how to use the CompareGeostatisticalLayers function.
# Compare various interpolation results
# Rank results by highest weighted average rank
# Rank same results by hierarchical sorting
# Import system modules
import arcpy
# Check out the ArcGIS Geostatistical Analyst extension license
arcpy.CheckOutExtension("GeoStats")
# Allow overwriting output
arcpy.env.overwriteOutput = True
### Set shared parameters
# Set input and output locations
directory = "C:/data/"
outgdb = directory + "out.gdb/"
arcpy.env.workspace = directory
# Three interpolation results to compare
myGALayers = ["EBK", "Universal Kriging", "Kernel Interpolation"]
# Exclude results with error reductions under 25%
exclCrit = [["MIN_PERC_ERROR", 25]]
# Output geostatistical layer with highest rank
outGALayer = "Result With Highest Rank"
### Set weighted average rank parameters
# Output table of ranks and cross validation results
outTable = outgdb + "outWeightedAverageTable"
# Use weighted average rank
compMethod = "AVERAGE_RANK"
# Use all criteria with highest weight to prediction accuracy
weightedCrit = [
["ACCURACY", 3],
["BIAS", 1],
["WORST_CASE", 1],
["STANDARD_ERROR", 1],
["PRECISION", 1]
]
# Compare using weighted average rank
arcpy.ga.CompareGeostatisticalLayers(myGALayers, outTable, outGALayer,
compMethod, None, None, weightedCrit, exclCrit)
### Set hierarchical sorting parameters
# Output table of ranks and cross validation results
outTable = outgdb + "outHierSortTable"
# Use hierarchical sorting with tolerances
compMethod = "SORTING"
# Compare using highest prediction accuracy with a 10% tolerance
# Break ties by lowest bias
hierCrit = [
["ACCURACY", "PERCENT", 10],
["BIAS", "PERCENT", None]
]
# Compare using hierarchical sorting with tolerances
arcpy.ga.CompareGeostatisticalLayers(myGALayers, outTable, outGALayer,
compMethod, None, hierCrit, None, exclCrit)
Environments
Licensing information
- Basic: Requires Geostatistical Analyst
- Standard: Requires Geostatistical Analyst
- Advanced: Requires Geostatistical Analyst