Available with Geostatistical Analyst license.
Summary
EBK Regression Prediction is a geostatistical interpolation method that uses Empirical Bayesian Kriging with explanatory variable rasters that are known to affect the value of the data that you are interpolating. This approach combines kriging with regression analysis to make predictions that are more accurate than either regression or kriging can achieve on their own.
Usage
This tool only supports prediction map outputs. To create standard error, quantile, or probability maps, output a geostatistical layer and convert it to a raster (or multiple rasters) using GA Layer To Rasters.
This kriging method can handle moderately nonstationary input data.
Only Standard Circular and Smooth Circular Search neighborhoods are allowed for this interpolation method.
If any of your Input explanatory variable rasters have many NoData cells, the Output geostatistical layer may fail to visualize in the map. This is not a problem, and the calculations have been performed correctly. To visualize the output, convert your geostatistical layer to a raster using GA Layer To Rasters or GA Layer To Grid. You can also choose to output a raster directly from this tool using the Output prediction raster parameter.
If the Input dependent variable features are in a geographic coordinate system, all distances will be calculated using chordal distances. For more information on chordal distances, see the Distance calculations for data in geographic coordinates section of the What is Empirical Bayesian Kriging help topic.
Syntax
EBKRegressionPrediction(in_features, dependent_field, in_explanatory_rasters, out_ga_layer, {out_raster}, {out_diagnostic_feature_class}, {measurement_error_field}, {min_cumulative_variance}, {in_subset_features}, {transformation_type}, {semivariogram_model_type}, {max_local_points}, {overlap_factor}, {number_simulations}, {search_neighborhood})
Parameter | Explanation | Data Type |
in_features | The input point features containing the field that will be interpolated. | Feature Layer |
dependent_field | The field of the Input dependent variable features containing the values of the dependent variable. This is the field that will be interpolated. | Field |
in_explanatory_rasters [[in_explanatory_raster,…],...] | Input rasters representing the explanatory variables that will be used to build the regression model. These rasters should represent variables that are known to influence the values of the dependent variable. For example, when interpolating temperature data, an elevation raster should be used as an explanatory variable because temperature is influenced by elevation. You can use up to 62 explanatory rasters. | Raster Layer; Mosaic Layer |
out_ga_layer | The output geostatistical layer displaying the result of the interpolation. | Geostatistical Layer |
out_raster (Optional) | The output raster displaying the result of the interpolation. The default cell size will be the maximum of the cell sizes of the Input explanatory variable rasters. To use a different cell size, use the cell size environmental setting. | Raster Dataset |
out_diagnostic_feature_class (Optional) | Output polygon feature class that shows the regions of each local model and contains fields with diagnostic information for the local models. For each subset, a polygon will be created that surrounds the points in the subset so you can easily identify which points were used in each subset. For example, if there are 10 local models, there will be ten polygons in this output. The feature class will contain the following fields:
| Feature Class |
measurement_error_field (Optional) | A field that specifies the measurement error for each point in the dependent variable features. For each point, the value of this field should correspond to one standard deviation of the measured value of the point. Use this field if the measurement error values are not the same at each point. A common source of nonconstant measurement error is when the data is measured with different devices. One device might be more precise than another, which means that it will have a smaller measurement error. For example, one thermometer rounds to the nearest degree and another thermometer rounds to the nearest tenth of a degree. The variability of measurements is often provided by the manufacturer of the measuring device, or it may be known from empirical practice. Leave this parameter empty if there are no measurement error values or the measurement error values are unknown. | Field |
min_cumulative_variance (Optional) | Defines the minimum cumulative percent of variance from the principal components of the explanatory variable rasters. Before building the regression model, the principal components of the explanatory variables are calculated, and these principal components are used as explanatory variables in the regression. Each principal component captures a certain percent of the variance of the explanatory variables, and this parameter controls the minimum percent of variance that must be captured by the principal components of each local model. For example, if a value of 75 is provided, the software will use the minimum number of principal components that are necessary to capture at least 75 percent of the variance of the explanatory variables. Principal components are all mutually uncorrelated with each other, so using principal components solves the problem of multicollinearity (explanatory variables that are correlated with each other). Most of the information contained in all explanatory variables can frequently be captured in just a few principal components. By discarding the least useful principal components, the model calculation becomes more stable and efficient without significant loss of accuracy. To calculate principal components, there must be variability in the explanatory variables, so if any of your Input explanatory variable rasters contain constant values within a subset, these constant rasters will not be used to compute principal components for that subset. If all explanatory variable rasters in a subset contain constant values, the Output diagnostic feature class will report that zero principal components were used and that they captured zero percent of the variability. | Double |
in_subset_features (Optional) | Polygon features defining where the local models will be calculated. The points inside each polygon will be used for the local models. This parameter is useful when you know that the values of the dependent variable changes according to known regions. For example, these polygons may represent administrative health districts where health policy changes in different districts. You can also use the Generate Subset Polygons tool to create subset polygons. The polygons created by this tool will be non-overlapping and compact. | Feature Layer |
transformation_type (Optional) | Type of transformation to be applied to the input data.
| String |
semivariogram_model_type (Optional) | The semivariogram model that will be used for the interpolation. Learn more about the semivariogram models in EBK Regression Prediction
| String |
max_local_points (Optional) | The input data will automatically be divided into subsets that do not have more than this number of points. If Subset polygon features are supplied, the value of this parameter will be ignored. | Long |
overlap_factor (Optional) | A factor representing the degree of overlap between local models (also called subsets). Each input point can fall into several subsets, and the overlap factor specifies the average number of subsets that each point will fall into. A high value of the overlap factor makes the output surface smoother, but it also increases processing time. Values must be between 1 and 5. If Subset polygon features are supplied, the value of this parameter will be ignored. | Double |
number_simulations (Optional) | The number of simulated semivariograms of each local model. Using more simulations will make the model calculations more stable, but the model will take longer to calculate. | Long |
search_neighborhood (Optional) | Defines which surrounding points will be used to control the output. Standard is the default. The following are Search Neighborhood classes: SearchNeighborhoodStandardCircular and SearchNeighborhoodSmoothCircular. Standard Circular
Smooth Circular
| Geostatistical Search Neighborhood |
Code sample
Interpolates a point feature class using explanatory variable rasters.
import arcpy
arcpy.EBKRegressionPrediction_ga("HousingSales_Points", "SalePrice",
["AREASQFEET", "NUMBATHROOMS", "NUMBEDROOMS","TOTALROOMS"],
"out_ga_layer", None, None, None, 95, None, "LOGEMPIRICAL",
"EXPONENTIAL", 100, 1, 100, None)
Interpolates a point feature class using explanatory variable rasters.
# Name: EBKRegressionPrediction_Example_02.py
# Description: Interpolates housing prices using EBK Regression Prediction
# Requirements: Geostatistical Analyst Extension
# Author: Esri
# Import system modules
import arcpy
# Set environment settings
arcpy.env.workspace = "C:/gaexamples/data.gdb"
# Set local variables
inDepFeatures = "HousingSales_Points"
inDepField = "SalePrice"
inExplanRasters = ["AREASQFEET", "NUMBATHROOMS", "NUMBEDROOMS","TOTALROOMS"]
outLayer = "outEBKRP_layer"
outRaster = "outEBKRP_raster"
outDiagFeatures = "outEBKRP_features"
inDepMeField = ""
minCumVariance = 97.5
outSubsetFeatures = ""
depTransform = ""
semiVariogram= "K_BESSEL"
maxLocalPoints = 50
overlapFactor = 1
numberSinulations = 200
radius = 100000
searchNeighbourhood = arcpy.SearchNeighborhoodStandardCircular(radius)
# Check out the ArcGIS Geostatistical Analyst extension license
arcpy.CheckOutExtension("GeoStats")
# Execute EBKRegressionPrediction
arcpy.EBKRegressionPrediction_ga(inDepFeatures, inDepField, inExplanRasters,
outLayer, outRaster, outDiagFeatures, inDepMeField, minCumVariance,
outSubsetFeatures, depTransform, semiVariogram, maxLocalPoints,
overlapFactor, numberSinulations, searchNeighbourhood)
Environments
Licensing information
- Basic: Requires Geostatistical Analyst
- Standard: Requires Geostatistical Analyst
- Advanced: Requires Geostatistical Analyst