Using areal interpolation to perform polygon-to-polygon predictions

Available with Geostatistical Analyst license.

Complexity: BeginnerData Requirement: Use your own dataGoal: The goal of this exercise is to show how to use areal interpolation to perform polygon-to-polygon predictions. This exercise will also show how to predict values for polygons with missing data.

Introduction

This exercise demonstrates how to use areal interpolation to take data collected at one set of polygons (the source polygons) and predict the data values for a new set of polygons (the target polygons). The data in this exercise involves obesity rates among fifth grade students in the Los Angeles area (for privacy reasons, the original data has been altered). For each school zone, every fifth grade student was sampled, and the number of obese and nonobese students was recorded (note that data is unavailable for 14 of the school zones). The goal of this exercise is to take the obesity rates collected at the school zone level and predict the obesity rates for the census block groups within the school zones. Additionally, you will predict the obesity rates in the 14 school zones that have missing data.

The graphic below shows the Los Angeles school zones symbolized by fifth grade obesity rates. Low rates are colored blue (indicating rates of under 22.5 percent), and high obesity rates are red (indicating rates greater than 44.7 percent), with green, yellow, and orange in the middle. The black polygons are the zones with missing data. On the right are the block groups in the Los Angeles area where you want to predict fifth grade obesity rates.

Los Angeles school zones (left) and block groups (right)
Los Angeles school zones (left) and block groups (right)

Areal interpolation is a two-step process. First, a prediction surface is created from the source polygons; then that prediction surface is averaged within the target polygons.

Create a prediction surface for obesity rates

The first step in the areal interpolation workflow is to create a prediction surface from the obesity rates collected in the school zones. Since areal interpolation requires the model to be fit interactively, the prediction surface must be created in the Geostatistical Wizard.

Open the Geostatistical Wizard

  1. Start ArcGIS Pro and verify that you have a valid license for the ArcGIS Geostatistical Analyst extension.
  2. Click the Analysis tab on the ribbon, and click the Geostatistical Wizard icon.

Choose the method and identify the input data

  1. Under Geostatistical methods, click Areal Interpolation.
  2. Next to Type, choose Rate since you are interested in predicting obesity rates (rather than population counts for example).
  3. Next to Source Dataset, choose child_obesity to specify the polygon feature class containing the school zone obesity rates.
  4. Next to Count Field, choose 5th_obese.

    This field contains the number of obese fifth graders.

  5. Next to Population Field, choose 5th_total.

    This field contains the total number of fifth graders.

  6. Leave the defaults for the second dataset because you will not be using a secondary variable in this exercise.

    Pane 1 of the Geostatistical Wizard for areal interpolation
    Pane 1 of the Geostatistical Wizard for areal interpolation

  7. Click Next to begin creating the areal interpolation model.

Adjust the variography

You are now viewing the variography page. In the entire areal interpolation workflow, this step takes the most time and is the most critical for obtaining accurate predictions. The goal is to change the parameters on the right so that most empirical covariances (blue crosses) fall within the confidence intervals (red bars). If the model is specified correctly, you expect about 90 percent of the empirical covariances to fall within the confidence intervals.

You can see in the graphic below that the default model is not adequate; most of the empirical covariances do not fall within the confidence intervals. You need to do some work to make the model fit.

Pane 2 of the Geostatistical Wizard
Pane 2 of the Geostatistical Wizard
  1. You can see that the empirical covariances become negative at a distance of approximately 12,000 meters. This indicates that you should start by changing Lag Size to 1000 and keep Number of Lags at 12. (The product of these two parameters should approximately equal the distance where the empirical covariances first become negative.)

    The covariance curve below looks better, but the model can still be improved. The large empirical covariance on the y-axis is troubling.

    Pane 2 of the Geostatistical Wizard
    Pane 2 of the Geostatistical Wizard

  2. To try to improve this result, under Model, change the model type to K-Bessel.

    This model appears to fit the data very well; most of the empirical covariances fall within the confidence intervals, and a few fall just outside the intervals. However, before you can be confident that this is a good model, you need to check the cross-validation results.

    Pane 2 of the Geostatistical Wizard
    Pane 2 of the Geostatistical Wizard

  3. Click Next to view the Searching Neighborhood pane.

Modify the search neighborhood

The Searching Neighborhood pane displays a preview surface for the fifth grade obesity rates. By clicking a point on the preview surface, you can get the predicted obesity rate at that point. For example, in the graphic below, the location of the crosshair has a predicted value of 0.333177. This means that the model predicts that any fifth grade student at that location has about a 33 percent chance of being obese.

Pane 3 of the Geostatistical Wizard
Pane 3 of the Geostatistical Wizard
  1. Click Next to view the Cross validation pane.

Examine the cross-validation statistics

  1. Click the Normal QQ Plot tab.

    Pane 4 of the Geostatistical Wizard
    Pane 4 of the Geostatistical Wizard

    You can see that the Root-Mean-Square Standardized value is 1.1475. This is good because, ideally, this number should be close to 1. The normal QQ plot also reveals that the standard errors are close to being normally distributed because the points fall near the one-to-one line. This is the model that you will use to make your prediction.

  2. Click Finish, and click OK on the Method Report dialog box.

    The prediction surface for the obesity rate is displayed in the map. Depending on your analysis, this obesity rate surface may be all you need. In that case, the workflow can end here. However, you want to predict the obesity rates of fifth grade students at the block group level, so you will continue to the second half of this areal interpolation workflow.

    Obesity rate surface for Los Angeles fifth grade students
    Obesity rate surface for Los Angeles fifth grade students

    Note:

    The layer in the graphic above has been clipped to the area of interest, and the layer has been renamed 5th grade obesity.

Predict obesity rates in census block groups

Once a proper prediction surface has been created with areal interpolation, the surface can be used to predict the fifth grade obesity rates in Los Angeles block groups using the Areal Interpolation Layer To Polygons geoprocessing tool.

  1. Right-click the 5th grade obesity layer, expand the Export Layer menu, and choose To Polygons to open the Areal Interpolation Layer To Polygons tool dialog box.

    Predict to polygons

    Note:

    The Areal Interpolation Layer To Polygons tool can also be accessed from the Working With Geostatistical Layers toolset in the Geostatistical Analyst Tools toolbox.

  2. Verify that Input areal interpolation geostatistical layer is set to 5th grade obesity.
  3. Click the Input polygon features drop-down arrow and click LA_blocks to specify the polygon feature class of the Los Angeles block groups.
  4. Click the Output polygon feature class browse button, browse to the location where you want the output to be saved, and type LA_blocks_obesity as the name for the output polygon feature class.
  5. Verify that Append all fields from input features is checked because you want to carry over all the fields from the LA_blocks feature class.

    Areal Interpolation Layer To Polygons geoprocessing tool dialog box
    Areal Interpolation Layer To Polygons geoprocessing tool dialog box

  6. Click OK to run the tool.

    The polygon feature class containing the predictions for fifth grade obesity rates in Los Angeles block groups is added to the map. The field with the predicted obesity rates is labeled Predicted. In addition, the standard errors of the prediction are stored in a field labeled StdError.

    Predicted obesity rates for fifth grade students in Los Angeles block groups
    Predicted obesity rates for fifth grade students in Los Angeles block groups

    Note:

    The symbology in the graphic above has been imported from the obesity rates of the school zones to get a fair visual comparison.

  7. You can also symbolize the block groups by the standard error of the predicted obesity rates. The standard errors are stored in the StdError field of LA_blocks_obesity. This allows you to create margins of error for the predicted obesity rates.

    Low standard errors are symbolized in lighter shades of red. Larger block groups tend to have smaller standard errors because larger areas have more information associated with them, so there is less uncertainty in the predictions.

    Standard errors for obesity rates in Los Angeles block groups
    Standard errors for obesity rates in Los Angeles block groups

This completes the workflow for predicting fifth grade obesity rates in Los Angeles block groups from rates sampled in school zones.

Predict obesity rates in school zones with missing data

To predict the obesity rates in the school zones with missing data, you will use the Areal Interpolation Layer To Polygons geoprocessing tool again.

  1. Right-click the 5th grade obesity layer, expand the Export Layer menu, and choose To Polygons to open the Areal Interpolation Layer To Polygons tool dialog box.

    Predict to polygons

  2. Verify that Input areal interpolation geostatistical layer is set to 5th grade obesity.
  3. Click the Input polygon features drop-down arrow and click Missing_zones to specify the polygon feature class of the school zones with missing data.
  4. Click the Output polygon feature class browse button, browse to the location where you want the output to be saved, and type Missing_zones_obesity as the name for the output polygon feature class.
  5. Verify that Append all fields from input features is checked because you want to carry over all the fields from the Missing_zones feature class.

    Areal Interpolation Layer To Polygons geoprocessing tool dialog box
    Areal Interpolation Layer To Polygons geoprocessing tool dialog box

  6. Click OK to run the tool.

    The polygon feature class containing the predictions for fifth grade obesity rates in the missing Los Angeles school zones is added to the map. The field with the predicted obesity rates is labeled Predicted. In addition, the standard errors of the prediction are stored in a field labeled StdError.

    Predicted obesity rates for fifth grade students in missing school zones
    Predicted obesity rates for fifth grade students in missing school zones

    Note:

    The symbology has been imported from the obesity rates of the school zones.

You have completed the workflow for predicting fifth grade obesity rates in Los Angeles school zones where data was missing.

You can close ArcGIS Pro without saving your results.

Data reference

  • Rosenshein, L. "The Local Nature of a National Epidemic: Childhood Overweight and the Accessibility of Healthy Food." M.S. dissertation, George Mason University, Department of Geography and GeoInformation Science, Fairfax, Virginia, USA, 2010.

Related topics