Understand data apportionment

Доступно с лицензией Business Analyst.

Data apportionment allows you to use attributes available within census geographies, such as total population, to calculate information for your custom geographies, like rings or drive-time service areas. The apportionment algorithm uses a secondary layer called the apportionment layer, such as population settlement points to calculate attributes inside a polygon. For example, using data apportionment, you can estimate the number of people impacted by a tornado or hurricane, the number of senior citizens that live within 15 minutes of a community center, or how many households are within a store's primary trade area.

Data apportionment is the aggregation of ArcGIS Business Analyst data into rings and polygons. ArcGIS Pro uses the same approach for apportioning data as the GeoEnrichment service, which employs a sophisticated geographic retrieval methodology to aggregate data for rings and other polygons. A geographic retrieval methodology determines how data is gathered and summarized or aggregated for input features. For standard geographic units, such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. For example, if an input study trade area contains a selection of ZIP Codes, the data retrieval is a simple process of gathering the data for those areas.

How data is summarized

The geographic retrieval process for ring buffers, drive-time service areas, and other non-standard geography polygons is more complicated, because the input polygon may intersect geographic areas that contain data that needs to be aggregated.

The following diagram illustrates this case. The polygon in the center represents an input study area that is being enriched. For example, the Enrich Layer geoprocessing tool in ArcGIS Pro can calculate the total population for this area. The polygons labeled represent census geographies that contain total population values. In the United States, these can be block groups with enrichment data; in Canada, they can be Dissemination Areas.

polygon

The GeoEnrichment service employs a Weighted Centroid geographic retrieval methodology to aggregate data for rings and other polygons. The Weighted Centroid retrieval approach uses census block data to better apportion block groups that are not exclusively contained within a ring. In the United States, Canada, and many other страны или области, census blocks are the smallest unit of census geography. These small areas are used to create all other levels of census geography. For example, in the United States, one or many blocks are aggregated to create a block group.

The Weighted Centroid method is illustrated in the following figure:

Weighted Centroid method

In the previous figure, census blocks are illustrated as black points. Using area P3 as an example, the population weight for this area is determined by summing the block weights within this polygon. The sum of these weights will provide a proportion of area P3 that is within the study area. Summarizing a demography variable such as the Total Population, will use this proportion to aggregate and summarize data. For example, if 90 percent of the P3 blocks' population are within the study area, and the Total Population of P3 is 100 people, you can determine that 90 people in area P3 are inside the study area.

Weighted blocks

The weight w1 of the site P1 is calculated as a sum of weights of block points belonging to the intersection of the site P1 and the target polygon T:

formula for weight of the site

Here, ß is a block and W1(ß) is a weight of this block in the site P1.

Summarizing a demography variable such as the Total Population, the weights need to be determined for all intersecting geographies. The GeoEnrichment service calculates the weight W1(ß) as a ratio of the total population associated with the block (ß) belonging to the site P1 to the sum of total population values for all blocks belonging to the site P1:

formula

How data apportionment works

The Enrich Layer tools in ArcGIS Pro, ArcGIS Online, and the GeoEnrichment service use a data apportionment algorithm to redistribute demographic, business, economic, and landscape variables to input polygon features. The algorithm analyzes each polygon to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons that contain attributes for the selected variables. Based on how each polygon being enriched overlays these datasets, the algorithm determines the appropriate amount of each variable to assign.

Depending on which country the enrichment polygon is located in, the granular point dataset represents one of the following:

  • Census Block Points—U.S. and Canada only. These points are initially produced as centroids from the most detailed census tabulation areas in these countries; census blocks in the U.S. and dissemination areas for Canada. In some cases, Esri has moved these points to be located within residential areas, rather than obviously industrial or other non-residential areas. Each point contains attributes for the count of people and households living in the corresponding tabulation area.
  • Settlement Points—For most other countries, Esri produces settlement points based on a settlement likelihood model that uses Landsat8 imagery and road intersections. Road intersections particularly help in areas where dense forest canopy obscures dwellings. Settlement points are initially produced as a dasymetric raster surface, meaning places that people cannot live, or where people do not live have been removed. This raster surface is produced at a resolution of 75-meters, which is roughly the size of a city block. The model assigns each cell or point a settlement likelihood score, representing the likelihood of people living there.
  • Address Based Settlement Points—Switzerland and Netherlands only. Some countries track and make available the points representing residential addresses of their citizens. Esri aggregates the count of these address points in to a 75-meter resolution raster and converts that to a point dataset like settlement points.
  • Building Footprint Settlement Points—Spain AIS Group Data only. The count of building footprint centroids of residential buildings is summed to a 75-meter resolution raster to produce a dataset of settlement points.

Apportionment methodology

The illustration below shows how the purple ring-buffer polygon to be enriched, relates to the dark blue settlement points and detailed statistical polygons with gray outlines that will support enrichment. Here is how the process works to enrich the purple ring with total population:

  1. Select the statistical polygons that are completely inside the ring polygon. These polygons are shown in white. Compute the sum of the total population variable for these polygons.
  2. Select the statistical polygons that partially intersect the ring polygon. These are shown in light green. For each of these polygons do the following:
    1. Select all the dark blue settlement points that are inside. Using the total population variable from the statistical polygon and the sum of settlement likelihood scores determine the ratio of people per unit of settlement score.
    2. For only the points that are inside the purple ring, compute the sum of settlement likelihood, and from that derive the number of people represented by those points.

      Settlement points

      The dark blue settlement points represent two types of information. First, a regularly spaced, 75-meter grid of points that is produced as described above. Second, because some reporting units are small enough to fall between the 75-meter grid of points, the centroids of these units are added to guarantee these areas are not omitted.

Variations in apportionment method

The above description applies to most countries, however, in the U.S. and Canada, the process is simpler because the points already have an attribute with the population living there. Thus, the sum of the population attribute for the points inside the enrichment polygon is all that is needed to determine the total population. The values for other variables are determined based on population or summaries pre-calculated means or rates.

The above information describes the default apportionment method, which is called BlockApportionment. If ArcGIS Pro detects a significantly large polygon, BlockApportionment method in ArcGIS Pro is now optimized to use a less computationally intensive method. ArcGIS Pro now uses generalized block point layers in the calculation, as the size of the area to be enriched increases. The attributes table of the results from an enrichment operation will output the name of the method used in the aggregationMethod field.

The method uses different geographies and generalized block points as the basis for apportionment. For larger polygons, the method uses progressively coarser polygon geographies and more generalized block points. For example, in the United States, instead of using the U.S. Census Bureau's block group polygons, the method uses census tract boundaries, and instead of using the most refined block points as the basis for apportionment, generalized block points will be used. The purpose of this optimization in apportioning data is for faster performance and greater accuracy.

These thresholds are based on the diameters of buffers.

  • In the United States, the following diameters and polygon/point datasets are used::
    • 0 to 504 miles use census block groups and block points.
    • 505 to 786 miles use census tracts and block points based on generalization level 2.
    • 787 to 866 miles use census tracts and block points based on generalization level 3.
    • 867 to 954 miles use census tracts and block points based on generalization level 4.
    • Beyond 954 miles use census tracts and block points based on generalization level 5
Подсказка:

The aggregationMethod field in the Enrich Layer output shows the apportionment method, geography level, and block point layers used to apportion/enrich data.

Apportionment layers

An apportionment layer is a point feature layer containing a weight field that is used in Statistical Data Collections to estimate and aggregate data to other layers. When using a local dataset in Business Analyst, apportionment layers, by default, are census block centroids.

Подсказка:

See Create a Statistical Data Collection to learn more.

You can apply the following apportionment methods to data fields:

  • NONE—No apportionment is used.
  • GEOM—Uses the geographic area of a polygon. No block point apportionment is used.
  • POP_W—Uses weighted population from the decennial census year.
  • HH_W—Uses weighted households from the decennial census year.
  • HU_W—Uses weighted housing units from the decennial census year.
  • POP_W_CY—Uses weighted population from the current year's dataset.
  • HH_W_CY—Uses weighted households from the current year's dataset.
  • HU_W_CY—Uses weighted housing units from the current year's dataset.
  • BUS_W_CY—Uses weighted businesses from the current year's dataset.
  • Daytime Workers Population—Uses weighted workforce population locations from the current year’s dataset.
  • Daytime Residents Population—Uses weighted residential population locations from the current year’s dataset.
Примечание:

The list of apportionment methods is specific to United States local data. Your list is dependent on the local data installed and is derived from the block centroid point layer.

Statistical Data Collections (SDCX) allow you to customize the apportionment layer to use any point layer. This connects your custom polygons to a custom apportionment layer to refine the results beyond default methods. No locally installed dataset is required.

Examples of custom apportionment layers

International location and nondemographic data area are examples of apportionment layers.

International location example

You can create an SDCX in Japan to analyze the historical household population—for example, the population in 1900—using data derived from research sources. You can start with Japan prefecture administrative division polygons. These are large boundaries where a remedial geometry apportionment does not return accurate results. To increase accuracy and granularity—and results specific to that time period—you can load a new point feature layer containing population settlement locations with weights for the year 1900. The weights can contain household counts in that year. By connecting the Japanese boundaries to the new apportionment layer, you can understand what the household populations were like in any boundary, such as a 5-kilometer area around Tokyo.

Nondemographic data area example

You can create an SDCX in the oil fields of Texas, where there might be minimal human population, but you still need to accurately estimate the underground resource levels. Instead of administrative boundaries, such as block groups, you can start with a custom 2x2-mile grid layer containing aggregated locations of underground fuel sources, such as natural gas or crude oil. To increase accuracy and granularity, you can load a new point feature layer containing oil and gas well locations with monthly tallied weights for each type of natural resource. By connecting the oil field grid layer to the new apportionment layer, you can understand what the current resource levels might be like in any boundary, such as a defined area of seismic activity.

Set an apportionment layer

To set an apportionment layer, do the following:

  1. Create an SDCX using any custom boundary layer.
  2. On the SDCX Edit dialog box, on the Source tab, set Apportionment Layer to any point feature layer. For best accuracy, the point feature should intersect the custom boundary. The point feature layer must contain a numeric field used for Apportionment Method weighting. The first numeric field found is used.
  3. Optionally, change the Apportionment Method value to any numeric field on the Variables tab.

Any changes made should be reflected in an updated SDCX performance index. You can build the index from the Source tab. You can select custom variables in the Custom Data node for any tool that uses the data browser, such as the Enrich Layer workflow.