Summarize Within (GeoAnalytics)

Summary

Overlays a polygon layer with another layer to summarize the number of points, length of the lines, or area of the polygons within each polygon and calculates attribute field statistics about those features within the polygons.

The following are example scenarios using Summarize Within:

  • Given watershed boundaries and land-use boundaries by land-use type, calculate total acreage of land-use type for each watershed.
  • Given county parcels and city boundaries, summarize the average value of vacant parcels within each city boundary.
  • Given counties and roads, summarize the total mileage of roads by road type within each county.

Illustration

Summarize Within
Examples of summarizing points within polygons (first row), lines within polygons (second row), and polygons within polygons (third row) are shown.

Usage

  • In simple terms, Summarize Within takes two layers, the input polygons and the input summary features, and stacks them on top of each other. After stacking these layers, you can look down through the stack and count the number of input summary features that fall within the input polygons. You can also calculate simple statistics about the attributes of the input summary features, such as sum, mean, minimum, maximum, and so on.

  • Use Summarize Within to calculate standard statistics as well as geographically weighted statistics. Standard statistics summarize the statistical values without weighting. Weighted statistics calculate values using the geographically weighted values of the proportion of lines within a polygon, or the proportion of polygons within a polygon. Weighted statistics do not apply to points within polygons.

  • You can calculate the lengths and areas of the summarized layers within each polygon using the options in the table below. Options are based on the geometry of the summarized layer.

    Input featureDescriptionOption

    Points

    The count of summarized points within each polygon.

    None

    Lines

    The length of summarized lines within each polygon.

    • Miles
    • Yards
    • Feet
    • Kilometers
    • Meters

    Areas

    The area of summarized polygons within each polygon.

    • Square Miles
    • Square Yards
    • Square Feet
    • Square Kilometers
    • Square Meters
    • Hectares
    • Acres
  • You can optionally calculate standard statistics. For lines and areas, all weighted statistics will be calculated. Both the standard summary field statistics and the weighted summary field statistics are applied to data for the features in the Summarized Layer that intersect the Summary Polygons layer. The weighted summary field statistics are multiplied by a weight based on the proportion of the Summary Polygons intersecting each feature in the Summarized Layer.

  • For standard statistics, there are eight options: count, sum, mean, minimum, maximum, range, standard deviation, and variance. There are two options for string statistics: count and any. There are six weighted statistics that are calculated on numeric fields in the layer to be summarized: count, sum, mean, minimum, maximum, and range.

  • Weighted statistics are not calculated for string data. Each time Field and Statistic are specified, a row is added to the tool pane so more than one statistic can be calculated. You can view the summarized results in the result layer's table or pop-ups. By default, the count of features intersecting the Summary Polygons is always calculated.

  • With ArcGIS Enterprise10.6.1 and later, you can calculate a group using the Group By Field parameter. This will create a tabular output in addition to your summarized polygon layer.

  • The Add Minority and Majority Attributes and Add Group Percentages parameters are available when a Group By Field value is selected. The minority and majority will be the least and most dominant value from the Group By Field, respectively, where dominance is determined using the count of points, total length, or total area of each value.

  • When the Add Minority and Majority Attributes parameter is checked, two fields will be added to the result layer. The fields will list the values from the Group By Field parameter that are the minority and majority for each result feature.

  • The Add Group Percentages parameter is only available when you select Add Minority and Majority Attributes. When the Add Group Percentages parameter is checked, two fields will be added to the result layer listing the percentage of the count of points, total length, or total area that belong to the minority and majority values for each feature. A percentage field will also be added to the result table listing the percentage of the count of points, total length, or total area that belong to all values from the Group By Field parameter for each feature.

  • For weighted statistics, line layers are summarized using only the proportions of line features that are within the Summary Polygons. Standard (nonweighted) statistics summarize any line intersecting the Summary Polygons. When summarizing lines using weighted statistics, use counts and amounts (rather than rates or indices) so proportional calculations make logical sense in your analysis.

  • Weighted statistics for summarized area layers are based on the proportions of the Summary Polygons features that are within the Summarized Layer. When summarizing areas, use counts or amounts (rather than rates or indices) so proportional calculations make logical sense in your analysis.

  • The output feature layer is always a polygon layer. Only polygons that intersect a summarized layer will be returned. Other polygons will be completely removed from the result layer.

    Polygons returned with point features
    The input point and polygon features (left) and the resulting area features (right) are shown.

  • The following fields are included in the output polygon features:

    Field nameDescription
    count

    The count of summarized features that intersect each polygon layer.

    sum_length_<linearunit>, or sum_area_<areaunit>

    The total length of lines within the polygon or total area of summarized polygon within each polygon. These values are returned when you select Add Shape Summary Attributes and are returned in the specified unit.

    statistic_<fieldname>

    Specified statistics will each create an attribute field named in the following format: <statistic>_<fieldname>. For example, the maximum and standard deviation of the field id is MAX_id and SD_id.

    pstatistic_<fieldname>

    Specified weighted statistics will each create an attribute field named in the following format: p<statistic>_<fieldname>. For example, the weighted maximum of the field id is pMAX_id.

    minority_<fieldname>

    This value is returned when you create a group-by table and select Add Minority and Majority Attributes. This represents the values for the specified field that is the minority in each polygon. For example, there are five points within a polygon with a field called color and values of red, blue, blue, green, green. If you create a group by the color field, the value for the minority_color field is red.

    majority_<fieldname>

    This value is returned when you create a group-by table and select Add Minority and Majority Attributes. This represents the values for the specified field that is the majority in each polygon. For example, there are five points within a polygon with a field called color and values of red, blue, blue, green, green. If you create a group by the color field, the value for the minority_color field is blue;green.

    minority_<fieldname>_percent

    This value is returned when you create a group-by table and selectAdd Group Percentages. This represents the percentages of the count for the specified field that is the minority in each polygon. For example, there are five points within a polygon with a field called color and values of red, blue, blue, green, green. If you create a group by the color field, the value for the minority_color_percent field is 20 (calculated as 1/5).

    majority_<fieldname>_percent

    This value is returned when you create a group-by table and select Add Group Percentages. This represents the percentages of the count for the specified field that is the majority in each polygon. For example, there are five points within a polygon with a field called color and values of red, blue, blue, green, green. If you create a group by the color field, the value for the majority_color_percent field is 40 (calculated as 2/5).

    join_id

    This value is returned when you create a group-by table. This is an ID to link features to the group-by table. Every join_id field corresponds to one or more rows in the group-by table.

    The following fields are included in the output group-by table:

    Field nameDescription

    join_id

    This is an ID to link features to the polygon layer. Each polygon will have one or more features with the same ID that represent all of the group-by values. For example, there are five points within a polygon with a field called color and values of red, blue, blue, green, green. The group-by table will have three rows representing that polygon (same join ID), one for each of the colors red, blue, and green.

    count

    The count of the specified group within the joined polygon. For example, red is 1 for the selected polygon.

    <statistic>_<fieldname>

    Any specified statistic calculated for each group.

    p<statistic>_<fieldname>

    Any specified weighted statistic calculated for each group.

    percentcount

    The percentage each group contributes to the total count in the polygon. Using the above example, red contributes 1/5 = 20, blue contributes 2/5 = 40, and green contributes 2/5 = 20.

  • You can improve the performance of the Summarize Within tool by using one or more of the following tips:

    • Set the extent environment so you only analyze data of interest.
    • If you are using bins, larger bins will perform better than smaller bins. If you are unsure which size to use, start with a larger bin to prototype.
    • Use data that is local to where the analysis is being run.

  • This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on your GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.

  • When running GeoAnalytics Server tools, the analysis is completed on the GeoAnalytics Server. For optimal performance, make data available to the GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to your GeoAnalytics Server will be moved to your GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool, and in some cases, moving the data from ArcGIS Pro to your GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. Therefore, it is recommended that you always share your data or create a big data file share.

    Learn more about sharing data to your portal

    Learn more about creating a big data file share through Server Manager

  • Similar analysis can also be completed using the Summarize Within tool in the Standard Feature Analysis toolbox in ArcGIS Pro.

Syntax

arcpy.geoanalytics.SummarizeWithin(summarized_layer, output_name, polygon_or_bin, bin_type, {bin_size}, {summary_polygons}, sum_shape, {shape_units}, {standard_summary_fields}, {weighted_summary_fields}, {data_store}, {group_by_field}, {add_minority_majority}, {add_percentages})
ParameterExplanationData Type
summarized_layer

The point, line, or polygon features that will be summarized by either polygons or bins.

Feature Set
output_name

The name of the output polygon feature service containing the intersecting geometries and attributes.

String
polygon_or_bin

Specifies whether summarized_layer will be summarized by polygons or bins.

  • POLYGONThe summarized layer will be aggregated into a polygon dataset.
  • BINThe summarized layer will be aggregated into square or hexagonal bins.
String
bin_type

Specifies the bin shape that will be generated to summarize features.

  • SQUAREbin_size represents the height of a square. This is the default.
  • HEXAGONbin_size represents the height between two parallel sides.
String
bin_size
(Optional)

The distance interval that represents the bin size and units by which the input features will be summarized.

Linear Unit
summary_polygons
(Optional)

The polygons used to summarize the features in the input summarized layer.

Feature Set
sum_shape

Specifies whether the length of lines or area of polygons within the summary layer (polygon or bin) will be calculated. The count of points, lines, and polygons intersecting the summary shape will always be included.

  • ADD_SUMMARYSummary shape values will be calculated. This is the default.
  • NO_SUMMARYSummary shape values will not be calculated.
Boolean
shape_units
(Optional)

Specifies the unit to be used to calculate shape summary attributes. If the input summarized_layer is points, no shape unit is necessary, since only the count of points within each input polygon is added. If the input summary features are lines, specify a linear unit. If the input summary features are polygons, specify an areal unit.

  • METERSThe shape units will be meters.
  • KILOMETERSThe shape units will be kilometers.
  • FEETThe shape units will be feet.
  • YARDSThe shape units will be yards.
  • MILESThe shape units will be miles.
  • ACRESThe shape units will be acres.
  • HECTARESThe shape units will be hectares.
  • SQUARE_METERSThe shape units will be square meters.
  • SQUARE_KILOMETERSThe shape units will be square kilometers.
  • SQUARE_FEETThe shape units will be square feet.
  • SQUARE_YARDSThe shape units will be square yards.
  • SQUARE_MILESThe shape units will be square miles.
String
standard_summary_fields
[standard_summary_fields,...]
(Optional)

The statistics that will be calculated on specified fields.

  • COUNT—The number of nonnull values. It can be used on numeric fields or strings. The count of [null, 0, 2] is 2.
  • SUM—The sum of numeric values in a field. The sum of [null, null, 3] is 3.
  • MEAN—The mean of numeric values. The mean of [0,2, null] is 1.
  • MIN—The minimum value of a numeric field. The minimum of [0, 2, null] is 0.
  • MAX—The maximum value of a numeric field. The maximum value of [0, 2, null] is 2.
  • STDDEV—The standard deviation of a numeric field. The standard deviation of [1] is null. The standard deviation of [null, 1,1,1] is null.
  • VAR—The variance of a numeric field in a track. The variance of [1] is null. The variance of [null, 1,1,1] is null.
  • RANGE—The range of a numeric field. This is calculated as the minimum value subtracted from the maximum value. The range of [0, null, 1] is 1. The range of [null, 4] is 0.
  • ANY—A sample string from a field of type string.

Value Table
weighted_summary_fields
[weighted_summary_fields,...]
(Optional)

Specifies the weighted statistics that will be calculated on specified fields.

  • COUNTThe count of each field will be calculated, multiplied by the proportion of the summarized layer within the polygons.
  • SUMThe sum of weighted values in each field will be calculated, in which the weight applied is the proportion of the summarized layer within the polygons.
  • MEANThe mean of weighted values in each field will be calculated, in which the weight applied is the proportion of the summarized layer within the polygons.
  • MINThe minimum of weighted values in each field will be calculated, in which the weight applied is the proportion of the summarized layer within the polygons.
  • MAXThe maximum of weighted values in each field will be calculated, in which the weight applied is the proportion of the summarized layer within the polygons.
  • RANGEThe difference between MIN and MAX will be calculated.
Value Table
data_store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is SPATIOTEMPORAL_DATA_STORE. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • SPATIOTEMPORAL_DATA_STOREOutput will be stored in a spatiotemporal big data store. This is the default.
  • RELATIONAL_DATA_STOREOutput will be stored in a relational data store.
String
group_by_field
(Optional)

A field from the input summary features that will be used to calculate statistics for each unique attribute value. For example, the input summary features contain point locations of businesses that store hazardous materials, and one of the fields is HazardClass, containing codes that describe the type of hazardous material stored. To calculate summaries by each unique value of HazardClass, use it as the group by field.

Field
add_minority_majority
(Optional)

Specifies whether minority (least dominant) and majority (most dominant) attribute values for each group field within each boundary will be added. If they are, two new fields are added to the output layer prefixed with Majority_ and Minority_. This parameter only applies when the group_by_field parameter is used.

  • NO_MIN_MAJMinority and majority fields will not be added. This is the default.
  • ADD_MIN_MAJMinority and majority fields will be added.
Boolean
add_percentages
(Optional)

Specifies whether percentage fields will be added. If they are, the percentage of each unique group value is calculated for each input polygon. This parameter only applies when the group_by_field and add_minority_majority parameters are used.

  • NO_PERCENTPercentage fields will not be added. This is the default.
  • ADD_PERCENTPercentage fields will be added.
Boolean

Derived Output

NameExplanationData Type
output

The summarized number of points, length of the lines, or area of the polygons within each polygon.

Feature Set
group_by_summary

When group by summary is provided, the tool will output a table containing the calculated statistics for each unique group.

Record Set

Code sample

SummarizeWithin example (Python window)

The following Python window script demonstrates how to use the SummarizeWithin tool.

# Name: SummarizeWithin.py
# Description: Summarize river polylines by counties.
#			
# Requirements: ArcGIS GeoAnalytics Server

# Import system modules
import arcpy

# Set local variables
summarizedLayer = "https://MyGeoAnalyticsMachine.domain.com/geoanalytics/rest/services/DataStoreCatalogs/bigDataFileShares_Water/BigDataCatalogServer/Rivers"
summaryPolys = "https://MyGeoAnalyticsMachine.domain.com/geoanalytics/rest/services/DataStoreCatalogs/bigDataFileShares_Boundaries/BigDataCatalogServer/Counties"
summaryStatistics = ["Width", "MEAN"]
weighedSummaryStatistics = ["DOC", "SUM"]
outFS = 'SummarizedRivers'
dataStore = "SPATIOTEMPORAL_DATA_STORE"

# Execute SummarizeWithin
arcpy.geoanalytics.SummarizeWithin(summarizedLayer, outFS, "POLYGON", None, 
                                   None, summaryPolys,"ADD_SUMMARY", 
                                   "KILOMETERS", summaryStatistics, 
                                   weightedSummaryStatistics, dataStore)

Environments

Output Coordinate System

The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

  • Basic: Requires ArcGIS GeoAnalytics Server
  • Standard: Requires ArcGIS GeoAnalytics Server
  • Advanced: Requires ArcGIS GeoAnalytics Server

Related topics