Calculates summary statistics for fields in a feature class.
Usage
Summarize Attributes is a tabular analysis tool, not a spatial analysis tool. Inputs can be a tabular layer or a layer with geometry (points, lines, or polygons).
You can specify one or more fields to summarize by or summarize all features. When you summarize by fields, statistics are calculated for each unique combination of attribute values.
The output table will consist of fields containing the result of
the statistical operation.
A field will be created for each specified statistic type
using the following naming convention: sum_<field>, max_<field>, min_<field>, range_<field>, std_<field>, count_<field>, var_<field>, and any_<field> (where <field> is the name of the input
field for which the statistic is computed). The statistics will be calculated on each group separately.
If time is enabled on the input, you can apply time stepping to your analysis. Each time step is analyzed independent of features outside the time step. To use time stepping, your input data must be time enabled and represent an instant in time. When time stepping is applied, output features will be time intervals represented by the START_DATETIME and END_DATETIME fields.
You can apply this tool to spatial data, and you will get a tabular result. You can join your results to spatial data using Join Features.
The tables below illustrate the statistical calculations of a layer that is summarized using like values of fields. The VO2 field was used to calculate the numeric statistics (Count,Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The Rating field was used to calculate the string statistics (Count and Any) for the layer.
The table above was summarized on the Designation field, and the VO2 field was used to calculate the numeric statistics (Count,Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The Rating field was used to calculate the string statistics (Count and Any) for the layer. This results is a table with two features, representing the distinct values of Designation.
The following table represents what the first few fields look like when the layer is summarized using the Designation and Age Group fields. Statistics are calculated using the same methods as the previous example.
You can improve the performance of the Summarize Attributes tool by using the following tips:
Set the extent environment so you only analyze data of interest.
This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on your GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.
When running GeoAnalytics Server tools, the analysis is completed on the GeoAnalytics Server. For optimal performance, make data available to the GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to your GeoAnalytics Server will be moved to your GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool, and in some cases, moving the data from ArcGIS Pro to your GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. Therefore, it is recommended that you always share your data or create a big data file share.
Similar analysis can also be completed using the Summary Statistics tool in the Analysis toolbox.
Parameters
Label
Explanation
Data Type
Input Layer
The point, polyline, or polygon layer to be summarized.
Record Set
Output Name
The name of the output feature service.
String
Fields
A field or fields used to summarize similar features. For example, if you choose a single field called PropertyType with the values of commercial and residential, all of the fields with the value residential fields will be summarized together, with summary statistics calculated, and all of the fields with the value commercial will be summarized together. This example will results in two rows in the output, one for commercial, and one for residential summary values.
Field
Summary Fields
(Optional)
The statistics that will be calculated on specified fields.
Value Table
Data Store
(Optional)
Specifies the ArcGIS Data Store where the output will be saved. The default is Spatiotemporal big data store. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.
Spatiotemporal big data store —Output will be stored in a spatiotemporal big data store. This is the default.
Relational data store —Output will be stored in a relational data store.
String
Time step interval
(Optional)
A value that specifies the duration of the time step. This parameter is only available if the input points are time enabled and represent an instant in time.
Time stepping can only be applied if time is enabled on the input.
Time Unit
Time step repeat
(Optional)
A value that specifies how often the time-step interval occurs. This parameter is only available if the input points are time enabled and represent an instant in time.
Time Unit
Time step reference
(Optional)
A date that specifies the reference time with which to align the time steps. The default is January 1, 1970, at 12:00 a.m. This parameter is only available if the input points are time enabled and represent an instant in time.
The point, polyline, or polygon layer to be summarized.
Record Set
output_name
The name of the output feature service.
String
fields
[fields,...]
A field or fields used to summarize similar features. For example, if you choose a single field called PropertyType with the values of commercial and residential, all of the fields with the value residential fields will be summarized together, with summary statistics calculated, and all of the fields with the value commercial will be summarized together. This example will results in two rows in the output, one for commercial, and one for residential summary values.
Field
summary_fields
[summary_fields,...]
(Optional)
The statistics that will be calculated on specified fields.
COUNT—The number of nonnull values. It can be used on numeric fields or strings. The count of [null, 0, 2] is 2.
SUM—The sum of numeric values in a field. The sum of [null, null, 3] is 3.
MEAN—The mean of numeric values. The mean of [0,2, null] is 1.
MIN—The minimum value of a numeric field. The minimum of [0, 2, null] is 0.
MAX—The maximum value of a numeric field. The maximum value of [0, 2, null] is 2.
STDDEV—The standard deviation of a numeric field. The standard deviation of [1] is null. The standard deviation of [null, 1,1,1] is null.
VAR—The variance of a numeric field in a track. The variance of [1] is null. The variance of [null, 1,1,1] is null.
RANGE—The range of a numeric field. This is calculated as the minimum value subtracted from the maximum value. The range of [0, null, 1] is 1. The range of [null, 4] is 0.
ANY—A sample string from a field of type string.
Value Table
data_store
(Optional)
Specifies the ArcGIS Data Store where the output will be saved. The default is SPATIOTEMPORAL_DATA_STORE. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.
SPATIOTEMPORAL_DATA_STORE —Output will be stored in a spatiotemporal big data store. This is the default.
RELATIONAL_DATA_STORE —Output will be stored in a relational data store.
String
time_step_interval
(Optional)
A value that specifies the duration of the time step. This parameter is only available if the input points are time enabled and represent an instant in time.
Time stepping can only be applied if time is enabled on the input.
Time Unit
time_step_repeat
(Optional)
A value that specifies how often the time-step interval occurs. This parameter is only available if the input points are time enabled and represent an instant in time.
Time Unit
time_step_reference
(Optional)
A date that specifies the reference time with which to align the time steps. The default is January 1, 1970, at 12:00 a.m. This parameter is only available if the input points are time enabled and represent an instant in time.
Date
Derived Output
Name
Explanation
Data Type
output
The output table with summarized attributes.
Record Set
Code sample
SummarizeAttributes (Python window)
The following Python window script demonstrates how to use the SummarizeAttributes tool.
#-------------------------------------------------------------------------------
# Name: Summarize Attributes.py
# Description: Summarize Crime Data by year and beat.
#
# Requirements: ArcGIS GeoAnalytics Server
# Import system modules
import arcpy
# Set local variables
# This example used a big data file share name "Crimes" with dataset "Chicago" registered on my GeoAnalytics server
inFeatures = "https://MyGeoAnalyticsMachine.domain.com/geoanalytics/rest/services/DataStoreCatalogs/bigDataFileShares_Crimes/BigDataCatalogServer/Chicago"
summaryFields = ["Year", "Beat"]
summaryStatistics = [["Arrest", "COUNT"], ["District", "COUNT"]]
outFS = 'SummarizeCrimes'
dataStore = "SPATIOTEMPORAL_DATA_STORE"
# Execute SummarizeAttributes
arcpy.geoanalytics.SummarizeAttributes(inFeatures, outFS, summaryFields,
summaryStatistics, dataStore)
The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.