Summarize Attributes (GeoAnalytics)

Summary

Calculates summary statistics for fields in a feature class.

Usage

  • Summarize Attributes is a tabular analysis tool, not a spatial analysis tool. Inputs can be a tabular layer or a layer with geometry (points, lines, or polygons).

  • You can specify one or more fields to summarize by or summarize all features. When you summarize by fields, statistics are calculated for each unique combination of attribute values.

  • The output table will consist of fields containing the result of the statistical operation.

  • A field will be created for each specified statistic type using the following naming convention: sum_<field>, max_<field>, min_<field>, range_<field>, std_<field>, count_<field>, var_<field>, and any_<field> (where <field> is the name of the input field for which the statistic is computed). The statistics will be calculated on each group separately.

  • If time is enabled on the input, you can apply time stepping to your analysis. Each time step is analyzed independent of features outside the time step. To use time stepping, your input data must be time enabled and represent an instant in time. When time stepping is applied, output features will be time intervals represented by the START_DATETIME and END_DATETIME fields.

    Learn more about time stepping

  • You can apply this tool to spatial data, and you will get a tabular result. You can join your results to spatial data using Join Features.

  • The tables below illustrate the statistical calculations of a layer that is summarized using like values of fields. The VO2 field was used to calculate the numeric statistics (Count,Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The Rating field was used to calculate the string statistics (Count and Any) for the layer.

    Input layer to be summarized
    The input layer to be summarized is shown.

    The table above was summarized on the Designation field, and the VO2 field was used to calculate the numeric statistics (Count,Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The Rating field was used to calculate the string statistics (Count and Any) for the layer. This results is a table with two features, representing the distinct values of Designation.

    Input layer summarized using the Designation field
    The input layer that has been summarized using the Designation field is shown.

    The following table represents what the first few fields look like when the layer is summarized using the Designation and Age Group fields. Statistics are calculated using the same methods as the previous example.

    Input layer summarized using the Designation and Age Group fields
    The input layer that has been summarized using the Designation and Age Group fields is shown.
  • You can improve the performance of the Summarize Attributes tool by using the following tips:

    • Set the extent environment so you only analyze data of interest.
    • Use data that is local to where the analysis is being run.

  • This geoprocessing tool is powered by ArcGIS GeoAnalytics Server. Analysis is completed on your GeoAnalytics Server, and results are stored in your content in ArcGIS Enterprise.

  • When running GeoAnalytics Server tools, the analysis is completed on the GeoAnalytics Server. For optimal performance, make data available to the GeoAnalytics Server through feature layers hosted on your ArcGIS Enterprise portal or through big data file shares. Data that is not local to your GeoAnalytics Server will be moved to your GeoAnalytics Server before analysis begins. This means that it will take longer to run a tool, and in some cases, moving the data from ArcGIS Pro to your GeoAnalytics Server may fail. The threshold for failure depends on your network speeds, as well as the size and complexity of the data. It is recommended that you always share your data or create a big data file share.

    Learn more about sharing data to your portal

    Learn more about creating a big data file share through Server Manager

  • Similar analysis can also be completed using the Summary Statistics tool in the Analysis toolbox.

Parameters

LabelExplanationData Type
Input Layer

The point, polyline, or polygon layer to be summarized.

Record Set
Output Name

The name of the output feature service.

String
Fields

A field or fields used to summarize similar features. For example, if you choose a single field called PropertyType with the values of commercial and residential, all of the fields with the value residential fields will be summarized together, with summary statistics calculated, and all of the fields with the value commercial will be summarized together. This example will results in two rows in the output, one for commercial, and one for residential summary values.

Field
Summary Fields
(Optional)

The statistics that will be calculated on specified fields.

Value Table
Data Store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is Spatiotemporal big data store. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • Spatiotemporal big data storeOutput will be stored in a spatiotemporal big data store. This is the default.
  • Relational data storeOutput will be stored in a relational data store.
String
Time step interval
(Optional)

A value that specifies the duration of the time step. This parameter is only available if the input points are time enabled and represent an instant in time.

Time stepping can only be applied if time is enabled on the input.

Time Unit
Time step repeat
(Optional)

A value that specifies how often the time-step interval occurs. This parameter is only available if the input points are time enabled and represent an instant in time.

Time Unit
Time step reference
(Optional)

A date that specifies the reference time with which to align the time steps. The default is January 1, 1970, at 12:00 a.m. This parameter is only available if the input points are time enabled and represent an instant in time.

Date

Derived Output

LabelExplanationData Type
Output

The output table with summarized attributes.

Record Set

arcpy.geoanalytics.SummarizeAttributes(input_layer, output_name, fields, {summary_fields}, {data_store}, {time_step_interval}, {time_step_repeat}, {time_step_reference})
NameExplanationData Type
input_layer

The point, polyline, or polygon layer to be summarized.

Record Set
output_name

The name of the output feature service.

String
fields
[fields,...]

A field or fields used to summarize similar features. For example, if you choose a single field called PropertyType with the values of commercial and residential, all of the fields with the value residential fields will be summarized together, with summary statistics calculated, and all of the fields with the value commercial will be summarized together. This example will results in two rows in the output, one for commercial, and one for residential summary values.

Field
summary_fields
[summary_fields,...]
(Optional)

The statistics that will be calculated on specified fields.

  • COUNT—The number of nonnull values. It can be used on numeric fields or strings. The count of [null, 0, 2] is 2.
  • SUM—The sum of numeric values in a field. The sum of [null, null, 3] is 3.
  • MEAN—The mean of numeric values. The mean of [0,2, null] is 1.
  • MIN—The minimum value of a numeric field. The minimum of [0, 2, null] is 0.
  • MAX—The maximum value of a numeric field. The maximum value of [0, 2, null] is 2.
  • STDDEV—The standard deviation of a numeric field. The standard deviation of [1] is null. The standard deviation of [null, 1,1,1] is null.
  • VAR—The variance of a numeric field in a track. The variance of [1] is null. The variance of [null, 1,1,1] is null.
  • RANGE—The range of a numeric field. This is calculated as the minimum value subtracted from the maximum value. The range of [0, null, 1] is 1. The range of [null, 4] is 0.
  • ANY—A sample string from a field of type string.

Value Table
data_store
(Optional)

Specifies the ArcGIS Data Store where the output will be saved. The default is SPATIOTEMPORAL_DATA_STORE. All results stored in a spatiotemporal big data store will be stored in WGS84. Results stored in a relational data store will maintain their coordinate system.

  • SPATIOTEMPORAL_DATA_STOREOutput will be stored in a spatiotemporal big data store. This is the default.
  • RELATIONAL_DATA_STOREOutput will be stored in a relational data store.
String
time_step_interval
(Optional)

A value that specifies the duration of the time step. This parameter is only available if the input points are time enabled and represent an instant in time.

Time stepping can only be applied if time is enabled on the input.

Time Unit
time_step_repeat
(Optional)

A value that specifies how often the time-step interval occurs. This parameter is only available if the input points are time enabled and represent an instant in time.

Time Unit
time_step_reference
(Optional)

A date that specifies the reference time with which to align the time steps. The default is January 1, 1970, at 12:00 a.m. This parameter is only available if the input points are time enabled and represent an instant in time.

Date

Derived Output

NameExplanationData Type
output

The output table with summarized attributes.

Record Set

Code sample

SummarizeAttributes (Python window)

The following Python window script demonstrates how to use the SummarizeAttributes tool.

#-------------------------------------------------------------------------------
# Name: Summarize Attributes.py
# Description: Summarize Crime Data by year and beat.
#
# Requirements: ArcGIS GeoAnalytics Server

# Import system modules
import arcpy

# Set local variables
# This example used a big data file share name "Crimes" with dataset "Chicago" registered on my GeoAnalytics server
inFeatures = "https://MyGeoAnalyticsMachine.domain.com/geoanalytics/rest/services/DataStoreCatalogs/bigDataFileShares_Crimes/BigDataCatalogServer/Chicago"
summaryFields = ["Year", "Beat"]
summaryStatistics = [["Arrest", "COUNT"], ["District", "COUNT"]]
outFS = 'SummarizeCrimes'
dataStore = "SPATIOTEMPORAL_DATA_STORE"

# Execute SummarizeAttributes
arcpy.geoanalytics.SummarizeAttributes(inFeatures, outFS, summaryFields, 
                                       summaryStatistics, dataStore)

Environments

Output Coordinate System

The coordinate system that will be used for analysis. Analysis will be completed in the input coordinate system unless specified by this parameter. For GeoAnalytics Tools, final results will be stored in the spatiotemporal data store in WGS84.

Licensing information

  • Basic: Requires ArcGIS GeoAnalytics Server
  • Standard: Requires ArcGIS GeoAnalytics Server
  • Advanced: Requires ArcGIS GeoAnalytics Server

Related topics