Summary Statistics (Analysis)

Summary

Calculates summary statistics for fields in a table.

Usage

  • The output will consist of fields containing the result of the statistical operations.

  • The following statistical operations are available with this tool: sum, mean, minimum, maximum, range, standard deviation, count, first, last, median, variance, unique, concatenate, and mode.

  • If the Case Fields parameter is specified, statistics will be calculated for each unique set of field values. The output table will include one record for each unique set of field values. If the Case Fields parameter is not specified, statistics will be calculated for all input records. The output table will contain a single record.

    The output's Frequency field is the count of all input records represented by the unique set of field values. If the Case Fields parameter is not specified, it will be a count of all input records.

  • A field will be created for each statistic type using the following naming convention: SUM_<field>, MEAN_<field>, MIN_<field>, MAX_<field>, RANGE_<field>, STD_<field>, COUNT_<field>, FIRST_<field>, LAST_<field>, MEDIAN_<field>, VARIANCE_<field>, UNIQUE_<field>, CONCATENATE_<field>, DBMS_SUM_<field>, DBMS_MEAN_<field>, DBMS_MIN_<field>, DBMS_MAX_<field>, DBMS_STD_<field>, DBMS_COUNT_<field>, DBMS_MEDIAN_<field>, and DBMS_VARIANCE_<field> in which <field> is the name of the input field for which the statistic is computed. The field name is truncated to 10 characters when the output table is a dBASE table.

  • Use the DBMS statistic types to process the statistics directly in the database or on the service rather than on the client. The DBMS statistic types are only supported for input geodatabase data sources, including enterprise geodatabases, feature services, cloud data warehouses, GeoPackage, and SQLite geodatabases.

    The DBMS statistic types do not support dBASE output.

  • Null values are excluded from all statistical calculations. For example, the average of 6, 4, and a null value is 5 ((6+4)/2).

  • When using layers, only the currently selected features will be used to calculate statistics.

Parameters

LabelExplanationData Type
Input Table

The input table containing the fields that will be used to calculate statistics.

Table View; Raster Layer
Output Table

The output table that will store the calculated statistics.

Table
Statistics Fields

Specifies the field or fields containing the attribute values that will be used to calculate the specified statistic. Multiple statistic and field combinations can be specified. Null values are excluded from all calculations.

Numeric attribute fields can be summarized using any statistic. Text attribute fields can be summarized using minimum, maximum, count, first, last, unique, concatenate, and mode statistics. Date, Date only, and Timestamp offset attribute fields can be summarized only with the mean, minimum, maximum, count, first, last, unique, and concatenate statistics.

The DBMS statistic types are only supported for input geodatabase data sources, including enterprise geodatabases, feature services, cloud data warehouses, and SQLite geodatabases

  • Sum—The values for the specified field will be added together.
  • Mean—The average for the specified field will be calculated.
  • Minimum—The smallest value of the specified field will be identified.
  • Maximum—The largest value of the specified field will be identified.
  • Range—The range of values (maximum minus minimum) for the specified field will be calculated.
  • Standard deviation—The standard deviation of values for the specified field will be calculated.
  • Count—The number of values in the specified field will be identified.
  • First—The specified field value of the first record in the input will be used.
  • Last—The specified field value of the last record in the input will be used.
  • Median—The median of the specified field will be calculated.
  • Variance—The variance of the specified field will be calculated.
  • Unique—The number of unique values of the specified field will be counted.
  • Concatenate—The values for the specified field will be concatenated. The values can be separated using the Concatenation Separator parameter.
  • Mode—The mode (the most common value) for the specified field will be identified. If more than one value is equally common, the lowest value will be returned.
  • DBMS Sum—The values for the specified field will be added together.
  • DBMS Mean—The average for the specified field will be calculated.
  • DBMS Minimum—The smallest value of the specified field will be identified.
  • DBMS Maximum—The largest value of the specified field will be identified.
  • DBMS Standard deviation—The standard deviation of values for the specified field will be calculated.
  • DBMS Count—The number of values in the specified field will be identified.
  • DBMS Median—The median of the specified field will be calculated.
  • DBMS Variance—The variance of the specified field will be calculated.
Value Table
Case Fields
(Optional)

The field or fields in the input that will be used to calculate statistics separately for each unique attribute value (or combination of attribute values when multiple fields are specified).

Field
Concatenation Separator
(Optional)

A character or characters that will be used to concatenate values when the Concatenation option is used for the Statistics Fields parameter. By default, the tool will concatenate values without a separator.

String

arcpy.analysis.Statistics(in_table, out_table, statistics_fields, {case_field}, {concatenation_separator})
NameExplanationData Type
in_table

The input table containing the fields that will be used to calculate statistics.

Table View; Raster Layer
out_table

The output table that will store the calculated statistics.

Table
statistics_fields
[[field, {statistic_type}],...]

Specifies the field or fields containing the attribute values that will be used to calculate the specified statistic. Multiple statistic and field combinations can be specified. Null values are excluded from all calculations.

Numeric attribute fields can be summarized using any statistic. Text attribute fields can be summarized using minimum, maximum, count, first, last, unique, concatenate, and mode statistics. Date, Date only, and Timestamp offset attribute fields can be summarized only with the mean, minimum, maximum, count, first, last, unique, and concatenate statistics.

The DBMS statistic types are only supported for input geodatabase data sources, including enterprise geodatabases, feature services, cloud data warehouses, and SQLite geodatabases

  • SUM—The values for the specified field will be added together.
  • MEAN—The average for the specified field will be calculated.
  • MIN—The smallest value of the specified field will be identified.
  • MAX—The largest value of the specified field will be identified.
  • RANGE—The range of values (maximum minus minimum) for the specified field will be calculated.
  • STD—The standard deviation of values for the specified field will be calculated.
  • COUNT—The number of values in the specified field will be identified.
  • FIRST—The specified field value of the first record in the input will be used.
  • LAST—The specified field value of the last record in the input will be used.
  • MEDIAN—The median of the specified field will be calculated.
  • VARIANCE—The variance of the specified field will be calculated.
  • UNIQUE—The number of unique values of the specified field will be counted.
  • CONCATENATE—The values for the specified field will be concatenated. The values can be separated using the concatenation_separator parameter.
  • MODE—The mode (the most common value) for the specified field will be identified. If more than one value is equally common, the lowest value will be returned.
  • DBMS_SUM—The values for the specified field will be added together.
  • DBMS_MEAN—The average for the specified field will be calculated.
  • DBMS_MIN—The smallest value of the specified field will be identified.
  • DBMS_MAX—The largest value of the specified field will be identified.
  • DBMS_STD—The standard deviation of values for the specified field will be calculated.
  • DBMS_COUNT—The number of values in the specified field will be identified.
  • DBMS_MEDIAN—The median of the specified field will be calculated.
  • DBMS_VARIANCE—The variance of the specified field will be calculated.
Value Table
case_field
[case_field,...]
(Optional)

The field or fields in the input that will be used to calculate statistics separately for each unique attribute value (or combination of attribute values when multiple fields are specified).

Field
concatenation_separator
(Optional)

A character or characters that will be used to concatenate values when the CONCATENATION option is used for the statistics_fields parameter. By default, the tool will concatenate values without a separator.

String

Code sample

Statistics example 1 (Python window)

The following Python window script demonstrates how to use the Statistics function in immediate mode.

import arcpy
arcpy.env.workspace = "C:/data/Habitat_Analysis.gdb"
arcpy.analysis.Statistics("futrds", "C:/output/output.gdb/stats", [["Shape_Length", "SUM"]], "NM")
Statistics example 2 (stand-alone script)

The following stand-alone script summarizes the vegetation by area within 150 feet of major roads.

# Description: Summarize the vegetation by area within 150 feet of major roads.

# Import system modules
import arcpy

# Set environment settings
arcpy.env.workspace = "C:/data"

# Set local variables
inRoads = "majorrds.shp"
outBuffer = "C:/output/output.gdb/buffer_out"
bufferDistance = "250 feet"
inVegetation = "Habitat_Analysis.gdb/vegtype"
outClip = "C:/output/output.gdb/clip_out"
joinField = "HOLLAND95"
joinTable = "c:/data/vegtable.dbf"
joinedField = "HABITAT"
outStatsTable = "C:/output/output.gdb/stats_out"
statsFields = [["Shape_Area", "SUM"]]

# Run Buffer to get a buffer of major roads
arcpy.analysis.Buffer(inRoads, outBuffer, bufferDistance, dissolve_option="ALL")

# Run Clip using the buffer output to get a clipped feature class of 
# vegetation
arcpy.analysis.Clip(inVegetation, outBuffer, outClip)

# Run JoinField to add the vegetation type
arcpy.management.JoinField(outClip, joinField, joinTable, joinField, joinedField)

# Run Statistics to get the area of each vegetation type within the 
# clipped buffer.
arcpy.analysis.Statistics(outClip, outStatsTable, statsFields, joinedField)
Statistics example 3 (stand-alone script)

The following stand-alone script loops through the attribute fields of a dataset and constructs the statistics_fields parameter so that the SUM statistic is calculated for every numeric field.


# Description: Script that runs the Summary Statistics tool to calculate the
#              Sum statistic for every numeric field based on a unique case 
#              field.

# Import system modules
import arcpy

# Set environment settings
arcpy.env.workspace = "C:/data/f.gdb"

# Set local variables
intable = "intable"
outtable = "sumstats"
casefield = "Name"
stats = []

# Loop through all fields in the Input Table
for field in arcpy.ListFields(intable):

    # Find the fields that have a numeric type
    if field.type in ("Double", "Integer", "Single", "SmallInteger"):
        # Add the field name and Sum statistic type to the list of fields to 
        # summarize
        stats.append([field.name, "Sum"])
# Correct formatting of stats [["Field1", "Sum"], ["Field2", "Sum"], ...]

# Run Statistics with the stats list
arcpy.analysis.Statistics(intable, outtable, stats, casefield)
Statistics example 4 (stand-alone script)

The following script uses a pandas DataFrame to access and display the tabular results of the Statistics function.

import arcpy
import pandas
import os

arcpy.env.overwriteOutput = True

in_table = r"d:\data\states.shp"
out_table = r"in_memory\stats_table"
stat_fields = [['POP1990', 'SUM'], ['POP1997', 'SUM']]

stats = arcpy.analysis.Statistics(in_table, out_table, stat_fields,
                                  case_field='SUB_REGION')

# Get a list of field names to display
field_names = [i.name for i in arcpy.ListFields(out_table) if i.type != 'OID']

# Open a cursor to extract results from stats table
cursor = arcpy.da.SearchCursor(out_table, field_names)

# Create a pandas DataFrame to display results
df = pandas.DataFrame(data=[row for row in cursor],
                      columns=field_names)

print(df)

Environments

Special cases

Time Zone

The mean statistic type on a Timestamp offset field will use the timezone offset from this environment.

Licensing information

  • Basic: Yes
  • Standard: Yes
  • Advanced: Yes

Related topics