Interact with statistics

You can evaluate the quality and distribution of values in each field in your data using data engineering. For example, the number of null values in a field may be a helpful data quality metric when identifying features with missing data. Descriptive statistics, such as the mean, standard deviation, and kurtosis, may help you understand the distribution of values in the fields as well as assess how to proceed when using a field in an analysis.

The Data Engineering view in ArcGIS Pro allows you to display descriptive statistics and metrics for fields of interest in your data in a table format that displays each field as a row and each statistic as a column. You can use this table for further data exploration and correction of issues in the data through access to setting symbology, creating charts, and running geoprocessing tools that are relevant to each metric and property for the selected field.

Select fields and calculate statistics

When you open the Data Engineering view, it contains two panels: one displays the fields in your data, and the other displays a statistics table for the fields (once they have been selected and calculated).

Learn more about the Data Engineering view

To get started, click a single field in the fields panel, press Ctrl and click to select individual, separate fields, or press Shift and click to select multiple fields. Then drag the fields to the statistics panel.

Select and drag fields to the statistics panel

Alternatively, right-click selected fields and click Add To Statistics or Add To Statistics And Calculate.

Once the fields are added, they are displayed as rows in the statistics table. Each row contains the field name, alias, and data type of the selected fields. Additionally, a series of statistics columns appear that will contain additional information about the selected fields once calculations are made.

To populate information for the statistics columns for the selected fields, click the Calculate button.

Calculate button

The statistics columns are populated with information for each field of the data.

Statistics table with statistics and charts for each field

If you have selected records, the results correspond to the selected records in the data. The number of selected features and the number of features that were used to calculate statistics are shown below the statistics table.

If you have pending edits in the feature layer or table, the pending edits are used in the calculation.

Types of statistics

In the Data Engineering view, you can calculate and display statistical and data quality metrics of each field in the data as columns in a table. Once values are calculated, right-click the statistics cells for each field to access additional functionality related to the statistics. Some of this functionality uses geoprocessing tools that modify the input data. If the data is not editable, make an editable copy of it before you begin data engineering.

Note:

How results in the statistics table are rounded depends on the magnitude of the value. At least one decimal place is shown for statistics with decimal places, but additional decimal places are only added if it maintains rounding errors under 1 percent.

StatisticDescriptionApplicable data typesMenu options

Number of Nulls

A count of the number of records containing null values in the field.

To select records containing null values, right-click cells in this column.

Note:

If the symbology of the layer is not configured to display null values, the selection may not appear on the map. Configure symbology to show values out of range to display features with nulls.

Numeric, Text, Date

  • Select (select rows containing nulls for the specified field)
  • Fill Missing Values (for feature layers only)

Chart Preview

A visual representation of the distribution of values in the field.

Histograms are displayed for numeric fields (short, long, float, double), bar charts for categorical fields (text), and line charts for date type fields.

Use the chart preview column to perform an initial exploration. To create charts for fields of interest, right-click cells in this column.

Note:

Histograms and line charts are displayed with 20 bins by default. Depending on the sparsity of the data, there may be bins that contain no data, and bins with empty values are treated as zero in the chart preview. To change the level of detail, right-click the chart preview and create a chart.

Numeric, Text, Date

Minimum

The smallest value in the field.

To select records containing the minimum value, right-click cells in this column.

Numeric, Date

Maximum

The largest value in the field.

To select records containing the maximum value, right-click cells in this column.

Numeric, Date

Mean

The mean of all values in the field.

The mean is the average value in a distribution, calculated as the sum of the values divided by the total count of values in the field. The mean is the most common measure of central tendency in a distribution.

To calculate the mean date for date fields, each date is converted to a number by calculating the difference between the date and a reference date (for example, 1900-01-01), calculated in milliseconds. The sum of all millisecond values divided by the amount of date values provides the mean date, which is rounded to the nearest second for display purposes.

Note:

The mean date may not be in the same temporal resolution (i.e., minutes, seconds, milliseconds) as the values in the field.

To select records containing values above and below the mean, right-click cells in this column.

Numeric, Date

  • Select (select rows above or below mean)

Standard Deviation

The standard deviation for values in the field.

The standard deviation is a measure of the spread of the distribution. It is calculated as the square root of variance, in which the variance is the average of the squared difference of each value from the mean of the field.

Numeric

Median

The median for all values in the field.

The median is the middle value in the sorted list of values. If there is an even number of values, the median is the mean between the two middle values in the distribution.

To select records containing values above the median and values below the median, right-click cells in this column.

Numeric, Date

  • Select (select rows above or below the median)

Count

The count of the number of nonnull values in the field.

Numeric, Text, Date

  • Select (select the rows that were part of the calculated statistics)

Number of Unique Values

The number of unique values in the field.

Numeric, Text, Date

No unique actions

Mode

The mode for all values in the field.

The mode is the most frequently occurring value in the field. In the case of ties, when the most frequently occurring value in a field corresponds to multiple values, the cell displays <Multiple Values>.

To select records containing the mode, right-click cells in this column.

Numeric, Text, Date

  • Select (select the rows with mode value. For integer, text, and date fields only)

Least Common

The least common value in the field.

In the case of ties, when the least common value in a field corresponds to multiple values, the cell displays <Multiple Values>.

To select records containing the least common value, right-click cells in this column.

Numeric, Text, Date

  • Select (select the rows with the least common value. For integer, text, and date fields only)

Outliers

The number of records with outlier values in the field.

Outliers are values that are more than 1.5 times the interquartile range above the third quartile or below the first quartile of the selected field.

To select records containing the outlier values, right-click cells in this column.

Numeric

  • Select (select the rows of outliers)

Sum

The sum of all values in the field.

Numeric

No unique actions

Range

The difference between the smallest and largest values in the field.

Numeric

No unique actions

Interquartile Range

The range between the first quartile and the third quartile values in the field.

Quartiles divide the sorted list of values into four groups containing equal numbers of values. The first quartile value is the upper limit of the first group in ascending order, and the third quartile is the upper limit of the third group.

To select records containing values within this range, right-click cells in this column.

Numeric

First Quartile

The value of the first quartile in the field.

Quartiles divide the sorted list of values into four groups containing equal numbers of values. The first quartile value is the upper limit of the first group in ascending order.

In the case of ties, the mean of all corresponding values is shown.

To select records containing values above and below the first quartile, right-click cells in this column.

Numeric, Date

Third Quartile

The value of the third quartile in the field.

Quartiles divide the sorted list of values into four groups containing equal numbers of values. The third quartile is the upper limit of the third group.

In the case of ties, the mean of all corresponding values is shown.

To select records containing values above and below the third quartile, right-click cells in this column.

Numeric, Date

Coefficient of Variation

The coefficient of variation for values in the field.

The coefficient of variation is a measure of the relative spread of the values. It is calculated as the standard deviation divided by the mean of the field.

Unlike the standard deviation, which must always be considered in the context of the range of the data, the coefficient of variation provides a way to compare data series with different ranges and means.

Numeric

Skewness

The skewness of values in the field.

Skewness measures the symmetry of the distribution. Skewness is zero (or close to zero) if the distribution is symmetrical on both sides, as seen in a normal distribution. Distributions with longer tails on the left have negative skewness, and distributions with longer tails on the right have positive skewness.

The skewness is calculated as the third moment (the average of the cubed data values) divided by the cubed standard deviation.

Numeric

Kurtosis

The kurtosis of values in the field.

Kurtosis describes the heaviness of the tails of a distribution compared to the tails of a normal distribution, helping identify the frequency of extreme values. Distributions with kurtosis less than three have lighter tails and fewer extreme values than the normal distribution, and distributions with kurtosis greater than three have heavier tails and more extreme values than the normal distribution.

The kurtosis is calculated as the fourth moment (the expected value of the data values taken to the fourth power) divided by the fourth power of the standard deviation.

Numeric

Interactive statistics table

The statistics table is interactive. Right-click cells and headers and use the toolbar to access functionality.

Interact with fields

Right-click a row header to access functionality applicable to the selected field such as the following:

  • Create Chart—Create charts using the selected field. Recommendations are provided based on the data type.
  • Clean, Construct, Integrate, and Format—Access geoprocessing tools to prepare the data. See Prepare data to learn more about these options.
  • Remove Field—Remove the field from the statistics table.

Note:

Most geoprocessing operations that modify the input data cannot be undone.

Functionality options for a row in the statistics table

Display specific data types

The statistics table toolbar includes options to designate which fields and statistics columns are displayed based on data type.

Filter fields by data type

For example, you can click the Text option to remove fields of data type text.

When you remove data types from the statistics table, the columns that are unique to the removed data type are also removed. This can make it easier to review the table for items of interest. For example, if you display only fields of type date, columns that describe distributions, such as skewness and kurtosis, are omitted, so the number of columns is reduced to only those of interest.

Sort, hide, freeze, and reorder columns

The options for the column headers allow you to sort, hide, and freeze the columns in the table.

Options for the Number of Nulls column in the statistics table

Sorting allows you to reorder the rows by the value in the calculated statistic. For example, you can sort fields by the Number of Null Values column to learn which fields may have missing data.

Note:

You can only sort if the table contains fields with a single data type. Use the display options on the toolbar to filter to a specific data type; then sort.

To hide columns, click Hide Column. This removes the column from view. To show all hidden columns, click Show all columns.

Show all columns option

Click Freeze/Unfreeze to move the column to the beginning of the statistics table and lock it in place so that the column displays as you scroll the table horizontally.

To reorder the columns, click and drag a column header to the new location.

References

Related topics