Box plot—ArcGIS Pro | Documentation

Box plots allow you to visualize and compare the distribution and central tendency of numeric values through their quartiles. Quartiles are a method of splitting numeric values into four equal groups based on five key values: minimum, first quartile, median, third quartile, and maximum. Box plots use the percentile calculation to determine quartile values. For example, the first quartile is equal to the 25th percentile.

The box portion of the diagram below illustrates the middle 50 percent of the data values, also known as the interquartile range (IQR). The median of the values is depicted as a line splitting the box in half. The IQR illustrates the variability in a set of values. A large IQR indicates a large spread in values, and a smaller IQR indicates most values fall near the center. Box plots also illustrate the minimum and maximum data values through whiskers, or lines, extending from the box, and optionally, outliers as points extending beyond the whiskers. Outliers are defined as values that are 1.5 times the IQR below the first quartile or above the third quartile.

Variables

Box plots are composed of an x-axis and a y-axis. The x-axis assigns one box for each Category or Numeric field variable. The y-axis is used to measure the minimum, first quartile, median, third quartile, and maximum value in a set of numbers.

You can use box plots to visualize one or many distributions. To visualize a single distribution, add one Numeric field variable. This results in a chart with one box plot visualizing the distribution of the specified numeric attribute.

You can add more Numeric field variables to compare multiple distributions from different attribute fields in a table. For example, in a county dataset, Population2010 and Population2015 fields are added as Numeric field variables. The resulting chart displays two box plots, one visualizing the distribution of Population2010, and the other visualizing the distribution of Population2015, for all counties in the dataset.

When only a single Numeric field variable is added, you can add a Category variable as a method of comparing distributions across categories. For example, Population2010 is set as the Numeric field variable and StateName is set as the Category variable for a county dataset. The resulting chart displays a box plot for each state, visualizing the distribution of Population2010 for all counties belonging to each state.

Multiple series

You can use multiple series box plots to compare distributions of different types or by different categories.

Multiple series box plots can be created by specifying a Category variable and multiple Numeric field variables, or by specifying a Split by category field.

When using a Category variable with multiple Numeric field variable, each Numeric field variable added to the Series table creates a series. For example, in a county dataset, StateName is set as the Category variable and Population2010, Population2015, and Population2020 are set as the Numeric field variables. The resulting chart has states as categories along the x-axis, with three series each (Population2010, Population2015, and Population2020).

Alternatively, the Split by category field can be added as a way to further divide the data and create multiple series. For example, Population2010 is set as the Numeric field variable, StateName as the Category variable, and ElectionWinner as the Split by category field for a county dataset. The Series table is populated with each unique ElectionWinner value (Democrat or Republican). The resulting chart will display two side-by-side box plots for each state (100 box plots total), one visualizing the distribution of Population2010 of all counties in each state with the ElectionWinner value of Democrat, and one for all counties in each state with the ElectionWinner value of Republican.

You can also use Split by category fields when multiple Numeric field variables are used instead of a Category variable. For example, Population2010, Population2015, and Population2020 are set as the Numeric field variables, and ElectionWinner is set as the Split by category field for a county dataset. The resulting chart will display the three Numeric field variables along the x-axis (Population2010, Population2015, and Population2020), each with two side-by-side box plots: one displaying the distribution for all counties with the ElectionWinner value of Democrat, and the other for all counties with the ElectionWinner value of Republican.

Display multiple series

When you use a Split by category field to create multiple series, you have the following options for visualizing the results:

Side-by-side —Create side-by-side box plots, one for each series.
As mean lines —Create one box plot for each Category variable or Numeric field variable, and use lines to show the mean for each unique value in the Split by category field.

For example, Population2010 is set as the Numeric field variable, StateName is set as the Category variable, and ElectionWinner is set as a Split by category field for a county dataset. The Series table populates each unique ElectionWinner value (Democrat and Republican), but instead of splitting each state into a box plot for each ElectionWinner value, the resulting chart displays one box plot for each state visualizing the distribution of Population2010 for counties within that state, and the mean value of each Split by category field series (Democrat and Republican) is overlaid on the box plots showing where the mean value of each series falls in relation to the total distribution.

Standardization

When you create a box plot from multiple Numeric field variables, a z-score standardization is applied by default. Standardization allows numeric variables of different units to be comparable.

For example, a box plot comparing the distributions of income (with values in the tens of thousands) and unemployment rate (values ranging between 0 and 1.0) would be difficult to read without standardization because the unemployment rate values are so much smaller than the income values.

Standardization of the attribute values involves a z-transform in which the mean for all values is subtracted from each value and divided by the standard deviation for all values. The z-score standardization puts all the attributes on the same scale, allowing multiple distributions to be visualized in the same chart. To visualize the raw values instead, uncheck the Standardize values (z-score) check box in the Chart Properties pane.

Axes

The options described in the subsections below control the axes and related settings.

X-axis label character limit

Category labels are truncated at 11 characters by default. When labels are truncated, you can hover over the label to view the full text. To display the entire label text in the chart, increase the label character limit.

Y-axis bounds

Default y-axis bounds are set based on the range of data values represented on the y-axis. Customize these values by providing a new axis bound value. You can set axis bounds to keep the scale of a chart consistent for comparison. Click the Reset button Reset to revert the axis bound to the default value.

Grid intervals

Configure grid intervals for the y-axis using the Interval control. The default grid interval will be calculated automatically.

Number format

You can format the way an axis displays numeric values by specifying a number format category or defining a custom format string. For example, use $#,### as a custom format string to display currency values.

Appearance

The options described in the subsections below control the chart appearance and related settings.

Titles and description

The charts and axes default titles are based on the variable names and chart type. These can be edited on the General tab in the Chart Properties pane. You can also provide a value for the Description option, which is a block of text that appears at the bottom of the chart window.

Visual formatting

You can configure the look of a chart by formatting text and symbol elements, or by applying a chart theme. Format properties can be configured on the Format tab in the Chart Properties pane. A chart theme can be selected on the Chart tab. Chart formatting options include the following:

Size, color, and style of the font used for axis titles, axis labels, description text, legend title, legend text, and guide labels
Color, width, and line type for grid and axis lines
Background color of the chart

Learn more about changing a chart's appearance

Series style

Box plots match the outline and fill colors defined in the layer symbology whenever possible. When series are split in a way that does not correspond with the layer symbology, a standard color palette is applied. You can change series colors on the Series tab in the Chart Properties pane by clicking the Symbol color patch in the Series table and choosing a new color. To apply a common style to multiple series, select multiple rows in the Series table, and click the Symbol color patch for one of the selected series. Alternatively, use the Color scheme drop-down list on the Series tab to apply a palette to the series in a chart.

Sort

Box plots are automatically sorted alphabetically by their categories (x-axis ascending). You can change this using the Sort options in the Chart Properties pane. The following sort options are available for box plots:

X-axis Ascending—Categories are arranged alphabetically from left to right.
X-axis Descending—Categories are arranged in reverse alphabetical order.
Mean Ascending—Boxes are arranged by the mean statistic in ascending order.
Mean Descending—Boxes are arranged by the mean statistic in descending order.
Median Ascending—Boxes are arranged by the median statistic in ascending order.
Median Descending—Boxes are arranged by the median statistic in descending order.
Custom sort—Categories can be arranged manually in the Custom sort table.

Orientation

To draw boxes horizontally, click the Rotate chart button in the chart window.

Guides

Guide lines or ranges can be added to charts as a reference or way to highlight significant values. To add a new guide, go to the Guides tab in the Chart Properties pane, click the arrow on the Add guide button, and select one of the following options:

Create fixed value line or range guide—Draw a line or range guide at a fixed location. When this option is selected, provide a value for Value where you want the line to draw. To create a range, provide a to value.
Create data-driven guide—Draw a data-driven guide. When this option is selected, use the Value drop-down list to a select a field whose values will be used to calculate the location of the guide. Select an aggregation option to specify how these values are summarized.

The guide style can be configured using the Line style or Fill color style picker, depending on the guide type. Optionally, add text to the guide by specifying a Label value, and configure the label style by clicking the text swatch adjacent to the input to open the style picker. Data-driven guides always display the guide value (based on the field values and aggregation), and this value will be appended to the end of any text provided for the Label value.

Example

Create a box plot to compare the distributions and variability of chronic health conditions by state using the following settings:

Numeric fields—% Diabetes, % Asthma, % Heart Failure
Category—State

Box plot comparing the distributions and variability of chronic health conditions by state