Box plots allow you to visualize and compare the distribution and central tendency of numeric values through their quartiles. Quartiles are a way of splitting numeric values into four equal groups based on five key values: minimum, first quartile, median, third quartile, and maximum.
The box portion of the chart illustrates the middle 50 percent of the data values, also known as the interquartile range, or IQR. The median of the values is depicted as a line splitting the box in half. The IQR illustrates the variability in a set of values. A large IQR indicates a large spread in values, while a smaller IQR indicates most values fall near the center. Box plots also illustrate the minimum and maximum data values through whiskers extending from the box, and optionally, outliers as points extending beyond the whiskers.
Box plots are composed of an x-axis and a y-axis. The x-axis assigns one box for each Category or Numeric field. The y-axis is used to measure the minimum, first quartile, median, third quartile, and maximum value in a set of numbers.
Box plots can be used to visualize one or many distributions. To visualize a single distribution, add one Numeric field. This results in a chart with one box plot visualizing the distribution of the chosen numeric attribute.
Additional Numeric fields can be added to compare multiple distributions from different attribute fields in a table. For example, in a county dataset, Population2010 and Population2015 are added as Numeric fields. The resulting chart will display two box plots, one visualizing the distribution of Population2010, and one visualizing the distribution of Population2015 for all counties in the dataset.
When only a single Numeric field is added, the option of adding a Category variable is available as a way of comparing distributions across different categories. For example, Population2010 is set as the Numeric field and StateName as the Category for a county dataset. The resulting chart will display a box plot for each state, visualizing the distribution of Population2010 for all counties belonging to each state.
Multiple series box plots can be used to compare distributions of different types, or by different categories.
Multiple series box plots can be created by specifying a Category field and multiple Numeric fields, or by specifying a Split by category field.
When using a Category variable with multiple Numeric fields, each Numeric field added to the series table will create a new series. For example, in a county dataset StateName is set as the Category and Population2010, Population2015, and Population2020 are set as the Numeric fields. The resulting chart will have states as categories along the x-axis, with three series each (Population2010, Population2015, and Population2020).
Alternatively, a Split by variable can be added as a way to further divide the data and create multiple series. For example, Population2010 is set as the Numeric field, StateName as the Category, and ElectionWinner as a Split by field for a county dataset. The Series table will populate with each unique ElectionWinner value (Democrat or Republican). The resulting chart will display two side-by-side box plots for each state (100 box plots total), one visualizing the distribution of Population2010 of all counties in each state with the ElectionWinner value of Democrat, and one for all counties in each state with the ElectionWinner value of Republican.
Split by fields can also be used when multiple Numeric fields are used instead of a Category variable. For example, Population2010, Population2015, and Population2020 are set as the Numeric fields and ElectionWinner as the Split by field for a county dataset. The resulting chart will display the three Numeric fields along the x-axis (Population2010, Population2015, and Population2020), each with two side-by-side box plots: one displaying the distribution for all counties with the ElectionWinner value of Democrat, and the other for all counties with the ElectionWinner value of Republican.
Display multiple series
When a Split by field is used to create multiple series, there are two options for visualizing the results.
- Show as multiple box plots—Create side-by-side box plots, one for each series.
- Show as mean lines—Create one box plot for each Category value or Numeric field and use lines to show the mean for each unique value in the Split by field.
For example, Population2010 is set as the Numeric field, StateName as the Category, and ElectionWinner as a Split by field for a county dataset. The Series table will populate with each unique ElectionWinner value (Democrat andRepublican), but instead of splitting each state into a box plot for each ElectionWinner value, the resulting chart will display one box plot for each state visualizing the distribution of Population2010 for counties within that state, and the mean value of each Split by series (Democrat and Republican) will be overlaid on the box plots showing where the mean value of each series falls in relation to the total distribution.
When a box plot is created from multiple Numeric fields, a z-score standardization is applied by default. Standardization allows for numeric variables of different units to be comparable.
For example, a box plot comparing the distributions of income (with values in the tens of thousands) and unemployment rate (values ranging between 0 and 1.0) would be difficult to read without standardization because the unemployment rate values are so much smaller than the income values.
Standardization of the attribute values involves a z-transform, where the mean for all values is subtracted from each value and divided by the standard deviation for all values. The z-score standardization puts all the attributes on the same scale, allowing multiple distributions to be visualized in the same chart. To visualize the raw values instead, uncheck the Standardize values (z-score) check box in the Chart Properties pane.
Default y-axis bounds are set based on the range of data values represented on the y-axis. These values can be customized by typing in a new axis bound value. Setting axis bounds can be used as a way to keep the scale of your chart consistent for comparison. Clicking the reset icon will revert the axis bound back to the default value.
You can format the way an axis displays numeric values by specifying a number format category or by defining a custom format string. For example, $#,### can be used as a custom format string to display currency values.
Titles and description
Charts and axes are given default titles based on the variable names and chart type. These can be edited on the General tab in the Chart Properties pane. You can also provide a chart Description, which is a block of text that appears at the bottom of the chart window.
You can configure the look of your chart by formatting text and symbol elements, or by applying a chart theme. Format properties can be configured on the Format tab in the Chart Properties pane, or through the Chart Format context ribbon. Chart formatting options include the following:
- Size, color, and style of the font used for axis titles, axis labels, description text, legend title, legend text, and guide labels
- Color, width, and line type for grid and axis lines
- Background color of the chart
Box plots match the outline and fill colors defined in the layer symbology whenever possible. When series are split in a way that does not correspond with the layer symbology, a standard color palette will be applied. Series colors can be changed on the Series tab in the Chart Properties pane by clicking the Symbol color patch in the Series table and choosing a new color.
Boxes can be drawn horizontally by clicking the Rotate chart button in the chart window.
Guide lines or ranges can be added to charts as a reference or way to highlight significant values. To add a new guide, navigate to the Guides tab in the Chart Properties pane and click Add guide. To draw a line, enter a Value where you would like the line to draw. To create a range, enter a to value. You can optionally add text to your guide by specifying a Label.
Create a box plot to compare the distributions and variability of different chronic health conditions by state.
- Numeric fields—% Diabetes, % Asthma, % Heart Failure