Box plots allow you to visualize and compare the distribution and central tendency of numeric values through their quartiles. Quartiles are a way of splitting numeric values into four equal groups based on five key values: minimum, first quartile, median, third quartile, and maximum.
The box portion of the chart illustrates the middle 50 percent of the data values, also known as the interquartile range, or IQR. The median of the values is depicted as a line splitting the box in half. The IQR illustrates the variability in a set of values. A large IQR indicates a large spread in values, while a smaller IQR indicates most values fall near the center. Box plots also illustrate the minimum and maximum data values through whiskers extending from the box, and optionally, outliers as points extending beyond the whiskers.
Box plots are composed of an x-axis and a y-axis. The x-axis assigns one box for each Category or Number Field. The y-axis is used to measure the minimum, first quartile, median, third quartile, and maximum value in a set of numbers.
Box plot series can be created in one of two ways:
- Create multiple series from a split field
- Create multiple series from multiple fields
Create multiple series from a split field
Create multiple series from a split field allows you to use multiple Number Fields or a single Number Field with a Category variable to create a series. To split a series into multiple series, a Split by variable can be specified.
Box plots can be used to visualize one or many distributions. To visualize a single distribution, specify a numeric attribute for the Number Fields. This results in a chart with one box plot used to visualize the distribution of the chosen numeric attribute.
Additional numeric attributes can be added to the Number Fields to compare multiple distributions from different attribute fields in a table. For example, in a county dataset, Population2010and Population2015 are set as Number Fields. The resulting chart will display two box plots, one visualizing the distribution of Population2010, and one visualizing the distribution of Population2015.
When only a single numeric attribute is selected as a Number Field, the option of adding a Category variable is available as a way of comparing distributions across different categories. For example, Population2010 is set as the Number Field and StateName as the Category for a county dataset. The resulting chart will display a box plot for each state, visualizing the distribution of Population2010 for all counties belonging to each state.
Additionally, a Split by variable can be added as a way to further divide the data and create multiple series. For example, Population2010 is set as the Number Field, StateName as the Category, and ElectionWinner as a Split by field for a county dataset. The Series table will populate with each unique ElectionWinner value (Democrat, Republican). The resulting chart will display two side-by-side box plots for each state (100 box plots total), one visualizing the distribution of Population2010 of all counties with the ElectionWinner value of Democrat, and one for all counties with the ElectionWinner value of Republican.
Split by fields can also be used when multiple Number Fields are used instead of a Category variable. For example, Population2010, Population2015, and Population2020 are set as the Number Fields and ElectionWinner as the Split by field for a county dataset. The resulting chart will display the three Number Fields along the x-axis (Population2010, Population2015,, Population2020), each with two side-by-side box plots, one displaying the distribution for all counties with the ElectionWinner value of Democrat, and the other for all counties with the ElectionWinner value of Republican.
When a Split by field is used to create multiple series, there are two options for visualizing the results. Show as multiple box plots will result in a chart with side-by-side box plots, one for each series. Alternatively, Show as mean lines will create one box plot for each Category value or Number Field and use lines to show the mean for each unique value in the Split by field. For example, Population2010 is set as the Number Field, StateName as the Category, and ElectionWinner as a Split by field for a county dataset. The Series table will populate with each unique ElectionWinner value (Democrat, Republican), but instead of splitting each state into a box plot for each ElectionWinner value, the resulting chart will display one box plot for each state visualizing the distribution of Population2010 for counties within that state, and the mean value of each Split byseries (Democrat, Republican) will be overlaid on the box plots showing where the mean value of each series falls in relation to the total distribution.
Create multiple series from multiple fields
Create multiple series from multiple fields requires a Category variable and one or multiple Numeric Fields. Each Numeric Field added to the series table will create a new series. For example, StateName is set as the Category, and Population2010, Population2015, and Population2020 are set as the Numeric Fields in the Series table for a county dataset. The resulting chart will have states as categories along the x-axis, with three series each (Population2010, Population2015, and Population2020).
When a box plot is created from multiple Number Fields, a z-score standardization is applied by default. Standardization allows for numeric variables of different units to be comparable.
For example, a box plot comparing the distributions of income (with values in the tens of thousands) and unemployment rate (values ranging between 0 and 1.0) would be difficult to read without standardization because the unemployment rate values are so much smaller than the income values.
Standardization of the attribute values involves a z-transform where the mean for all values is subtracted from each value and divided by the standard deviation for all values.
The z-score standardization puts all the attributes on the same scale, allowing multiple distributions to be visualized in the same chart.
If you wish to visualize the raw values instead, simply uncheck the Standardize values (z-score) checkbox in the Chart Properties pane.
Default minimum and maximum y-axis bounds are set based on the range of data values represented on the axis. These values can be customized by typing in a new desired axis bound value. Clicking the reset icon will revert the axis bound back to the default value.
You can format the way an axis will display numeric values by specifying a number format category or by defining a custom format string.
Titles and description
Charts and axes are given default titles based on the variable names and chart type. These can be edited on the General tab in the Chart Properties pane. You can also provide a chart Description, which is a block of text that appears at the bottom of the chart window.
When a chart window is active, a Chart Format context ribbon becomes available, allowing visual formatting of the chart. Chart formatting options include the following:
- Changing the size, color, and style of the font used for axis titles, axis labels, description text, and legend text
- Changing the color, width, and line type for grid and axis lines
- Changing the background color of the chart
Box plots will match the outline and fill colors defined in the layer symbology whenever possible. When series are split in a way that does not correspond with the layer symbology, a standard color palette will be applied. Series colors can be changed by clicking the Symbol color swatch in the Series table and choosing a new color.
Create a box plot to compare the distributions and variability of different chronic health conditions by state.
- Create multiple series from multiple fields
- Fields—% Diabetes, % Asthma, % Heart Failure