Scatter plot

Scatter plots visualize the relationship between two numeric variables, where one variable is displayed on the x-axis, and the other variable is displayed on the y-axis. For each record, a point is plotted where the two variables intersect in the chart. When the resulting points form a nonrandom structure, a relationship exists between the two variables.

Variables

Scatter plots are composed of two numbers, one for the x-axis and one for the y-axis. Additionally, a third numeric variable can be specified to proportionally size each point in the plot.

Multiple series

Scatter plots can be displayed with multiple series by setting a Split by category field. For example, in a dataset of crime incidents, a CrimeType field can be used to split the data into multiple series. The Series table will populate with each unique crime type (Theft, Vandalism, and Arson, for example), and the resulting chart will display three scatter plot series.

Display multiple series

To configure a scatter plot with multiple series, use the Display multiple series as option on the Series tab in the Chart Properties pane. By default, multiple series are displayed with the Single chart option. In this representation, all series are drawn in the same plot area, but each series is assigned a unique color to allow comparisons between the different groups.

You can also view a scatter plot with multiple series as a grid chart (also known as small multiples) by selecting the Grid option. This option displays a matrix of smaller charts in which each mini chart only shows data for an individual series. Grid charts are helpful for comparing trends and patterns between different subgroups in data. You can customize the dimensions of a grid chart layout by setting the Mini charts per row numeric value. For example, setting Mini charts per row chart to 3 will display a maximum of three charts per row—the total number of rows in the grid will be determined by the number of series in the chart. Checking the Show preview chart check box allows you to dynamically explore each mini chart in greater detail by selecting one to view in the larger preview chart.

Grid chart example

Tooltip Display Field

The Tooltip Display Field drop-down menu can be used to show values for a specific field in the tooltip for each scatter plot point. For example, when plotting housing_cost against crime_rate, it may be helpful to select neighborhood for the Tooltip Display Field value so that the name of the neighborhood is displayed when you hover over an individual point.

Statistics

A regression equation is calculated and the associated trend line is plotted on scatter plots. The trend line models the relationship between the two variables, with both linear (Linear) and nonlinear (Exponential, Logarithmic, Power, and Polynomial) trend line options available. The R² value quantifies how well the data fits the model—though this value can be problematic for nonlinear models because linearity is an assumption built-in to the R² calculation. To turn off the trend line, uncheck the Show linear trend check box in the Chart Properties pane, or turn visibility on and off by clicking the item in the legend. To change the color of the trend line, click the trend line color swatch in the Chart Properties pane and choose a new color.

Learn more about regression analysis

Note:

Charts use the following formula for calculating R²:

R-squared formula

where The actual value is the actual value, The predicted value is the predicted value, and The mean of actual values is the mean of the actual values.

Correlation

For linear trends, when small x-values correspond to small y-values, and large x-values correspond to large y-values (line sloping up), it indicates a positive correlation. When small x-values correspond to large y-values, and large x-values correspond to small y-values (line sloping down), it indicates a negative correlation.

Note:

A correlation between x and y does not imply that x causes y.

Symbol

Several options control the chart symbolization and related settings.

Size

Scatter plot points can be uniform in size or sized proportionally by a numeric attribute. Sizing scatter plot points proportionally based on a third numeric variable adds another dimension to the visualization, creating a bubble plot.

Bubble chart example

Color

Scatter plot points can be visualized using a single color or with the colors specified in the layer's symbology. By default, scatter plots use layer colors and inherit their outline and fill colors from the source layer symbology. By symbolizing a layer with a different attribute than either of the scatter plot variables, an additional dimension can be shown on the scatter plot visualization.

Axes

Several options control the axes and related settings.

Axis bounds

Default minimum and maximum axis bounds are based on the range of data values represented on the axis. These values can be customized by providing a new axis bound value. Clicking the reset button reverts the axis bound to the default value.

Log axis

By default, scatter plot axes are displayed on a linear scale. One or both axes can be displayed on a logarithmic scale by checking the Log axis check box in the Axes section of the Chart Properties pane.

Logarithmic scales are useful when visualizing data with a large positive skew in which the majority of data points have a small value, with a few data points with very large values. Changing the scale of the axis does not change the value of the data, only the way it is displayed.

Linear scales are based on addition, and logarithmic scales are based on multiplication.

On a linear scale, each increment on the axis represents the same distance in value. For example, in the axis diagram below, each increment on the axis increases by adding 10.

Linear scale axis

On a logarithmic scale, increments increase by magnitudes. In the axis diagram below, each increment on the axis increases by multiplying by 10.

Logarithmic scale axis

Note:

Logarithmic scales cannot display negative values or zero. If you log the axis of a variable with negative values or zero, those values will not appear on the chart.

Adaptive axis bounds

When a multiseries scatter plot is displayed with the Grid option, the axis bounds can be configured with the following options:

  • Fixed—Applies the global minimum and maximum bounds to all mini charts.
  • Adaptive—Adjusts to the local minimum and maximum bounds for each mini chart.

Grid intervals

Grid intervals for the x-axis and y-axis can be configured using the Interval controls. The default grid intervals will be calculated automatically.

Invert axis

Either axis of a scatter plot can be inverted by checking the Invert axis check box.

Number format

You can format the way an axis will display numeric values by specifying a number format category or by defining a custom format string. For example, $#,### can be used as a custom format string to display currency values.

Appearance

Several options control the chart appearance and related settings.

Titles and description

Charts and axes are given default titles based on the variable names and chart type. These can be edited on the General tab in the Chart Properties pane. You can also provide a chart Description, which is a block of text that appears at the bottom of the chart window.

Guides

Guide lines or ranges can be added to charts as a reference or way to highlight significant values. To add a new guide, browse to the Guides tab in the Chart Properties pane, choose whether you want to draw a vertical or horizontal guide, and click Add guide. To draw a line, enter a Value where you want the line to draw. To create a range, enter a to value. You can optionally add text to your guide by specifying a Label.

Example

The scatter plot below visualizes the relationship between diabetes and hypertension among Medicare beneficiaries. Select features in the chart to see where they are located on the map.

  • X-Axis—Diabetes rate
  • Y-Axis—Hypertension rate

Scatter plot example

Related topics