At My Fingertips

Documentation

Simple Box Plot

A box plot is a high-level visual summary of a set of data points. Box plots can be drawn horizontally or vertically; in this activity we create horizontal box plots.

Annotated Boxplot

Quartiles & Five Number Summary

A box plot visually represents the "five-number summary" of the given values: the minimum, first quartile, median, third quartile, and maximum. The left of the box is located at the first quartile (Q1) of the data, and the right of the box at the third quartile (Q3). The width of the box is the inter-quartile range (IQR). The median (Q2) of the data is shown somewhere inside the box (in our visualization, the median is where the blue left part and the red right part of the box meet). The box has two whiskers: the left whisker extends from the left of the box to some data point in the bottom quartile. In our simple box plot, that data point is the minimum (Q0). The right whisker similarly extends from the right of the box to some data point in the top quartile; in our simple box plot, the maximum (Q4).

Given that we need the five-number-summary of a set of data points, let's first create a function, named quartiles, to do just that. Let's use the suggested method from this paper on Quartiles in Elementary Statistics.

Divide the data set into two halves, a bottom half and a top half. If n is odd, include or exclude the median in the halves so that each half has an odd number of elements. The lower and upper quartiles are then the medians of the bottom and top halves respectively.

Loading...

Let's test whether our function works according to the behavior specified by the suggested method also on different lists.

Loading...

Interface

Now let's focus on creating box plots. We would like to develop the following function:

def box_plot(
  values: list[float], axis_min: float, axis_max: float, width: float, height: float
) -> Graphic:

The parameters have the following meaning:

  • values -- the data points (numeric values) to plot.
  • axis_min -- the minimum value representable on the plot's axis.
  • axis_max -- the maximum value representable on the plot's axis.
  • width -- the total width of the visualization (onto which the values on the axis will be mapped).
  • height -- the total height of the visualization (which corresponds to the height of the bar).

One can call the function as follows:

show_graphic(box_plot([10, 20, 30, 40, 50], 0, 60, 200, 40))

This produces the following visualization:

example.png

The (invisible) axis goes from 0 to 60. The range of the data goes from 10 to 50, which means the the left whisker extends to the value 10, and the right whisker to the value 50. The whole plot has a size of 200 by 40. The axis (values 0 to 60) is mapped to the width of the plot (0 to 200).

The fact that we can specify the extent of the axis (with axis_min and axis_max) seems like overkill when creating a single boxplot. However, it allows us to create multiple box plots for different data sets (with different minimum and maximum values), and to place them above each other while sharing the same axis.

show_graphic(above(
  box_plot([10, 20, 30, 40, 50], 0, 60, 200, 40),
  box_plot([5, 15, 22, 27, 35], 0, 60, 200, 40)
))

This produces the following visualization:

example2.png

Decomposition

Terminology

The term boxplot is unfortunately used for two different things: it may mean a single box with its two whiskers, or it may mean an entire group of such whiskered boxes on a common axis.

Let's use boxplot to refer to a single box with whiskers, and boxplot group to refer to multiple whiskered boxes on a common axis.

Decomposing a Boxplot

A horizontal box plot consists of a left whisker, the left part of the box, the right part of the box, and the right whisker.

A whisker is composed of a horizontal and a vertical rectangle.

So far this decomposition describes what we can see. However, our boxplots also contain invisible pieces on their left and right: there is a gap representing the area below Q0 (from axis_min to the minimum value), and a gap representing the range above Q3 (from the maximum value to axis_max).

Implementation

Now let's implement a function to compose a boxplot.

Loading...

What You Learned

You learned to create boxplots, one of the most useful visual summaries of numerical data sets. Groups of boxplot with a shared axis are particularly helpful when comparing multiple data sets, because they make it easy to compare their central tendencies (via the locatins of their boxes, the medians) and their dispersions (via the sizes of their boxes, the IQRs).


This activity has been created by LuCE Research Lab and is licensed under CC BY-SA 4.0.

Simple Box Plot

Logo of PyTamaro

PyTamaro is a project created by the Lugano Computing Education Research Lab at the Software Institute of USI

Privacy Policy • Platform Version 0ac3f58 (Wed, 16 Oct 2024 16:07:53 GMT)