At My Fingertips

A box plot is a high-level visual summary of a set of data points. Box plots can be drawn horizontally or vertically; in this activity we create horizontal box plots.

A box plot visually represents the "five-number summary" of the given values:
the minimum, first quartile, median, third quartile, and maximum.
The left of the box is located at the **first quartile** (Q1) of the data,
and the right of the box at the **third quartile** (Q3).
The width of the box is the **inter-quartile range** (IQR).
The **median** (Q2) of the data is shown somewhere inside the box
(in our visualization, the median is
where the blue left part and the red right part of the box meet).
The box has two whiskers:
the left whisker extends from the left of the box
to some data point in the bottom quartile.
In our simple box plot, that data point is the minimum (Q0).
The right whisker similarly extends from the right of the box
to some data point in the top quartile;
in our simple box plot, the maximum (Q4).

Given that we need the five-number-summary of a set of data points,
let's first create a function, named `quartiles`

, to do just that.
Let's use the *suggested method* from this paper on
Quartiles in Elementary Statistics.

Divide the data set into two halves, a bottom half and a top half. If n is odd, include or exclude the median in the halves so that each half has an odd number of elements. The lower and upper quartiles are then the medians of the bottom and top halves respectively.

Loading...

Let's test whether our function works according to the behavior specified by the suggested method also on different lists.

Loading...

Now let's focus on creating box plots. We would like to develop the following function:

```
def box_plot(
values: list[float], axis_min: float, axis_max: float, width: float, height: float
) -> Graphic:
```

The parameters have the following meaning:

**values**-- the data points (numeric values) to plot.**axis_min**-- the minimum value representable on the plot's axis.**axis_max**-- the maximum value representable on the plot's axis.**width**-- the total width of the visualization (onto which the values on the axis will be mapped).**height**-- the total height of the visualization (which corresponds to the height of the bar).

One can call the function as follows:

`show_graphic(box_plot([10, 20, 30, 40, 50], 0, 60, 200, 40))`

This produces the following visualization:

The (invisible) axis goes from 0 to 60. The range of the data goes from 10 to 50, which means the the left whisker extends to the value 10, and the right whisker to the value 50. The whole plot has a size of 200 by 40. The axis (values 0 to 60) is mapped to the width of the plot (0 to 200).

The fact that we can specify the extent of the axis (with `axis_min`

and `axis_max`

)
seems like overkill when creating a single boxplot.
However, it allows us to create multiple box plots for different data sets
(with different minimum and maximum values),
and to place them above each other while sharing the same axis.

```
show_graphic(above(
box_plot([10, 20, 30, 40, 50], 0, 60, 200, 40),
box_plot([5, 15, 22, 27, 35], 0, 60, 200, 40)
))
```

This produces the following visualization:

The term *boxplot* is unfortunately used for two different things:
it may mean a single box with its two whiskers,
or it may mean an entire group of such whiskered boxes on a common axis.

Let's use *boxplot* to refer to a single box with whiskers,
and *boxplot group* to refer to multiple whiskered boxes on a common axis.

A horizontal box plot consists of a left whisker, the left part of the box, the right part of the box, and the right whisker.

A whisker is composed of a horizontal and a vertical rectangle.

So far this decomposition describes what we can see.
However, our boxplots also contain invisible pieces on their left and right:
there is a gap representing the area below Q0 (from `axis_min`

to the minimum value),
and a gap representing the range above Q3 (from the maximum value to `axis_max`

).

Now let's implement a function to compose a boxplot.

Loading...

You learned to create boxplots, one of the most useful visual summaries of numerical data sets. Groups of boxplot with a shared axis are particularly helpful when comparing multiple data sets, because they make it easy to compare their central tendencies (via the locatins of their boxes, the medians) and their dispersions (via the sizes of their boxes, the IQRs).

This activity has been created by LuCE Research Lab and is licensed under CC BY-SA 4.0.

Simple Box Plot

PyTamaro is a project created by theÂ Lugano Computing Education Research LabÂ at theÂ Software InstituteÂ ofÂ USI

Privacy Policy â˘ Platform Version b744b47 (Tue, 08 Oct 2024 16:30:14 GMT)