> How to use graphical variables effectively.
\ Otho Mantegazza _ Dataviz for Scientists _ Part 3.1
When you draw a data visualization on a two dimensional screen, you map values from data variables to graphical variables:
In terms of the grammar of graphics, the graphical variables are the aesthetics that you map your data to.
The x and y planar variables are largely perceived as a quantitative linear space. And they are great for representing quantitative and qualitative data.
The planar variables are the x and y coordinates on your planar screen, which readily translate to the x and y positions in your graph.
(f you use cartesian coordinates or to a transformed version of them if you use more elaborate coordinate systems)
You can place qualitative variables both in the x and y variable. In this case the x might stop being the independent variable, and the y stops being the response.
Often there is no clear hypothetical relationship of cause effect between two variables, in that case you can invert the x and the y freely.
The retinal variables are all those other graphical variables, that cannot be interpreted directly as a position on the screen’s x and y.
The most important retinal variables are colour hue, color value, shape, orientation, size, area, texture.
Each one has its own peculiarities and its own rules about how it can be used best.
Colours can be mapped both to categorical and continuous variables.
With some caveats, colours are a multidimensional space:
If you find it hard to plan colours, don’t worry, colours are complex for everyone.
On a screen, colours are defined as three hexadecimal strings, that combine 256 levels of red, green and blue.
Colors are perceived non linearly, and the model of how colors are perceived by people gets constantly updated. The most used model is CIECAM02.
What should interest you is:
You can use colours to encode for categorical variables.
If the categorical variable is not ordered you should modulate the colors hue, with also small changes to saturation and lightness.
Always check if your colour palette is accessible by colour blind people.
Using colours to encode continuous variables is somehow easier.
If you check that your palette are friendly to colour blind people, you can also detect unwanted patterns perception patterns.
You can use Firefox accessibility tools to simulate colour blind perception.
In categorical palettes, you should be able to distinguish colours, even in small plotting characters.
Check if colours are different, even when plotted in black and white. Otherwise consider using and additional graphical variable to encode the information.
Continuous colour palette should be perceived linearly and univocally throughout the spectra. Check that this is true also for color blindness and black and white.
You should handle ordered categorical variables as if they were quantitative, not categorical.
Remember, data visualization is processed intuitively by the readers.
Colours have meaning. Don’t represent ice coverage in red, don’t represent the warming of the ocean in blue, ask yourself if your colour palette is appropriate for the topic your data are about.
There are plenty of colour palette available in R, so it’s unlikely that you’ll have to design your own.
It’s more likely that you’ll have to be able to choose a good one.
The palette gallery from the paletteer package is a great place to start.
Also, check the blog posts presenting cubehelix, the viridis and batlow for an intro on perceptually uniform color maps.
Check the climate stripes with a tool to simulate colour blind vision. Are the climate stripes colour blind friendly? Motivate your anwser.
Like any other graphical variables, colours in ggplot2 can be encoded with the family of functions:
Check the documentation on the ggplot2 book.
diamonds %>%
ggplot() +
aes(x = carat,
y = price) +
geom_bin_2d(
binwidth = c(0.01, 50)
) +
scale_x_continuous(
expand = expansion(0, 0),
limits = c(.2, .75)
) +
scale_y_continuous(
expand = expansion(0, 0),
limits = c(0, 2500)
) +
scale_fill_viridis_c(
direction = -1,
option = 'G'
) +
guides(
fill = guide_colourbar(
barwidth = 13,
barheight = 1)
)
You can use point shapes to encode categorical information.
Shapes are simple and easy to understand.
They can’t be used to represent quantitative data. They could be used for ordered categorical data, but I’d advise against this practice.
Linear size and area are often mapped to quantitative values, such as absolute measurements and percentages.
Size is often used together with colour to display a quantitative value stratified by a qualitative factor.
When we use a barolot, we map the data to the length of the bars, not to the area.
The area of the bars is directly proportional to the data values.
Though, conceptually, if we would be mapping the data to the bars’ area instead of their length, we could no represent negative values.
diamonds %>%
filter(color == "J") %>%
ggplot() +
aes(x = color,
fill = clarity) +
geom_bar(position = 'fill') +
coord_polar(theta = "y") +
scale_y_reverse(
expand = expansion(
mult = c(0, 0)
)
) +
theme_void(
base_size = base_size
) +
theme(
legend.position = "bottom",
plot.margin = margin(10,5,5,5)
)
Pie charts get a bad reputation for not being a nuanced analytical graphs.
But pie charts are effective at representing percentages, and they outscore barcharts when the number of slices is high.
Can you describe the pie chart from the previous page in terms of the grammar of graphics?
You can encode a continuous variable in the area of the circles in a scatterplot, or in the area the plotting character of your choice.
Our perception is not as good at comparing areas, so use this retinal variable with parsimony.
You can map data to the radius of circles or to the area directly. Mapping data to the radius might be perceptually better, although neither choice is optimal.
msleep %>%
drop_na(
bodywt, brainwt,
name, sleep_rem
) %>%
ggplot() +
aes(x = bodywt,
y = brainwt,
size = sleep_rem) +
geom_point(
alpha = .8
) +
geom_text_repel(
data = . %>%
sample_frac(.08),
aes(label = name),
size = base_size/size_scale,
min.segment.length = 0,
direction = 'y',
nudge_y = 1.2,
hjust = 1,
colour = '#f44702'
) +
scale_x_log10() +
scale_y_log10() +
scale_size(
limits = c(0, NA)
)
msleep %>%
drop_na(
bodywt, brainwt,
name, sleep_rem
) %>%
ggplot() +
aes(x = bodywt,
y = brainwt,
size = sleep_rem) +
geom_point(
alpha = .8
) +
geom_text_repel(
data = . %>%
sample_frac(.08),
aes(label = name),
size = base_size/size_scale,
min.segment.length = 0,
direction = 'y',
nudge_y = 1.2,
hjust = 1,
colour = '#f44702'
) +
scale_x_log10() +
scale_y_log10() +
scale_radius(
limits = c(0, NA)
)
A light coloured area is often used to encode for a measurement of uncertainty or dispersion of the data.
For example, the confidence interval of a regression model, or the prediction of how a natural phenomena will evolve in space and time.
How to represent uncertainty intuitively, is an active field of research.
How to represent uncertainty is an active area of research.
You can check Dr. Lace Padilla’s Work, for an overview of the best practices and the latest findings.
The orientation of plotting characters is used to show the vectorial orientation of dimensions such as wind or other types of movements on a map.
The orientation of plotting characters is often used combined with their length, to show intensity and direction.
In ggplot there is no way to control the orientation of a plotting character directly. So you’ll have to use a segments, calculating their start and end points from data.
load('data/wind.data.RData')
wind <- wind.data
rm(wind.data)
wind %>%
filter(lon > -7) %>%
ggplot() +
aes(
x = lon,
y = lat,
xend = (
lon + cos(
dir*pi/180
)/5
),
yend = (
lat + sin(
dir*pi/180
)/5
)
) +
geom_segment(
arrow = arrow(
length = unit(1, 'mm')
)
) +
scale_x_continuous(
expand = expansion(
mult = c(.01, .01))
) +
scale_y_continuous(
expand = expansion(
mult = c(.01, .01))
) +
coord_map() +
theme(
plot.margin = margin(
0, 0, 0, 0
)
)
wind %>%
filter(lon > -7) %>%
ggplot() +
aes(
x = lon,
y = lat,
xend = (
lon + speed*cos(
dir*pi/180
)/20
),
yend = (
lat + speed*sin(
dir*pi/180
)/20
)
) +
geom_segment(
arrow = arrow(
length = unit(1, 'mm')
)
) +
scale_x_continuous(
expand = expansion(
mult = c(.01, .01))
) +
scale_y_continuous(
expand = expansion(
mult = c(.01, .01))
) +
coord_map() +
theme(
plot.margin = margin(
0 ,0 ,0, 0
)
)
The texture is often used to encode categorical data in various types of lines.
It can also be used to fill shapes in a semi-quantitative way. This aspect fell in disuse, but it can be a good choice for printer-friendly visualization.
GGplot does not support filling shapes with patterns natively. But you can do it with the package ggpattern.
The example on the side is from texture.js instead, a js library for textured web graphics.
On 2021-02-16, TidyTuesday published a challenge to remember the work of W.E.B Du Bois.
On this Github Page you can find the data for 10 of the renown Du Bois’ charts
Redesign or just redraw one or more of Du Bois’ graphs. Feel free to modify them as much as you want, but explain your design choices and how they improve the original graph.
On 2023-08-22, TidyTuesday published a challenge on UNHCR’s migration data.
You can find data and further information in this github repository.
Explore those data, find a message that you would like to communicate, design a graph to convey that message. Explain your stylistical choices, how do they help you convey your message.