> Show the data. Declutter. Foster intuition. Know your audience.
\ Otho Mantegazza _ Dataviz for Scientists _ Part 3.2
When you explore your data, your goal should be to produce, quickly, as many graphs as possible, to gain insights.
When you want to communicate results to other, you goal should be to make as few graphs as possible that convey a message to your audience in a clear and informative way.
As a scientist, the most likely scenario would be that you are communicating very complex results to an highly educated and informed audience.
You should use graphs to deliver a message to your audience.
When you do it, show as many details of the data as you can. In this way you’ll provide precious information to your audience.
If you have to hide your data behind a statistical transformation, use one which is as simple as possible.
# showing mean and
# confidence interval
# is generally accepted,
# but it still hides most
# of the information
penguins %>%
drop_na(sex) %>%
ggplot() +
aes(x = sex,
y = body_mass_g) +
stat_summary(
fun = mean,
geom = 'bar',
colour = 'black',
fill = 'grey80',
size = 1
) +
stat_summary(
fun.data = mean_cl_normal,
geom = 'errorbar',
colour = 'black',
size = 1,
width = .5
) +
scale_y_continuous(
expand = expansion(
mult = c(0, .05)
)
)
# A boxplot shows intuitive and
# robust statistics
# When we switch to this visual
# model, ggplot automatically cuts
# the axis to highlight relative
# comparisons
penguins %>%
drop_na() %>%
ggplot() +
aes(x = sex,
y = body_mass_g) +
geom_boxplot(
colour = 'black',
fill = 'grey80',
size = 1
) +
scale_y_continuous(
expand = expansion(
mult = c(0, .05)
)
)
penguins %>%
drop_na() %>%
ggplot() +
aes(x = sex,
y = body_mass_g,
colour = species,
shape = species) +
geom_jitter(
shape = 1,
size = 3,
stroke = 1
) +
stat_summary(
fun = median,
geom = 'point',
shape = '_',
size = 20,
colour = 'black'
) +
facet_grid(
cols = vars(species)
) +
scale_y_continuous(
expand = expansion(
mult = c(0, .05)
)
)
Show the data is a good mindset to make informative plots.
Though, for the sake of clarity and simplicity, you might decide to hide some of the raw data behind statistical transformation.
It’s up to you to find the balance between detail and simplicity that suits your audience.
# And add a red point that shows a robust
# summary of the data: the median
# of price over binned carat
diamonds %>%
filter(carat < 3) %>%
ggplot() +
aes(x = carat,
y = price) +
geom_point(shape = 1,
alpha = .05) +
stat_summary(
aes(
x = carat %>% round(1)
),
fun = median,
colour = '#f44702',
geom = 'point',
size = 3
)
Your graph should always deliver a message.
While delivering this message, you should provide as much information as possible, showing the data.
If your graph gets too crowded, feel free to hide some of the data behind a statistical transformation.
Graphs are processed intuitively by your audience, be sure not to mislead them.
Keep your message in front.
One of the main mindset for decluttering a graph is the “data to ink ratio” concept, developed by E. Tufte.
According to it, most ink in your graph should be used to represent data.
User research is a design practice about researching how user interact with a tools or products like websites. Asking what would the user expects from a tool and how they interact with it.
User research proceeds through methods from behavioral psychology and ethnography, such as interviews, surveys, tests.
When you communicate your data driven results, graphs and charts are your products, and your audience is their user.
Probably, you’ll collaborate with people from many different fields and disciplines.
Each field has its own way to represent specific types of data.
People in each field expect to see data represented in a way that they are used too.
In a few words, people don’t like change.
We’ve seen examples that for each data driven discipline there’s a set of specific and elaborated visual models that are used often and repeatedly.
You might think that they are great or that they are bad, but they are the plots being used.
If you try to change them you’ll encounter opposition, because you’ll require your readers to go through a mental burden, when they are exposed to a change to the visual model that they are used to process intuitively.
Data visualization is often processed intuitively. This is its power, that let’s us process complex information with very little cognitive burden.
If you change to much what people expect, you will place a burden on them. They might have a hard time understand the message you are trying to convey, or they might apply the wrong mental model to it, and be misled to wrong conclusions.
Change things gradually and with caution.
When changing the status quo, when you want to apply a new visual model to a known problem, test your changes.
They might be an improvements for you, but be misleading to others.
Your resources will probably fall short of what is needed for a full cognitive test. Nevertheless you can ask your colleagues, coworkers and friends to go through your graphs and give you feedback about what they understand and what they don’t.
Nevertheless, there are fundamental perception experiments that tell us how accurately we tent to read different shapes of visual information.
If you want to study this topic, a great place to start is Kennedy Elliot’s blog post: 39 studies about human perception in 30 minutes.
Recover your graphs from the exercises at the end of section 5 - Better Graphs Part 1.
If you feel is needed, take 30 minutes to improve them, based on what you have learned in this lesson.
Afterwards, go to your colleagues and ask them to read your graphs. →
→ Your colleagues should read your graph and:
Let your colleagues ask you questions and note down the difficulties that they are facing. These are precious information on how to improve your graphs.
Remember, you are NOT testing your colleagues’ skills, you are testing the readability of your graphs.