>A Grammar for Graphics

> A collection of terms and concepts to declare data visualization systematically

\ Otho Mantegazza _ Dataviz for Scientists _ Part 2.2

Can you Describe a Graph?

If we find a way to describe graphs systematically, then we can design and develop them more easily.

Most technical graphs can be declared with a system of rules called “Grammar of Graphics”.

This system of rules is the basis for many data visualization packages, such as ggplot2, Seaborn and Altair.

A Grammar for Graphics

The “Grammar of Graphics” was developed by Leeland Wilkinson.

It was later extended by Hadley Wickham, who started encoding it in the R package ggplot2.

Recently, a new API in the style of ggplot2 was included in a new version of Seaborn, making the layered grammar of graphics available also in Python,

A Grammar for Graphics

The layered grammar of graphics defines graphics as composed of:

  • A default dataset and set of mappings from variables to aesthetics.
  • One or more layers, with each layer having one geometric object, one statistical transformation, one position adjustment, and optionally, one dataset and set of aesthetic mappings.
  • One scale for each aesthetic mapping used.
  • A coordinate system.
  • The facet specification.

Aesthetics

The word aesthetic is derived from the Ancient Greek αἰσθητικός (aisthētikós, “perceptive, sensitive, pertaining to sensory perception”), which in turn comes from αἰσθάνομαι (aisthánomai, “I perceive, sense, learn”) and is related to αἴσθησις (aísthēsis, “perception, sensation”). [Wikipedia]

Let’s Describe Graphs

Let’s describe three historical graphs in terms of the Grammar of Graphics.

  1. How are data mapped to aesthetics?
  2. What statistical transformation is applied?
  3. Which geometric object is used?
  4. What is the coordinate system?
  5. Are the data split in facets?

Describe the weather history by Robert Plot.

  • Aesthetics Mapping:
    • x: atmospheric pressure
    • y: day of the month
  • Statistical Transformation:
    • none / identity
  • Geometric Object:
    • stepped line
  • Coordinate System:
    • cartesian
  • Facets:
    • by month

Describe this semigraph by Lambert.

  • Aesthetics Mapping:
    • x: …
    • y: …
  • Statistical Transformation:
  • Geometric Object:
  • Coordinate System:
  • Facets:

Describe the radial histogram by Nightingale

[previous page]

  • Aesthetics Mapping:
    • x: …
    • y: …
  • Statistical Transformation:
  • Geometric Object:
  • Coordinate System:
  • Facets:

Let’s Describe Graphs

Just one more.

Let’s challenge ourselves a bit more.

Now describe the web based data visualization on the next page. Is a weather map taken from the beautiful app Windy.

Can you do it with the Grammar of Graphics as before? How many layers of information can you notice?

Describe the weather map by the app Windy.

[previous page]

  • Aesthetics Mapping:
    • x: …
    • y: …
  • Statistical Transformation:
  • Geometric Object:
  • Coordinate System:
  • Facets:

For the main data visualization, how many layers of information do you notice?

GGPLOT2

GGPLOT2 is one of the main tools for declaring graphics in R.

It is based on the grammar of graphics.

It can be used both for explorative analysis and for publication ready graphs.

Packages

# Main tidyverse packages;
# including ggplot2

library(tidyverse)

# The palmer penguins dataset;
# that we are going to use for practice

library(palmerpenguins)

Learn more about Palmer Penguins.

A Scatterplot…

A default dataset…

penguins %>% 
  ggplot()

A set of mappings from variables to aesthetics…

penguins %>%  
  ggplot() +
  aes(
      x = bill_length_mm,
      y = bill_depth_mm
    )

One or more layers, with geometric object, related to the aesthetic mappings.

penguins %>%  
  ggplot() +
  aes(
      x = bill_length_mm,
      y = bill_depth_mm
    ) + 
  geom_point()

More variables mapped to aesthetics and represented in the geometric object.

penguins %>% drop_na() %>%
  ggplot(
    aes(x = bill_length_mm,
        y = bill_depth_mm,
        colour = species,
        shape = sex)
  ) + 
  geom_point()

A layer with a different geometric object and a statistical transformation.

penguins %>% drop_na() %>%
  ggplot(
    aes(x = bill_length_mm,
        y = bill_depth_mm,
        colour = species,
        shape = sex)
  ) + 
  geom_point() +
  geom_smooth()

A layer with a different geometric object and a statistical transformation.

penguins %>% drop_na() %>%
  ggplot(
    aes(x = bill_length_mm,
        y = bill_depth_mm,
        colour = species,
        shape = sex)
  ) + 
  geom_point() +
  geom_smooth(method = 'lm')

A facet specification.

penguins %>% drop_na() %>%
  ggplot(
    aes(x = bill_length_mm,
        y = bill_depth_mm,
        colour = species,
        shape = sex)
  ) + 
  geom_point() +
  geom_smooth(method = 'lm') +
  facet_wrap(facets = 'sex',
             ncol = 1)

A Histogram

A default dataset…

penguins %>% 
  ggplot()

A set of mappings from variables to aesthetics…

penguins %>% 
  ggplot() +
  aes(x = bill_length_mm)

A layer including geometric objects and a statistical transformation.

penguins %>% 
  ggplot() +
  aes(x = bill_length_mm) +
  geom_histogram()

More aesthetic mappings.

penguins %>% 
  ggplot() +
  aes(x = bill_length_mm,
      fill = species) +
  geom_histogram() 

The facet specification.

penguins %>% 
  ggplot() +
  aes(x = bill_length_mm,
      fill = species) +
  geom_histogram() +
  facet_wrap(facets = 'species',
             ncol = 1)

Remove a legend guide that’s no longer needed…

penguins %>% 
  ggplot() +
  aes(x = bill_length_mm,
      fill = species) +
  geom_histogram() +
  facet_wrap(facets = 'species',
             ncol = 1) +
  guides(fill = 'none')

Remap fills to another variable

penguins %>% 
  ggplot() +
  aes(x = bill_length_mm,
      fill = sex) +
  geom_histogram() +
  facet_wrap(facets = 'species',
             ncol = 1) 

A Horizontal Stacked Bar Chart

A default dataset…

penguins %>% 
  ggplot()

A set of mappings from variables to aesthetics…

penguins %>% 
  ggplot() +
  aes(y = species)

A layer including geometric objects and a statistical transformation.

penguins %>% drop_na() %>% 
  ggplot() +
  aes(y = species) +
  geom_bar()

More aesthetic mappings.

penguins %>% drop_na() %>% 
  ggplot() +
  aes(y = species,
      fill = sex) +
  geom_bar()

A facet specification.

penguins %>% drop_na() %>% 
  ggplot() +
  aes(y = species,
      fill = sex) +
  geom_bar() +
  facet_grid(rows = 'island')

Remove empty bars.

penguins %>% drop_na() %>% 
  ggplot() +
  aes(y = species,
      fill = sex) +
  geom_bar() +
  facet_grid(rows = 'island',
             scales = 'free_y',
             space = 'free')

Change the position adjustment.

penguins %>% drop_na() %>% 
  ggplot() +
  aes(y = species,
      fill = sex) +
  geom_bar(position = 'dodge') +
  facet_grid(rows = 'island',
             scales = 'free_y',
             space = 'free')

Exercise

Learn about the visual models available in ggplot and use them to explore the Palmer Penguins dataset.

For each visual model that you use:

  • Describe it in term of the Grammar of Graphics.
  • Explain what it shows about the data, which pattern it highlights, what impression it gives us about the patterns in the data.