Dataviz for Scientists

Introduction

Communication is essential for any kind of scientist. Your research findings might contain the most important results, but if you don’t manage to communicate them to others, they might be like a tree falling in a forest with no one around. Do they make a sound?

A huge part of results today depends on data, and communicating data effectively is a skill made of many components. Representing data graphically is a big part of it.

In this workshop you will learn how to communicate data graphically, and how to use literate programming to make your graphics available to others, together with text and analytical code.

For this three days workshop, we will use R and Javascript to analyze and visualize the data.

With Quarto, a modern literate programming tool dedicated to scientific communication, we will put it all together, to let you communicate your data in beautifully formatted outputs, which are easy to read, but also reproducible and transparent for the analytical mind.

In the extensively hands-on sessions we will focus on real-world data, and if you would like to, you are welcomed to bring your own data to the workshop.

All the software, programming language, and resources used in this workshop are open source and open access. In this way the participants will have full control on the tools that they use and will be able to access them after the class is over, free from unfavorable commercial licenses. All the tools are cutting edge in both industry and academic fields.

Slides

Below you can find the link to the slides.

Part 1: Welcome to R

Day 1.

  1. Introduction
  2. Meet R
  3. Use Data in R
  4. Missing Values
  5. Load New Data Into R
  6. Clean and Tidy Data

Part 2: Intro to Data Visualization

Day 2, morning.

  1. Instant Knowledge
  2. Quick History of Dataviz
  3. The Grammar of Graphics
  4. Exploratory Data Analysis

Part 3: Better Graphs

Day 2, afternoon.

  1. Better Graphs - Part I
  2. Better Graphs - Part II

Part 4: Scientific Publishing

Day 3, morning.

  1. Quarto: Open Tools for Scientific Publishing

Part 5: Web Development

Day 3, afternoon.

  1. Why Learn Web Development?
  2. Data Visualization in Javascript

Resources

Besides the slides, you can consult any of these open access books on the topics of data analysis, statistics, programming and data visualization. The authors of those books made them open access, so they can be consulted online anytime.

Packages

There’s some package that we are going to use in most exercise. To be sure that you have them ready, install them by running at the R console:

install.packages(
  c('tidyverse', 
    'palmerpenguins', 
    'here', 
    'janitor', 
    'paletteer')
)

At the beginning of each one of your script, you can load them by writing:

library(tidyverse)
library(palmerpenguins)
library(here)
library(janitor)
library(paletteer)

Source Code

The source code for these slides is on Github at https://github.com/othomantegazza/dataviz-for-scientists-slides

License

This work is licensed by Otho Mantegazza under the CC BY-NC-SA 4.0 license. For more information about the non-original bits and pieces, please check the file LICENSE on this course’s Github repo.

Acknowledgements

Big thanks to Giorgia Ditano for the help and support in reviewing the course material.