Day 25 of viz with me

Data viz Beginner ggplot rainclouds geom_rain ggrain DataViz Challenge

Rainclouds!!!!

Soundarya Soundararajan true
2024-10-25

Welcome to Day 25 (aka Day 2 of the Final Week) of the ‘Viz with Me in R’ Series!

Goals: Today, I’m excited to introduce raincloud plots!

You might remember how we previously explored ways to visualize data spread and sample size using combinations like box + jitter or box + violin plots. Well, raincloud plots take this to another level—they’re perfect for visualizing distributions and summary statistics all in one go!

Initially, we didn’t have a specific geom function in ggplot2 for raincloud plots, and we had to rely on creative workarounds. Luckily, someone eventually created the geom_rain() function as part of the ggrain package (Allen et al. 2021). It’s been a game-changer for plotting these kinds of visuals.

Now, let’s hop onto the rainclouds with a practical example:

library(palmerpenguins)
library(tidyverse)
library(ggrain)


penguins %>% 
  ggplot(aes(species, body_mass_g)) +
  geom_rain() +
  hrbrthemes::theme_ipsum()

How do they look? Can you interpret this plot?

When you look at a raincloud plot, you’re seeing a combination of several elements:

  1. A half-violin plot, which shows the density of the data distribution, or in other words, how spread out the values are across the range of body masses for each species.

  2. A box plot, which gives you a quick summary of the data distribution in terms of quartiles, highlighting the median and any potential outliers.

  3. Individual data points, often plotted as jittered points, allowing you to visually assess the sample size and variability without overlapping data points.

In the case of our Palmer Penguins dataset, this raincloud plot provides:

  1. the “cloud” part, showing how body masses are distributed for each penguin species.

  2. Box plots give a snapshot of the key statistics—median, interquartile range (IQR), and any outliers.

  3. Raw data points, which help you see each individual penguin’s body mass, giving you a direct sense of how many samples were collected for each species.

You can now compare the species:

The beauty of a raincloud plot is that it blends all of this information into one visual, making it easier to grasp the shape and spread of the data while also showing individual data points and summary statistics.

Did you also notice, hrbrthemes::theme_ipsum()? It’s called a namespace, and it’s a way to use the library without calling it explicitly. This is a handy trick to avoid conflicts when you have similarly named functions in different packages. If i am using hrbrthemes::theme_ipsum(), it means I am using the theme_ipsum() function from the hrbrthemes package (Rudis 2024). and I dont have to use library(hrbrthemes) explicitly.

That’s it for today! Raincloud plots are a fantastic addition to your data visualization toolkit, especially when you want to show distributions, summary statistics, and individual data points all in one go. Give them a try and see how they can enhance your data storytelling!

Rainclouds Photo by eberhard grossgasteiger
Allen, Micah, Davide Poggiali, Kirstie Whitaker, Tom Rhys Marshall, Jordy van Langen, and Rogier A. Kievit. 2021. “Raincloud Plots: A Multi-Platform Tool for Robust Data Visualization [Version 2; Peer Review: 2 Approved]” 4. https://doi.org/10.12688/wellcomeopenres.15191.2.
Rudis, Bob. 2024. “Hrbrthemes: Additional Themes, Theme Components and Utilities for ’Ggplot2’.” https://CRAN.R-project.org/package=hrbrthemes.

References

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 25). My R Space: Day 25 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-25-day-25-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 25 of viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-25-day-25-of-viz-with-me/},
  year = {2024}
}