Rainclouds!!!!
Welcome to Day 25 (aka Day 2 of the Final Week) of the ‘Viz with Me in R’ Series!
Goals: Today, I’m excited to introduce raincloud plots!
You might remember how we previously explored ways to visualize data spread and sample size using combinations like box + jitter or box + violin plots. Well, raincloud plots take this to another level—they’re perfect for visualizing distributions and summary statistics all in one go!
Initially, we didn’t have a specific geom function in ggplot2 for raincloud plots, and we had to rely on creative workarounds. Luckily, someone eventually created the geom_rain()
function as part of the ggrain
package (Allen et al. 2021). It’s been a game-changer for plotting these kinds of visuals.
Now, let’s hop onto the rainclouds with a practical example:
How do they look? Can you interpret this plot?
When you look at a raincloud plot, you’re seeing a combination of several elements:
A half-violin plot, which shows the density of the data distribution, or in other words, how spread out the values are across the range of body masses for each species.
A box plot, which gives you a quick summary of the data distribution in terms of quartiles, highlighting the median and any potential outliers.
Individual data points, often plotted as jittered points, allowing you to visually assess the sample size and variability without overlapping data points.
In the case of our Palmer Penguins dataset, this raincloud plot provides:
the “cloud” part, showing how body masses are distributed for each penguin species.
Box plots give a snapshot of the key statistics—median, interquartile range (IQR), and any outliers.
Raw data points, which help you see each individual penguin’s body mass, giving you a direct sense of how many samples were collected for each species.
You can now compare the species:
Do you notice how some species show more spread in body mass, indicating greater variability?
Are there any species with tighter, more clustered distributions, suggesting more consistency in their body masses?
The beauty of a raincloud plot is that it blends all of this information into one visual, making it easier to grasp the shape and spread of the data while also showing individual data points and summary statistics.
Did you also notice, hrbrthemes::theme_ipsum()
? It’s called a namespace, and it’s a way to use the library without calling it explicitly. This is a handy trick to avoid conflicts when you have similarly named functions in different packages. If i am using hrbrthemes::theme_ipsum()
, it means I am using the theme_ipsum()
function from the hrbrthemes
package (Rudis 2024). and I dont have to use library(hrbrthemes)
explicitly.
That’s it for today! Raincloud plots are a fantastic addition to your data visualization toolkit, especially when you want to show distributions, summary statistics, and individual data points all in one go. Give them a try and see how they can enhance your data storytelling!
For attribution, please cite this work as
Soundararajan (2024, Oct. 25). My R Space: Day 25 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-25-day-25-of-viz-with-me/
BibTeX citation
@misc{soundararajan2024day, author = {Soundararajan, Soundarya}, title = {My R Space: Day 25 of viz with me}, url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-25-day-25-of-viz-with-me/}, year = {2024} }