Day 24 of viz with me

Data viz Beginner ggplot alpha geom_jitter violin geom_vioin jitter geom_boxplot boxplot DataViz Challenge

Boxes+jitter & Boxes+violins

Soundarya Soundararajan true
2024-10-24

Welcome to Day 24 of “Viz with Me in R”!

Today is extra special because it’s my birthday! 🎉

To celebrate, we’re going to take our data visualizations a step further by adding violins and jitter to our boxplots. This will allow you to explore what works best for your data and help you decide what you might like to use in your own projects.

Goals for Today: Combine two types of plots: violins and box plots, layering them on top of each other (not side by side). Add jitter to box plots to show individual data points alongside summary statistics.

Yesterday, we explored violin plots to show distributions, but today, let’s see how combining violins and box plots can give us a more complete picture by showing both the distribution and spread of the data.

Let’s jump right in!

Violin and Box Plot Combo

We’ll first layer the violin and box plots to create a single plot that highlights the distribution of body mass across penguin species.

library(palmerpenguins)
library(tidyverse)

penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_violin() +
  geom_boxplot(width = 0.3) +  # Add the boxplot on top of the violin
  theme_minimal()

Here, the violin plot shows the distribution shape, while the box plot adds summary statistics like the median and interquartile range (IQR). You can adjust the width of the box plot to make sure it doesn’t overpower the violin.

Adding Jitter to Box Plot

Next, let’s add jitter to a box plot to show individual data points while keeping the summary statistics clear. The jitter helps visualize the spread of data points that may overlap in the box plot.

penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_jitter(height = 0.05, 
              width = 0.05, 
              alpha = 0.3) +  # Add jitter for individual points
  geom_boxplot(width = 0.3) +  # Boxplot for summary
  theme_minimal()

The jitter adds some randomness to the points’ positions, making it easier to see overlapping points, while the box plot still gives an overview of the key statistics. Adjusting the height, width, and alpha parameters allows you to fine-tune how the jitter appears on your plot.

Do you see the boxes overlapping the jittered points? This i because the jittered points are plotted first, and then the box plot is added on top.

I want the jittered points to be on top of the box plot. Let’s fix that!

penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_boxplot(width = 0.3) + 
  geom_jitter(height = 0.05, 
              width = 0.05, 
              alpha = 0.3) +
  theme_minimal()

Key Takeaways:

  1. Violin and box plot combos are great for displaying both distribution and statistical summaries in a single plot.

  2. Adding jitter to a box plot lets you see the individual data points without losing the clarity of the box plot’s summary.

Try these out and see which style you prefer! Feel free to experiment with the aesthetics and see what works best for your data.

I hope you enjoyed today’s birthday special! 🎂 and I’ll see you tomorrow for more data visualization fun!

Until then, happy plotting!

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 24). My R Space: Day 24 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-24-day-24-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 24 of viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-24-day-24-of-viz-with-me/},
  year = {2024}
}