My R Space: Day 22 of viz with me

Soundarya Soundararajan

Day 22 of viz with me

Data viz Beginner ggplot violin geom_violin DataViz Challenge

Exploring bar plot alternatives.

Author

Affiliation

Soundarya Soundararajan

Published

Oct. 21, 2024

Citation

Soundararajan, 2024

Welcome to Day 22 of the viz with me series!

By now, I believe you’re feeling confident about creating plots in ggplot2. Today, we’ll dive into something new: the violin plot—a great alternative to bar plots.

Goals for Today: 1. Understand what a violin plot is. 2. Learn how to create one in ggplot2.

Why Consider Alternatives to Bar Plots?

There’s been a lot of debate on this topic, and I’d like to summarize the two main reasons:

Bar plots don’t capture the spread of the data. They show only summary statistics (like the mean), missing out on the distribution of the data.
Bar plots don’t reveal sample size. The height of the bar represents a value, but it doesn’t provide any information on how much data contributes to that value.

In the coming days, we’ll explore a few more alternatives to bar plots. If you’re fond of using them, it’s worth considering these alternatives based on your data visualization needs.

Why Violin Plots?

Violin plots are a good alternative because they capture both the spread of the data and its central tendency. This makes them useful for showing the distribution while also giving insight into summary statistics.

Here’s how you can create a violin plot in ggplot2 using the iris dataset:

library(tidyverse)

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length)) +
  geom_violin() +
  labs(x = "Species", 
       y = "Sepal Length", 
       title = "Distribution of Sepal Lengths by Species", 
       caption = "Data: Iris dataset") +
  theme_classic()+
  theme(base_size = 20)

But I told you that violin plots can show summary statistics too, right? Let’s add that to our plot. Here’s how you can do it:

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length)) +
  geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) + # Add quantiles
  labs(x = "Species", 
       y = "Sepal Length", 
       title = "Distribution of Sepal Lengths by Species", 
       caption = "Data: Iris dataset") +
  theme_classic()+
  theme(base_size = 20)

Now what do violins offer more than bar plots?

They show the distribution of the data, which is a great way to understand the spread of the data. Do you see how the bulged areas in the violin plot representing the density of the data? Comparatively, when you see lesser density, it means fewer data points are present in that region. See virginica around 5 vs setosa around 5. You get the idea, right? We also get to see the spread of the data, which is not possible with bar plots. Here you can appreciate virginica extending from 5-8 whereas setosa is confined to 4-6. Versicolor is somewhere in between.

But What About Sample Size?

While violin plots are great for visualizing the distribution, they don’t inherently tell us about sample size. There are better ways to do this, and we’ll explore some of those options tomorrow.

See You Tomorrow! Tomorrow, we’ll look at a way to better visualize sample size alongside your data. Until then, happy coding!

0 Comments

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 22). My R Space: Day 22 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-22-day-22-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 22 of viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-22-day-22-of-viz-with-me/},
  year = {2024}
}

Day 22 of viz with me

Author

Affiliation

Published

Citation

Why Consider Alternatives to Bar Plots?

Why Violin Plots?

But What About Sample Size?

Footnotes

Citation