Exploring bar plot alternatives.
Welcome to Day 22 of the viz with me series!
By now, I believe you’re feeling confident about creating plots in ggplot2.
Today, we’ll dive into something new: the violin plot—a great alternative to bar plots.
Goals for Today: 1. Understand what a violin plot is. 2. Learn how to create one in ggplot2.
There’s been a lot of debate on this topic, and I’d like to summarize the two main reasons:
Bar plots don’t capture the spread of the data. They show only summary statistics (like the mean), missing out on the distribution of the data.
Bar plots don’t reveal sample size. The height of the bar represents a value, but it doesn’t provide any information on how much data contributes to that value.
In the coming days, we’ll explore a few more alternatives to bar plots. If you’re fond of using them, it’s worth considering these alternatives based on your data visualization needs.
Violin plots are a good alternative because they capture both the spread of the data and its central tendency. This makes them useful for showing the distribution while also giving insight into summary statistics.
Here’s how you can create a violin plot in ggplot2
using the iris dataset:
library(tidyverse)
iris %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_violin() +
labs(x = "Species",
y = "Sepal Length",
title = "Distribution of Sepal Lengths by Species",
caption = "Data: Iris dataset") +
theme_classic()+
theme(base_size = 20)
But I told you that violin plots can show summary statistics too, right? Let’s add that to our plot. Here’s how you can do it:
iris %>%
ggplot(aes(x = Species, y = Sepal.Length)) +
geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) + # Add quantiles
labs(x = "Species",
y = "Sepal Length",
title = "Distribution of Sepal Lengths by Species",
caption = "Data: Iris dataset") +
theme_classic()+
theme(base_size = 20)
Now what do violins offer more than bar plots?
They show the distribution of the data, which is a great way to understand the spread of the data. Do you see how the bulged areas in the violin plot representing the density of the data? Comparatively, when you see lesser density, it means fewer data points are present in that region. See virginica around 5 vs setosa around 5. You get the idea, right? We also get to see the spread of the data, which is not possible with bar plots. Here you can appreciate virginica extending from 5-8 whereas setosa is confined to 4-6. Versicolor is somewhere in between.
While violin plots are great for visualizing the distribution, they don’t inherently tell us about sample size. There are better ways to do this, and we’ll explore some of those options tomorrow.
See You Tomorrow! Tomorrow, we’ll look at a way to better visualize sample size alongside your data. Until then, happy coding!
For attribution, please cite this work as
Soundararajan (2024, Oct. 22). My R Space: Day 22 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-22-day-22-of-viz-with-me/
BibTeX citation
@misc{soundararajan2024day, author = {Soundararajan, Soundarya}, title = {My R Space: Day 22 of viz with me}, url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-22-day-22-of-viz-with-me/}, year = {2024} }