Combining violin plots with jittering to reveal full spread of data points
Welcome to Day 23!
Yesterday, I mentioned how important it is to visualize the data points on a plot.
Goals for Today: We’ll achieve that by combining a violin plot with a little twist—we’ll use jittering instead of geom_point
. And, we will touch upon the alpha
parameter to control the transparency of the points.
geom_jitter
?When using geom_point
, the data points may overlap, which can obscure individual samples. By using geom_jitter
, we can spread out the points slightly so that all data points are visible, giving a clearer picture of the distribution.
A violin plot shows the concentration of data—whether it’s thick or thin in certain areas—but adding jittered data points helps reveal the actual number of samples more effectively.
Let’s dive into the code:
penguins %>%
ggplot(aes(species, body_mass_g)) +
geom_violin() +
geom_point() + # Simply add new geom to the existing violin plot
theme_minimal()
penguins %>%
ggplot(aes(species, body_mass_g)) +
geom_violin() +
geom_jitter(height = 0.03, width = 0.03) + # See how the jitter is controlled
theme_minimal()
See how the jitter is controlled by mentioning the height and width parameters. You can adjust these values to spread out the data points as needed.
If we use geom_point
without jitter, many data points may overlap, making it harder to identify individual samples. By adding jitter, we spread the points slightly along the x-axis, making each data point visible and providing a rough idea of how many samples there are.
alpha
The plot looks good, but we can make it even better. Do you see the points looking too black? The intuitive idea is to change the color of the points, but I take this opportunity to introduce you to the alpha
parameter.
penguins %>%
ggplot(aes(species, body_mass_g)) +
geom_violin() +
geom_jitter(
height = 0.03,
width = 0.03,
alpha = 0.8
) +
labs(title = "Using alpha 0.8") +
theme_minimal()
penguins %>%
ggplot(aes(species, body_mass_g)) +
geom_violin() +
geom_jitter(
height = 0.03,
width = 0.03,
alpha = 0.5
) +
labs(title = "Using alpha 0.5") +
theme_minimal()
penguins %>%
ggplot(aes(species, body_mass_g)) +
geom_violin() +
geom_jitter(height = 0.03,
width = 0.03,
alpha = 0.2) +
labs(title = "Using alpha 0.2") +
theme_minimal()
The alpha
parameter controls the transparency of the points. By adjusting this parameter, you can make the points more or less transparent, depending on your preference.
Remember, it’s also important to note that jitter has been applied in the legend or caption of the figure, ensuring transparency in how the data is being represented.
Here is a sample of how I would mention about jitter in the legends.
The individual data points are shown using jittering to prevent overlap, ensuring all samples are visible. Jittering shifts the points slightly along both the x-axis (species) and y-axis (body mass) to better depict the number of data points and avoid overlapping, while maintaining the overall structure of the data.
I will leave you with this thought for today. Have fun exploring the alpha
parameter and how it can enhance your plots!
For attribution, please cite this work as
Soundararajan (2024, Oct. 23). My R Space: Day 23 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-23-day-23-of-viz-with-me/
BibTeX citation
@misc{soundararajan2024day, author = {Soundararajan, Soundarya}, title = {My R Space: Day 23 of viz with me}, url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-23-day-23-of-viz-with-me/}, year = {2024} }