Day 23 of viz with me

Data viz Beginner ggplot alpha geom_jitter violin geom_vioin jitter DataViz Challenge

Combining violin plots with jittering to reveal full spread of data points

Soundarya Soundararajan true
2024-10-23

Welcome to Day 23!

Yesterday, I mentioned how important it is to visualize the data points on a plot.

Goals for Today: We’ll achieve that by combining a violin plot with a little twist—we’ll use jittering instead of geom_point. And, we will touch upon the alpha parameter to control the transparency of the points.

Why geom_jitter?

When using geom_point, the data points may overlap, which can obscure individual samples. By using geom_jitter, we can spread out the points slightly so that all data points are visible, giving a clearer picture of the distribution.

A violin plot shows the concentration of data—whether it’s thick or thin in certain areas—but adding jittered data points helps reveal the actual number of samples more effectively.

Let’s dive into the code:

penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_violin() +
  geom_point() + # Simply add new geom to the existing violin plot
  theme_minimal()

penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_violin() +
  geom_jitter(height = 0.03, width = 0.03) + # See how the jitter is controlled
  theme_minimal()

See how the jitter is controlled by mentioning the height and width parameters. You can adjust these values to spread out the data points as needed.

What Happens When We Use geom_point Instead?

If we use geom_point without jitter, many data points may overlap, making it harder to identify individual samples. By adding jitter, we spread the points slightly along the x-axis, making each data point visible and providing a rough idea of how many samples there are.

Improving the Plot with alpha

The plot looks good, but we can make it even better. Do you see the points looking too black? The intuitive idea is to change the color of the points, but I take this opportunity to introduce you to the alpha parameter.

penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_violin() +
  geom_jitter(
    height = 0.03,
    width = 0.03,
    alpha = 0.8
  ) +
  labs(title = "Using alpha 0.8") +
  theme_minimal()
penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_violin() +
  geom_jitter(
    height = 0.03,
    width = 0.03, 
    alpha = 0.5
  ) +
  labs(title = "Using alpha 0.5") +
  theme_minimal()
penguins %>%
  ggplot(aes(species, body_mass_g)) +
  geom_violin() +
  geom_jitter(height = 0.03, 
              width = 0.03, 
              alpha = 0.2) +
  labs(title = "Using alpha 0.2") +
  theme_minimal()

The alpha parameter controls the transparency of the points. By adjusting this parameter, you can make the points more or less transparent, depending on your preference.

Remember, it’s also important to note that jitter has been applied in the legend or caption of the figure, ensuring transparency in how the data is being represented.

Here is a sample of how I would mention about jitter in the legends.

The individual data points are shown using jittering to prevent overlap, ensuring all samples are visible. Jittering shifts the points slightly along both the x-axis (species) and y-axis (body mass) to better depict the number of data points and avoid overlapping, while maintaining the overall structure of the data.

I will leave you with this thought for today. Have fun exploring the alpha parameter and how it can enhance your plots!

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 23). My R Space: Day 23 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-23-day-23-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 23 of viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-23-day-23-of-viz-with-me/},
  year = {2024}
}