Day 15 of viz with me

Data viz Beginner ggplot exercise boxplot geom_boxplot DataViz Challenge

Solution for yesterday’s exercise and a few twists.

Soundarya Soundararajan true
2024-10-15

Solution to Yesterday’s Exercise

As we step into Week 3, here’s the solution to yesterday’s exercise.

library(palmerpenguins)
library(tidyverse)

penguins %>%
  ggplot(aes(x = species, y = body_mass_g)) +
  geom_boxplot() +
  labs(x = "Species", y = "Body Mass (g)") +
  theme_minimal()

Time to Practice Another Boxplot!

Now, let’s try a similar boxplot but split by sex. Here’s what I’ll do:

penguins %>%
  ggplot(aes(x = species, y = body_mass_g, fill = sex)) +
  geom_boxplot() +
  labs(x = "Species", y = "Body Mass (g)") +
  theme_minimal()

There are 2 things to note here:

  1. I’ve added fill inside the aes() function to color the boxes by sex.

  2. Did you notice the NA values in the plot? Look towards the legends. You will catch them. Do you want to get rid of them before proceeding?

Handling Missing Data

To handle this, we can refer back to our earlier post on how to filter a specific country in the Gapminder dataset. It’s a similar approach, but with a twist!

Last time, we used filter(country == "India") because we wanted India to be included in our plot out of other countries. But now, we want to filter out the NA values in the sex variable. Here’s how to do it using filter():

penguins %>%
  filter(!sex=="NA") %>% # look out the exclamation mark
  ggplot(aes(x = species, y = body_mass_g, fill = sex)) +
  geom_boxplot() +
  labs(x = "Species", y = "Body Mass (g)") +
  theme_minimal()

cartoon to depict the differences in filtering

Alternatively, we can use the drop_na() function, which works wonders for removing missing values without having to write complex conditions:

penguins %>%
  drop_na(sex) %>%
  ggplot(aes(x = species, y = body_mass_g, fill = sex)) +
  geom_boxplot() +
  labs(x = "Species", y = "Body Mass (g)") +
  theme_minimal()

Advanced Tip

Want to customize the appearance further? Try adjusting the width of the boxes by adding width = 0.3 inside geom_boxplot():

penguins %>%
  drop_na(sex) %>%
  ggplot(aes(x = sex, y = body_mass_g)) +
  geom_boxplot() +
  labs(x = "Sex", y = "Body Mass (g)") +
  theme_minimal()

penguins %>%
  drop_na(sex) %>%
  ggplot(aes(x = sex, y = body_mass_g)) +
  geom_boxplot(width = 0.3) +
  labs(x = "Sex", y = "Body Mass (g)") +
  theme_minimal()

This gives you more control over the visual spacing and proportions in the plot. Give it a try!

I shall see you tomorrow!

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 15). My R Space: Day 15 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-15-day-15-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 15 of viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-15-day-15-of-viz-with-me/},
  year = {2024}
}