Day 9 of viz with me

Data viz Beginner dplyr ggplot count DataViz Challenge

Learn to count categorical data.

Soundarya Soundararajan true
2024-10-09

Welcome to Day 9 of “Viz with Me”!

Yesterday, we learnt how to create a line plot.

Goals for today: To take our first step towards creating bar plots in R!

For this, we will return to the trusty penguins dataset. First, make sure to load the necessary libraries:

We will draw bars to represent the number of species of penguins. However, if you check the dataset, this information is not readily available in a summarized format.

Let’s take a quick look at the data using glimpse:

glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Ad…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39…
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19…
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 46…
$ sex               <fct> male, female, female, NA, female, male, fe…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, …

Now, let’s count the number of penguins for each species. This will give us a clearer idea of how many penguins we have for each species category before creating the bar plot:

penguins %>%
  count(species)
# A tibble: 3 × 2
  species       n
  <fct>     <int>
1 Adelie      152
2 Chinstrap    68
3 Gentoo      124
How would the code speak?

count() function gives us the counts of the categorical variable we specify, which in this case is species. The dataset has three species of penguins, and the count of each will be displayed. count is part of the same dplyr package (Wickham et al. 2023) as filter and glimpse.

Once we have the counts, all that is left is to construct the bar plot, which we’ll cover tomorrow.

Get familiar with the output of this code, as we’ll use that information to pipe directly into our bar plot tomorrow. The output contains two columns: species, which is our variable of interest, and n, which gives the number of penguins for each species. The labels indicate a factor (categorical variable), while represents an integer (the counts).

Can you guess which geom_ function we’ll use to create the bars?

Jump to tomorrow, for the bar plots!

Credit: Photo by Kevin Malik
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.

References

Citation

For attribution, please cite this work as

Soundararajan (2024, Oct. 9). My R Space: Day 9 of viz with me. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2024-10-09-day-9-of-viz-with-me/

BibTeX citation

@misc{soundararajan2024day,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Day 9 of viz with me},
  url = {https://github.com/soundarya24/SoundBlog/posts/2024-10-09-day-9-of-viz-with-me/},
  year = {2024}
}