Add sample sizes to plots

ggplot Beginner bar plots

Adding sample sizes to your ggplot.

Soundarya Soundararajan true
08-06-2021

Did that reviewer 2 ask you to add sample sizes in your plots? We have all been there; return your figures in style with these simple steps to add sample sizes to your ggplot.

Let’s use the in-built iris dataset to learn about plotting bars.

Load libraries

library(tidyverse) # this includes ggplot necessary for plotting
library(EnvStats) # for adding sample size
library(ggpomological) #theme

If you do not have the libraries installed, you need to install them first by install.packages("package name"), then load the libraries as previous command.

Data

names(iris) # checking what are the variable names
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
[5] "Species"     
table(iris$Species) # i precheck any categorical variable using this command

    setosa versicolor  virginica 
        50         50         50 

Let’s plot!

iris %>% # your dataset here
  ggplot( #we are calling ggplot to draw the plot
    aes( #this aesthetics command is necessary to give x and y axes
      x = Species,
      y = Sepal.Length
    )
  ) +
  geom_col() 

We need colors!

Color plot

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length, 
             color = Species)) + #lets color the bars based on Species groups
  geom_col()

We need to fill the bars, not color them, so we will correct the command to fill, not color.

Fill plot

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length, 
             fill = Species)) + #this fills the bars
  geom_col()

We have just three species which are already well marked in the X-axis, so we actually do not need the legend, lets remove it.

Remove legends

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_col() +
  theme(legend.position = "none") 

Add sample size

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_col() +
  theme(legend.position = "none") +
  stat_n_text() #adds sample size

We achieved what we wanted, but can do better. let’s try changing the theme as well as improvise how the sample size is marked.

Change theme and customize sample size

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_col() +
  theme_light() + #i like this theme
  theme(legend.position = "none") +
  stat_n_text(
    y.pos = 20, #we can specify where in y axis the samle size should be denoted
    color = "black", #choose any color
    text.box = TRUE #draws a box outside the n
  )

Wonderful. What if you want the sample sizes to be depicted at different locations for different boxes, then you use the c() command to give the desired positions.

Final plot

iris %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_col() +
  theme_pomological() +
  scale_fill_manual(values=c( "#E87F4D","#286B7B","#E484A9"))+
  theme(legend.position = "none") +
  stat_n_text(
    y.pos = c(270, 315, 345), # 3 positions for 3 bars
    color = "black",
    text.box = TRUE
  )

Citation

For attribution, please cite this work as

Soundararajan (2021, Aug. 6). My R Space: Add sample sizes to plots. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-05-20-basics-of-bar-plots/

BibTeX citation

@misc{soundararajan2021add,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Add sample sizes to plots},
  url = {https://github.com/soundarya24/SoundBlog/posts/2021-05-20-basics-of-bar-plots/},
  year = {2021}
}