Grouped scatter plots

ggplot Beginner scatterplots

Approaching grouped scatterplots one step at a time.

Soundarya Soundararajan true
08-11-2021

Libraries

library(tidyverse)
library(ggpubr) #to add statistical significance

Cut me the chase and take me to the final ggplot output.

We are working on ChickWeight dataset for today.

We are using 3 variables from this dataset: Time: is the number of days since birth of the chick, weight: of the chick in grams and Diet: experimental diet on which the chick was fed. We would like to check

– To what extent time since birth and weight of the chick are correlated? – Does this vary based on different diet (esp. 1 and 3)?

We are about to use scatter plot for the first qn and color them based on diets 1 and 3 to find the answer for the second.

Base R method

# base scatter
plot(ChickWeight$Time, ChickWeight$weight,
     col=factor(ChickWeight$Diet),
     pch=16,
     main = "Relation between time since birth and weight of chicken")
# to add legends
legend("topleft",
       legend = levels(factor(ChickWeight$Diet)),
       pch = 16,
       col=factor(levels(factor(ChickWeight$Diet))))

I really cannot infer anything from this plot.

Lets get the easier ggplot way.

ggplot way

Initiate the scatterplot

ChickWeight %>% 
  ggplot(aes(x=Time, y=weight))+
  geom_point()

Seems like weight increases with increasing time since birth, of course, understandable.

Adding axes titles

ChickWeight %>% 
  ggplot(aes(x=Time, y=weight))+
  geom_point()+
  labs(x="Time since birth (in days)",
       y="Weight of the chick (in grams)")

How about adding a regression line to confirm the direction of association?

Add regression line

ChickWeight %>% 
  ggplot(aes(x=Time, y=weight))+
  geom_point()+
  geom_smooth(method = "lm")+
  labs(x="Time since birth (in days)",
       y="Weight of the chick (in grams)")+
  theme_classic()

For now,adding R and a p notation will step-up the game. We achieve that using ggpubr

ChickWeight %>% 
  ggplot(aes(x=Time, y=weight))+
  geom_point()+
  geom_smooth(method = "lm")+
  ggpubr::stat_cor(r.digits = 3)+
  labs(x="Time since birth (in days)",
       y="Weight of the chick (in grams)")+
  theme_classic()

How about adding different colors for the chicken based on the diet? There are 4 diet available in this dataset, lets choose just 2 so that the plot is not crowded.

Grouped scatter plots

dietselect <- c(1,3) #i am choosing 1st and 3rd diet
ChickWeight %>% 
  filter(Diet %in% dietselect) %>% # this is a way to filter more than 1 variable
  ggplot(aes(x=Time, y=weight, color=Diet))+
  geom_point()+
  geom_smooth(method = "lm")+
  ggpubr::stat_cor(r.digits = 3)+
  labs(x="Time since birth (in days)",
       y="Weight of the chick (in grams)")+
    scale_color_manual(values=c("#4F788D", "#E87F4D"))+
  theme_classic()

We see that chicken on diet 3 gained more weight than those on diet 1.

Happy scattering until I see you with the next post!


Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill at https://rstudio.github.io/distill.

Citation

For attribution, please cite this work as

Soundararajan (2021, Aug. 11). My R Space: Grouped scatter plots. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-08-11-grouped-scatter-plots/

BibTeX citation

@misc{soundararajan2021grouped,
  author = {Soundararajan, Soundarya},
  title = {My R Space: Grouped scatter plots},
  url = {https://github.com/soundarya24/SoundBlog/posts/2021-08-11-grouped-scatter-plots/},
  year = {2021}
}