Approaching grouped scatterplots one step at a time.
Cut me the chase and take me to the final ggplot output.
We are working on ChickWeight
dataset for today.
We are using 3 variables from this dataset: Time
: is the number of days since birth of the chick, weight
: of the chick in grams and Diet
: experimental diet on which the chick was fed. We would like to check
– To what extent time since birth and weight of the chick are correlated? – Does this vary based on different diet (esp. 1 and 3)?
We are about to use scatter plot for the first qn and color them based on diets 1 and 3 to find the answer for the second.
I really cannot infer anything from this plot.
Lets get the easier ggplot way.
ChickWeight %>%
ggplot(aes(x=Time, y=weight))+
geom_point()
Seems like weight increases with increasing time since birth, of course, understandable.
ChickWeight %>%
ggplot(aes(x=Time, y=weight))+
geom_point()+
labs(x="Time since birth (in days)",
y="Weight of the chick (in grams)")
How about adding a regression line to confirm the direction of association?
ChickWeight %>%
ggplot(aes(x=Time, y=weight))+
geom_point()+
geom_smooth(method = "lm")+
labs(x="Time since birth (in days)",
y="Weight of the chick (in grams)")+
theme_classic()
For now,adding R and a p notation will step-up the game. We achieve that using ggpubr
ChickWeight %>%
ggplot(aes(x=Time, y=weight))+
geom_point()+
geom_smooth(method = "lm")+
ggpubr::stat_cor(r.digits = 3)+
labs(x="Time since birth (in days)",
y="Weight of the chick (in grams)")+
theme_classic()
How about adding different colors for the chicken based on the diet? There are 4 diet available in this dataset, lets choose just 2 so that the plot is not crowded.
dietselect <- c(1,3) #i am choosing 1st and 3rd diet
ChickWeight %>%
filter(Diet %in% dietselect) %>% # this is a way to filter more than 1 variable
ggplot(aes(x=Time, y=weight, color=Diet))+
geom_point()+
geom_smooth(method = "lm")+
ggpubr::stat_cor(r.digits = 3)+
labs(x="Time since birth (in days)",
y="Weight of the chick (in grams)")+
scale_color_manual(values=c("#4F788D", "#E87F4D"))+
theme_classic()
We see that chicken on diet 3 gained more weight than those on diet 1.
Happy scattering until I see you with the next post!
Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill at https://rstudio.github.io/distill.
For attribution, please cite this work as
Soundararajan (2021, Aug. 11). My R Space: Grouped scatter plots. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-08-11-grouped-scatter-plots/
BibTeX citation
@misc{soundararajan2021grouped, author = {Soundararajan, Soundarya}, title = {My R Space: Grouped scatter plots}, url = {https://github.com/soundarya24/SoundBlog/posts/2021-08-11-grouped-scatter-plots/}, year = {2021} }