Let us see how to draw boxplots with individual data points depicted on them.
I choose to work on iris
and mtcars
dataset for the ease of accessing them. Today we will work on mtcars
again to depict the differences in mileage between automatic and manual transmission types. To check the distribution of these variables, consult the density plots discussed in an earlier post here.
There are three ways to achieve this a combination of dot and box plots.
Let us initiate the libraries and create a new dataframe to work with.
Cut me the chase and take me to the final plot from ggplot.
To make am
a factor variable, we recode and assign in it in a dataframe df
.
boxplot(data=df, # note we tell R what the dataframe is
mpg~am) # note the formula
The formula here is, y variable is denoted on the left hand side and the group on the right.
Alright, lets add the dots.
boxplot(data=df, mpg~am, xlab = "Transmission type",ylab = "Mileage")
# follow the boxplot with this stripchart
stripchart(data=df, mpg~am,
vertical=TRUE, # otherwise the dots will be horizontal
method="jitter", # to avoid overlapping
pch=16, #gives dark full dots
add=TRUE # this lets the
)
df %>%
ggplot(aes(x=am, y=mpg))+
geom_boxplot()
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3, alpha=0.4)
Adding some custom colors
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3,alpha=0.8)+
scale_fill_manual(values=c("#639EAA", "#8D2D54"))
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3, alpha=0.8)+
geom_point(alpha=0.6)+
scale_fill_manual(values=c("#639EAA", "#8D2D54"))
The individual points can be overlapping, to avoid this, we use jitter
function.
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3, alpha=0.8)+
geom_jitter(alpha=0.6)+
scale_fill_manual(values=c("#639EAA", "#8D2D54"))
oooh.., too much jitter. Lets tame them.
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3, alpha=0.8)+
geom_jitter(width = 0.03, height = 0.03, alpha=0.4)+
scale_fill_manual(values=c("#639EAA", "#8D2D54"))
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3, alpha=0.8)+
geom_jitter(width = 0.03, height = 0.03, alpha=0.4)+
scale_fill_manual(values=c("#639EAA", "#8D2D54"))+
theme_classic()+
theme(legend.position = "none")+
labs(x="Transmission type",
y="Miles/(US) gallon")
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3, alpha=0.8)+
geom_jitter(width = 0.03, height = 0.03, alpha=0.4)+
scale_fill_manual(values=c("#639EAA", "#8D2D54"))+
stat_compare_means()+ # make sure ggpubr is loaded for this to work
theme_classic()+
theme(legend.position = "none")+
labs(x="Transmission type",
y="Miles/(US) gallon")
By default, the test with the p value is depicted. We can modify this.
df %>%
ggplot(aes(x=am, y=mpg, fill=am))+
geom_boxplot(width=0.3, alpha=0.8)+
geom_jitter(width = 0.03, height = 0.03, alpha=0.4)+
scale_fill_manual(values=c("#639EAA", "#8D2D54"))+
stat_compare_means(label = "p.signif",label.x = 1.5, label.y = 25)+
theme_classic()+
theme(legend.position = "none")+
labs(x="Transmission type",
y="Miles/(US) gallon")
Save the file where you want by point and click interface.
Or you can save them by R way by writing a code. If you have saved the plot as object then save the object, if not just as you finish making the plot, save your last plot.
ggsave(filename = "demo.tiff",
plot = last_plot() # make sure your last plot is the one you wamt to save
)
This is an easier approach, but differs slightly from the ggplot way. Consult this page from the sthda.com for a detailed walkthrough.
Try all the three ways and my suggestion is to stick to what resonates with you.
Good luck!
For attribution, please cite this work as
Soundararajan (2021, Aug. 9). My R Space: Dot and boxplots. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-08-09-dot-and-boxplots/
BibTeX citation
@misc{soundararajan2021dot, author = {Soundararajan, Soundarya}, title = {My R Space: Dot and boxplots}, url = {https://github.com/soundarya24/SoundBlog/posts/2021-08-09-dot-and-boxplots/}, year = {2021} }