A detailed walkthrough of drawing density plots.
I prefer the density plots as they depict the distributions better than histograms. Density plots are in fact smoother version of histograms.
For today’s demonstration, we will work with the mtcars
dataset. You have to know 2 variables which we are using today: mpg
which is the miles per gallon variable, aka mileage, and am
variable which is the transmission type: manual or automatic.
Cut me the chase and take me to the final plot or the much prettier plot.
Let’s draw a density plot on mileage and also color them by transmission type.
mtcars %>% # your data here
ggplot(aes(mpg, # inside aesthetics, add your variable of interest
fill = am
)) + # we want to fill the plots by transmission type
geom_density() # :-)
Okay and not okay. What is the problem with this figure? We did not get a colored plot as we thought of. But why? To be colored based on a group, that variable should be coded as a categorical variable, otherwise called as factor variable in R. Let’s check how am
variable is coded.
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Ah, we see that the am
variable is coded as a numerical variable. Let’s tweak the type and recode now.
mtcars %>%
ggplot(aes(mpg,
# note that we code am variable as a factor
fill = factor(am)
)) +
geom_density()
We got the colors based on transmission type. Overlapping regions obscure the plot ends. Hence I prefer lighter colors.
Adding alpha inside any geom adds transparency.
We see that the am
group and thus the legend is coded as 0 and 1, that legends are coded so too. It would be better to spell them out, what 0 and 1 means.
To achieve this, let us recode the data using mutate
command and store in a new name
df <- mtcars %>%
mutate(
am = # am variable should be mutated (changed) to..
factor(am, # a factor variable..
levels = c(0, 1), # which has these levels
labels = c("Automatic", "Manual") # add corresponding names to the levels
)
)
# to check
str(df$am)
Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
Yup, it is a factor with 2 levels now. Awesome!
df %>%
ggplot(aes(mpg, fill = am)) +
geom_density(alpha = 0.4) +
theme_classic() +
labs( # I am adding title and axes labels using this command
x = "Miles/(US) gallon",
y = "Density",
title = "Density plots of mileage by transmission type"
)
Still the title of legends need some working; am
is not intuitive.
df %>%
ggplot(aes(mpg, fill = am)) +
geom_density(alpha = 0.4) +
theme_classic() +
labs(
x = "Miles/(US) gallon",
y = "Density",
title = "Density plots of mileage by transmission type",
fill = "Transmission type"
)
To change the legend title, I used the fill
here, as the colors fill the plots.
This is pretty good and we can end it here. But if you want to step this up, try adding the sample sizes to the density plots.
# sample sizes by transmission group
df %>%
group_by(am) %>%
summarize(samplesize = n())
# A tibble: 2 × 2
am samplesize
<fct> <int>
1 Automatic 19
2 Manual 13
We will now use this summary from the object fortext
to annotate the plots.
df %>%
ggplot(aes(mpg, fill = am)) +
geom_density(alpha = 0.8) +
geom_text( # we call this to annotate
data = fortext, # using this data we created earlier
aes(
y = y, # x and y we input manually
x = x,
label = am, # asking to label the transmission types
fontface = "bold",
color = am
)
) +
scale_fill_manual(values=c("#868E74", "#AD5988"))+
# adding sample sizes
geom_text(
data = fortext,
aes(
x = x,
y = y - 0.005, # a bit lower than the transmission type level
color = am,
fontface = "bold",
label = str_glue(
"n = {samplesize}" # I want n = before the sample sizes
)
)
) +
scale_color_manual(values=c("#868E74", "#AD5988"))+
theme_classic() +
labs(
x = "Miles/(US) gallon",
y = "Density",
title = "Density plots of mileage by transmission type"
) +
theme(legend.position = "none") # no longer required since we have labeled them in the plots
Oh how much I love it!
I would probably do this if this was my first density plot!
See you with a next one!
For attribution, please cite this work as
Soundararajan (2021, Aug. 8). My R Space: Density Plots. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-08-05-density-plots/
BibTeX citation
@misc{soundararajan2021density, author = {Soundararajan, Soundarya}, title = {My R Space: Density Plots}, url = {https://github.com/soundarya24/SoundBlog/posts/2021-08-05-density-plots/}, year = {2021} }