How to create variables using the function case_when
?
I encounter this task quite often during my exploratory data analysis, to rename/regroup variables. case_when
from dplyr
(a part of tidyverse) comes in handy.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
table(mtcars$cyl) #creates a table for the variable selected
4 6 8
11 7 14
We are using mtcars
dataset for this; we are renaming to preserve the original data and will work on the renamed data.
df_cars <- mtcars
We aim to create a new variable cyl_new
using mutate
function.
df_cars <- df_cars %>% #this says which dataframe should be selected
mutate(cyl_new = case_when( #new variable name is given here
cyl == 4 ~ "Four", # if cyl is 4, rename it as four
cyl == 6 ~ "Six", #and so on
cyl == 8 ~ "Eight"
))
Let us check whether the job is done.
table(df_cars$cyl_new)
Eight Four Six
14 11 7
Yes! Let’s check another way of using case_when
. First let me simulate a data
subjects <- c("a", "b", "c", "d", "e")
marks<- c(70,90,20,10,96)
df <- data.frame(subjects,marks)
df
subjects marks
1 a 70
2 b 90
3 c 20
4 d 10
5 e 96
With this new data our aim is to markpass
for all those above 60.
df_new <- df %>%
mutate(rank=case_when(
marks > 60 ~ "Pass",
TRUE~"Fail"
))
We essentially told dplyr
to code all those marks above 60 as pass and rest of them as fail with the last line of the code. Let’s check
df_new
subjects marks rank
1 a 70 Pass
2 b 90 Pass
3 c 20 Fail
4 d 10 Fail
5 e 96 Pass
–Happy mutating–
For attribution, please cite this work as
Soundararajan (2021, Aug. 10). My R Space: The curious case of case_when. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-01-04-casewhen/
BibTeX citation
@misc{soundararajan2021the, author = {Soundararajan, Soundarya}, title = {My R Space: The curious case of case_when}, url = {https://github.com/soundarya24/SoundBlog/posts/2021-01-04-casewhen/}, year = {2021} }