The curious case of case_when

dplyr Beginner Tidying

How to create variables using the function case_when?

Soundarya Soundararajan true
08-10-2021

First of first, why?

I encounter this task quite often during my exploratory data analysis, to rename/regroup variables. case_when from dplyr (a part of tidyverse) comes in handy.

Libraries and data

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
table(mtcars$cyl) #creates a table for the variable selected

 4  6  8 
11  7 14 

We are using mtcars dataset for this; we are renaming to preserve the original data and will work on the renamed data.

df_cars <- mtcars

How to create a new variable based on an existing variable?

We aim to create a new variable cyl_new using mutate function.

df_cars <- df_cars %>% #this says which dataframe should be selected
  mutate(cyl_new = case_when( #new variable name is given here
    cyl == 4 ~ "Four", # if cyl is 4, rename it as four
    cyl == 6 ~ "Six", #and so on
    cyl == 8 ~ "Eight"
  ))

Check

Let us check whether the job is done.

table(df_cars$cyl_new) 

Eight  Four   Six 
   14    11     7 

Yes! Let’s check another way of using case_when. First let me simulate a data

subjects <- c("a", "b", "c", "d", "e")
marks<- c(70,90,20,10,96)
df <- data.frame(subjects,marks)
df
  subjects marks
1        a    70
2        b    90
3        c    20
4        d    10
5        e    96

With this new data our aim is to markpass for all those above 60.

df_new <- df %>% 
  mutate(rank=case_when(
    marks > 60 ~ "Pass",
    TRUE~"Fail" 
  ))

We essentially told dplyr to code all those marks above 60 as pass and rest of them as fail with the last line of the code. Let’s check

df_new
  subjects marks rank
1        a    70 Pass
2        b    90 Pass
3        c    20 Fail
4        d    10 Fail
5        e    96 Pass

–Happy mutating–

Citation

For attribution, please cite this work as

Soundararajan (2021, Aug. 10). My R Space: The curious case of case_when. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2021-01-04-casewhen/

BibTeX citation

@misc{soundararajan2021the,
  author = {Soundararajan, Soundarya},
  title = {My R Space: The curious case of case_when},
  url = {https://github.com/soundarya24/SoundBlog/posts/2021-01-04-casewhen/},
  year = {2021}
}