A friend of mine wanted to quickly write a loop in R. Here is how to approach it
Necessity is the mother of codes.
I had learned most of R when I tried solving my analysis requirements. Today my friend approached me as she required me to help her write a loop for the following condition.
She mentions there are many rows, so she doesn’t want to go about in traditional ways she wants to use R. She has nine variables, each has to be multiplied by a number, then all those values need to be summed up to generate the final variable. Let’s break these requirements to understand the loop we will be writing.
each of 9 variables has to be multiplied by some number (could be identical or different)
all those multiplied new variables have to be summed up.
create a new variable with the final summed values.
Alright, let’s get into action.
I am creating a small dataset here for demonstration. The condition for the loop is the same for any number of variables. If you need to import your data, you will do it at this step.
subject <- c("maths", "science", "english")
first_term_marks <- c(20,30,30)
second_term_marks <- c(15,20,25)
df <- data.frame(subject,first_term_marks,second_term_marks)
I created a data frame with three variables, 2 numbers and one nominal.
df # this is the full data
subject first_term_marks second_term_marks
1 maths 20 15
2 science 30 20
3 english 30 25
nrow(df) # this is the number of rows in the data frame
[1] 3
Now. let’s loop :-)
a simple loop goes like this
for (i in 1:10) {# i have specified numbers from 1 to 10
x=i*2 #this is our multiplication, see that we have stored it in object x
print(x)#we are asking R to print our x values
}
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 12
[1] 14
[1] 16
[1] 18
[1] 20
But we want all her rows to be included not just 1-10. so instead we use, nrow(df)
for (i in 1:nrow(df)) { #from 1 to number of rows in the dataframe df
df$marksnew_firstterm=first_term_marks*2 # multiply first term marks by 2
df$marksnew_secondterm=second_term_marks*4 # multiply second term marks by 4
df$total=df$marksnew_firstterm+df$marksnew_secondterm # give me a total of both the term marks
}
See that I have stored all the variables calculated as new variables in our data frame. so when we print our data frame now,
df
subject first_term_marks second_term_marks marksnew_firstterm
1 maths 20 15 40
2 science 30 20 60
3 english 30 25 60
marksnew_secondterm total
1 60 100
2 80 140
3 100 160
This will have all the calculated items. Let us say you want only the final output in the data frame and not the intermediate ones, then tweak the code in this way.
subject <- c("maths", "science", "english")
first_term_marks <- c(20,30,30)
second_term_marks <- c(15,20,25)
df_new <- data.frame(subject,first_term_marks,second_term_marks)
for (i in 1:nrow(df_new)) {
marksnew_firstterm=first_term_marks*2
marksnew_secondterm=second_term_marks*4
df_new$total=marksnew_firstterm+marksnew_secondterm
}
Now when we print the data frame,
df_new
subject first_term_marks second_term_marks total
1 maths 20 15 100
2 science 30 20 140
3 english 30 25 160
You see that it has only the total marks, and the intermediary values are not added. We did not make those intermediary variables part of the data frame. See that we skipped using df followed by $
and used only marksnew_firstterm. Using df$variablename creates the variable as part of the data frame, df
Happy looping!!
For attribution, please cite this work as
Soundararajan (2022, Jan. 7). My R Space: Loops in R. Retrieved from https://github.com/soundarya24/SoundBlog/posts/2022-01-07-loops-in-r/
BibTeX citation
@misc{soundararajan2022loops, author = {Soundararajan, Soundarya}, title = {My R Space: Loops in R}, url = {https://github.com/soundarya24/SoundBlog/posts/2022-01-07-loops-in-r/}, year = {2022} }