data:image/s3,"s3://crabby-images/876f2/876f267f6b1c8d4507e48ce60ff89eeeca2a4474" alt="R Programming Fundamentals"
For loop
For loops are often used to go through every column or row of a dataframe in R.
Say, for example, that we're interested in the mean of all of the numeric columns of the built-in iris dataset (which is four out of the five—everything but the Species column, which is a factor variable of character strings indicating the species of each iris.) We could type, four times, mean(iris$Sepal.Length), with each input variable name changing each time. However, a far more efficient way to complete this exercise would be to use a for loop.
If we simply want to print the means to the console, we could use a for loop as follows:
for(i in seq_along(iris)){
print(mean(iris[[i]]))
}
The output will be as follows:
data:image/s3,"s3://crabby-images/23a6b/23a6b8c4baa93189504a05be879ee6eacd2fd6d6" alt=""
We'll come back to the output, especially that warning message, in a second—first, let's break down the components of the for loop. The syntax will always be as follows:
for(i in a range of numbers){
some_action
}
In this particular for loop, we chose i as our iterator variable. A for loop in R will automatically iterate this variable, which means that every time it reaches the end of the loop, it will increase i by one. You might have noticed that once the loop has finished completing, i was added to the global environment as a Value, 5L (which means it's an integer, the number 5). Our iterator will always get added to the environment when a loop concludes.
It is displayed on the screen, as shown in the following screenshot:
data:image/s3,"s3://crabby-images/19be8/19be8179ad2ead45056fbdc809090c46136fc19d" alt=""
The R function seq_along() is very helpful for the for loops, because it automatically moves along the number of columns of the dataframe (if that's the input) or more generally, iterates along the number of items contained in whatever is input into it.
We also chose to print the mean of each column in this particular for loop. Accessing the columns is done using indexing, so when i = 1, iris[[i]] is equal to the Sepal.Length variable, which is column 1, and so on. We got an error for column 5, because it isn't numeric (the Species variable!) Species doesn't have a mean, because it's a character variable.
This is actually a great example of where we can combine for loops with an if statement. Take a look at the following code:
for(i in seq_along(iris)){
if(class(iris[[i]]) == "numeric"){
print(mean(iris[[i]]))
}
}
The if statement here will only print the mean of an iris column if the class of that column is numeric (which makes sense, since only numeric columns should have means!) The output is now only as follows:
data:image/s3,"s3://crabby-images/f6308/f6308b84348f2487306b93861d02ccc2ee684a7d" alt=""
If we're really feeling fancy, we could have even added an else statement with a different message for when the class of a column isn't numeric, such as in this loop:
for(i in seq_along(iris)){
if(class(iris[[i]]) == "numeric"){
print(mean(iris[[i]]))
}else{
print(paste("Variable", i, "isn't numeric"))
}
}
The output is as follows:
data:image/s3,"s3://crabby-images/d0460/d0460e8783ea78b29d8830d8370caa4a5246e7c9" alt=""
seq_along() returns a sequence of numbers and makes for loops more straightforward. However, if you need to iterate using any other function, the syntax of the for statement will change slightly. The following code will print every row of the Species column in iris:
for(i in 1:nrow(iris)){
print(iris[i, "Sepal.Width"])
}
You have to explicitly use 1:nrow(iris) in the for statement, or this loop will not run. nrow() simply returns the number of rows of iris versus the entire sequence of the number of columns that seq_along() returns as shown below:
nrow(iris)
[1] 150
seq_along(iris)
[1] 1 2 3 4 5