The for loop
In most computer programming languages you can write a for loop.
In R we try not to do this (although we can).
Computing multiple powers.
Write a for loop that computes e^1, e^2,… e^100 and prints them.
Or better yet stores them as a vector (this is a little harder).
Solution
e = 2.71828
vector_o_numbers <- vector (mode = "numeric" , length = 100 )
for (exponent in 1 : 100 ){
print (e^ exponent)
vector_o_numbers[exponent] <- exp (exponent)
}
[1] 2.71828
[1] 7.389046
[1] 20.0855
[1] 54.598
[1] 148.4127
[1] 403.4272
[1] 1096.628
[1] 2980.942
[1] 8103.035
[1] 22026.32
[1] 59873.7
[1] 162753.5
[1] 442409.5
[1] 1202593
[1] 3268984
[1] 8886015
[1] 24154677
[1] 65659174
[1] 178480020
[1] 485158668
[1] 1318797105
[1] 3584859796
[1] 9744652685
[1] 26488694502
[1] 72003688490
[1] 195726186349
[1] 532038577830
[1] 1.44623e+12
[1] 3.931258e+12
[1] 1.068626e+13
[1] 2.904824e+13
[1] 7.896126e+13
[1] 2.146388e+14
[1] 5.834484e+14
[1] 1.585976e+15
[1] 4.311127e+15
[1] 1.171885e+16
[1] 3.185512e+16
[1] 8.659113e+16
[1] 2.353789e+17
[1] 6.398258e+17
[1] 1.739226e+18
[1] 4.727703e+18
[1] 1.285122e+19
[1] 3.493321e+19
[1] 9.495826e+19
[1] 2.581231e+20
[1] 7.016509e+20
[1] 1.907284e+21
[1] 5.184531e+21
[1] 1.409301e+22
[1] 3.830874e+22
[1] 1.041339e+23
[1] 2.83065e+23
[1] 7.694501e+23
[1] 2.091581e+24
[1] 5.685502e+24
[1] 1.545479e+25
[1] 4.201044e+25
[1] 1.141961e+26
[1] 3.104171e+26
[1] 8.438005e+26
[1] 2.293686e+27
[1] 6.234881e+27
[1] 1.694815e+28
[1] 4.606982e+28
[1] 1.252307e+29
[1] 3.40412e+29
[1] 9.253352e+29
[1] 2.51532e+30
[1] 6.837345e+30
[1] 1.858582e+31
[1] 5.052146e+31
[1] 1.373315e+32
[1] 3.733054e+32
[1] 1.014749e+33
[1] 2.758371e+33
[1] 7.498024e+33
[1] 2.038173e+34
[1] 5.540324e+34
[1] 1.506015e+35
[1] 4.093771e+35
[1] 1.112802e+36
[1] 3.024906e+36
[1] 8.222543e+36
[1] 2.235117e+37
[1] 6.075675e+37
[1] 1.651538e+38
[1] 4.489344e+38
[1] 1.220329e+39
[1] 3.317197e+39
[1] 9.01707e+39
[1] 2.451092e+40
[1] 6.662755e+40
[1] 1.811123e+41
[1] 4.92314e+41
[1] 1.338247e+42
[1] 3.637731e+42
[1] 9.888372e+42
[1] 2.687936e+43
R is a vectorized language
This means most created objects are vectors.
Traditionally the word “hi” is a character variable or “string”.
R makes it a vector with a string in it.
word <- "hi"
is.vector (word)
two_words <- c (word , word)
is.vector (two_words)
We can take advantage of this vectorized nature.
Many functions are vectorized
This means they take a vector as an input and give a vector as an output.
# Here I recompute the for loop with 100 powers of e in it.
faster_vec_o_numbers <- exp (1 : 100 )
Some vectorized operations and functions
Basic Operations (+, -, *, /, //, %/%)
Base R functions: nchar()
, is.na()
, is.numeric()
Comparison operators >, <, ==
Make a vectorized function.
The conversion between Celsius and Fahrenheit is given by this formula: \[ F = C \times \frac{9}{5}+32\]
Make a function that does this conversion.
Use the function to compute all temps for C between 0 and 100.
Solution
C_to_F <- function (temperature_C){
temperature_F = temperature_C * 9 / 5 + 32
#return(temperature_F)
}
celcius_temps <- C_to_F (0 : 100 )
Note: In R the last line is always returned, so we do not need to explicitly return anything.
Aggregating functions
These are functions that take a vector and give one output.
Examples: mean()
, sd()
, median()
, sum()
, n()
They are not vectorized functions.
Vectorize Exisiting functions
We can vectorized existing functions with helper functions in R
Two general types are across()
and map()
across()
and map()
These functions allow us to preform the same operation across multiple rows.
map()
comes from the purr
package.
many different map()s
apply a function to every element
across()
comes from the dplyr
package.
this is a helper used with mutate and summarize
can also apply a function to every element.
Example: iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
A for loop
There are four columns in iris and I’d like to calculate the mean of each of them.
This feels bad. I’m using vectors in a four loop
for (column in 1 : 4 ){
print (mean (iris[[column]], na.rm = TRUE ))
}
[1] 5.843333
[1] 3.057333
[1] 3.758
[1] 1.199333
Average with across
Let’s mimic the for loop above.
iris %>%
summarise (
across (.cols = is.numeric,
#.cols = everything()
.fns = mean))
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.843333 3.057333 3.758 1.199333
Again with group_by()
Let’s find the average of each column that ends with .Length
by species.
iris %>%
group_by (Species) %>%
summarise (
across (.cols = ends_with ("Length" ),
.fns = mean))
# A tibble: 3 × 3
Species Sepal.Length Petal.Length
<fct> <dbl> <dbl>
1 setosa 5.01 1.46
2 versicolor 5.94 4.26
3 virginica 6.59 5.55
Let’s consider rounding
We’ll look at 5 methods of rounding.
Rounding with across()
Let’s say we want to round every value.
iris %>%
group_by (Species) %>%
summarise (across (
.cols = Sepal.Length: Petal.Width,
.fns = round))
# A tibble: 150 × 5
# Groups: Species [3]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5 4 1 0
2 setosa 5 3 1 0
3 setosa 5 3 1 0
4 setosa 5 3 2 0
5 setosa 5 4 1 0
6 setosa 5 4 2 0
7 setosa 5 3 1 0
8 setosa 5 3 2 0
9 setosa 4 3 1 0
10 setosa 5 3 2 0
# ℹ 140 more rows
Rounding without across()
and the tidyverse
iris %>%
group_by (Species) %>%
mutate (Sepal.Length = round (Sepal.Length),
Sepal.Width = round (Sepal.Width),
Petal.Length = round (Petal.Length),
Petal.Width = round (Petal.Width))
# A tibble: 150 × 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5 4 1 0 setosa
2 5 3 1 0 setosa
3 5 3 1 0 setosa
4 5 3 2 0 setosa
5 5 4 1 0 setosa
6 5 4 2 0 setosa
7 5 3 1 0 setosa
8 5 3 2 0 setosa
9 4 3 1 0 setosa
10 5 3 2 0 setosa
# ℹ 140 more rows
Rounding old school R
This is not taking advantage of the vectors.
# one at a time
iris$ Sepal.Length <- round (iris$ Sepal.Length)
iris$ Sepal.Width <- round (iris$ Sepal.Width)
iris$ Petal.Length <- round (iris$ Petal.Length)
iris$ Petal.Width <- round (iris$ Petal.Width)
Rounding w/ vectors
Round all at the same time because round() is vectorized.
Rounding example 5
# This removes the previous iris df that we made.
rm (iris)
iris %>%
group_by (Species) %>%
reframe (across (
.cols = starts_with ("Sepal" ),
.fns = ~ round (. , digits = 2 )))
# A tibble: 150 × 3
Species Sepal.Length Sepal.Width
<fct> <dbl> <dbl>
1 setosa 5.1 3.5
2 setosa 4.9 3
3 setosa 4.7 3.2
4 setosa 4.6 3.1
5 setosa 5 3.6
6 setosa 5.4 3.9
7 setosa 4.6 3.4
8 setosa 5 3.4
9 setosa 4.4 2.9
10 setosa 4.9 3.1
# ℹ 140 more rows
What .fns?
Any aggregating function with the appropriate data type can be used.
mean, median, sd, round
Here’s a nice list just drop the parenthesis.
where()
We can drop the warning by specifying we only want the numeric variables.
iris |>
summarise (
across (
where (is.numeric),
.fns = mean
)
)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.843333 3.057333 3.758 1.199333
map()
for iteration.
It performs some operation on a data frame, vector or list.
map() returns a list.
$Sepal.Length
[1] 5.843333
$Sepal.Width
[1] 3.057333
$Petal.Length
[1] 3.758
$Petal.Width
[1] 1.199333
$Species
[1] NA
Different maps()
map()
returns a list.
map_dfc()
returns a dataframe with columns
map_dfr()
returns a dataframe with rows.
map_dfc()
iris |>
map_dfc (.f = mean)
# A tibble: 1 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <dbl>
1 5.84 3.06 3.76 1.20 NA
Conclusion
Try not to use a for loop unless you really need to.