iteration

Schwab

The for loop

In most computer programming languages you can write a for loop.

In R we try not to do this (although we can).

Computing multiple powers.

Write a for loop that computes e^1, e^2,… e^100 and prints them.

Or better yet stores them as a vector (this is a little harder).

Solution

e = 2.71828
vector_o_numbers <- vector(mode = "numeric", length = 100)

for(exponent in 1:100){
  print(e^exponent)
  vector_o_numbers[exponent] <- exp(exponent)
}
[1] 2.71828
[1] 7.389046
[1] 20.0855
[1] 54.598
[1] 148.4127
[1] 403.4272
[1] 1096.628
[1] 2980.942
[1] 8103.035
[1] 22026.32
[1] 59873.7
[1] 162753.5
[1] 442409.5
[1] 1202593
[1] 3268984
[1] 8886015
[1] 24154677
[1] 65659174
[1] 178480020
[1] 485158668
[1] 1318797105
[1] 3584859796
[1] 9744652685
[1] 26488694502
[1] 72003688490
[1] 195726186349
[1] 532038577830
[1] 1.44623e+12
[1] 3.931258e+12
[1] 1.068626e+13
[1] 2.904824e+13
[1] 7.896126e+13
[1] 2.146388e+14
[1] 5.834484e+14
[1] 1.585976e+15
[1] 4.311127e+15
[1] 1.171885e+16
[1] 3.185512e+16
[1] 8.659113e+16
[1] 2.353789e+17
[1] 6.398258e+17
[1] 1.739226e+18
[1] 4.727703e+18
[1] 1.285122e+19
[1] 3.493321e+19
[1] 9.495826e+19
[1] 2.581231e+20
[1] 7.016509e+20
[1] 1.907284e+21
[1] 5.184531e+21
[1] 1.409301e+22
[1] 3.830874e+22
[1] 1.041339e+23
[1] 2.83065e+23
[1] 7.694501e+23
[1] 2.091581e+24
[1] 5.685502e+24
[1] 1.545479e+25
[1] 4.201044e+25
[1] 1.141961e+26
[1] 3.104171e+26
[1] 8.438005e+26
[1] 2.293686e+27
[1] 6.234881e+27
[1] 1.694815e+28
[1] 4.606982e+28
[1] 1.252307e+29
[1] 3.40412e+29
[1] 9.253352e+29
[1] 2.51532e+30
[1] 6.837345e+30
[1] 1.858582e+31
[1] 5.052146e+31
[1] 1.373315e+32
[1] 3.733054e+32
[1] 1.014749e+33
[1] 2.758371e+33
[1] 7.498024e+33
[1] 2.038173e+34
[1] 5.540324e+34
[1] 1.506015e+35
[1] 4.093771e+35
[1] 1.112802e+36
[1] 3.024906e+36
[1] 8.222543e+36
[1] 2.235117e+37
[1] 6.075675e+37
[1] 1.651538e+38
[1] 4.489344e+38
[1] 1.220329e+39
[1] 3.317197e+39
[1] 9.01707e+39
[1] 2.451092e+40
[1] 6.662755e+40
[1] 1.811123e+41
[1] 4.92314e+41
[1] 1.338247e+42
[1] 3.637731e+42
[1] 9.888372e+42
[1] 2.687936e+43

R is a vectorized language

This means most created objects are vectors.

Traditionally the word “hi” is a character variable or “string”.

R makes it a vector with a string in it.

word <- "hi"
is.vector(word)
[1] TRUE
length(word)
[1] 1
two_words <- c(word , word)
is.vector(two_words)
[1] TRUE
is.vector(two_words[1])
[1] TRUE

We can take advantage of this vectorized nature.

Many functions are vectorized

This means they take a vector as an input and give a vector as an output.

# Here I recompute the for loop with 100 powers of e in it. 

faster_vec_o_numbers <- exp(1:100)

Some vectorized operations and functions

  • Basic Operations (+, -, *, /, //, %/%)

  • Base R functions: nchar(), is.na() , is.numeric()

  • Comparison operators >, <, ==

Make a vectorized function.

The conversion between Celsius and Fahrenheit is given by this formula: \[ F = C \times \frac{9}{5}+32\]

  1. Make a function that does this conversion.

  2. Use the function to compute all temps for C between 0 and 100.

Solution

C_to_F <-function(temperature_C){
  temperature_F = temperature_C * 9/5+32  
  #return(temperature_F)
}

celcius_temps <- C_to_F(0:100)

Note: In R the last line is always returned, so we do not need to explicitly return anything.

Aggregating functions

These are functions that take a vector and give one output.

Examples: mean(), sd(), median(), sum(), n()

They are not vectorized functions.

Vectorize Exisiting functions

We can vectorized existing functions with helper functions in R

Two general types are across() and map()

across() and map()

library(tidyverse)

These functions allow us to preform the same operation across multiple rows.

map() comes from the purr package.

  • many different map()s
  • apply a function to every element

across() comes from the dplyr package.

  • this is a helper used with mutate and summarize
  • can also apply a function to every element.

Example: iris

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

A for loop

There are four columns in iris and I’d like to calculate the mean of each of them.

This feels bad. I’m using vectors in a four loop

for (column in 1:4){
  print(mean(iris[[column]], na.rm = TRUE))
}
[1] 5.843333
[1] 3.057333
[1] 3.758
[1] 1.199333

Average with across

Let’s mimic the for loop above.

iris %>%
  summarise(
    across(.cols = is.numeric, 
           #.cols = everything()
           .fns = mean))
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

Again with group_by()

Let’s find the average of each column that ends with .Length by species.

iris %>%
  group_by(Species) %>%
  summarise(
    across(.cols = ends_with("Length"), 
           .fns =  mean))
# A tibble: 3 × 3
  Species    Sepal.Length Petal.Length
  <fct>             <dbl>        <dbl>
1 setosa             5.01         1.46
2 versicolor         5.94         4.26
3 virginica          6.59         5.55

Let’s consider rounding

We’ll look at 5 methods of rounding.

Rounding with across()

Let’s say we want to round every value.

iris %>%
  group_by(Species) %>%
  summarise(across(
    .cols = Sepal.Length:Petal.Width, 
    .fns = round))
# A tibble: 150 × 5
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
   <fct>          <dbl>       <dbl>        <dbl>       <dbl>
 1 setosa             5           4            1           0
 2 setosa             5           3            1           0
 3 setosa             5           3            1           0
 4 setosa             5           3            2           0
 5 setosa             5           4            1           0
 6 setosa             5           4            2           0
 7 setosa             5           3            1           0
 8 setosa             5           3            2           0
 9 setosa             4           3            1           0
10 setosa             5           3            2           0
# ℹ 140 more rows

Rounding without across()

and the tidyverse

iris %>%
  group_by(Species) %>%
  mutate(Sepal.Length = round(Sepal.Length),
            Sepal.Width = round(Sepal.Width),
            Petal.Length = round(Petal.Length),
            Petal.Width = round(Petal.Width))
# A tibble: 150 × 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1            5           4            1           0 setosa 
 2            5           3            1           0 setosa 
 3            5           3            1           0 setosa 
 4            5           3            2           0 setosa 
 5            5           4            1           0 setosa 
 6            5           4            2           0 setosa 
 7            5           3            1           0 setosa 
 8            5           3            2           0 setosa 
 9            4           3            1           0 setosa 
10            5           3            2           0 setosa 
# ℹ 140 more rows

Rounding old school R

This is not taking advantage of the vectors.

# one at a time
iris$Sepal.Length <- round(iris$Sepal.Length)
iris$Sepal.Width <- round(iris$Sepal.Width)
iris$Petal.Length <- round(iris$Petal.Length)
iris$Petal.Width <- round(iris$Petal.Width)

Rounding w/ vectors

Round all at the same time because round() is vectorized.

iris <- round(iris[1:4])

Rounding example 5

# This removes the previous iris df that we made. 
rm(iris)

iris %>%
  group_by(Species) %>%
  reframe(across(
    .cols = starts_with("Sepal"), 
    .fns = ~ round(. , digits = 2)))
# A tibble: 150 × 3
   Species Sepal.Length Sepal.Width
   <fct>          <dbl>       <dbl>
 1 setosa           5.1         3.5
 2 setosa           4.9         3  
 3 setosa           4.7         3.2
 4 setosa           4.6         3.1
 5 setosa           5           3.6
 6 setosa           5.4         3.9
 7 setosa           4.6         3.4
 8 setosa           5           3.4
 9 setosa           4.4         2.9
10 setosa           4.9         3.1
# ℹ 140 more rows

What .fns?

Any aggregating function with the appropriate data type can be used.

mean, median, sd, round

Here’s a nice list just drop the parenthesis.

where()

We can drop the warning by specifying we only want the numeric variables.

iris |>
  summarise(
    across(
      where(is.numeric), 
      .fns = mean
    )
  )
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

map() for iteration.

It performs some operation on a data frame, vector or list.

map() returns a list.

iris |>
  map(.f = mean)
$Sepal.Length
[1] 5.843333

$Sepal.Width
[1] 3.057333

$Petal.Length
[1] 3.758

$Petal.Width
[1] 1.199333

$Species
[1] NA

Different maps()

map() returns a list.

map_dfc() returns a dataframe with columns

map_dfr() returns a dataframe with rows.

map_dfc()

iris |>
  map_dfc(.f = mean)
# A tibble: 1 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>   <dbl>
1         5.84        3.06         3.76        1.20      NA

Conclusion

Try not to use a for loop unless you really need to.