iteration

Schwab

The for loop

In most computer programming languages you can write a for loop.

In R we try not to do this (although we can).

Ways to make a vector in R

  • 1:100 makes a vector of the number 1 through 100 in R.

  • c("a","b") makes a vector with “a” and “b” in it

  • vector() also makes a vector.

  • df$column makes a vector of a data frame column.

Remember vectors are sets of values of the same type.

Example for() loop

#letters:  R has the alphabet built in. 

for (letter in letters){
  print(letter)
}
[1] "a"
[1] "b"
[1] "c"
[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
[1] "k"
[1] "l"
[1] "m"
[1] "n"
[1] "o"
[1] "p"
[1] "q"
[1] "r"
[1] "s"
[1] "t"
[1] "u"
[1] "v"
[1] "w"
[1] "x"
[1] "y"
[1] "z"

Computing multiple powers.

Write a for loop that computes e^1, e^2,… e^100 and prints them.

Or better yet stores them as a vector (this is a little harder).

Here’s a start:

e = 2.71828 # define e
vector_o_numbers <- vector(mode = "numeric", length = 100) #Making the loop fast by making an empty vector first.

# write for loop below. 

Solution

e = 2.71828
vector_o_numbers <- vector(mode = "numeric", length = 100) #Making the loop fast by making an empty vector first.

for(exponent in 1:100){
  print(e^exponent) # print the number
  vector_o_numbers[exponent] <- exp(exponent) # store the number
  # we don't need to return() in R, b/c the last line is always returned. 
}
[1] 2.71828
[1] 7.389046
[1] 20.0855
[1] 54.598
[1] 148.4127
[1] 403.4272
[1] 1096.628
[1] 2980.942
[1] 8103.035
[1] 22026.32
[1] 59873.7
[1] 162753.5
[1] 442409.5
[1] 1202593
[1] 3268984
[1] 8886015
[1] 24154677
[1] 65659174
[1] 178480020
[1] 485158668
[1] 1318797105
[1] 3584859796
[1] 9744652685
[1] 26488694502
[1] 72003688490
[1] 195726186349
[1] 532038577830
[1] 1.44623e+12
[1] 3.931258e+12
[1] 1.068626e+13
[1] 2.904824e+13
[1] 7.896126e+13
[1] 2.146388e+14
[1] 5.834484e+14
[1] 1.585976e+15
[1] 4.311127e+15
[1] 1.171885e+16
[1] 3.185512e+16
[1] 8.659113e+16
[1] 2.353789e+17
[1] 6.398258e+17
[1] 1.739226e+18
[1] 4.727703e+18
[1] 1.285122e+19
[1] 3.493321e+19
[1] 9.495826e+19
[1] 2.581231e+20
[1] 7.016509e+20
[1] 1.907284e+21
[1] 5.184531e+21
[1] 1.409301e+22
[1] 3.830874e+22
[1] 1.041339e+23
[1] 2.83065e+23
[1] 7.694501e+23
[1] 2.091581e+24
[1] 5.685502e+24
[1] 1.545479e+25
[1] 4.201044e+25
[1] 1.141961e+26
[1] 3.104171e+26
[1] 8.438005e+26
[1] 2.293686e+27
[1] 6.234881e+27
[1] 1.694815e+28
[1] 4.606982e+28
[1] 1.252307e+29
[1] 3.40412e+29
[1] 9.253352e+29
[1] 2.51532e+30
[1] 6.837345e+30
[1] 1.858582e+31
[1] 5.052146e+31
[1] 1.373315e+32
[1] 3.733054e+32
[1] 1.014749e+33
[1] 2.758371e+33
[1] 7.498024e+33
[1] 2.038173e+34
[1] 5.540324e+34
[1] 1.506015e+35
[1] 4.093771e+35
[1] 1.112802e+36
[1] 3.024906e+36
[1] 8.222543e+36
[1] 2.235117e+37
[1] 6.075675e+37
[1] 1.651538e+38
[1] 4.489344e+38
[1] 1.220329e+39
[1] 3.317197e+39
[1] 9.01707e+39
[1] 2.451092e+40
[1] 6.662755e+40
[1] 1.811123e+41
[1] 4.92314e+41
[1] 1.338247e+42
[1] 3.637731e+42
[1] 9.888372e+42
[1] 2.687936e+43

R is a vectorized language

This means most created objects are vectors.

Traditionally the word “hi” is a character variable or “string”.

R makes it a vector with a string in it.

word <- "hi"
is.vector(word)
[1] TRUE
length(word)
[1] 1
two_words <- c(word , word)
is.vector(two_words)
[1] TRUE
is.vector(two_words[1])
[1] TRUE

We can take advantage of this vectorized nature.

Many functions are vectorized

This means they take a vector as an input and give a vector as an output.

# Here I recompute the for loop with 100 powers of e in it. 
faster_vec_o_numbers <- exp(1:100)

Checking the time

If you are interested you can check to see which is faster with system.time()

print("With a for loop the time is:")
[1] "With a for loop the time is:"
system.time(for(exponent in 1:100){vector_o_numbers[exponent] <- exp(exponent)})
   user  system elapsed 
  0.001   0.000   0.002 
print("Using vectorized operations the time is: ")
[1] "Using vectorized operations the time is: "
system.time( faster_vec_o_numbers <- exp(1:100))
   user  system elapsed 
      0       0       0 

Some vectorized operations and functions

  • Basic Operations (+, -, *, /, //, %/%)

  • Base R functions: nchar(), is.na() , is.numeric()

  • Comparison operators >, <, ==

Make a vectorized function.

The conversion between Celsius and Fahrenheit is given by this formula: \[ F = C \times \frac{9}{5}+32\]

  1. Make a function that does this conversion.

  2. Use the function to compute all temps for C between 0 and 100.

Solution

C_to_F <-function(temperature_C){
  temperature_F = temperature_C * 9/5+32  
  #return(temperature_F)
}

celcius_temps <- C_to_F(0:100)

Aggregating functions

These are functions that take a vector and give one output.

Examples: mean(), sd(), median(), sum(), n()

They are not vectorized functions.

Vectorize exisiting functions

We can vectorize existing functions with helper functions in R

Two general what of doing this are across() and map()

across() and map()

library(tidyverse)

These functions allow us to preform the same operation across multiple rows.

map() comes from the purr package.

  • many different map()s
    • map_df() will be useful for us.
  • apply a function to every element

across() comes from the dplyr package.

  • this is a helper used with mutate and summarize
  • can also apply a function to every element.

Example: iris

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Average with across

There are four columns in iris and I’d like to calculate the mean of each of them.

iris %>%
  summarise(
    across(.cols = is.numeric, 
           #.cols = everything()
           .fns = mean))
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

Again with group_by()

Let’s find the average of each column that ends with .Length by species.

iris %>%
  group_by(Species) %>%
  summarise(
    across(.cols = ends_with("Length"), 
           .fns =  mean))
# A tibble: 3 × 3
  Species    Sepal.Length Petal.Length
  <fct>             <dbl>        <dbl>
1 setosa             5.01         1.46
2 versicolor         5.94         4.26
3 virginica          6.59         5.55

Let’s consider rounding

We’ll look at 5 methods of rounding.

I’ll also show you a couple of ways to select columns.

Rounding with across()

Let’s say we want to round every value.

iris %>%
  group_by(Species) %>%
  summarise(
    across(
      .cols = Sepal.Length:Petal.Width, # R makes a vector of column names
      .fns = round)
    )
# A tibble: 150 × 5
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
   <fct>          <dbl>       <dbl>        <dbl>       <dbl>
 1 setosa             5           4            1           0
 2 setosa             5           3            1           0
 3 setosa             5           3            1           0
 4 setosa             5           3            2           0
 5 setosa             5           4            1           0
 6 setosa             5           4            2           0
 7 setosa             5           3            1           0
 8 setosa             5           3            2           0
 9 setosa             4           3            1           0
10 setosa             5           3            2           0
# ℹ 140 more rows

Rounding without across()

and the tidyverse

iris %>%
  group_by(Species) %>%
  mutate(
    Sepal.Length = round(Sepal.Length),
    Sepal.Width = round(Sepal.Width),
    Petal.Length = round(Petal.Length),
    Petal.Width = round(Petal.Width)
    )
# A tibble: 150 × 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1            5           4            1           0 setosa 
 2            5           3            1           0 setosa 
 3            5           3            1           0 setosa 
 4            5           3            2           0 setosa 
 5            5           4            1           0 setosa 
 6            5           4            2           0 setosa 
 7            5           3            1           0 setosa 
 8            5           3            2           0 setosa 
 9            4           3            1           0 setosa 
10            5           3            2           0 setosa 
# ℹ 140 more rows

Rounding old school R

# one at a time
iris$Sepal.Length <- round(iris$Sepal.Length)
iris$Sepal.Width <- round(iris$Sepal.Width)
iris$Petal.Length <- round(iris$Petal.Length)
iris$Petal.Width <- round(iris$Petal.Width)

Rounding old school w/ vectors

Round all at the same time because round() is vectorized.

iris <- round(iris[1:4])

Rounding example 5

rm(iris) # This removes the previous iris df that we made. 

iris %>%
  group_by(Species) %>%
  summarise(
    across( 
      .cols = starts_with("Sepal"), 
      .fns = ~ round(. , digits = 2))
    ) # The ~ introduces a custom round function the . references the data.  
# A tibble: 150 × 3
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width
   <fct>          <dbl>       <dbl>
 1 setosa           5.1         3.5
 2 setosa           4.9         3  
 3 setosa           4.7         3.2
 4 setosa           4.6         3.1
 5 setosa           5           3.6
 6 setosa           5.4         3.9
 7 setosa           4.6         3.4
 8 setosa           5           3.4
 9 setosa           4.4         2.9
10 setosa           4.9         3.1
# ℹ 140 more rows

What .fns?

Any aggregating function with the appropriate data type can be used.

mean, median, sd, round

Here’s a nice list just drop the parenthesis.

where()

We can drop the warning by specifying we only want the numeric variables.

iris |>
  summarise(
    across(
      where(is.numeric), 
      .fns = mean
    )
  )
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

map() for iteration.

It performs some operation on a data frame, vector or list.

map() returns a list.

map_df() returns a data frame.

iris |>
  map_df(.f = mean)
# A tibble: 1 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>   <dbl>
1         5.84        3.06         3.76        1.20      NA

Different maps()

map() returns a list.

map_dfc() returns a dataframe with columns

map_dfr() returns a dataframe with rows.

map_dfc()

iris |>
  map_dfc(.f = mean)
# A tibble: 1 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>   <dbl>
1         5.84        3.06         3.76        1.20      NA

Conclusion

Try not to use a for loop unless you really need to.