iteration

Schwab

The for loop

In most computer programming languages you can write a for loop.

In R we try not to do this (although we can).

Ways to make a vector in R

1:100 makes a vector of the number 1 through 100 in R.
c("a","b") makes a vector with “a” and “b” in it
vector() also makes a vector.
df$column makes a vector of a data frame column.

Remember vectors are sets of values of the same type.

Example for() loop

#letters:  R has the alphabet built in. 

for (letter in letters){
  print(letter)
}

[1] "a"
[1] "b"
[1] "c"
[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
[1] "k"
[1] "l"
[1] "m"
[1] "n"
[1] "o"
[1] "p"
[1] "q"
[1] "r"
[1] "s"
[1] "t"
[1] "u"
[1] "v"
[1] "w"
[1] "x"
[1] "y"
[1] "z"

Computing multiple powers.

Write a for loop that computes e^1, e^2,… e^100 and prints them.

Or better yet stores them as a vector (this is a little harder).

Here’s a start:

e = 2.71828 # define e
vector_o_numbers <- vector(mode = "numeric", length = 100) #Making the loop fast by making an empty vector first.

# write for loop below.

Solution

e = 2.71828
vector_o_numbers <- vector(mode = "numeric", length = 100) #Making the loop fast by making an empty vector first.

for(exponent in 1:100){
  print(e^exponent) # print the number
  vector_o_numbers[exponent] <- exp(exponent) # store the number
  # we don't need to return() in R, b/c the last line is always returned. 
}

[1] 2.71828
[1] 7.389046
[1] 20.0855
[1] 54.598
[1] 148.4127
[1] 403.4272
[1] 1096.628
[1] 2980.942
[1] 8103.035
[1] 22026.32
[1] 59873.7
[1] 162753.5
[1] 442409.5
[1] 1202593
[1] 3268984
[1] 8886015
[1] 24154677
[1] 65659174
[1] 178480020
[1] 485158668
[1] 1318797105
[1] 3584859796
[1] 9744652685
[1] 26488694502
[1] 72003688490
[1] 195726186349
[1] 532038577830
[1] 1.44623e+12
[1] 3.931258e+12
[1] 1.068626e+13
[1] 2.904824e+13
[1] 7.896126e+13
[1] 2.146388e+14
[1] 5.834484e+14
[1] 1.585976e+15
[1] 4.311127e+15
[1] 1.171885e+16
[1] 3.185512e+16
[1] 8.659113e+16
[1] 2.353789e+17
[1] 6.398258e+17
[1] 1.739226e+18
[1] 4.727703e+18
[1] 1.285122e+19
[1] 3.493321e+19
[1] 9.495826e+19
[1] 2.581231e+20
[1] 7.016509e+20
[1] 1.907284e+21
[1] 5.184531e+21
[1] 1.409301e+22
[1] 3.830874e+22
[1] 1.041339e+23
[1] 2.83065e+23
[1] 7.694501e+23
[1] 2.091581e+24
[1] 5.685502e+24
[1] 1.545479e+25
[1] 4.201044e+25
[1] 1.141961e+26
[1] 3.104171e+26
[1] 8.438005e+26
[1] 2.293686e+27
[1] 6.234881e+27
[1] 1.694815e+28
[1] 4.606982e+28
[1] 1.252307e+29
[1] 3.40412e+29
[1] 9.253352e+29
[1] 2.51532e+30
[1] 6.837345e+30
[1] 1.858582e+31
[1] 5.052146e+31
[1] 1.373315e+32
[1] 3.733054e+32
[1] 1.014749e+33
[1] 2.758371e+33
[1] 7.498024e+33
[1] 2.038173e+34
[1] 5.540324e+34
[1] 1.506015e+35
[1] 4.093771e+35
[1] 1.112802e+36
[1] 3.024906e+36
[1] 8.222543e+36
[1] 2.235117e+37
[1] 6.075675e+37
[1] 1.651538e+38
[1] 4.489344e+38
[1] 1.220329e+39
[1] 3.317197e+39
[1] 9.01707e+39
[1] 2.451092e+40
[1] 6.662755e+40
[1] 1.811123e+41
[1] 4.92314e+41
[1] 1.338247e+42
[1] 3.637731e+42
[1] 9.888372e+42
[1] 2.687936e+43

R is a vectorized language

This means most created objects are vectors.

Traditionally the word “hi” is a character variable or “string”.

R makes it a vector with a string in it.

word <- "hi"
is.vector(word)

[1] TRUE

length(word)

[1] 1

two_words <- c(word , word)
is.vector(two_words)

[1] TRUE

is.vector(two_words[1])

[1] TRUE

We can take advantage of this vectorized nature.

Many functions are vectorized

This means they take a vector as an input and give a vector as an output.

# Here I recompute the for loop with 100 powers of e in it. 
faster_vec_o_numbers <- exp(1:100)

Checking the time

If you are interested you can check to see which is faster with system.time()

print("With a for loop the time is:")

[1] "With a for loop the time is:"

system.time(for(exponent in 1:100){vector_o_numbers[exponent] <- exp(exponent)})

   user  system elapsed 
  0.001   0.000   0.002

print("Using vectorized operations the time is: ")

[1] "Using vectorized operations the time is: "

system.time( faster_vec_o_numbers <- exp(1:100))

   user  system elapsed 
      0       0       0

Some vectorized operations and functions

Basic Operations (+, -, *, /, //, %/%)
Base R functions: nchar(), is.na() , is.numeric()
Comparison operators >, <, ==

Make a vectorized function.

The conversion between Celsius and Fahrenheit is given by this formula: \[ F = C \times \frac{9}{5}+32\]

Make a function that does this conversion.
Use the function to compute all temps for C between 0 and 100.

Solution

C_to_F <-function(temperature_C){
  temperature_F = temperature_C * 9/5+32  
  #return(temperature_F)
}

celcius_temps <- C_to_F(0:100)

Aggregating functions

These are functions that take a vector and give one output.

Examples: mean(), sd(), median(), sum(), n()

They are not vectorized functions.

Vectorize exisiting functions

We can vectorize existing functions with helper functions in R

Two general what of doing this are across() and map()

`across()` and `map()`

library(tidyverse)

These functions allow us to preform the same operation across multiple rows.

map() comes from the purr package.

many different map()s
- map_df() will be useful for us.
apply a function to every element

across() comes from the dplyr package.

this is a helper used with mutate and summarize
can also apply a function to every element.

Example: `iris`

head(iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Average with across

There are four columns in iris and I’d like to calculate the mean of each of them.

iris %>%
  summarise(
    across(.cols = is.numeric, 
           #.cols = everything()
           .fns = mean))

  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

Again with group_by()

Let’s find the average of each column that ends with .Length by species.

iris %>%
  group_by(Species) %>%
  summarise(
    across(.cols = ends_with("Length"), 
           .fns =  mean))

# A tibble: 3 × 3
  Species    Sepal.Length Petal.Length
  <fct>             <dbl>        <dbl>
1 setosa             5.01         1.46
2 versicolor         5.94         4.26
3 virginica          6.59         5.55

Let’s consider rounding

We’ll look at 5 methods of rounding.

I’ll also show you a couple of ways to select columns.

Rounding with `across()`

Let’s say we want to round every value.

iris %>%
  group_by(Species) %>%
  summarise(
    across(
      .cols = Sepal.Length:Petal.Width, # R makes a vector of column names
      .fns = round)
    )

# A tibble: 150 × 5
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
   <fct>          <dbl>       <dbl>        <dbl>       <dbl>
 1 setosa             5           4            1           0
 2 setosa             5           3            1           0
 3 setosa             5           3            1           0
 4 setosa             5           3            2           0
 5 setosa             5           4            1           0
 6 setosa             5           4            2           0
 7 setosa             5           3            1           0
 8 setosa             5           3            2           0
 9 setosa             4           3            1           0
10 setosa             5           3            2           0
# ℹ 140 more rows

Rounding without `across()`

and the tidyverse

iris %>%
  group_by(Species) %>%
  mutate(
    Sepal.Length = round(Sepal.Length),
    Sepal.Width = round(Sepal.Width),
    Petal.Length = round(Petal.Length),
    Petal.Width = round(Petal.Width)
    )

# A tibble: 150 × 5
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1            5           4            1           0 setosa 
 2            5           3            1           0 setosa 
 3            5           3            1           0 setosa 
 4            5           3            2           0 setosa 
 5            5           4            1           0 setosa 
 6            5           4            2           0 setosa 
 7            5           3            1           0 setosa 
 8            5           3            2           0 setosa 
 9            4           3            1           0 setosa 
10            5           3            2           0 setosa 
# ℹ 140 more rows

Rounding old school R

# one at a time
iris$Sepal.Length <- round(iris$Sepal.Length)
iris$Sepal.Width <- round(iris$Sepal.Width)
iris$Petal.Length <- round(iris$Petal.Length)
iris$Petal.Width <- round(iris$Petal.Width)

Rounding old school w/ vectors

Round all at the same time because round() is vectorized.

iris <- round(iris[1:4])

Rounding example 5

rm(iris) # This removes the previous iris df that we made. 

iris %>%
  group_by(Species) %>%
  summarise(
    across( 
      .cols = starts_with("Sepal"), 
      .fns = ~ round(. , digits = 2))
    ) # The ~ introduces a custom round function the . references the data.

# A tibble: 150 × 3
# Groups:   Species [3]
   Species Sepal.Length Sepal.Width
   <fct>          <dbl>       <dbl>
 1 setosa           5.1         3.5
 2 setosa           4.9         3  
 3 setosa           4.7         3.2
 4 setosa           4.6         3.1
 5 setosa           5           3.6
 6 setosa           5.4         3.9
 7 setosa           4.6         3.4
 8 setosa           5           3.4
 9 setosa           4.4         2.9
10 setosa           4.9         3.1
# ℹ 140 more rows

What .fns?

Any aggregating function with the appropriate data type can be used.

mean, median, sd, round

Here’s a nice list just drop the parenthesis.

where()

We can drop the warning by specifying we only want the numeric variables.

iris |>
  summarise(
    across(
      where(is.numeric), 
      .fns = mean
    )
  )

  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.843333    3.057333        3.758    1.199333

`map()` for iteration.

It performs some operation on a data frame, vector or list.

map() returns a list.

map_df() returns a data frame.

iris |>
  map_df(.f = mean)

# A tibble: 1 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>   <dbl>
1         5.84        3.06         3.76        1.20      NA

Different maps()

map() returns a list.

map_dfc() returns a dataframe with columns

map_dfr() returns a dataframe with rows.

`map_dfc()`

iris |>
  map_dfc(.f = mean)

# A tibble: 1 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>   <dbl>
1         5.84        3.06         3.76        1.20      NA

Conclusion

Try not to use a for loop unless you really need to.

iteration

The for loop

Ways to make a vector in R

Example for() loop

Computing multiple powers.

Solution

R is a vectorized language

Many functions are vectorized

Checking the time

Some vectorized operations and functions

Make a vectorized function.

Solution

Aggregating functions

Vectorize exisiting functions

across() and map()

Example: iris

Average with across

Again with group_by()

Let’s consider rounding

Rounding with across()

Rounding without across()

Rounding old school R

Rounding old school w/ vectors

Rounding example 5

What .fns?

where()

map() for iteration.

Different maps()

map_dfc()

Conclusion

`across()` and `map()`

Example: `iris`

Rounding with `across()`

Rounding without `across()`

`map()` for iteration.

`map_dfc()`