[1] "a"
[1] "b"
[1] "c"
[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
[1] "k"
[1] "l"
[1] "m"
[1] "n"
[1] "o"
[1] "p"
[1] "q"
[1] "r"
[1] "s"
[1] "t"
[1] "u"
[1] "v"
[1] "w"
[1] "x"
[1] "y"
[1] "z"
In most computer programming languages you can write a for loop.
In R we try not to do this (although we can).
1:100
makes a vector of the number 1 through 100 in R.
c("a","b")
makes a vector with “a” and “b” in it
vector()
also makes a vector.
df$column
makes a vector of a data frame column.
Remember vectors are sets of values of the same type.
Write a for loop that computes e^1, e^2,… e^100 and prints them.
Or better yet stores them as a vector (this is a little harder).
Here’s a start:
e = 2.71828
vector_o_numbers <- vector(mode = "numeric", length = 100) #Making the loop fast by making an empty vector first.
for(exponent in 1:100){
print(e^exponent) # print the number
vector_o_numbers[exponent] <- exp(exponent) # store the number
# we don't need to return() in R, b/c the last line is always returned.
}
[1] 2.71828
[1] 7.389046
[1] 20.0855
[1] 54.598
[1] 148.4127
[1] 403.4272
[1] 1096.628
[1] 2980.942
[1] 8103.035
[1] 22026.32
[1] 59873.7
[1] 162753.5
[1] 442409.5
[1] 1202593
[1] 3268984
[1] 8886015
[1] 24154677
[1] 65659174
[1] 178480020
[1] 485158668
[1] 1318797105
[1] 3584859796
[1] 9744652685
[1] 26488694502
[1] 72003688490
[1] 195726186349
[1] 532038577830
[1] 1.44623e+12
[1] 3.931258e+12
[1] 1.068626e+13
[1] 2.904824e+13
[1] 7.896126e+13
[1] 2.146388e+14
[1] 5.834484e+14
[1] 1.585976e+15
[1] 4.311127e+15
[1] 1.171885e+16
[1] 3.185512e+16
[1] 8.659113e+16
[1] 2.353789e+17
[1] 6.398258e+17
[1] 1.739226e+18
[1] 4.727703e+18
[1] 1.285122e+19
[1] 3.493321e+19
[1] 9.495826e+19
[1] 2.581231e+20
[1] 7.016509e+20
[1] 1.907284e+21
[1] 5.184531e+21
[1] 1.409301e+22
[1] 3.830874e+22
[1] 1.041339e+23
[1] 2.83065e+23
[1] 7.694501e+23
[1] 2.091581e+24
[1] 5.685502e+24
[1] 1.545479e+25
[1] 4.201044e+25
[1] 1.141961e+26
[1] 3.104171e+26
[1] 8.438005e+26
[1] 2.293686e+27
[1] 6.234881e+27
[1] 1.694815e+28
[1] 4.606982e+28
[1] 1.252307e+29
[1] 3.40412e+29
[1] 9.253352e+29
[1] 2.51532e+30
[1] 6.837345e+30
[1] 1.858582e+31
[1] 5.052146e+31
[1] 1.373315e+32
[1] 3.733054e+32
[1] 1.014749e+33
[1] 2.758371e+33
[1] 7.498024e+33
[1] 2.038173e+34
[1] 5.540324e+34
[1] 1.506015e+35
[1] 4.093771e+35
[1] 1.112802e+36
[1] 3.024906e+36
[1] 8.222543e+36
[1] 2.235117e+37
[1] 6.075675e+37
[1] 1.651538e+38
[1] 4.489344e+38
[1] 1.220329e+39
[1] 3.317197e+39
[1] 9.01707e+39
[1] 2.451092e+40
[1] 6.662755e+40
[1] 1.811123e+41
[1] 4.92314e+41
[1] 1.338247e+42
[1] 3.637731e+42
[1] 9.888372e+42
[1] 2.687936e+43
This means most created objects are vectors.
Traditionally the word “hi” is a character variable or “string”.
R makes it a vector with a string in it.
[1] TRUE
[1] 1
[1] TRUE
[1] TRUE
We can take advantage of this vectorized nature.
This means they take a vector as an input and give a vector as an output.
If you are interested you can check to see which is faster with system.time()
[1] "With a for loop the time is:"
user system elapsed
0.001 0.000 0.002
[1] "Using vectorized operations the time is: "
user system elapsed
0 0 0
Basic Operations (+, -, *, /, //, %/%)
Base R functions: nchar()
, is.na()
, is.numeric()
Comparison operators >, <, ==
The conversion between Celsius and Fahrenheit is given by this formula: \[ F = C \times \frac{9}{5}+32\]
Make a function that does this conversion.
Use the function to compute all temps for C between 0 and 100.
These are functions that take a vector and give one output.
Examples: mean()
, sd()
, median()
, sum()
, n()
They are not vectorized functions.
We can vectorize existing functions with helper functions in R
Two general what of doing this are across()
and map()
across()
and map()
These functions allow us to preform the same operation across multiple rows.
map()
comes from the purr
package.
across()
comes from the dplyr
package.
iris
There are four columns in iris and I’d like to calculate the mean of each of them.
Let’s find the average of each column that ends with .Length
by species.
We’ll look at 5 methods of rounding.
I’ll also show you a couple of ways to select columns.
across()
Let’s say we want to round every value.
iris %>%
group_by(Species) %>%
summarise(
across(
.cols = Sepal.Length:Petal.Width, # R makes a vector of column names
.fns = round)
)
# A tibble: 150 × 5
# Groups: Species [3]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5 4 1 0
2 setosa 5 3 1 0
3 setosa 5 3 1 0
4 setosa 5 3 2 0
5 setosa 5 4 1 0
6 setosa 5 4 2 0
7 setosa 5 3 1 0
8 setosa 5 3 2 0
9 setosa 4 3 1 0
10 setosa 5 3 2 0
# ℹ 140 more rows
across()
and the tidyverse
iris %>%
group_by(Species) %>%
mutate(
Sepal.Length = round(Sepal.Length),
Sepal.Width = round(Sepal.Width),
Petal.Length = round(Petal.Length),
Petal.Width = round(Petal.Width)
)
# A tibble: 150 × 5
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5 4 1 0 setosa
2 5 3 1 0 setosa
3 5 3 1 0 setosa
4 5 3 2 0 setosa
5 5 4 1 0 setosa
6 5 4 2 0 setosa
7 5 3 1 0 setosa
8 5 3 2 0 setosa
9 4 3 1 0 setosa
10 5 3 2 0 setosa
# ℹ 140 more rows
Round all at the same time because round() is vectorized.
rm(iris) # This removes the previous iris df that we made.
iris %>%
group_by(Species) %>%
summarise(
across(
.cols = starts_with("Sepal"),
.fns = ~ round(. , digits = 2))
) # The ~ introduces a custom round function the . references the data.
# A tibble: 150 × 3
# Groups: Species [3]
Species Sepal.Length Sepal.Width
<fct> <dbl> <dbl>
1 setosa 5.1 3.5
2 setosa 4.9 3
3 setosa 4.7 3.2
4 setosa 4.6 3.1
5 setosa 5 3.6
6 setosa 5.4 3.9
7 setosa 4.6 3.4
8 setosa 5 3.4
9 setosa 4.4 2.9
10 setosa 4.9 3.1
# ℹ 140 more rows
Any aggregating function with the appropriate data type can be used.
mean
, median
, sd
, round
Here’s a nice list just drop the parenthesis.
We can drop the warning by specifying we only want the numeric variables.
map()
for iteration.It performs some operation on a data frame, vector or list.
map() returns a list.
map_df() returns a data frame.
map()
returns a list.
map_dfc()
returns a dataframe with columns
map_dfr()
returns a dataframe with rows.
map_dfc()
Try not to use a for loop unless you really need to.