Tidy data and pivots

Schwab

Tidydata

Tidy data problems

Tidy data is not always in a human readable format.

It is easy for computers to read but tricky for people.

Data is in a format that is easy for people for tricky for computers.

pivot_*()

When you want to make data readable for machines you’ll often pivot_longer().

pivot_longer()

When you want to make it more human readable you’ll pivot wider.

pivot_wider()

These are new functions we should check the docs.

`pivot_longer()`

This data is not tidy how can we make it longer for computers?

Code

gs4_deauth()
hiv_key <- "1kWH_xdJDM4SMfT_Kzpkk-1yuxWChfurZuWYjfmv51EA"
hiv <- read_sheet(hiv_key) |>
  rename(Country = 1) |>
  filter(
    Country %in% c("United States", "France", "South Africa")
  ) |>
  select(Country, `1979`, `1989`, `1999`, `2009`) |>
  unnest(cols = c(`2009`)) |>
  mutate(across(matches("[0-9]"), as.double))
hiv

# A tibble: 3 × 5
  Country        `1979` `1989` `1999` `2009`
  <chr>           <dbl>  <dbl>  <dbl>  <dbl>
1 France        NA          NA    0.3    0.4
2 South Africa  NA          NA   14.8   17.2
3 United States  0.0318     NA    0.5    0.6

How you think we could do that with pivot_longer()

`pivot_longer()`

Solution to previous slide

Code

hiv |>
  pivot_longer(cols = `1979`:`2009`, names_to = "year", values_to = "hiv_rate")

# A tibble: 12 × 3
   Country       year  hiv_rate
   <chr>         <chr>    <dbl>
 1 France        1979   NA     
 2 France        1989   NA     
 3 France        1999    0.3   
 4 France        2009    0.4   
 5 South Africa  1979   NA     
 6 South Africa  1989   NA     
 7 South Africa  1999   14.8   
 8 South Africa  2009   17.2   
 9 United States 1979    0.0318
10 United States 1989   NA     
11 United States 1999    0.5   
12 United States 2009    0.6

`pivot_wider()`

This data is tidy how could we make it wider for humans?

library(babynames)
babynames |>
     filter(name %in% c("Sue", "Robin", "Leslie")) |>
     group_by(name, sex) |>
     summarize(total = sum(n))

# A tibble: 6 × 3
# Groups:   name [3]
  name   sex    total
  <chr>  <chr>  <int>
1 Leslie F     266474
2 Leslie M     112689
3 Robin  F     289395
4 Robin  M      44616
5 Sue    F     144465
6 Sue    M        519

How you think we could do that with pivot_wider()

`pivot_wider()`

Solution to the previous slide.

Code

library(babynames)
babynames |>
     filter(name %in% c("Sue", "Robin", "Leslie")) |>
     group_by(name, sex) |>
     summarize(total = sum(n))|>
     pivot_wider(names_from = sex,values_from = total)

# A tibble: 3 × 3
# Groups:   name [3]
  name        F      M
  <chr>   <int>  <int>
1 Leslie 266474 112689
2 Robin  289395  44616
3 Sue    144465    519

fish_encounters

This data is tidy try to make it wider.

fish_encounters

# A tibble: 114 × 3
   fish  station  seen
   <fct> <fct>   <int>
 1 4842  Release     1
 2 4842  I80_1       1
 3 4842  Lisbon      1
 4 4842  Rstr        1
 5 4842  Base_TD     1
 6 4842  BCE         1
 7 4842  BCW         1
 8 4842  BCE2        1
 9 4842  BCW2        1
10 4842  MAE         1
# ℹ 104 more rows

fish_encounters 2

The solution to the previous slide.

fish_encounters |>
  pivot_wider(names_from = station,values_from = "seen")

# A tibble: 19 × 12
   fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW
   <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int> <int>
 1 4842        1     1      1     1       1     1     1     1     1     1     1
 2 4843        1     1      1     1       1     1     1     1     1     1     1
 3 4844        1     1      1     1       1     1     1     1     1     1     1
 4 4845        1     1      1     1       1    NA    NA    NA    NA    NA    NA
 5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA
 6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA    NA
 7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
 8 4850        1     1     NA     1       1     1     1    NA    NA    NA    NA
 9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
11 4855        1     1      1     1       1    NA    NA    NA    NA    NA    NA
12 4857        1     1      1     1       1     1     1     1     1    NA    NA
13 4858        1     1      1     1       1     1     1     1     1     1     1
14 4859        1     1      1     1       1    NA    NA    NA    NA    NA    NA
15 4861        1     1      1     1       1     1     1     1     1     1     1
16 4862        1     1      1     1       1     1     1     1     1    NA    NA
17 4863        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
18 4864        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
19 4865        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA

Bonus: the separate() function

If your values have a common symbol and they should be put into different column that can be done with separate().

We’ll see some examples in the next several problems.

Not Tidy -> Tidy

Are these tables tidy, do we need to pivot wider or longer?

table1,
table2,
table3,
table4a,
table4b,
table5

solutions posted in video

Reading

Read Chapter 6.

Futher reading on tidy data