Tidy data and pivots

Schwab

Tidydata

First Job

You need to get data into machine readable format. What do you do?

pivot_longer()

pivot_wider()

What do these functions do?

Some built in tables

pivot_longer()

This data is not tidy how can we make it longer for computers?

# A tibble: 3 × 5
  Country        `1979` `1989` `1999` `2009`
  <chr>           <dbl>  <dbl>  <dbl>  <dbl>
1 France        NA          NA    0.3    0.4
2 South Africa  NA          NA   14.8   17.2
3 United States  0.0318     NA    0.5    0.6

Write down how you think we could do that with pivot_longer()

pivot_wider()

This data is tidy how could we make it wider for humans?

# A tibble: 6 × 3
# Groups:   name [3]
  name   sex    total
  <chr>  <chr>  <int>
1 Leslie F     266474
2 Leslie M     112689
3 Robin  F     289395
4 Robin  M      44616
5 Sue    F     144465
6 Sue    M        519

Write down how you think we could do that with pivot_wider()

pivot_longer()

Let’s try our code on the examples above. Pivot this longer.

library(tidyverse)
library(mdsr)
library(googlesheets4)
gs4_deauth()
hiv_key <- "1kWH_xdJDM4SMfT_Kzpkk-1yuxWChfurZuWYjfmv51EA"
hiv <- read_sheet(hiv_key) |>
  rename(Country = 1) |>
  filter(
    Country %in% c("United States", "France", "South Africa")
  ) |>
  select(Country, `1979`, `1989`, `1999`, `2009`) |>
  unnest(cols = c(`2009`)) |>
  mutate(across(matches("[0-9]"), as.double))
hiv
# A tibble: 3 × 5
  Country        `1979` `1989` `1999` `2009`
  <chr>           <dbl>  <dbl>  <dbl>  <dbl>
1 France        NA          NA    0.3    0.4
2 South Africa  NA          NA   14.8   17.2
3 United States  0.0318     NA    0.5    0.6

pivot_wider()

library(babynames)
babynames |>
     filter(name %in% c("Sue", "Robin", "Leslie")) |>
     group_by(name, sex) |>
     summarize(total = sum(n))
# A tibble: 6 × 3
# Groups:   name [3]
  name   sex    total
  <chr>  <chr>  <int>
1 Leslie F     266474
2 Leslie M     112689
3 Robin  F     289395
4 Robin  M      44616
5 Sue    F     144465
6 Sue    M        519

Not Tidy -> Tidy

This table is not tidy can you make it so?

# A tibble: 3 × 3
  country         `1999`     `2000`
  <chr>            <dbl>      <dbl>
1 Afghanistan   19987071   20595360
2 Brazil       172006362  174504898
3 China       1272915272 1280428583

More practice

This dataframe is tidy but can you make it easier for a human to read?

# A tibble: 19 × 12
   fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW
   <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int> <int>
 1 4842        1     1      1     1       1     1     1     1     1     1     1
 2 4843        1     1      1     1       1     1     1     1     1     1     1
 3 4844        1     1      1     1       1     1     1     1     1     1     1
 4 4845        1     1      1     1       1    NA    NA    NA    NA    NA    NA
 5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA
 6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA    NA
 7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
 8 4850        1     1     NA     1       1     1     1    NA    NA    NA    NA
 9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
11 4855        1     1      1     1       1    NA    NA    NA    NA    NA    NA
12 4857        1     1      1     1       1     1     1     1     1    NA    NA
13 4858        1     1      1     1       1     1     1     1     1     1     1
14 4859        1     1      1     1       1    NA    NA    NA    NA    NA    NA
15 4861        1     1      1     1       1     1     1     1     1     1     1
16 4862        1     1      1     1       1     1     1     1     1    NA    NA
17 4863        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
18 4864        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
19 4865        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA

consider some tables

Are these tables tidy, do we need to pivot wider or longer?

Reading

Read Chapter 6.

Futher reading on tidy data