Tidy data and pivots

Author

Schwab

Tidydata

I have heard about

pivot_longer()

pivot_wider()

What do these functions do?

Some built in tables

pivot_longer()

This data is not tidy how can we make it longer for computers?

✔ Reading from "indicator hiv estimated prevalence% 15-49".
✔ Range 'Data'.
# A tibble: 3 × 5
  Country        `1979` `1989` `1999` `2009`
  <chr>           <dbl>  <dbl>  <dbl>  <dbl>
1 France        NA          NA    0.3    0.4
2 South Africa  NA          NA   14.8   17.2
3 United States  0.0318     NA    0.5    0.6

Write down how you think we could do that with pivot_longer()

pivot_wider()

This data is tidy how could we make it wider for humans?

`summarise()` has grouped output by 'name'. You can override using the
`.groups` argument.
# A tibble: 6 × 3
# Groups:   name [3]
  name   sex    total
  <chr>  <chr>  <int>
1 Leslie F     266474
2 Leslie M     112689
3 Robin  F     289395
4 Robin  M      44616
5 Sue    F     144465
6 Sue    M        519

Write down how you think we could do that with pivot_wider()

pivot_longer()

Let’s try our code on the examples above. Pivot this longer.

library(tidyverse)
library(mdsr)
library(googlesheets4)
gs4_deauth()
hiv_key <- "1kWH_xdJDM4SMfT_Kzpkk-1yuxWChfurZuWYjfmv51EA"
hiv <- read_sheet(hiv_key) |>
  rename(Country = 1) |>
  filter(
    Country %in% c("United States", "France", "South Africa")
  ) |>
  select(Country, `1979`, `1989`, `1999`, `2009`) |>
  unnest(cols = c(`2009`)) |>
  mutate(across(matches("[0-9]"), as.double))
✔ Reading from "indicator hiv estimated prevalence% 15-49".
✔ Range 'Data'.
hiv
# A tibble: 3 × 5
  Country        `1979` `1989` `1999` `2009`
  <chr>           <dbl>  <dbl>  <dbl>  <dbl>
1 France        NA          NA    0.3    0.4
2 South Africa  NA          NA   14.8   17.2
3 United States  0.0318     NA    0.5    0.6

pivot_wider()

library(babynames)
babynames |>
     filter(name %in% c("Sue", "Robin", "Leslie")) |>
     group_by(name, sex) |>
     summarize(total = sum(n))
`summarise()` has grouped output by 'name'. You can override using the
`.groups` argument.
# A tibble: 6 × 3
# Groups:   name [3]
  name   sex    total
  <chr>  <chr>  <int>
1 Leslie F     266474
2 Leslie M     112689
3 Robin  F     289395
4 Robin  M      44616
5 Sue    F     144465
6 Sue    M        519

Try this one

This table is not tidy can you make it so?

#|echo: true
#|
table4b 
# A tibble: 3 × 3
  country         `1999`     `2000`
  <chr>            <dbl>      <dbl>
1 Afghanistan   19987071   20595360
2 Brazil       172006362  174504898
3 China       1272915272 1280428583

More practice

This dataframe is tidy but can you make it easier for a human to read?

fish_encounters |>
  pivot_wider(names_from = station,values_from = "seen")
# A tibble: 19 × 12
   fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW
   <fct>   <int> <int>  <int> <int>   <int> <int> <int> <int> <int> <int> <int>
 1 4842        1     1      1     1       1     1     1     1     1     1     1
 2 4843        1     1      1     1       1     1     1     1     1     1     1
 3 4844        1     1      1     1       1     1     1     1     1     1     1
 4 4845        1     1      1     1       1    NA    NA    NA    NA    NA    NA
 5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA
 6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA    NA
 7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
 8 4850        1     1     NA     1       1     1     1    NA    NA    NA    NA
 9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
11 4855        1     1      1     1       1    NA    NA    NA    NA    NA    NA
12 4857        1     1      1     1       1     1     1     1     1    NA    NA
13 4858        1     1      1     1       1     1     1     1     1     1     1
14 4859        1     1      1     1       1    NA    NA    NA    NA    NA    NA
15 4861        1     1      1     1       1     1     1     1     1     1     1
16 4862        1     1      1     1       1     1     1     1     1    NA    NA
17 4863        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
18 4864        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA
19 4865        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA

tidy, clean, join.

world_bank_pop is loaded with the tidyverse.

  1. Make world_bank_pop tidy.

  2. Use clean_names() to clean up country_info.

  3. Join world_bank_pop with the country_info data (loaded below) to add names of countries. (Hint: You only need to country_codes and the english version of the name.)

country_info <- read_csv("https://raw.githubusercontent.com/datasets/country-codes/master/data/country-codes.csv")
Rows: 250 Columns: 56
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (49): FIFA, Dial, ISO3166-1-Alpha-3, MARC, is_independent, ISO3166-1-num...
dbl  (4): Intermediate Region Code, Sub-region Code, Region Code, Geoname ID
num  (2): GAUL, ISO4217-currency_minor_unit
lgl  (1): Global Code

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

consider some tables

Are these tables tidy, do we need to pivot wider or longer?

Reading

Read Chapter 6.

Futher reading on tidy data