The Verbs

Nic Schwab

dplyr lives in the tidyverse

SDS 100 Lab 4

You’ve learned about these verbs:

select()
- subsets columns
filter()
- subsets rows
mutate()
- creates new variables (or columns)
arrange() / arrange(desc())

We will also use:

group_by()
- to group the data before summarizing
summarize()
- to summarize data
rename()
- to rename columns

Recall SQL Queries

SQL	dplyr
SELECT	select(), mutate(), summarise()
FROM	the data frame. data=
WHERE	filter()
ORDER BY	arrange()
LIMIT	slice_head()
GROUP BY	group_by

The idea

You can get good at a few functions and do a lot.

The first argument is a data frame.

its a special kind called a tibble.

The output of the wrangling functions is a data frame.

When we wrangle we are not altering the original data.

It still exists.
You can start over.

What is a tibble?

Object of class tbl
Its different from a data frame.

Group by example

Live code with the mpg data frame.

Open R or Smith’s RStudio Server

Practice wrangling

In class MDSR Problems 1 and 2

 # Copy this code into a chunk in R to make the Random_subset data frame from problem 1 and 2.
 # Use the verbs we've discussed to make the subsets from the text.
 Random_subset <-  tibble::tribble(
     ~year,~sex,   ~name,         ~n, ~prop,
      2003, "M",     "Bilal",        146, 0.0000695,
      1999, "F",     "Terria",        23, 0.0000118,
      2010, "F",     "Naziyah",       45, 0.0000230,
      1989, "F",     "Shawana",       41, 0.0000206,
      1989, "F",     "Jessi",        210, 0.000105,
      1928, "M",     "Tillman",       43, 0.0000377,
      1981, "F",     "Leslee",        83, 0.0000464,
      1981, "F",     "Sherise",       27, 0.0000151,
      1920, "F",     "Marquerite",    26, 0.0000209,
      1941, "M",     "Lorraine",      24, 0.0000191
   )