Local vs. Global variables
x and df are global. They are referred to as environment variables.
df#x and df$y are local to the dataframe. They are often referred to as stats-variables.
We don’t use the dollar sign operator often, because we use the tidyverse which makes it unnecessary.
$
selects a variable.
We can select a variable with tidyr from the tidyverse.
This makes things easier to read and less redundant.
To print out 5 we need to tell R where to find the variable 5.
my_printer() looks through the environment first.
So my_printer(x)=10 and not 5.
The tidyverse uses a process called masking to blur the lines between environment and data variables.
This is why we like it so much.
However it makes programming with the tidyverse functions more challenging.
The code below works as expected because tidyr is working under the hood to select x from df.
We get into problems when using tidyverse functions within homemade functions:
We’re still getting 10 in our output! It should be 5, because that is the value from the data frame.
my_mean_maker() goes to the global environment because it doesn’t recognize that it should be looking in the data frame.
We can check to see if tidyr is working under the hood by checking the arguments of a function.
We are looking for the words “data-masking” or “tidy-select”.
If these words are not present we will not need to inject the data (aka unmask the data).
We need to inject the data into the sumarize function. We can do this by unmasking the data with {{}}.
This tells R to look for the variable within the dataframe, not the global environment.
Make a function that will calculate the mean, median, and standard deviation of any variable from a data frame using tidyverse functions.
Use filter( !is.na() )
or na.rm = TRUE
to remove missing variables.
Test your function on the starwars
data height
variable.
Make a function that makes a bargraph of a categorical variable.
Test your function on the starwars
eye_color
variable.