# A tibble: 2 × 2
quarantine n
<fct> <int>
1 against 188
2 favor 854
Pretend: We are studying how New Yorkers feel about a mandatory 14 day quarantine for Ebola exposure.
We ask: do you favor a “mandatory 14-day quarantine for anyone who has come in contact with an Ebola patient?”
Our question: We want to know how New Yorkers feel on quarantine of Ebola after the COVID-19 pandemic.
We have past data to compare to from 2014. Its called ebola_survey
and is our best guess as how New Yorkers felt about mandatory quarantine pre pandemic. This will be our assumed \(p\).
# A tibble: 2 × 2
quarantine n
<fct> <int>
1 against 188
2 favor 854
\(p= 854/1042 \approx 0.82\) This is a reasonable estimate for p because we have not other data to the contrary.
We ask 1000 people if they favor a mandatory 14 day quarantine for individuals that have been in contact with Ebola.
Here are the results.
# A tibble: 1 × 2
against favor
<dbl> <dbl>
1 486 514
And so \(\hat{p} = 514/1000 = 0.514\) Which seems very different from \(p=0.82\)
\(\hat{p}\) - the sample statistic. It is a proportion in this case.
p - The assumed population parameter, also a proportion.
If the sentiment from 2014 New Yorkers is unchanged today, what is the probability we would have gotten \(\hat{p} = 0.514\) based on the 2014 sample?
Is it likely or unlikely
Likely will be anything more than \(\alpha = 0.05\)
If less than 0.05, we reject \(H_0\)
Hypothesis testing with Proportions
\[ H_0: p = 0.82 \\ H_A: p \ne 0.82 \]
Null and Alternative hypotheses.
We assume the null hypothesis is true and build a theoretical sampling distribution from that.
This is what allows us to build the sampling distribution.
If we look at a proportion and the scenario satisfies certain conditions, then the distribution of sample proportions will appear to follow a bell-shaped curve called the normal distribution.
This is the point of the recent R Assignment. If we could sample repeatedly from a population, the shape from the statistics in those samples would be normal.
Proportions:
Success Failure condition
10 successes or \(n(p_0) > 10\)
10 failures or \(n(1-p_0) > 10\)
Large Independent Samples (n>30)
\[ p \sim N(p_0,SE)\]
With \[SE= \sqrt{\frac{p_0(1-p_0)}{n}}\]
note that n is the sample size from our current study.
What is the probability of getting the \(\hat{p}\) from our sample?
With a p-value of zero it is extremely unlikely that we would have gotten 0.514 if the true distribution was centered at 0.82. So the distribution is likely not really centered around 0.82 and so we reject the null hypothesis that p=0.82.
So the true proportion of New Yorkers that believe there should be a mandatory 14 day quarantine is not 0.82, but some other number.
Have a majority of college students had more than 1 exclusive_relationship
?
# A tibble: 3 × 2
`num > 1` n
<lgl> <int>
1 FALSE 51
2 TRUE 152
3 NA 15
Check Conditions.
Do the test with math.
Do the test with simulation.
We are testing against 50%. We’ve already checked the conditions for the mathematical test. Now we:
\[ H_0: p = 0.5 \\ H_A: p > 0.5 \]
\[ \alpha = 0.05 \]
We have \(p =0.5\) We need to find the standard error.
\[ SE = \sqrt{\frac{(p)(1-p)}{n}}=\sqrt{\frac{(0.5)(0.5)}{203}} = 0.03509312 \]
\[ p \sim N(p = 0.5, SE = 0.035) \]
If the distribution from the last slide is correct what is the probability we would find \(\hat{p}=0.749\) from our sample.
[1] 6.763449e-13
We get 6.763449e-13 (0.0000000000006), which is much smaller than 0.05. We reject the null hypothesis and state that its very likely that a majority of students have been in more than one exclusive relationship.
We could have made a type 1 error, and rejected the null hypothesis if it were actually true.
More on decision errors later.