Read chapter 17 on inference with two proportions.
Maybe we want to directly compare two proportions.
Consider a difference of proportions.
Confidence Interval
Hypothesis Test
Is there a difference between the two proportions?
The following is from chapter 17 in the text:
The difference \(\hat{p}_1−\hat{p}_2\) can be modeled using a normal distribution when
Independence Extended: The data are independent within and between the two groups.
Success-failure condition. The success-failure condition holds for both groups, where we check successes and failures in each group separately.
\(p_1−p_2 \sim N(\hat{p}_1−\hat{p}_2,\sqrt{\frac{\hat{p}_1(1−\hat{p}_1)}{n_1} +\frac{\hat{p}_2(1−\hat{p}_2)}{n_2}})\)
mean = \(\hat{p}_1−\hat{p}_2\)
Standard Error = \(\sqrt{\frac{\hat{p}_1(1−\hat{p}_1)}{n_1} +\frac{\hat{p}_2(1−\hat{p}_2)}{n_2}}\)
We want to try to capture the true difference in proportions.
\[ \hat{p}_1−\hat{p}_2 \pm z_{\frac{\alpha}{2}} \times SE \]
We expect 95% of the CIs to capture the true difference.
Check conditions first:
Independence Extended
Success-failure condition
\(\hat{p}_1 = \frac{11}{50} = 0.22\)
\(\hat{p}_2 = \frac{14}{40} \approx 0.35\)
\(\hat{p_2} - \hat{p_1} = 0.13\)
\(SE = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \approx 0.095\)
\(Z = 1.96\)
\({p}_1−{p}_2 \sim N(0.13,0.095)\)
We are 95 % confident that the true difference in the proportion of people who survived using the two treatments is between -0.06 and 0.32. Note that zero is in this interval. It is possible that there is no difference in the survival rates.
We’ll use the same data above
\[ H_0: p_1 = p_2 \\ H_a:p_1 \ne p_2 \]
\[ \alpha = 0.05 \]
Check conditions:
Independence Extended
Success-failure condition
There is one difference when doing the hypothesis test. We use a pooled statistic when calculating the standard error.
The book uses \(\widehat{p}_{pool} = \frac{\text{total successes from both groups}}{\text{total from both groups}}\)
If conditions with the pooled group then
\(\hat{p}_1−\hat{p}_2 \sim N({p}_1−{p}_2,\sqrt{\hat{p}_{pool}(1−\hat{p}_{pool})(\frac{1}{n_1} +\frac{1}{n_2}}))\)
\(\widehat{p}_{pool} = \frac{25}{90}\)
\(SE = \sqrt{\widehat{p}_{pool}(1-\widehat{p}_{pool})(1/n_1+1/n_2)}\)
Note that the distribution is different, because in a hypothesis test we assume the center is 0.
\({p}_1−{p}_2 \sim N(0,0.095)\)
If the distribution is centered around the null parameter (eg 0, because we assume there is no difference) how likely is it that we would get 0.13 for a difference?
Our pvalue is greater than 0.05 so we fail to reject the null hypothesis and conclude that there is no difference in survival rates between the treatment and control groups.
Note: It is possible we have made a Type 2 error in failing to reject the null even when the null was false.