prop.test()
There is also a prop.test()
function that will test a proportion hypothesis from data based on mathematical formula (as opposed to bootstrapping).
Another hypothesis test.
Conditions
Distributions
New Standard Error
New df
Independence Extended
Normality (no extreme outliers)
with no computer
t - distribution with the smaller of \(n_1-1\) and \(n_2-1\) degrees of freedom.
with R
t - distribution, let R find the degrees of freedom with t.test()
.
\[ SE = \sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}\\ \text{or}\\ SE = \sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}} \] We rarely know \(\sigma\) so the second standard error is more often used.
\[ \text{Test Statisitc} = \frac{\bar{x}_1 - \bar{x}_2 -0}{SE } \]
(It looks just like the difference of proportions, but with means)
Check independence extended.
No extreme outliers.
\(\bar{x}_1 - \bar{x}_2 \pm t^* SE\)
We are 99% sure that the true difference in mpg is between 3.6 and 10.8.
Conditions are checked and values are input.
Notation:
\[ H_o: \mu_1 = \mu_2\\ H_a: \mu_1 \ne \mu_2\\ \alpha = 0.05 \]
Test statistic:
\[ T = \frac{\bar{x}_1-\bar{x}_2 - 0 }{SE} \]
With a pvalue close to zero we have strong evidence to reject the null hypothesis in favor of the alternative. It seems that the average fuel economy of automatic vs manual cars is different.
It is possible we have made a type 1 error.
Is there a difference in the absenteeism of students from New South Wales based on reported gender?
absenteeism
check conditions, make boxplot
write hypothesis
use t.test( y\~x, alternative= "t", data=)
We should first check out absenteeism. ?absenteeism
Independence between and within groups.
check for outliers:
Those outliers seem pretty extreme. We should probably not do this test with a math model, like t.test()
\[ H_o: \mu_1 = \mu_2\\ H_a: \mu_1 \ne \mu_2\\ \alpha = 0.05 \]
Welch Two Sample t-test
data: absenteeism$days by absenteeism$sex
t = -1.0058, df = 136.35, p-value = 0.3163
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
-8.096135 2.637044
sample estimates:
mean in group F mean in group M
15.22500 17.95455
Is there a difference in the absenteeism of students from New South Wales based on ethnicity?
Independence
Outliers seem to still be a problem
\[ H_o: \mu_1 = \mu_2\\ H_a: \mu_1 \ne \mu_2\\ \alpha = 0.05 \]
Welch Two Sample t-test
data: absenteeism$days by absenteeism$eth
t = 3.4358, df = 126.85, p-value = 0.0007991
alternative hypothesis: true difference in means between group A and group N is not equal to 0
95 percent confidence interval:
3.837747 14.262384
sample estimates:
mean in group A mean in group N
21.23188 12.18182
The observed difference is \(21.2 - 12.2 \approx 9\)
# A tibble: 1 × 1
p_value
<dbl>
1 0.0004