Call:
lm(formula = calories ~ fat, data = starbucks)
Residuals:
Min 1Q Median 3Q Max
-132.599 -44.130 3.469 54.868 126.134
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 183.734 17.277 10.63 < 2e-16 ***
fat 11.267 1.117 10.09 1.32e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 69.1 on 75 degrees of freedom
Multiple R-squared: 0.5756, Adjusted R-squared: 0.5699
F-statistic: 101.7 on 1 and 75 DF, p-value: 1.32e-15
Make a model
Trying to figure out if the model is reasonable
A hypothesis test to see if the slope is a number other than zero.
recall: \(y= \beta_0 + \beta_1 x\)
\(\beta_1\) is the slope.
If the slope is zero there is no relationship between y and x.
Linearity* data has to be linear
Data has to be independent
nearly normal residuals
constant or equal variability
Independence problems
\[ y=\beta_0+\beta_1 x + e \]
\(\beta_0\) = intercept
\(\beta_1\) = slope
x is the predictor variable
y is the response variable
e is the error
If there is no relationship the slope is 0
If there is a relationship the slope is not zero.
We do
\[ H_0: \beta_1 = 0 \\ H_1: \beta_1 \ne 0 \]
(We could do other tests on r - the correlation coefficient or \(\beta_0\) )
The distribution of the slopes of an infinite number of samples would be a student t.
So we are doing a t test with this test statistic:
\(T = \frac{\hat{\beta}_1-0} {\text{SE}}\)
df= n-2
We’ll let R calculate \(SE\) and \(\hat{\beta_1}\).
\[ H_0: \beta_1 = 0 \\ H_1: \beta_1 \ne 0 \]
Call:
lm(formula = boys ~ girls, data = arbuthnot)
Residuals:
Min 1Q Median 3Q Max
-363.30 -110.25 -6.42 97.44 356.63
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 184.77244 59.76823 3.091 0.00274 **
girls 1.03391 0.01038 99.578 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 148.8 on 80 degrees of freedom
Multiple R-squared: 0.992, Adjusted R-squared: 0.9919
F-statistic: 9916 on 1 and 80 DF, p-value: < 2.2e-16
\(r\) - correlation coefficient
\(r^2\) - coefficient of determination
\(r^2\) is the proportion of the variability that can be explained by the explanatory variable