Linear Regression part 2

Schwab

Cats

Cat’s hearts

Cats’ hearts

You stole Meow Heart

A big heart may cause problems

Finding the size of a cat’s heart is hard.

Estimate with a model a model.

Data from the 40s

This data cites this source, which you may be inclined to look over.

Code
library(tidyverse)

#install.packages("MASS")
#library(MASS)
cats <- read_csv("cats.csv")

Look at data

?cats
No documentation for 'cats' in specified packages and libraries:
you could try '??cats'
tail(cats,6)
# A tibble: 6 × 3
  Sex     Bwt   Hwt
  <chr> <dbl> <dbl>
1 M       3.6  15  
2 M       3.7  11  
3 M       3.8  14.8
4 M       3.8  16.8
5 M       3.9  14.4
6 M       3.9  20.5

Which variable is which?

Which should be the predictor and response variables?

Plot

Describe Trend

  • Shape

  • Direction

  • Strength

Add a regression line

and some labels.

ggplot(data = cats, aes(x = Bwt, y = Hwt))+
  geom_point(aes(color=Sex) )+
  geom_smooth(method = lm, se = FALSE) +
  xlab("Body Weight in Kg") +
  ylab("Heart Weight in Grams")

Find correlation coefficient

summarize(.data= cats , r = cor(Bwt, Hwt, use="complete.obs"))
# A tibble: 1 × 1
      r
  <dbl>
1 0.804

Find the coefficents

#install.packages("broom")
library(broom)
cat_model <- lm(Hwt ~ Bwt, data = cats)
tidy(cat_model)
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   -0.357     0.692    -0.515 6.07e- 1
2 Bwt            4.03      0.250    16.1   6.97e-34

The Equation

Intercept is the y-intercept of our linear model (\(\beta_0 = -0.36\)).

The number next to Bwt is the slope (\(\beta_1 = 4.03\)).

The equations is: \[ \hat{y} = \beta_0 + \beta_1 x \\ \text{or} \\ \widehat{Hwt} = -0.36 + 4.03 \times Bwt \]

Summary

Stolen from Prof Kurtz Garcia at Smith College

Interpretation

Slope: For every kilogram increase in body weight we expect a 4.03 gram increase in the heart weight.

Intercept: If we found a cat they weighed nothing its heart would be -0.36 grams?

More Complete outputs

# Alternative to tidy()
summary(cat_model)

Call:
lm(formula = Hwt ~ Bwt, data = cats)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5694 -0.9634 -0.0921  1.0426  5.1238 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.3567     0.6923  -0.515    0.607    
Bwt           4.0341     0.2503  16.119   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.452 on 142 degrees of freedom
Multiple R-squared:  0.6466,    Adjusted R-squared:  0.6441 
F-statistic: 259.8 on 1 and 142 DF,  p-value: < 2.2e-16

You try:

  • If a cat weighs 3.5 kg how large a heart would we expect?

  • Can we estimate the heart size for a 5kg cat?

    • Caution against extrapolation.

Solution:

The equation for the line is

$ = -0.36 + 4.03 Bwt $

Plug in 3.5 we get

\(\widehat{Hwt} = -0.36 + 4.03 \times 3.5 \\ \widehat{Hwt} = 13.745 \approx 13.7\)

Residuals

Residuals are the difference in the observed values and the predicted value.

\[e = y - \hat{y}\]

You try: calculate a residual

There is a cat in this data that has a heart weight of 11.7 grams and a body weight of 3.5 kilograms. Calculate the residual for this cat.

Solution:

The residual is calculated \[e = y - \hat{y}\]

We previously found the value of \(\hat{y} = 13.7\) and we are told \(y=11.7\) So we need to subtract those two values to get

\[e = y - \hat{y}\\ e = 11.7-13.7\\ e = -2\]

Solution with R functions

predict(object =  cat_model, 
        newdata = data.frame(Bwt = 3.5)
        )
       1 
13.76256 

Citations

Street cats from https://petlifesa.com/wp-content/uploads/2019/08/SA0057-Petlifesa-health-conditions-diseases-heart-disease-facts-about-your-cats-heart-Header-FA.jpg

You stole Meow Heart https://drbillspetnutrition.com/feline-heart-conditions/