class: language-r layout: true --- # Lab 5 — November 1 ```r library(tidyverse) ``` <PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Attaching packages</span> --------------------------------------- tidyverse 1.3.1 -- </CODE></PRE><PRE class="fansi fansi-message"><CODE>## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>ggplot2</span> 3.3.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>purrr </span> 0.3.4 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tibble </span> 3.1.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>dplyr </span> 1.0.9 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tidyr </span> 1.2.0 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>stringr</span> 1.4.0 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>readr </span> 2.1.2 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>forcats</span> 0.5.1 </CODE></PRE><PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Conflicts</span> ------------------------------------------ tidyverse_conflicts() -- ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter() ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag() </CODE></PRE> ```r library(broom) ``` --- # Reusing the data from last time ```r bears <- read_csv("./data/bears.csv") %>% rename(active_hunters = `Active Hunters`) %>% filter(WMU != "Total") %>% filter(WMU %in% c("82A","83","84","73","74","75","76")) bears ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 63 x 4</span> ## WMU Year active_hunters Harvest ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 73 <span style='text-decoration: underline;'>2</span>012 58 4 ## <span style='color: #555555;'> 2</span> 73 <span style='text-decoration: underline;'>2</span>013 44 0 ## <span style='color: #555555;'> 3</span> 73 <span style='text-decoration: underline;'>2</span>014 52 2 ## <span style='color: #555555;'> 4</span> 73 <span style='text-decoration: underline;'>2</span>015 79 7 ## <span style='color: #555555;'> 5</span> 73 <span style='text-decoration: underline;'>2</span>016 111 15 ## <span style='color: #555555;'> 6</span> 73 <span style='text-decoration: underline;'>2</span>017 109 7 ## <span style='color: #555555;'> 7</span> 73 <span style='text-decoration: underline;'>2</span>018 99 9 ## <span style='color: #555555;'> 8</span> 73 <span style='text-decoration: underline;'>2</span>019 144 21 ## <span style='color: #555555;'> 9</span> 73 <span style='text-decoration: underline;'>2</span>020 148 13 ## <span style='color: #555555;'>10</span> 74 <span style='text-decoration: underline;'>2</span>012 74 9 ## <span style='color: #555555;'># ... with 53 more rows</span> </CODE></PRE> --- # Reusing the same linear model from last time ```r bears_lm <- lm(Harvest ~ active_hunters, data=bears) summary(bears_lm) ``` ``` ## ## Call: ## lm(formula = Harvest ~ active_hunters, data = bears) ## ## Residuals: ## Min 1Q Median 3Q Max ## -20.2036 -2.8621 0.0816 3.3201 25.6144 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -5.048595 1.560276 -3.236 0.00196 ** ## active_hunters 0.188743 0.009188 20.542 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 7.749 on 61 degrees of freedom ## Multiple R-squared: 0.8737, Adjusted R-squared: 0.8716 ## F-statistic: 422 on 1 and 61 DF, p-value: < 2.2e-16 ``` --- # Creating a 95% confidence interval for the mean response We are interested in a 95% confidence interval for the mean harvest when the number of active hunters equals 150. Using the functions built into R, we can easily obtain this interval. ```r predict( bears_lm, newdata = data.frame(active_hunters=150), interval = "confidence" ) ``` ``` ## fit lwr upr ## 1 23.26288 21.28423 25.24152 ``` --- # Creating the interval manually The formula for a 95% confidence interval for the mean response is: `$$\widehat{Y} \,\pm\, t_{1-\frac{\alpha}{2},\,\text{df}}\;\cdot\;\widehat{\sigma}\;\cdot\;\sqrt{\frac{1}{n} \,+\, \frac{(X_{*} - \overline{X})^{2}}{\sum_{i}(X_{i} - \overline{X})^{2}}}$$` Let's collect all these values and store them in individual columns in a tibble. --- # More on broom - We can collect the required values by starting from the output of `broom::augment()` and `broom::glance()` - `broom::augment()` augments a model by returning the data used to fit the model and diagnostic information ```r augment(bears_lm) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 63 x 8</span> ## Harvest active_hunters .fitted .resid .hat .sigma .cooksd .std.resid ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 4 58 5.90 -<span style='color: #BB0000;'>1.90</span> 0.023<span style='text-decoration: underline;'>7</span> 7.81 0.000<span style='text-decoration: underline;'>745</span> -<span style='color: #BB0000;'>0.248</span> ## <span style='color: #555555;'> 2</span> 0 44 3.26 -<span style='color: #BB0000;'>3.26</span> 0.026<span style='text-decoration: underline;'>9</span> 7.80 0.002<span style='text-decoration: underline;'>51</span> -<span style='color: #BB0000;'>0.426</span> ## <span style='color: #555555;'> 3</span> 2 52 4.77 -<span style='color: #BB0000;'>2.77</span> 0.025<span style='text-decoration: underline;'>0</span> 7.81 0.001<span style='text-decoration: underline;'>67</span> -<span style='color: #BB0000;'>0.361</span> ## <span style='color: #555555;'> 4</span> 7 79 9.86 -<span style='color: #BB0000;'>2.86</span> 0.019<span style='text-decoration: underline;'>9</span> 7.80 0.001<span style='text-decoration: underline;'>41</span> -<span style='color: #BB0000;'>0.373</span> ## <span style='color: #555555;'> 5</span> 15 111 15.9 -<span style='color: #BB0000;'>0.902</span> 0.016<span style='text-decoration: underline;'>5</span> 7.81 0.000<span style='text-decoration: underline;'>116</span> -<span style='color: #BB0000;'>0.117</span> ## <span style='color: #555555;'> 6</span> 7 109 15.5 -<span style='color: #BB0000;'>8.52</span> 0.016<span style='text-decoration: underline;'>6</span> 7.73 0.010<span style='text-decoration: underline;'>4</span> -<span style='color: #BB0000;'>1.11</span> ## <span style='color: #555555;'> 7</span> 9 99 13.6 -<span style='color: #BB0000;'>4.64</span> 0.017<span style='text-decoration: underline;'>4</span> 7.79 0.003<span style='text-decoration: underline;'>24</span> -<span style='color: #BB0000;'>0.604</span> ## <span style='color: #555555;'> 8</span> 21 144 22.1 -<span style='color: #BB0000;'>1.13</span> 0.016<span style='text-decoration: underline;'>1</span> 7.81 0.000<span style='text-decoration: underline;'>177</span> -<span style='color: #BB0000;'>0.147</span> ## <span style='color: #555555;'> 9</span> 13 148 22.9 -<span style='color: #BB0000;'>9.89</span> 0.016<span style='text-decoration: underline;'>2</span> 7.71 0.013<span style='text-decoration: underline;'>6</span> -<span style='color: #BB0000;'>1.29</span> ## <span style='color: #555555;'>10</span> 9 74 8.92 0.081<span style='text-decoration: underline;'>6</span> 0.020<span style='text-decoration: underline;'>7</span> 7.81 0.000<span style='text-decoration: underline;'>001</span>20 0.010<span style='text-decoration: underline;'>6</span> ## <span style='color: #555555;'># ... with 53 more rows</span> </CODE></PRE> --- # More on broom - `broom::glance()` returns general information on the model and goodness of fit measures ```r glance(bears_lm) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 12</span> ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> ## <span style='color: #555555;'>1</span> 0.874 0.872 7.75 422. 4.26<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-29</span> 1 -<span style='color: #BB0000;'>217.</span> 441. 447. <span style='text-decoration: underline;'>3</span>663. 61 63 </CODE></PRE> --- # Collecting the values ```r augment(bears_lm) %>% summarise( xbar = mean(active_hunters), Sxx = sum((active_hunters - xbar)^2) ) %>% bind_cols( glance(bears_lm) %>% select(sigma, df.residual, nobs) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 5</span> ## xbar Sxx sigma df.residual nobs ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> ## <span style='color: #555555;'>1</span> 132. <span style='text-decoration: underline;'>711</span>260. 7.75 61 63 </CODE></PRE> --- # Collecting the values ```r augment(bears_lm) %>% summarise( xbar = mean(active_hunters), Sxx = sum((active_hunters - xbar)^2) ) %>% bind_cols( glance(bears_lm) %>% select(sigma, df.residual, nobs) ) %>% mutate( alpha = 0.05, beta0 = coef(bears_lm)[1], beta1 = coef(bears_lm)[2], xstar = 150, yhat = beta0 + beta1 * xstar, tval = qt(1 - alpha/2, df=df.residual) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 11</span> ## xbar Sxx sigma df.residual nobs alpha beta0 beta1 xstar yhat tval ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> 132. <span style='text-decoration: underline;'>711</span>260. 7.75 61 63 0.05 -<span style='color: #BB0000;'>5.05</span> 0.189 150 23.3 2.00 </CODE></PRE> --- # Collecting the values Finally, create a variable to store these values. ```r values <- augment(bears_lm) %>% summarise( xbar = mean(active_hunters), Sxx = sum((active_hunters - xbar)^2) ) %>% bind_cols( glance(bears_lm) %>% select(sigma, df.residual, nobs) ) %>% mutate( alpha = 0.05, beta0 = coef(bears_lm)[1], beta1 = coef(bears_lm)[2], xstar = 150, yhat = beta0 + beta1 * xstar, tval = qt(1 - alpha/2, df=df.residual) ) ``` --- # Calculating the interval ```r values %>% summarise( fit = beta0 + beta1 * xstar, lower = fit - tval * sigma * sqrt((1 / nobs) + (xstar - xbar)^2 / Sxx), upper = fit + tval * sigma * sqrt((1 / nobs) + (xstar - xbar)^2 / Sxx) ) %>% as.matrix() # only to see more digits ``` ``` ## fit lower upper ## [1,] 23.26288 21.28423 25.24152 ``` Compare them to our results from `predict()`: ```r predict( bears_lm, newdata = data.frame(active_hunters=150), interval = "confidence" ) ``` ``` ## fit lwr upr ## 1 23.26288 21.28423 25.24152 ``` --- # Creating a 90% prediction interval for response value of a new observation We are interested in a 90% prediction interval for the response value of a new observation, where the number of active hunters equals 55. Using the functions built into R, we can easily obtain this interval. ```r predict( bears_lm, newdata = data.frame(active_hunters=55), interval = "prediction", level = 0.90 ) ``` ``` ## fit lwr upr ## 1 5.332277 -7.766788 18.43134 ``` --- # Creating the interval manually The formula for a 90% confidence interval for the response value of a new observation is: `$$\widehat{Y} \,\pm\, t_{1-\frac{\alpha}{2},\,\text{df}}\;\cdot\;\widehat{\sigma}\;\cdot\;\sqrt{1 + \frac{1}{n} \,+\, \frac{(X_{*} - \overline{X})^{2}}{\sum_{i}(X_{i} - \overline{X})^{2}}}$$` We can reuse the values that we had previously collected and update: - `alpha` - `xstar` - `tval` --- # Updating our values ```r values2 <- values %>% mutate( alpha = 0.10, xstar = 55, tval = qt(1 - alpha/2, df=df.residual) ) values2 ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 11</span> ## xbar Sxx sigma df.residual nobs alpha beta0 beta1 xstar yhat tval ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> 132. <span style='text-decoration: underline;'>711</span>260. 7.75 61 63 0.1 -<span style='color: #BB0000;'>5.05</span> 0.189 55 23.3 1.67 </CODE></PRE> --- # Calculating the interval The formula differs only by a `\(1 +\)` term under the square root. ```r values2 %>% summarise( fit = beta0 + beta1 * xstar, lower = fit - tval * sigma * sqrt(1 + (1 / nobs) + (xstar - xbar)^2 / Sxx), upper = fit + tval * sigma * sqrt(1 + (1 / nobs) + (xstar - xbar)^2 / Sxx) ) %>% as.matrix() # only to see more digits ``` ``` ## fit lower upper ## [1,] 5.332277 -7.766788 18.43134 ``` Compare them to our results from `predict()`: ```r predict( bears_lm, newdata = data.frame(active_hunters=55), interval = "prediction", level = 0.90 ) ``` ``` ## fit lwr upr ## 1 5.332277 -7.766788 18.43134 ``` --- # Extracting the design matrix We can extract the design matrix, `\(X\)`, from our linear model using the `model.matrix()` function. ```r X <- model.matrix(bears_lm) head(X) ``` ``` ## (Intercept) active_hunters ## 1 1 58 ## 2 1 44 ## 3 1 52 ## 4 1 79 ## 5 1 111 ## 6 1 109 ``` --- # Obtaining `\(X^{T}X\)` ```r t(X) %*% X ``` ``` ## (Intercept) active_hunters ## (Intercept) 63 8345 ## active_hunters 8345 1816641 ``` --- # Obtaining the hat matrix The hat matrix is defined as `$$H \,=\, X\,(X^{T}\,X)^{-1}\,X^{T}$$` ```r H <- X %*% solve(t(X) %*% X) %*% t(X) H[1:6, 1:6] ``` ``` ## 1 2 3 4 5 6 ## 1 0.02366811 0.02513375 0.02429624 0.02146967 0.01811965 0.01832903 ## 2 0.02513375 0.02687494 0.02587997 0.02252195 0.01854206 0.01879081 ## 3 0.02429624 0.02587997 0.02497498 0.02192064 0.01830069 0.01852693 ## 4 0.02146967 0.02252195 0.02192064 0.01989125 0.01748603 0.01763636 ## 5 0.01811965 0.01854206 0.01830069 0.01748603 0.01652052 0.01658087 ## 6 0.01832903 0.01879081 0.01852693 0.01763636 0.01658087 0.01664684 ``` --- # Obtaining the variance-covariance matrix of parameter estimates The variance-covariance matrix of the parameter estimates is obtained as `$$\textbf{Var}(\widehat{\beta}) \,=\, \widehat{\sigma}^{2}(X^{T}\,X)^{-1}$$` We already have the value of `\(\widehat{\sigma}\)` from our previous collection of values, so we just need to remember to square it. ```r var_beta <- values$sigma^2 * solve(t(X) %*% X) var_beta ``` ``` ## (Intercept) active_hunters ## (Intercept) 2.43446196 -1.118305e-02 ## active_hunters -0.01118305 8.442565e-05 ``` - The diagonal elements are the variances - The off-diagonal elements are the covariances --- # Standard errors Since the variances were along the diagonal, if we square root them we will get the standard errors. We can extract the diagonal elements of a matrix using `diag()`. ```r se_beta <- sqrt(diag(var_beta)) se_beta ``` ``` ## (Intercept) active_hunters ## 1.560276244 0.009188343 ``` Compare with the standard errors computed from `summary()` or `broom::tidy()`: ```r tidy(bears_lm) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 2 x 5</span> ## term estimate std.error statistic p.value ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> (Intercept) -<span style='color: #BB0000;'>5.05</span> 1.56 -<span style='color: #BB0000;'>3.24</span> 1.96<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 3</span> ## <span style='color: #555555;'>2</span> active_hunters 0.189 0.009<span style='text-decoration: underline;'>19</span> 20.5 4.26<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-29</span> </CODE></PRE> They're the same (as they should be)!