class: language-r layout: true --- # Lab 4 — October 18 ```r library(tidyverse) ``` <PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Attaching packages</span> --------------------------------------- tidyverse 1.3.1 -- </CODE></PRE><PRE class="fansi fansi-message"><CODE>## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>ggplot2</span> 3.3.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>purrr </span> 0.3.4 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tibble </span> 3.1.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>dplyr </span> 1.0.9 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tidyr </span> 1.2.0 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>stringr</span> 1.4.0 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>readr </span> 2.1.2 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>forcats</span> 0.5.1 </CODE></PRE><PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Conflicts</span> ------------------------------------------ tidyverse_conflicts() -- ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter() ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag() </CODE></PRE> ```r library(fastDummies) library(broom) theme_set(theme_bw()) ``` --- # Black bear hunting activity and harvests From the [data set info page](https://data.ontario.ca/en/dataset/black-bear-hunting-activity-and-harvests): > This data breaks down estimated hunter and harvest numbers by: - wildlife management unit (WMU) - calendar year > Harvest and active hunter numbers are estimates based on replies received from a sample of hunters and are therefore subject to statistical error. - A WMU is essentially a code that identifies a hunting region --- # Obtain the data set ```r if (!dir.exists("./data")) { dir.create("./data") } download.file( "https://data.ontario.ca/dataset/c14b7d2b-7c42-4727-92c6-99f140ed3a57/resource/7dd6328e-74cc-4291-a041-2345cf7c6186/download/black_bear_2021.csv", destfile="./data/bears.csv" ) ``` ```r bears_full <- read_csv("./data/bears.csv") ``` <PRE class="fansi fansi-message"><CODE>## <span style='font-weight: bold;'>Rows: </span><span style='color: #0000BB;'>801</span> <span style='font-weight: bold;'>Columns: </span><span style='color: #0000BB;'>4</span> ## <span style='color: #00BBBB;'>--</span> <span style='font-weight: bold;'>Column specification</span> <span style='color: #00BBBB;'>--------------------------------------------------------</span> ## <span style='font-weight: bold;'>Delimiter:</span> "," ## <span style='color: #BB0000;'>chr</span> (1): WMU ## <span style='color: #00BB00;'>dbl</span> (3): Year, Active Hunters, Harvest ## ## <span style='color: #00BBBB;'>i</span> Use `spec()` to retrieve the full column specification for this data. ## <span style='color: #00BBBB;'>i</span> Specify the column types or set `show_col_types = FALSE` to quiet this message. </CODE></PRE> --- # Inspect the data ```r head(bears_full) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 6 x 4</span> ## WMU Year `Active Hunters` Harvest ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> 01A <span style='text-decoration: underline;'>2</span>012 1 0 ## <span style='color: #555555;'>2</span> 01A <span style='text-decoration: underline;'>2</span>013 6 0 ## <span style='color: #555555;'>3</span> 01A <span style='text-decoration: underline;'>2</span>014 6 2 ## <span style='color: #555555;'>4</span> 01A <span style='text-decoration: underline;'>2</span>015 8 2 ## <span style='color: #555555;'>5</span> 01A <span style='text-decoration: underline;'>2</span>016 0 0 ## <span style='color: #555555;'>6</span> 01A <span style='text-decoration: underline;'>2</span>017 6 0 </CODE></PRE> ```r bears_full %>% distinct(WMU) %>% pull(WMU) ``` ``` ## [1] "01A" "01C" "01D" "2" "3" "4" "5" "6" "07A" ## [10] "07B" "8" "09A" "09B" "10" "11A" "11B" "12A" "12B" ## [19] "13" "14" "15A" "15B" "16A" "16B" "16C" "17" "18A" ## [28] "18B" "19" "21A" "21B" "22" "23" "24" "25" "26" ## [37] "27" "28" "29" "30" "31" "32" "33" "34" "35" ## [46] "36" "37" "38" "39" "40" "41" "42" "43A" "43B" ## [55] "44" "45" "46" "47" "48" "49" "50" "53" "54" ## [64] "55A" "55B" "56" "57" "58" "59" "60" "61" "62" ## [73] "63" "64" "66" "67" "68" "69A" "69B" "71" "72" ## [82] "73" "74" "75" "76" "82A" "83" "84" "Total" ``` --- # Inspect the data ```r tail(bears_full) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 6 x 4</span> ## WMU Year `Active Hunters` Harvest ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> Total <span style='text-decoration: underline;'>2</span>015 <span style='text-decoration: underline;'>26</span>293 <span style='text-decoration: underline;'>6</span>662 ## <span style='color: #555555;'>2</span> Total <span style='text-decoration: underline;'>2</span>016 <span style='text-decoration: underline;'>31</span>480 <span style='text-decoration: underline;'>8</span>152 ## <span style='color: #555555;'>3</span> Total <span style='text-decoration: underline;'>2</span>017 <span style='text-decoration: underline;'>28</span>718 <span style='text-decoration: underline;'>6</span>497 ## <span style='color: #555555;'>4</span> Total <span style='text-decoration: underline;'>2</span>018 <span style='text-decoration: underline;'>27</span>138 <span style='text-decoration: underline;'>5</span>869 ## <span style='color: #555555;'>5</span> Total <span style='text-decoration: underline;'>2</span>019 <span style='text-decoration: underline;'>28</span>250 <span style='text-decoration: underline;'>5</span>301 ## <span style='color: #555555;'>6</span> Total <span style='text-decoration: underline;'>2</span>020 <span style='text-decoration: underline;'>27</span>291 <span style='text-decoration: underline;'>4</span>269 </CODE></PRE> - It looks like the end of the data set contains annual summaries - We should separate this from the raw data that we will use for analysis --- # Split the data; rename column ```r bears <- bears_full %>% filter(WMU != "Total") %>% rename(active_hunters = `Active Hunters`) ``` --- # Re-inspect the data ```r bears %>% distinct(WMU) %>% pull(WMU) ``` ``` ## [1] "01A" "01C" "01D" "2" "3" "4" "5" "6" "07A" "07B" "8" "09A" ## [13] "09B" "10" "11A" "11B" "12A" "12B" "13" "14" "15A" "15B" "16A" "16B" ## [25] "16C" "17" "18A" "18B" "19" "21A" "21B" "22" "23" "24" "25" "26" ## [37] "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" ## [49] "39" "40" "41" "42" "43A" "43B" "44" "45" "46" "47" "48" "49" ## [61] "50" "53" "54" "55A" "55B" "56" "57" "58" "59" "60" "61" "62" ## [73] "63" "64" "66" "67" "68" "69A" "69B" "71" "72" "73" "74" "75" ## [85] "76" "82A" "83" "84" ``` There are 88 different WMUs! --- # Re-inspect the data ```r bears %>% distinct(Year) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 9 x 1</span> ## Year ## <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> <span style='text-decoration: underline;'>2</span>012 ## <span style='color: #555555;'>2</span> <span style='text-decoration: underline;'>2</span>013 ## <span style='color: #555555;'>3</span> <span style='text-decoration: underline;'>2</span>014 ## <span style='color: #555555;'>4</span> <span style='text-decoration: underline;'>2</span>015 ## <span style='color: #555555;'>5</span> <span style='text-decoration: underline;'>2</span>016 ## <span style='color: #555555;'>6</span> <span style='text-decoration: underline;'>2</span>017 ## <span style='color: #555555;'>7</span> <span style='text-decoration: underline;'>2</span>018 ## <span style='color: #555555;'>8</span> <span style='text-decoration: underline;'>2</span>019 ## <span style='color: #555555;'>9</span> <span style='text-decoration: underline;'>2</span>020 </CODE></PRE> The data ranges from 2012 to 2020. --- # Scatterplots .pull-left[ A basic scatterplot: ```r ggplot(bears, aes(x=active_hunters, y=Harvest))+ geom_point()+ labs(x="Active hunters") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> ] --- # Scatterplots .pull-left[ Scatterplot colouring by `WMU`: ```r ggplot(bears, aes(x=active_hunters, y=Harvest, colour=WMU))+ geom_point(show.legend=FALSE)+ labs(x="Active hunters") ``` - You can disable the legend of one or more geom layers by setting `show.legend=FALSE`, rather than using `theme(legend.position="none")` - This still isnt a great plot as we saw earlier that we actually have 88 different levels for the `WMU` variable... - Facetting by the WMU won't help either because we'll have 88 subplots ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> ] --- # Subsetting the data Let us consider only the observations where the WMU ended in "A" or "B". ```r bearsAB <- bears %>% filter(str_ends(WMU, pattern="A|B")) ``` - `str_ends()` is a function that checks whether a string ends in a pattern - We can use this function with `filter()` because it returns a logical vector (`TRUE`, `FALSE`) Check our work: ```r bearsAB %>% distinct(WMU) %>% pull(WMU) ``` ``` ## [1] "01A" "07A" "07B" "09A" "09B" "11A" "11B" "12A" "12B" "15A" "15B" "16A" ## [13] "16B" "18A" "18B" "21A" "21B" "43A" "43B" "55A" "55B" "69A" "69B" "82A" ``` - We still have 24 WMUs! --- # Remake the scatterplot .pull-left[ ```r ggplot(bearsAB, aes(x=active_hunters, y=Harvest, colour=WMU))+ geom_point(show.legend=FALSE)+ labs(x="Active hunters") ``` - I wouldn't say that this plot is much of an improvement over the last... - Facetting by WMU still won't be useful since we'll end up with 24 subplots, which is still a lot ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-17-1.svg" style="display: block; margin: auto;" /> ] --- # Fit a multiple linear regression model Suppose we now wish to fit the model of form `$$Y \,=\, \beta_{0} \,+\, \beta_{1}X_{1} \,+\, \beta_{2}X_{2_{1}} \,+\, \beta_{3}X_{2_{2}} \,+\, \ldots + \beta_{24}X_{2_{23}} \,+\, \varepsilon$$` where: - `\(Y\)` is the harvest - `\(X_{1}\)` is the number of active hunters - `\(X_{2_{j}}\)` is an indicator for the `\(j^{\text{th}}\)` WMU - WMU has 24 levels, of which, the last 23 levels will require indicator variables We will first do this with matrices, then with the `lm` function. --- # The matrices We want to go from the equation on the previous page to something like this: `$$\begin{bmatrix}Y_{1}\\ Y_{2}\\ \vdots\\ Y_{n} \end{bmatrix} \,=\, \begin{bmatrix}1 &X_{1,\,1} &X_{2_{1},\,1} &X_{2_{2},\,1} &\cdots &X_{2_{23},\,1}\\ 1 &X_{1,\,2} &X_{2_{1},\,2} &X_{2_{2},\,2} &\cdots &X_{2_{23},\,2}\\ \vdots &\vdots &\vdots &\vdots &\ddots &\vdots\\ 1 &X_{1,\,n} &X_{2_{1},\,n} &X_{2_{2},\,n} &\cdots &X_{2_{23},\,n} \end{bmatrix} \begin{bmatrix}\beta_{0}\\ \beta_{1}\\ \beta_{2_{1}}\\ \beta_{2_{2}}\\ \vdots\\ \beta_{2_{23}} \end{bmatrix} \,+\, \begin{bmatrix}\varepsilon_{1}\\ \varepsilon_{2}\\ \vdots\\ \varepsilon_{n} \end{bmatrix}$$` where: - `\(X_{1,\,i}\)` is the number of active hunters for observation `\(i\)` - `\(X_{2_{j},\,i}\)` is an indicator for the `\(j^{\text{th}}\)` WMU for observation `\(i\)` - WMU has 24 levels so `\(24 - 1 = 23\)` of these levels are included in the matrix (so that `\(X\)` will be of full rank) --- # Creating the response matrix .pull-left[ ```r Y <- bearsAB %>% select(Harvest) %>% as.matrix() head(Y) ``` - All we need to do is extract the response variable (`Harvest`) and convert it to a matrix ] .pull-right[ ``` ## Harvest ## [1,] 0 ## [2,] 0 ## [3,] 2 ## [4,] 2 ## [5,] 0 ## [6,] 0 ``` ] --- # Creating the design matrix .pull-left[ ```r bearsAB %>% select(active_hunters, WMU) %>% add_column(intercept=1, .before=1) ``` - We start by selecting the columns that we need: `active_hunters` and `WMU` - Then we add a column to the front of the data to denote the intercept. - In general, if a name-value pair of length 1 is supplied to a tidyverse-related function that involves adding rows/columns to data (e.g. `add_column`, `mutate`), the value is repeated. This is **only** applicable when the length is 1. ] .pull-right[ <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 216 x 3</span> ## intercept active_hunters WMU ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><chr></span> ## <span style='color: #555555;'> 1</span> 1 1 01A ## <span style='color: #555555;'> 2</span> 1 6 01A ## <span style='color: #555555;'> 3</span> 1 6 01A ## <span style='color: #555555;'> 4</span> 1 8 01A ## <span style='color: #555555;'> 5</span> 1 0 01A ## <span style='color: #555555;'> 6</span> 1 6 01A ## <span style='color: #555555;'> 7</span> 1 8 01A ## <span style='color: #555555;'> 8</span> 1 17 01A ## <span style='color: #555555;'> 9</span> 1 22 01A ## <span style='color: #555555;'>10</span> 1 13 07A ## <span style='color: #555555;'># ... with 206 more rows</span> </CODE></PRE> ] --- # Aside: factor levels - The WMU variable is currently of type character - In order to incorporate it into our model, we need to convert it to a factor - Factor levels are assigned in numerical/alphabetical order ```r factor(c("B", "A", "C")) ``` ``` ## [1] B A C ## Levels: A B C ``` ```r bearsAB %>% pull(WMU) %>% factor() %>% head() ``` ``` ## [1] 01A 01A 01A 01A 01A 01A ## 24 Levels: 01A 07A 07B 09A 09B 11A 11B 12A 12B 15A 15B 16A 16B 18A 18B ... 82A ``` - When WMU is converted to a factor, the first level is `01A` - As such, this level will be excluded from our set of indicator variables that we need to create - `01A` will be the baseline of our resulting model --- # Creating the design matrix .pull-left[ ```r bearsAB %>% select(active_hunters, WMU) %>% add_column(intercept=1, .before=1) %>% dummy_cols(remove_first_dummy=TRUE, remove_selected_columns=TRUE) ``` - We use `fastDummies::dummy_cols` to create our indicator variables - By default, it creates indicators using all character/factor variables in the data set - We set `remove_first_dummy` to `TRUE` so that the design matrix will be of full rank - We also set `remove_selected_columns` to `TRUE` to drop WMU after creating its indicators ] .pull-right[ <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 216 x 25</span> ## intercept active_hunters WMU_07A WMU_07B WMU_09A WMU_09B WMU_11A WMU_11B ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> ## <span style='color: #555555;'> 1</span> 1 1 0 0 0 0 0 0 ## <span style='color: #555555;'> 2</span> 1 6 0 0 0 0 0 0 ## <span style='color: #555555;'> 3</span> 1 6 0 0 0 0 0 0 ## <span style='color: #555555;'> 4</span> 1 8 0 0 0 0 0 0 ## <span style='color: #555555;'> 5</span> 1 0 0 0 0 0 0 0 ## <span style='color: #555555;'> 6</span> 1 6 0 0 0 0 0 0 ## <span style='color: #555555;'> 7</span> 1 8 0 0 0 0 0 0 ## <span style='color: #555555;'> 8</span> 1 17 0 0 0 0 0 0 ## <span style='color: #555555;'> 9</span> 1 22 0 0 0 0 0 0 ## <span style='color: #555555;'>10</span> 1 13 1 0 0 0 0 0 ## <span style='color: #555555;'># ... with 206 more rows, and 17 more variables: WMU_12A <int>, WMU_12B <int>,</span> ## <span style='color: #555555;'># WMU_15A <int>, WMU_15B <int>, WMU_16A <int>, WMU_16B <int>, WMU_18A <int>,</span> ## <span style='color: #555555;'># WMU_18B <int>, WMU_21A <int>, WMU_21B <int>, WMU_43A <int>, WMU_43B <int>,</span> ## <span style='color: #555555;'># WMU_55A <int>, WMU_55B <int>, WMU_69A <int>, WMU_69B <int>, WMU_82A <int></span> </CODE></PRE> ] --- # Creating the design matrix .pull-left[ ```r X <- bearsAB %>% select(active_hunters, WMU) %>% add_column(intercept=1, .before=1) %>% dummy_cols(remove_first_dummy=TRUE, remove_selected_columns=TRUE) %>% as.matrix() X[1:6, 1:6] ``` - Finally, we convert the data into a matrix and assign it to `X` - We can take a peek at the first 6 rows and columns ] .pull-right[ ``` ## intercept active_hunters WMU_07A WMU_07B WMU_09A WMU_09B ## [1,] 1 1 0 0 0 0 ## [2,] 1 6 0 0 0 0 ## [3,] 1 6 0 0 0 0 ## [4,] 1 8 0 0 0 0 ## [5,] 1 0 0 0 0 0 ## [6,] 1 6 0 0 0 0 ``` ] --- # Matrix functions needed to obtain the coefficient estimates `t()` is the transpose. ```r A <- matrix(c(1,2,3,4), nrow=2, ncol=2, byrow=TRUE) A ``` ``` ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ``` ```r t(A) ``` ``` ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ``` --- # Matrix functions needed to obtain the coefficient estimates `solve()` returns the inverse of a matrix. ```r A ``` ``` ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ``` ```r solve(A) ``` ``` ## [,1] [,2] ## [1,] -2.0 1.0 ## [2,] 1.5 -0.5 ``` `%*%` represents matrix multiplication. We can check that `$$A \,*\, A^{-1} \,=\, \mathbb{I}_{2}$$` ```r A %*% solve(A) ``` --- # Obtaining the coefficient estimates with matrices `$$\widehat{\boldsymbol{\beta}} \,=\, (X^{T}X)^{-1}X^{T}Y$$` ```r beta <- solve(t(X) %*% X) %*% t(X) %*% Y head(beta) ``` ``` ## Harvest ## intercept -2.752779 ## active_hunters 0.402365 ## WMU_07A 3.844856 ## WMU_07B 14.750428 ## WMU_09A 13.086960 ## WMU_09B 12.205280 ``` --- # Obtaining the coefficient estimates with matrices Converting it back into a tibble: ```r beta_tibble <- beta %>% as_tibble(rownames="term") %>% rename(estimate = Harvest) beta_tibble ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 25 x 2</span> ## term estimate ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> intercept -<span style='color: #BB0000;'>2.75</span> ## <span style='color: #555555;'> 2</span> active_hunters 0.402 ## <span style='color: #555555;'> 3</span> WMU_07A 3.84 ## <span style='color: #555555;'> 4</span> WMU_07B 14.8 ## <span style='color: #555555;'> 5</span> WMU_09A 13.1 ## <span style='color: #555555;'> 6</span> WMU_09B 12.2 ## <span style='color: #555555;'> 7</span> WMU_11A 4.46 ## <span style='color: #555555;'> 8</span> WMU_11B -<span style='color: #BB0000;'>7.34</span> ## <span style='color: #555555;'> 9</span> WMU_12A -<span style='color: #BB0000;'>1.10</span> ## <span style='color: #555555;'>10</span> WMU_12B -<span style='color: #BB0000;'>7.21</span> ## <span style='color: #555555;'># ... with 15 more rows</span> </CODE></PRE> --- # Obtaining the coefficient estimates with `lm` ```r bearsAB_lm <- lm(Harvest ~ active_hunters + WMU, data=bearsAB) ``` I'm using `broom::tidy.lm` to extract only the coefficient table from the `summary` output because the output from `summary` is going to be huge and won't fit on my slides. ```r bearsAB_lm_coef <- bearsAB_lm %>% tidy() %>% select(term, estimate) bearsAB_lm_coef ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 25 x 2</span> ## term estimate ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> (Intercept) -<span style='color: #BB0000;'>2.75</span> ## <span style='color: #555555;'> 2</span> active_hunters 0.402 ## <span style='color: #555555;'> 3</span> WMU07A 3.84 ## <span style='color: #555555;'> 4</span> WMU07B 14.8 ## <span style='color: #555555;'> 5</span> WMU09A 13.1 ## <span style='color: #555555;'> 6</span> WMU09B 12.2 ## <span style='color: #555555;'> 7</span> WMU11A 4.46 ## <span style='color: #555555;'> 8</span> WMU11B -<span style='color: #BB0000;'>7.34</span> ## <span style='color: #555555;'> 9</span> WMU12A -<span style='color: #BB0000;'>1.10</span> ## <span style='color: #555555;'>10</span> WMU12B -<span style='color: #BB0000;'>7.21</span> ## <span style='color: #555555;'># ... with 15 more rows</span> </CODE></PRE> --- # Comparing our estimates .pull-left[ ```r list( with_matrices = beta_tibble, with_lm = bearsAB_lm_coef ) ``` They are identical! ] .pull-right[ <PRE class="fansi fansi-output"><CODE>## $with_matrices ## <span style='color: #555555;'># A tibble: 25 x 2</span> ## term estimate ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> intercept -<span style='color: #BB0000;'>2.75</span> ## <span style='color: #555555;'> 2</span> active_hunters 0.402 ## <span style='color: #555555;'> 3</span> WMU_07A 3.84 ## <span style='color: #555555;'> 4</span> WMU_07B 14.8 ## <span style='color: #555555;'> 5</span> WMU_09A 13.1 ## <span style='color: #555555;'> 6</span> WMU_09B 12.2 ## <span style='color: #555555;'> 7</span> WMU_11A 4.46 ## <span style='color: #555555;'> 8</span> WMU_11B -<span style='color: #BB0000;'>7.34</span> ## <span style='color: #555555;'> 9</span> WMU_12A -<span style='color: #BB0000;'>1.10</span> ## <span style='color: #555555;'>10</span> WMU_12B -<span style='color: #BB0000;'>7.21</span> ## <span style='color: #555555;'># ... with 15 more rows</span> ## ## $with_lm ## <span style='color: #555555;'># A tibble: 25 x 2</span> ## term estimate ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> (Intercept) -<span style='color: #BB0000;'>2.75</span> ## <span style='color: #555555;'> 2</span> active_hunters 0.402 ## <span style='color: #555555;'> 3</span> WMU07A 3.84 ## <span style='color: #555555;'> 4</span> WMU07B 14.8 ## <span style='color: #555555;'> 5</span> WMU09A 13.1 ## <span style='color: #555555;'> 6</span> WMU09B 12.2 ## <span style='color: #555555;'> 7</span> WMU11A 4.46 ## <span style='color: #555555;'> 8</span> WMU11B -<span style='color: #BB0000;'>7.34</span> ## <span style='color: #555555;'> 9</span> WMU12A -<span style='color: #BB0000;'>1.10</span> ## <span style='color: #555555;'>10</span> WMU12B -<span style='color: #BB0000;'>7.21</span> ## <span style='color: #555555;'># ... with 15 more rows</span> </CODE></PRE> ] --- # WMUs ending in B Now instead of WMUs ending in A or B, let us consider only the WMUs that end in B. ```r bearsB <- bears %>% filter(str_ends(WMU, pattern="B")) bearsB ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 99 x 4</span> ## WMU Year active_hunters Harvest ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 07B <span style='text-decoration: underline;'>2</span>012 198 121 ## <span style='color: #555555;'> 2</span> 07B <span style='text-decoration: underline;'>2</span>013 148 66 ## <span style='color: #555555;'> 3</span> 07B <span style='text-decoration: underline;'>2</span>014 131 50 ## <span style='color: #555555;'> 4</span> 07B <span style='text-decoration: underline;'>2</span>015 148 56 ## <span style='color: #555555;'> 5</span> 07B <span style='text-decoration: underline;'>2</span>016 172 73 ## <span style='color: #555555;'> 6</span> 07B <span style='text-decoration: underline;'>2</span>017 148 81 ## <span style='color: #555555;'> 7</span> 07B <span style='text-decoration: underline;'>2</span>018 157 87 ## <span style='color: #555555;'> 8</span> 07B <span style='text-decoration: underline;'>2</span>019 189 102 ## <span style='color: #555555;'> 9</span> 07B <span style='text-decoration: underline;'>2</span>020 71 20 ## <span style='color: #555555;'>10</span> 09B <span style='text-decoration: underline;'>2</span>012 105 58 ## <span style='color: #555555;'># ... with 89 more rows</span> </CODE></PRE> --- # Fit the new model Since the question doesn't mention anything about WMUs, WMU is not included in the model. ```r bearsB_lm <- lm(Harvest ~ active_hunters, data=bearsB) summary(bearsB_lm) ``` ``` ## ## Call: ## lm(formula = Harvest ~ active_hunters, data = bearsB) ## ## Residuals: ## Min 1Q Median 3Q Max ## -81.066 -14.013 -3.305 14.675 65.839 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.80206 3.67144 0.491 0.625 ## active_hunters 0.26949 0.01492 18.056 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 24.58 on 97 degrees of freedom ## Multiple R-squared: 0.7707, Adjusted R-squared: 0.7683 ## F-statistic: 326 on 1 and 97 DF, p-value: < 2.2e-16 ``` --- # Make an 80% prediction interval The formula for a `\(100(1 - \alpha)\%\)` prediction interval for `\(X \,=\, x_{*}\)` is: `$$\widehat{Y} \,\pm\, t_{1 - \frac{\alpha}{2},\, n-2}\;\cdot\;\widehat{\sigma}\;\cdot\;\sqrt{\left(1 \,+\, \frac{1}{n} \,+\, \frac{(X_{*} - \overline{X})^{2}}{S_{xx}}\right)}$$` where `$$P\left(T \,<\, t_{1 - \frac{\alpha}{2},\, n-2}\right) \,=\, 1 - \frac{\alpha}{2}$$` --- # Obtaining the values The value of `\(\widehat{Y}\)` when `\(X_{*} \,=\, 4\)`: ```r yhat <- bearsB_lm %>% augment(newdata = data.frame(active_hunters=4)) %>% pull(.fitted) yhat ``` ``` ## [1] 2.880025 ``` --- # Obtaining the values We have `\(n = 99\)` and `\(\alpha = 0.20\)`. ```r n <- 99 alpha <- 0.20 ``` The quantile value: ```r tval <- qt(1 - alpha/2, df=n-2) tval ``` ``` ## [1] 1.29034 ``` --- # Obtaining the values The value of `\(\widehat{\sigma}\)` can be read off the summary output under "Residual standard error". ```r summary(bearsB_lm) ``` ``` ## ## Call: ## lm(formula = Harvest ~ active_hunters, data = bearsB) ## ## Residuals: ## Min 1Q Median 3Q Max ## -81.066 -14.013 -3.305 14.675 65.839 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.80206 3.67144 0.491 0.625 ## active_hunters 0.26949 0.01492 18.056 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 24.58 on 97 degrees of freedom ## Multiple R-squared: 0.7707, Adjusted R-squared: 0.7683 ## F-statistic: 326 on 1 and 97 DF, p-value: < 2.2e-16 ``` --- # Obtaining the values Alternatively, we can extract the value of `\(\widehat{\sigma}\)` using `broom::glance.lm`: ```r glance(bearsB_lm) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 12</span> ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> 0.771 0.768 24.6 326. 8.75<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-33</span> 1 -<span style='color: #BB0000;'>456.</span> 919. 927. ## <span style='color: #555555;'># ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int></span> </CODE></PRE> ```r sigma <- bearsB_lm %>% glance() %>% pull(sigma) sigma ``` ``` ## [1] 24.583 ``` --- # Obtaining the values The remaining terms in the standard error can be obtained using: ```r xstar <- 4 xbar <- bearsB %>% pull(active_hunters) %>% mean() Sxx <- bearsB %>% pull(active_hunters) %>% var() %>% magrittr::multiply_by(n-1) big_term <- sqrt(1 + (1/n) + (xstar - xbar)^2 / Sxx) ``` --- # Calculating the interval ```r interval <- tibble( fit = yhat, lower = yhat - tval * sigma * big_term, upper = yhat + tval * sigma * big_term ) interval %>% as.matrix() # only to see more digits ``` ``` ## fit lower upper ## [1,] 2.880025 -29.18389 34.94394 ``` --- # Check our work ```r interval ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 3</span> ## fit lower upper ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> 2.88 -<span style='color: #BB0000;'>29.2</span> 34.9 </CODE></PRE> ```r predict( bearsB_lm, newdata = data.frame(active_hunters=4), interval = "prediction", level = 0.80 ) ``` ``` ## fit lwr upr ## 1 2.880025 -29.18389 34.94394 ``` They're the same!