class: language-r layout: true --- # Lab 7 — November 15 ```r library(tidyverse) ``` <PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Attaching packages</span> --------------------------------------- tidyverse 1.3.1 -- </CODE></PRE><PRE class="fansi fansi-message"><CODE>## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>ggplot2</span> 3.3.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>purrr </span> 0.3.4 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tibble </span> 3.1.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>dplyr </span> 1.0.9 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tidyr </span> 1.2.0 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>stringr</span> 1.4.0 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>readr </span> 2.1.2 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>forcats</span> 0.5.1 </CODE></PRE><PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Conflicts</span> ------------------------------------------ tidyverse_conflicts() -- ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter() ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag() </CODE></PRE> ```r library(GGally) library(broom) theme_set(theme_bw()) ``` --- # Additional packages — easystats [easystats](https://easystats.github.io/easystats/) is a collection of packages, like [tidyverse](https://www.tidyverse.org/packages/). [easystats](https://easystats.github.io/easystats/) is not currently on CRAN but can be installed using: ```r install.packages("easystats", repos="https://easystats.r-universe.dev") ``` However, its component packages and dependencies can be installed from CRAN normally: ```r install.packages("performance") install.packages("see") install.packages("patchwork") ``` Once installed, ```r library(performance) # or # library(easystats) ``` --- # Additional packages — tidymodels [tidymodels](https://www.tidymodels.org/packages/) is an extension of the [tidyverse](https://www.tidyverse.org/packages/) that focuses on modelling. (Recall that [broom](https://broom.tidymodels.org/) is actually a package from [tidymodels](https://www.tidymodels.org/packages/), not [tidyverse](https://www.tidyverse.org/packages/)!) You can install [tidymodels](https://www.tidymodels.org/packages/) using the usual methods. You can also install just its component packages. In this lab, I introduce two additional packages from the [tidymodels](https://www.tidymodels.org/packages/) collection: - [rsample](https://rsample.tidymodels.org/): for producing data partitions and data resamples - [yardstick](https://yardstick.tidymodels.org/): for calculating model performance metrics ```r library(rsample) library(yardstick) ``` --- # Load the data ```r t7 <- read_csv("./data/tutorial7.csv") ``` <PRE class="fansi fansi-message"><CODE>## <span style='font-weight: bold;'>Rows: </span><span style='color: #0000BB;'>1024</span> <span style='font-weight: bold;'>Columns: </span><span style='color: #0000BB;'>7</span> ## <span style='color: #00BBBB;'>--</span> <span style='font-weight: bold;'>Column specification</span> <span style='color: #00BBBB;'>--------------------------------------------------------</span> ## <span style='font-weight: bold;'>Delimiter:</span> "," ## <span style='color: #00BB00;'>dbl</span> (7): Y, X1, X2, X3, X4, X5, X6 ## ## <span style='color: #00BBBB;'>i</span> Use `spec()` to retrieve the full column specification for this data. ## <span style='color: #00BBBB;'>i</span> Specify the column types or set `show_col_types = FALSE` to quiet this message. </CODE></PRE> ```r t7 ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1,024 x 7</span> ## Y X1 X2 X3 X4 X5 X6 ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> -<span style='color: #BB0000;'>4</span><span style='color: #BB0000; text-decoration: underline;'>078</span><span style='color: #BB0000;'>113.</span> 0.072<span style='text-decoration: underline;'>7</span> 2.91 1.35 0.224 357. 1 ## <span style='color: #555555;'> 2</span> -<span style='color: #BB0000;'>3</span><span style='color: #BB0000; text-decoration: underline;'>100</span><span style='color: #BB0000;'>124.</span> 0.181 8.08 2.57 0.848 311. 1 ## <span style='color: #555555;'> 3</span> -<span style='color: #BB0000;'>2</span><span style='color: #BB0000; text-decoration: underline;'>282</span><span style='color: #BB0000;'>073.</span> 0.659 115. 77.4 1.06 269. 0 ## <span style='color: #555555;'> 4</span> -<span style='color: #BB0000;'>1</span><span style='color: #BB0000; text-decoration: underline;'>993</span><span style='color: #BB0000;'>606.</span> 0.807 47.4 42.3 1.35 251. 0 ## <span style='color: #555555;'> 5</span> -<span style='color: #BB0000;'>1</span><span style='color: #BB0000; text-decoration: underline;'>569</span><span style='color: #BB0000;'>783.</span> 1.24 0.785 -<span style='color: #BB0000;'>1.61</span> 1.36 222. 1 ## <span style='color: #555555;'> 6</span> -<span style='color: #BB0000;'>1</span><span style='color: #BB0000; text-decoration: underline;'>467</span><span style='color: #BB0000;'>620.</span> 2.46 103. 258. 1.55 216. 1 ## <span style='color: #555555;'> 7</span> -<span style='color: #BB0000;'>1</span><span style='color: #BB0000; text-decoration: underline;'>248</span><span style='color: #BB0000;'>135.</span> 2.50 22.4 67.7 1.59 198. 1 ## <span style='color: #555555;'> 8</span> -<span style='color: #BB0000;'>1</span><span style='color: #BB0000; text-decoration: underline;'>223</span><span style='color: #BB0000;'>299.</span> 2.56 104. 262. 1.60 198. 0 ## <span style='color: #555555;'> 9</span> -<span style='color: #BB0000;'>1</span><span style='color: #BB0000; text-decoration: underline;'>221</span><span style='color: #BB0000;'>486.</span> 2.74 89.7 254. 1.95 197. 1 ## <span style='color: #555555;'>10</span> -<span style='color: #BB0000;'>1</span><span style='color: #BB0000; text-decoration: underline;'>178</span><span style='color: #BB0000;'>091.</span> 3.29 81.8 269. 2.30 194. 0 ## <span style='color: #555555;'># ... with 1,014 more rows</span> </CODE></PRE> --- # Make a scatterplot matrix ```r ggpairs(t7) ``` <img src="index_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- # Fit a linear model using all main effects ```r main_model <- lm(Y ~ ., data=t7) summary(main_model) ``` ``` ## ## Call: ## lm(formula = Y ~ ., data = t7) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1665731 -72925 22740 95651 308517 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.273e+05 3.501e+04 9.350 < 2e-16 *** ## X1 3.291e+04 3.657e+03 8.999 < 2e-16 *** ## X2 -2.506e+02 3.055e+02 -0.821 0.41212 ## X3 6.365e+00 2.014e+00 3.160 0.00162 ** ## X4 -3.312e+04 3.666e+03 -9.033 < 2e-16 *** ## X5 -7.394e+03 1.300e+02 -56.895 < 2e-16 *** ## X6 -1.696e+05 1.148e+04 -14.771 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 179700 on 1017 degrees of freedom ## Multiple R-squared: 0.8604, Adjusted R-squared: 0.8596 ## F-statistic: 1045 on 6 and 1017 DF, p-value: < 2.2e-16 ``` --- # Report the AIC of the fitted model ```r AIC(main_model) ``` ``` ## [1] 27693.49 ``` AIC is also reported by `broom::glance.lm()`: ```r glance(main_model) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 12</span> ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><int></span> <span style='color: #555555; font-style: italic;'><int></span> ## <span style='color: #555555;'>1</span> 0.860 0.860 <span style='text-decoration: underline;'>179</span>672. <span style='text-decoration: underline;'>1</span>045. 0 6 -<span style='color: #BB0000; text-decoration: underline;'>13</span><span style='color: #BB0000;'>839.</span> <span style='text-decoration: underline;'>27</span>693. <span style='text-decoration: underline;'>27</span>733. 3.28<span style='color: #555555;'>e</span>13 <span style='text-decoration: underline;'>1</span>017 <span style='text-decoration: underline;'>1</span>024 </CODE></PRE> --- # Check model assumptions via diagnostic plots ```r par(mfrow = c(2, 2)) # Create 2x2 grid plot(main_model, which=c(1, 2, 3, 5)) ``` <img src="index_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Check model assumptions via diagnostic plots `performance::check_model()` makes it easier to read and interpret the diagnostic plots. - Plots are created using [ggplot2](https://ggplot2.tidyverse.org/) - Plots are labelled to tell you what to look for ```r check_model(main_model, check=c("vif", "qq", "normality", "linearity", "homogeneity", "outliers")) ``` --- <img src="index_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> --- # Adding residuals to the scatterplot matrix ```r main_model %>% augment() %>% select(Y, contains("X"), .resid) %>% ggpairs() ``` <img src="index_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> --- # Consider adding interaction effects We wish to consider interactions between all the main effects. For example, consider the interaction between variables `X1` and `X2`. ```r model_X1X2 <- lm(Y ~ . + X1:X2, data=t7) anova(main_model, model_X1X2) ``` ``` ## Analysis of Variance Table ## ## Model 1: Y ~ X1 + X2 + X3 + X4 + X5 + X6 ## Model 2: Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X1:X2 ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 1017 3.2831e+13 ## 2 1016 3.2800e+13 1 3.0461e+10 0.9435 0.3316 ``` This interaction is not significant so we will not add it to our model. It remains to check all other two-way interactions. We do not want to build all these models manually and pass each one into `anova()` to compare with our main model. --- # Consider adding interaction effects .pull-left[ Instead, we will use the `add1()` function which adds single variables to the main model and gives the result of the usual `\(F\)`-test from comparing two models with `anova()`. ```r add1( main_model, ~ .^2, test = "F" ) ``` ] .pull-right[ ``` ## Single term additions ## ## Model: ## Y ~ X1 + X2 + X3 + X4 + X5 + X6 ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 3.2831e+13 24786 ## X1:X2 1 3.0461e+10 3.2800e+13 24787 0.9435 0.33160 ## X1:X3 1 7.8963e+10 3.2752e+13 24785 2.4495 0.11787 ## X1:X4 1 2.7107e+12 3.0120e+13 24699 91.4374 < 2.2e-16 *** ## X1:X5 1 4.7530e+12 2.8078e+13 24627 171.9883 < 2.2e-16 *** ## X1:X6 1 4.0214e+12 2.8809e+13 24654 141.8184 < 2.2e-16 *** ## X2:X3 1 1.0099e+12 3.1821e+13 24756 32.2458 1.773e-08 *** ## X2:X4 1 5.3209e+10 3.2778e+13 24786 1.6493 0.19935 ## X2:X5 1 1.2772e+12 3.1554e+13 24747 41.1238 2.185e-10 *** ## X2:X6 1 2.6245e+12 3.0206e+13 24702 88.2748 < 2.2e-16 *** ## X3:X4 1 9.8325e+10 3.2733e+13 24784 3.0520 0.08094 . ## X3:X5 1 7.3403e+12 2.5491e+13 24528 292.5696 < 2.2e-16 *** ## X3:X6 1 4.6462e+12 2.8185e+13 24631 167.4842 < 2.2e-16 *** ## X4:X5 1 4.8446e+12 2.7986e+13 24624 175.8765 < 2.2e-16 *** ## X4:X6 1 4.0482e+12 2.8783e+13 24653 142.8979 < 2.2e-16 *** ## X5:X6 1 4.7672e+12 2.8064e+13 24627 172.5890 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- count: false # Consider adding interaction effects .pull-left[ Using a significance level of `\(\alpha = 0.05\)`, the interactions that are not significant are - `X1` with `X2` - `X1` with `X3` - `X2` with `X4` - `X3` with `X4` For this example, we will be adding all significant interactions into our main model. ] .pull-right[ ``` ## Single term additions ## ## Model: ## Y ~ X1 + X2 + X3 + X4 + X5 + X6 ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 3.2831e+13 24786 ## X1:X2 1 3.0461e+10 3.2800e+13 24787 0.9435 0.33160 ## X1:X3 1 7.8963e+10 3.2752e+13 24785 2.4495 0.11787 ## X1:X4 1 2.7107e+12 3.0120e+13 24699 91.4374 < 2.2e-16 *** ## X1:X5 1 4.7530e+12 2.8078e+13 24627 171.9883 < 2.2e-16 *** ## X1:X6 1 4.0214e+12 2.8809e+13 24654 141.8184 < 2.2e-16 *** ## X2:X3 1 1.0099e+12 3.1821e+13 24756 32.2458 1.773e-08 *** ## X2:X4 1 5.3209e+10 3.2778e+13 24786 1.6493 0.19935 ## X2:X5 1 1.2772e+12 3.1554e+13 24747 41.1238 2.185e-10 *** ## X2:X6 1 2.6245e+12 3.0206e+13 24702 88.2748 < 2.2e-16 *** ## X3:X4 1 9.8325e+10 3.2733e+13 24784 3.0520 0.08094 . ## X3:X5 1 7.3403e+12 2.5491e+13 24528 292.5696 < 2.2e-16 *** ## X3:X6 1 4.6462e+12 2.8185e+13 24631 167.4842 < 2.2e-16 *** ## X4:X5 1 4.8446e+12 2.7986e+13 24624 175.8765 < 2.2e-16 *** ## X4:X6 1 4.0482e+12 2.8783e+13 24653 142.8979 < 2.2e-16 *** ## X5:X6 1 4.7672e+12 2.8064e+13 24627 172.5890 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- # Consider quadratic effects .pull-left[ Let us also consider quadratic effects, obtained by squaring the main predictors (can also be thought of as an interaction with oneself). We will not consider the square of `X6` since `X6` is a binary predictor (0-1) and squaring it would result in a variable that is identical. ```r add1( main_model, ~ . + I(X1^2) + I(X2^2) + I(X3^2) + I(X4^2) + I(X5^2), test = "F" ) ``` We also need to wrap the squaring by `I()` to mean literal exponentiation as `^` has a different meaning when used in a formula. See `?formula` for details. ] .pull-right[ ``` ## Single term additions ## ## Model: ## Y ~ X1 + X2 + X3 + X4 + X5 + X6 ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 3.2831e+13 24786 ## I(X1^2) 1 2.9689e+12 2.9862e+13 24690 101.012 < 2.2e-16 *** ## I(X2^2) 1 6.6248e+11 3.2168e+13 24767 20.924 5.369e-06 *** ## I(X3^2) 1 1.2554e+12 3.1575e+13 24748 40.395 3.128e-10 *** ## I(X4^2) 1 2.4475e+12 3.0383e+13 24708 81.844 < 2.2e-16 *** ## I(X5^2) 1 2.9071e+13 3.7596e+12 22568 7856.178 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- count: false # Consider quadratic effects .pull-left[ Using a significance level of `\(\alpha = 0.05\)`, all quadratic terms are significant. For this example, we will be adding all quadratic terms into our main model. ] .pull-right[ ``` ## Single term additions ## ## Model: ## Y ~ X1 + X2 + X3 + X4 + X5 + X6 ## Df Sum of Sq RSS AIC F value Pr(>F) ## <none> 3.2831e+13 24786 ## I(X1^2) 1 2.9689e+12 2.9862e+13 24690 101.012 < 2.2e-16 *** ## I(X2^2) 1 6.6248e+11 3.2168e+13 24767 20.924 5.369e-06 *** ## I(X3^2) 1 1.2554e+12 3.1575e+13 24748 40.395 3.128e-10 *** ## I(X4^2) 1 2.4475e+12 3.0383e+13 24708 81.844 < 2.2e-16 *** ## I(X5^2) 1 2.9071e+13 3.7596e+12 22568 7856.178 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- # Fit the new model .pull-left[ ```r new_formula <- Y ~ .^2 - X1:X2 - X1:X3 - X2:X4 - X3:X4 + I(X1^2) + I(X2^2) + I(X3^2) + I(X4^2) + I(X5^2) new_model <- lm(new_formula, data=t7) tidy(new_model) %>% print(n=23) ``` Let's use `tidy()` to print just the coefficient table. ] .pull-right[ <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 23 x 5</span> ## term estimate std.error statistic p.value ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> (Intercept) 177. 64.5 2.75 6.15<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 3</span> ## <span style='color: #555555;'> 2</span> X1 128. 4.36 29.3 1.25<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-136</span> ## <span style='color: #555555;'> 3</span> X2 32.8 0.907 36.2 8.09<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-184</span> ## <span style='color: #555555;'> 4</span> X3 7.00 0.007<span style='text-decoration: underline;'>77</span> 901. 0 <span style='color: #555555;'> </span> ## <span style='color: #555555;'> 5</span> X4 -<span style='color: #BB0000;'>49.6</span> 4.28 -<span style='color: #BB0000;'>11.6</span> 2.61<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 29</span> ## <span style='color: #555555;'> 6</span> X5 2.69 0.479 5.62 2.54<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 8</span> ## <span style='color: #555555;'> 7</span> X6 8.49 31.3 0.271 7.86<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'> 8</span> I(X1^2) 0.744 0.802 0.927 3.54<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'> 9</span> I(X2^2) -<span style='color: #BB0000;'>0.003</span><span style='color: #BB0000; text-decoration: underline;'>81</span> 0.005<span style='text-decoration: underline;'>09</span> -<span style='color: #BB0000;'>0.748</span> 4.55<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>10</span> I(X3^2) -<span style='color: #BB0000;'>0.000</span><span style='color: #BB0000; text-decoration: underline;'>000</span><span style='color: #BB0000;'>192</span> 0.000<span style='text-decoration: underline;'>000</span>166 -<span style='color: #BB0000;'>1.16</span> 2.46<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>11</span> I(X4^2) 0.282 0.804 0.351 7.26<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>12</span> I(X5^2) -<span style='color: #BB0000;'>32.0</span> 0.000<span style='text-decoration: underline;'>936</span> -<span style='color: #BB0000; text-decoration: underline;'>34</span><span style='color: #BB0000;'>168.</span> 0 <span style='color: #555555;'> </span> ## <span style='color: #555555;'>13</span> X1:X4 -<span style='color: #BB0000;'>1.02</span> 1.60 -<span style='color: #BB0000;'>0.636</span> 5.25<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>14</span> X1:X5 0.076<span style='text-decoration: underline;'>7</span> 0.047<span style='text-decoration: underline;'>1</span> 1.63 1.04<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>15</span> X1:X6 1.56 3.40 0.458 6.47<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>16</span> X2:X3 0.000<span style='text-decoration: underline;'>059</span>8 0.000<span style='text-decoration: underline;'>059</span>7 1.00 3.17<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>17</span> X2:X5 1.01 0.002<span style='text-decoration: underline;'>80</span> 359. 0 <span style='color: #555555;'> </span> ## <span style='color: #555555;'>18</span> X2:X6 -<span style='color: #BB0000;'>0.040</span><span style='color: #BB0000; text-decoration: underline;'>6</span> 0.275 -<span style='color: #BB0000;'>0.147</span> 8.83<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>19</span> X3:X5 -<span style='color: #BB0000;'>0.000</span><span style='color: #BB0000; text-decoration: underline;'>024</span><span style='color: #BB0000;'>0</span> 0.000<span style='text-decoration: underline;'>022</span>4 -<span style='color: #BB0000;'>1.07</span> 2.84<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>20</span> X3:X6 -<span style='color: #BB0000;'>8.00</span> 0.001<span style='text-decoration: underline;'>80</span> -<span style='color: #BB0000; text-decoration: underline;'>4</span><span style='color: #BB0000;'>436.</span> 0 <span style='color: #555555;'> </span> ## <span style='color: #555555;'>21</span> X4:X5 -<span style='color: #BB0000;'>0.076</span><span style='color: #BB0000; text-decoration: underline;'>0</span> 0.047<span style='text-decoration: underline;'>2</span> -<span style='color: #BB0000;'>1.61</span> 1.08<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>22</span> X4:X6 -<span style='color: #BB0000;'>1.40</span> 3.41 -<span style='color: #BB0000;'>0.411</span> 6.81<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> ## <span style='color: #555555;'>23</span> X5:X6 -<span style='color: #BB0000;'>0.042</span><span style='color: #BB0000; text-decoration: underline;'>1</span> 0.119 -<span style='color: #BB0000;'>0.355</span> 7.23<span style='color: #555555;'>e</span><span style='color: #BB0000;'>- 1</span> </CODE></PRE> ] --- # Produce 95% confidence intervals for the betas .pull-left[ We can obtain confidence intervals for the betas using `confint()` or as a tibble using `tidy()` and setting `conf.int=TRUE` ```r # confint(new_model) tidy(new_model, conf.int=TRUE) %>% select(term, contains("conf")) %>% print(n=23) ``` ] .pull-right[ <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 23 x 3</span> ## term conf.low conf.high ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> (Intercept) 50.5 303. ## <span style='color: #555555;'> 2</span> X1 119. 136. ## <span style='color: #555555;'> 3</span> X2 31.0 34.6 ## <span style='color: #555555;'> 4</span> X3 6.99 7.02 ## <span style='color: #555555;'> 5</span> X4 -<span style='color: #BB0000;'>58.0</span> -<span style='color: #BB0000;'>41.2</span> ## <span style='color: #555555;'> 6</span> X5 1.75 3.63 ## <span style='color: #555555;'> 7</span> X6 -<span style='color: #BB0000;'>52.9</span> 69.9 ## <span style='color: #555555;'> 8</span> I(X1^2) -<span style='color: #BB0000;'>0.830</span> 2.32 ## <span style='color: #555555;'> 9</span> I(X2^2) -<span style='color: #BB0000;'>0.013</span><span style='color: #BB0000; text-decoration: underline;'>8</span> 0.006<span style='text-decoration: underline;'>19</span> ## <span style='color: #555555;'>10</span> I(X3^2) -<span style='color: #BB0000;'>0.000</span><span style='color: #BB0000; text-decoration: underline;'>000</span><span style='color: #BB0000;'>518</span> 0.000<span style='text-decoration: underline;'>000</span>133 ## <span style='color: #555555;'>11</span> I(X4^2) -<span style='color: #BB0000;'>1.29</span> 1.86 ## <span style='color: #555555;'>12</span> I(X5^2) -<span style='color: #BB0000;'>32.0</span> -<span style='color: #BB0000;'>32.0</span> ## <span style='color: #555555;'>13</span> X1:X4 -<span style='color: #BB0000;'>4.17</span> 2.13 ## <span style='color: #555555;'>14</span> X1:X5 -<span style='color: #BB0000;'>0.015</span><span style='color: #BB0000; text-decoration: underline;'>8</span> 0.169 ## <span style='color: #555555;'>15</span> X1:X6 -<span style='color: #BB0000;'>5.11</span> 8.23 ## <span style='color: #555555;'>16</span> X2:X3 -<span style='color: #BB0000;'>0.000</span><span style='color: #BB0000; text-decoration: underline;'>057</span><span style='color: #BB0000;'>4</span> 0.000<span style='text-decoration: underline;'>177</span> ## <span style='color: #555555;'>17</span> X2:X5 1.00 1.01 ## <span style='color: #555555;'>18</span> X2:X6 -<span style='color: #BB0000;'>0.581</span> 0.500 ## <span style='color: #555555;'>19</span> X3:X5 -<span style='color: #BB0000;'>0.000</span><span style='color: #BB0000; text-decoration: underline;'>067</span><span style='color: #BB0000;'>9</span> 0.000<span style='text-decoration: underline;'>019</span>9 ## <span style='color: #555555;'>20</span> X3:X6 -<span style='color: #BB0000;'>8.00</span> -<span style='color: #BB0000;'>8.00</span> ## <span style='color: #555555;'>21</span> X4:X5 -<span style='color: #BB0000;'>0.169</span> 0.016<span style='text-decoration: underline;'>6</span> ## <span style='color: #555555;'>22</span> X4:X6 -<span style='color: #BB0000;'>8.09</span> 5.29 ## <span style='color: #555555;'>23</span> X5:X6 -<span style='color: #BB0000;'>0.275</span> 0.191 </CODE></PRE> ] --- # Creating the variables beforehand An advantage of creating the variables beforehand is that it simplifies the formula specification and your model output may look a bit cleaner. Consider the simple example: ```r demo_data1 <- t7 %>% select(Y, X3) %>% mutate(X3sq = X3^2) demo_model1 <- lm(Y ~ ., data=demo_data1) summary(demo_model1) ``` ``` ## ## Call: ## lm(formula = Y ~ ., data = demo_data1) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3736905 -51045 85325 169189 971743 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -3.413e+05 2.515e+04 -13.57 <2e-16 *** ## X3 3.388e+01 2.506e+00 13.52 <2e-16 *** ## X3sq -9.323e-04 4.874e-05 -19.13 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 381900 on 1021 degrees of freedom ## Multiple R-squared: 0.367, Adjusted R-squared: 0.3658 ## F-statistic: 296 on 2 and 1021 DF, p-value: < 2.2e-16 ``` --- # Creating the variables beforehand The disadvantage occurs when making predictions. When new variables are created beforehand, their relationships are not retained. To make a prediction with the model where we squared `X3` beforehand, we need to supply a value for `X3` and `X3sq`. ```r predict( demo_model1, newdata = data.frame(X3=100, X3sq=100^2) ) ``` ``` ## 1 ## -337875.1 ``` Note that we could have used `broom::augment()` to obtain our predictions if we wanted our output to be a tibble. --- # Creating the variables beforehand However, if we create the variables in the formula specification, the relationships between the variables are retained. When predicting, we only need to supply a value for `X3`. The value of `X3^2` is automatically calculated by squaring the value of `X3`. ```r demo_model2 <- lm(Y ~ X3 + I(X3^2), data=demo_data1) predict( demo_model2, newdata = data.frame(X3=100) ) ``` ``` ## 1 ## -337875.1 ``` Note that we could have used `broom::augment()` to obtain our predictions if we wanted our output to be a tibble. --- # Creating training/testing splits Oftentimes, we may want to train our model on a portion of the available data, and evaluate its performance on the remaining unseen data. This can be accomplished in a tidy manner using the [rsample](https://rsample.tidymodels.org/) and [yardstick](https://yardstick.tidymodels.org/) packages from the [tidymodels](https://www.tidymodels.org/packages/) collection. An example workflow: ```r set.seed(100) # For reproducibility # Create an 80/20 split t7_split <- t7 %>% initial_split(prop=0.8) t7_train <- t7_split %>% training() t7_test <- t7_split %>% testing() main_model <- lm(Y ~ ., data=t7_train) main_model %>% augment(newdata = t7_test) %>% metrics(truth=Y, estimate=.fitted) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 3 x 3</span> ## .metric .estimator .estimate ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> rmse standard <span style='text-decoration: underline;'>197</span>221. ## <span style='color: #555555;'>2</span> rsq standard 0.860 ## <span style='color: #555555;'>3</span> mae standard <span style='text-decoration: underline;'>126</span>751. </CODE></PRE>