class: language-r layout: true --- # Lab 3 — October 4 ```r library(tidyverse) ``` <PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Attaching packages</span> --------------------------------------- tidyverse 1.3.1 -- </CODE></PRE><PRE class="fansi fansi-message"><CODE>## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>ggplot2</span> 3.3.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>purrr </span> 0.3.4 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tibble </span> 3.1.6 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>dplyr </span> 1.0.9 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>tidyr </span> 1.2.0 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>stringr</span> 1.4.0 ## <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>readr </span> 2.1.2 <span style='color: #00BB00;'>v</span> <span style='color: #0000BB;'>forcats</span> 0.5.1 </CODE></PRE><PRE class="fansi fansi-message"><CODE>## -- <span style='font-weight: bold;'>Conflicts</span> ------------------------------------------ tidyverse_conflicts() -- ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter() ## <span style='color: #BB0000;'>x</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag() </CODE></PRE> ```r library(lubridate) ``` ``` ## ## Attaching package: 'lubridate' ``` ``` ## The following objects are masked from 'package:base': ## ## date, intersect, setdiff, union ``` --- # Soft-drink production data - From 1950 to 1977, soft-drink production was reported on a quarterly basis - From 1976 to 1995, soft-drink production was reported on a monthly basis - We want to expand the quarterly production data by converting our monthly data into quarterly data, and appending it to the end of the original quarterly data - The final result should be a data set that ranges from 1950 to 1995 with production values being reported each quarter --- # Obtain quarterly soft-drinks data set ```r if (!dir.exists("./data")) { dir.create("./data") } download.file("https://www150.statcan.gc.ca/n1/tbl/csv/16100100-eng.zip", destfile="./data/qdrinks.zip") unzip("./data/qdrinks.zip", exdir="./data") ``` ```r file.rename("./data/16100100.csv", "./data/qdrinks.csv") ``` ```r qdrinks <- read_csv("./data/qdrinks.csv") ``` <PRE class="fansi fansi-message"><CODE>## <span style='font-weight: bold;'>Rows: </span><span style='color: #0000BB;'>112</span> <span style='font-weight: bold;'>Columns: </span><span style='color: #0000BB;'>15</span> ## <span style='color: #00BBBB;'>--</span> <span style='font-weight: bold;'>Column specification</span> <span style='color: #00BBBB;'>--------------------------------------------------------</span> ## <span style='font-weight: bold;'>Delimiter:</span> "," ## <span style='color: #BB0000;'>chr</span> (6): REF_DATE, GEO, Standard Classification of Goods (SCG), UOM, SCALAR_... ## <span style='color: #00BB00;'>dbl</span> (5): UOM_ID, SCALAR_ID, COORDINATE, VALUE, DECIMALS ## <span style='color: #BBBB00;'>lgl</span> (4): DGUID, STATUS, SYMBOL, TERMINATED ## ## <span style='color: #00BBBB;'>i</span> Use `spec()` to retrieve the full column specification for this data. ## <span style='color: #00BBBB;'>i</span> Specify the column types or set `show_col_types = FALSE` to quiet this message. </CODE></PRE> --- # Inspect quarterly drinks data ```r qdrinks %>% select(REF_DATE, VALUE) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 112 x 2</span> ## REF_DATE VALUE ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 1950-01 <span style='text-decoration: underline;'>19</span>349 ## <span style='color: #555555;'> 2</span> 1950-04 <span style='text-decoration: underline;'>29</span>730 ## <span style='color: #555555;'> 3</span> 1950-07 <span style='text-decoration: underline;'>31</span>721 ## <span style='color: #555555;'> 4</span> 1950-10 <span style='text-decoration: underline;'>20</span>045 ## <span style='color: #555555;'> 5</span> 1951-01 <span style='text-decoration: underline;'>17</span>398 ## <span style='color: #555555;'> 6</span> 1951-04 <span style='text-decoration: underline;'>25</span>893 ## <span style='color: #555555;'> 7</span> 1951-07 <span style='text-decoration: underline;'>28</span>477 ## <span style='color: #555555;'> 8</span> 1951-10 <span style='text-decoration: underline;'>19</span>923 ## <span style='color: #555555;'> 9</span> 1952-01 <span style='text-decoration: underline;'>19</span>248 ## <span style='color: #555555;'>10</span> 1952-04 <span style='text-decoration: underline;'>26</span>232 ## <span style='color: #555555;'># ... with 102 more rows</span> </CODE></PRE> - The `REF_DATE` column is in the format yyyy-mm - We want to convert it to a full date format, yyyy-mm-dd - The lubridate package contains functions to help us parse text into dates and datetimes - Since our values are in the format yyyy-mm, we use the `ym()` function --- # Modify quarterly drinks data ```r quarterly <- qdrinks %>% select(REF_DATE, VALUE) %>% rename(date = REF_DATE, quarterly_value = VALUE) %>% mutate(date = ym(date)) quarterly ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 112 x 2</span> ## date quarterly_value ## <span style='color: #555555; font-style: italic;'><date></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 1950-01-01 <span style='text-decoration: underline;'>19</span>349 ## <span style='color: #555555;'> 2</span> 1950-04-01 <span style='text-decoration: underline;'>29</span>730 ## <span style='color: #555555;'> 3</span> 1950-07-01 <span style='text-decoration: underline;'>31</span>721 ## <span style='color: #555555;'> 4</span> 1950-10-01 <span style='text-decoration: underline;'>20</span>045 ## <span style='color: #555555;'> 5</span> 1951-01-01 <span style='text-decoration: underline;'>17</span>398 ## <span style='color: #555555;'> 6</span> 1951-04-01 <span style='text-decoration: underline;'>25</span>893 ## <span style='color: #555555;'> 7</span> 1951-07-01 <span style='text-decoration: underline;'>28</span>477 ## <span style='color: #555555;'> 8</span> 1951-10-01 <span style='text-decoration: underline;'>19</span>923 ## <span style='color: #555555;'> 9</span> 1952-01-01 <span style='text-decoration: underline;'>19</span>248 ## <span style='color: #555555;'>10</span> 1952-04-01 <span style='text-decoration: underline;'>26</span>232 ## <span style='color: #555555;'># ... with 102 more rows</span> </CODE></PRE> --- # Obtain monthly soft-drinks data set ```r download.file("https://www150.statcan.gc.ca/n1/tbl/csv/16100099-eng.zip", destfile="./data/mdrinks.zip") unzip("./data/mdrinks.zip", exdir="./data") ``` ```r file.rename("./data/16100099.csv", "./data/mdrinks.csv") ``` ```r mdrinks <- read_csv("./data/mdrinks.csv") ``` <PRE class="fansi fansi-message"><CODE>## <span style='font-weight: bold;'>Rows: </span><span style='color: #0000BB;'>240</span> <span style='font-weight: bold;'>Columns: </span><span style='color: #0000BB;'>15</span> ## <span style='color: #00BBBB;'>--</span> <span style='font-weight: bold;'>Column specification</span> <span style='color: #00BBBB;'>--------------------------------------------------------</span> ## <span style='font-weight: bold;'>Delimiter:</span> "," ## <span style='color: #BB0000;'>chr</span> (6): REF_DATE, GEO, Standard Classification of Goods (SCG), UOM, SCALAR_... ## <span style='color: #00BB00;'>dbl</span> (5): UOM_ID, SCALAR_ID, COORDINATE, VALUE, DECIMALS ## <span style='color: #BBBB00;'>lgl</span> (4): DGUID, STATUS, SYMBOL, TERMINATED ## ## <span style='color: #00BBBB;'>i</span> Use `spec()` to retrieve the full column specification for this data. ## <span style='color: #00BBBB;'>i</span> Specify the column types or set `show_col_types = FALSE` to quiet this message. </CODE></PRE> --- # Inspect monthly drinks data ```r mdrinks %>% select(REF_DATE, VALUE) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 240 x 2</span> ## REF_DATE VALUE ## <span style='color: #555555; font-style: italic;'><chr></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 1976-01 <span style='text-decoration: underline;'>20</span>680 ## <span style='color: #555555;'> 2</span> 1976-02 <span style='text-decoration: underline;'>23</span>392 ## <span style='color: #555555;'> 3</span> 1976-03 <span style='text-decoration: underline;'>21</span>553 ## <span style='color: #555555;'> 4</span> 1976-04 <span style='text-decoration: underline;'>24</span>304 ## <span style='color: #555555;'> 5</span> 1976-05 <span style='text-decoration: underline;'>27</span>791 ## <span style='color: #555555;'> 6</span> 1976-06 <span style='text-decoration: underline;'>32</span>838 ## <span style='color: #555555;'> 7</span> 1976-07 <span style='text-decoration: underline;'>32</span>475 ## <span style='color: #555555;'> 8</span> 1976-08 <span style='text-decoration: underline;'>32</span>503 ## <span style='color: #555555;'> 9</span> 1976-09 <span style='text-decoration: underline;'>28</span>990 ## <span style='color: #555555;'>10</span> 1976-10 <span style='text-decoration: underline;'>23</span>988 ## <span style='color: #555555;'># ... with 230 more rows</span> </CODE></PRE> - The `REF_DATE` column is in the format yyyy-mm - We want to convert it to a full date format, yyyy-mm-dd - The lubridate package contains functions to help us parse text into dates and datetimes - Since our values are in the format yyyy-mm, we use the `ym()` function --- # Modify monthly drinks data ```r monthly <- mdrinks %>% select(REF_DATE, VALUE) %>% rename(date = REF_DATE, monthly_value = VALUE) %>% mutate(date = ym(date)) monthly ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 240 x 2</span> ## date monthly_value ## <span style='color: #555555; font-style: italic;'><date></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 1976-01-01 <span style='text-decoration: underline;'>20</span>680 ## <span style='color: #555555;'> 2</span> 1976-02-01 <span style='text-decoration: underline;'>23</span>392 ## <span style='color: #555555;'> 3</span> 1976-03-01 <span style='text-decoration: underline;'>21</span>553 ## <span style='color: #555555;'> 4</span> 1976-04-01 <span style='text-decoration: underline;'>24</span>304 ## <span style='color: #555555;'> 5</span> 1976-05-01 <span style='text-decoration: underline;'>27</span>791 ## <span style='color: #555555;'> 6</span> 1976-06-01 <span style='text-decoration: underline;'>32</span>838 ## <span style='color: #555555;'> 7</span> 1976-07-01 <span style='text-decoration: underline;'>32</span>475 ## <span style='color: #555555;'> 8</span> 1976-08-01 <span style='text-decoration: underline;'>32</span>503 ## <span style='color: #555555;'> 9</span> 1976-09-01 <span style='text-decoration: underline;'>28</span>990 ## <span style='color: #555555;'>10</span> 1976-10-01 <span style='text-decoration: underline;'>23</span>988 ## <span style='color: #555555;'># ... with 230 more rows</span> </CODE></PRE> --- # Convert monthly data to quarterly data Starting with the `monthly` data set, create a new variable called `qdate` which classifies the `date` by its corresponding quarter. ```r monthly %>% mutate(qdate = quarter(date, with_year = TRUE)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 240 x 3</span> ## date monthly_value qdate ## <span style='color: #555555; font-style: italic;'><date></span> <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 1976-01-01 <span style='text-decoration: underline;'>20</span>680 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 2</span> 1976-02-01 <span style='text-decoration: underline;'>23</span>392 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 3</span> 1976-03-01 <span style='text-decoration: underline;'>21</span>553 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 4</span> 1976-04-01 <span style='text-decoration: underline;'>24</span>304 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 5</span> 1976-05-01 <span style='text-decoration: underline;'>27</span>791 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 6</span> 1976-06-01 <span style='text-decoration: underline;'>32</span>838 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 7</span> 1976-07-01 <span style='text-decoration: underline;'>32</span>475 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 8</span> 1976-08-01 <span style='text-decoration: underline;'>32</span>503 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'> 9</span> 1976-09-01 <span style='text-decoration: underline;'>28</span>990 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'>10</span> 1976-10-01 <span style='text-decoration: underline;'>23</span>988 <span style='text-decoration: underline;'>1</span>976. ## <span style='color: #555555;'># ... with 230 more rows</span> </CODE></PRE> --- count: false # Convert monthly data to quarterly data Group the rows by their resulting quarters and sum the monthly production values by group, resulting in quarterly production values. ```r monthly %>% mutate(qdate = quarter(date, with_year = TRUE)) %>% group_by(qdate) %>% summarise(quarterly_value = sum(monthly_value)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 80 x 2</span> ## qdate quarterly_value ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>65</span>625 ## <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>84</span>933 ## <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>93</span>968 ## <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>78</span>708 ## <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>73</span>865 ## <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>92</span>938 ## <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>91</span>812 ## <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>76</span>821 ## <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>1</span>978. <span style='text-decoration: underline;'>71</span>130 ## <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>1</span>978. <span style='text-decoration: underline;'>92</span>965 ## <span style='color: #555555;'># ... with 70 more rows</span> </CODE></PRE> --- count: false # Convert monthly data to quarterly data ```r monthly %>% mutate(qdate = quarter(date, with_year = TRUE)) %>% group_by(qdate) %>% summarise(quarterly_value = sum(monthly_value)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 80 x 2</span> ## qdate quarterly_value ## <span style='color: #555555; font-style: italic;'><dbl></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>65</span>625 ## <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>84</span>933 ## <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>93</span>968 ## <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>1</span>976. <span style='text-decoration: underline;'>78</span>708 ## <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>73</span>865 ## <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>92</span>938 ## <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>91</span>812 ## <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>1</span>977. <span style='text-decoration: underline;'>76</span>821 ## <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>1</span>978. <span style='text-decoration: underline;'>71</span>130 ## <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>1</span>978. <span style='text-decoration: underline;'>92</span>965 ## <span style='color: #555555;'># ... with 70 more rows</span> </CODE></PRE> We now have one row for each quarter for each year. Note that after summarising, the last grouping is automatically dropped. Since we only grouped on the values of one variable, the resulting data is fully ungrouped. Therefore, we do not need to call `ungroup()`. --- count: false # Convert monthly data to quarterly data Finally, we convert our quarterly dates (yyyy.q) back to full dates (yyyy-mm-dd). ```r monthly_to_quarterly <- monthly %>% mutate(qdate = quarter(date, with_year = TRUE)) %>% group_by(qdate) %>% summarise(quarterly_value = sum(monthly_value)) %>% mutate(date = yq(qdate)) %>% select(date, quarterly_value) monthly_to_quarterly ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 80 x 2</span> ## date quarterly_value ## <span style='color: #555555; font-style: italic;'><date></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 1976-01-01 <span style='text-decoration: underline;'>65</span>625 ## <span style='color: #555555;'> 2</span> 1976-04-01 <span style='text-decoration: underline;'>84</span>933 ## <span style='color: #555555;'> 3</span> 1976-07-01 <span style='text-decoration: underline;'>93</span>968 ## <span style='color: #555555;'> 4</span> 1976-10-01 <span style='text-decoration: underline;'>78</span>708 ## <span style='color: #555555;'> 5</span> 1977-01-01 <span style='text-decoration: underline;'>73</span>865 ## <span style='color: #555555;'> 6</span> 1977-04-01 <span style='text-decoration: underline;'>92</span>938 ## <span style='color: #555555;'> 7</span> 1977-07-01 <span style='text-decoration: underline;'>91</span>812 ## <span style='color: #555555;'> 8</span> 1977-10-01 <span style='text-decoration: underline;'>76</span>821 ## <span style='color: #555555;'> 9</span> 1978-01-01 <span style='text-decoration: underline;'>71</span>130 ## <span style='color: #555555;'>10</span> 1978-04-01 <span style='text-decoration: underline;'>92</span>965 ## <span style='color: #555555;'># ... with 70 more rows</span> </CODE></PRE> --- # Append the converted data to the quarterly data Since we are appending our converted data to the end of the original quarterly data, we need to make sure that the dates don't overlap (otherwise some dates will have multiple production values). ```r quarterly_end_date <- quarterly %>% slice_max(date) %>% pull(date) monthly_to_quarterly <- monthly_to_quarterly %>% filter(date > quarterly_end_date) ``` ```r full_quarterly <- bind_rows(quarterly, monthly_to_quarterly) full_quarterly ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 184 x 2</span> ## date quarterly_value ## <span style='color: #555555; font-style: italic;'><date></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'> 1</span> 1950-01-01 <span style='text-decoration: underline;'>19</span>349 ## <span style='color: #555555;'> 2</span> 1950-04-01 <span style='text-decoration: underline;'>29</span>730 ## <span style='color: #555555;'> 3</span> 1950-07-01 <span style='text-decoration: underline;'>31</span>721 ## <span style='color: #555555;'> 4</span> 1950-10-01 <span style='text-decoration: underline;'>20</span>045 ## <span style='color: #555555;'> 5</span> 1951-01-01 <span style='text-decoration: underline;'>17</span>398 ## <span style='color: #555555;'> 6</span> 1951-04-01 <span style='text-decoration: underline;'>25</span>893 ## <span style='color: #555555;'> 7</span> 1951-07-01 <span style='text-decoration: underline;'>28</span>477 ## <span style='color: #555555;'> 8</span> 1951-10-01 <span style='text-decoration: underline;'>19</span>923 ## <span style='color: #555555;'> 9</span> 1952-01-01 <span style='text-decoration: underline;'>19</span>248 ## <span style='color: #555555;'>10</span> 1952-04-01 <span style='text-decoration: underline;'>26</span>232 ## <span style='color: #555555;'># ... with 174 more rows</span> </CODE></PRE> --- # Check our work Our data should range from 1950 to 1995 (46 years), with four quarters for each year. Therefore, we should have 46 * 4 = 184 rows. ```r full_quarterly %>% slice_min(date) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 2</span> ## date quarterly_value ## <span style='color: #555555; font-style: italic;'><date></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> 1950-01-01 <span style='text-decoration: underline;'>19</span>349 </CODE></PRE> ```r full_quarterly %>% slice_max(date) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #555555;'># A tibble: 1 x 2</span> ## date quarterly_value ## <span style='color: #555555; font-style: italic;'><date></span> <span style='color: #555555; font-style: italic;'><dbl></span> ## <span style='color: #555555;'>1</span> 1995-10-01 <span style='text-decoration: underline;'>110</span>997 </CODE></PRE> ```r full_quarterly %>% distinct(date) %>% nrow() ``` ``` ## [1] 184 ``` --- # Fit a simple linear regression model We want to create a simple linear regression model with `quarterly_value` as the response and `date` as the predictor. ```r production <- lm(quarterly_value ~ date, data=full_quarterly) ``` --- # What are the estimated coefficients? We can get the coefficients by using either: ```r coef(production) ``` ``` ## (Intercept) date ## 68275.298497 7.541153 ``` ```r summary(production) ``` ``` ## ## Call: ## lm(formula = quarterly_value ~ date, data = full_quarterly) ## ## Residuals: ## Min 1Q Median 3Q Max ## -50506 -9466 -1279 8010 46612 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.828e+04 1.089e+03 62.72 <2e-16 *** ## date 7.541e+00 2.194e-01 34.38 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 14430 on 182 degrees of freedom ## Multiple R-squared: 0.8665, Adjusted R-squared: 0.8658 ## F-statistic: 1182 on 1 and 182 DF, p-value: < 2.2e-16 ``` --- count: false # What are the estimated coefficients? .pull-left[ ```r summary(production) ``` ``` ## ## Call: ## lm(formula = quarterly_value ~ date, data = full_quarterly) ## ## Residuals: ## Min 1Q Median 3Q Max ## -50506 -9466 -1279 8010 46612 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.828e+04 1.089e+03 62.72 <2e-16 *** ## date 7.541e+00 2.194e-01 34.38 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 14430 on 182 degrees of freedom ## Multiple R-squared: 0.8665, Adjusted R-squared: 0.8658 ## F-statistic: 1182 on 1 and 182 DF, p-value: < 2.2e-16 ``` ] .pull-right[ - `\(\widehat{\beta}_{0} = 68275\)` - `\(\widehat{\beta}_{1} = 7.541\)` The equation of the fitted line is: `$$\widehat{y}_{i} \,=\, 68275 \,+\, 7.541x_{i}$$` ] --- # What are the standard errors of our estimates? .pull-left[ ```r summary(production) ``` ``` ## ## Call: ## lm(formula = quarterly_value ~ date, data = full_quarterly) ## ## Residuals: ## Min 1Q Median 3Q Max ## -50506 -9466 -1279 8010 46612 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.828e+04 1.089e+03 62.72 <2e-16 *** ## date 7.541e+00 2.194e-01 34.38 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 14430 on 182 degrees of freedom ## Multiple R-squared: 0.8665, Adjusted R-squared: 0.8658 ## F-statistic: 1182 on 1 and 182 DF, p-value: < 2.2e-16 ``` ] .pull-right[ - The standard error of `\(\widehat{\beta}_{0}\)` is `\(1.089 \times 10^{3}\)` - The standard error of `\(\widehat{\beta}_{1}\)` is `\(0.219\)` ] --- # What are the degrees of freedom of our model? .pull-left[ ```r summary(production) ``` ``` ## ## Call: ## lm(formula = quarterly_value ~ date, data = full_quarterly) ## ## Residuals: ## Min 1Q Median 3Q Max ## -50506 -9466 -1279 8010 46612 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.828e+04 1.089e+03 62.72 <2e-16 *** ## date 7.541e+00 2.194e-01 34.38 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 14430 on 182 degrees of freedom ## Multiple R-squared: 0.8665, Adjusted R-squared: 0.8658 ## F-statistic: 1182 on 1 and 182 DF, p-value: < 2.2e-16 ``` ] .pull-right[ - The degrees of freedom of the error is 182 - This is obtained by taking the number of observations in our data, `\(n\)`, minus the number of non-intercept parameters estimated, `\(p\)`, minus 1, `\(n-p-1\)`. - There are 184 data points and one non-intercept parameter estimated. `$$184 - 1 - 1 = 182$$` ] --- # Confidence intervals for our estimates We can obtain confidence intervals for our estimates by wrapping our model with the `confint()` function. By default, the confidence level is set to 95%. ```r confint(production) ``` ``` ## 2.5 % 97.5 % ## (Intercept) 66127.423443 70423.173551 ## date 7.108313 7.973994 ``` The 95% confidence interval for `\(\beta_{0}\)` is `\((66127,\, 70423)\)`. The 95% confidence interval for `\(\beta_{1}\)` is `\((7.108,\, 7.974)\)`.