The production and sales of different types of greenhouse flowers and plants has been collected annually since 2007 for both Canada as a whole and its individual provinces (no territories).
More information on this record can be found here.
The data and accompanying metadata can be downloaded together as a .zip file from here.
For these demos, I will be storing all data files in a folder called data
with subfolders for each topic.
if (!dir.exists("./data/flowers")) {
dir.create("./data/flowers", recursive=TRUE)
}
download.file(
"https://www150.statcan.gc.ca/n1/tbl/csv/32100246-eng.zip",
destfile = "./data/flowers/flowers.zip"
)
unzip(
"./data/flowers/flowers.zip",
exdir = "./data/flowers"
)
list.files("./data/flowers", pattern="csv")
## [1] "32100246.csv" "32100246_MetaData.csv"
## Rows: 1,540
## Columns: 16
## $ REF_DATE <dbl> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2~
## $ GEO <chr> "Canada", "Canada", "Canada", "Canada", "Canada",~
## $ DGUID <chr> "2016A000011124", "2016A000011124", "2016A0000111~
## $ `Flowers and plants` <chr> "Total potted plants [1151431 + 1151432]", "Total~
## $ Output <chr> "Production (number)", "Sales", "Production (numb~
## $ UOM <chr> "Number", "Dollars", "Number", "Dollars", "Number~
## $ UOM_ID <dbl> 223, 81, 223, 81, 223, 81, 223, 81, 223, 81, 223,~
## $ SCALAR_FACTOR <chr> "units", "units", "units", "units", "units", "uni~
## $ SCALAR_ID <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ VECTOR <chr> "v52221087", "v52221098", "v52221109", "v52221120~
## $ COORDINATE <chr> "1.1.1", "1.1.2", "1.2.1", "1.2.2", "1.3.1", "1.3~
## $ VALUE <dbl> 137778939, 665908000, 139391139, 131074000, 18666~
## $ STATUS <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ SYMBOL <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ TERMINATED <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ DECIMALS <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
The variables of interest are:
Variable | Description |
---|---|
REF_DATE |
Year of record |
GEO |
Country or province |
VALUE |
Number of plants produced or monetary value of plants sold |
Output |
Whether VALUE corresponds to production (number) or sales (dollars) |
Flowers and plants |
Flower and plant types |
flower_sales_can <- flowers %>%
filter(location == "Canada", output == "Sales")
ggplot(flower_sales_can, aes(x=year, y=value, colour=type))+
geom_line()
We can overwrite the existing flowers
data with these changes since we'll want to use the cleaned data again later. We'll remove the word Total with the space that comes after, the space before the square brackets, and the square brackets with all of its contents. Finally, we will capitalise the first letter of each flower type.
flowers <- flowers %>%
mutate(
type = str_remove(type, pattern="Total\\s"),
type = str_remove(type, pattern="\\s\\[.*\\]"),
type = str_to_sentence(type)
)
We can check our work using:
flowers %>%
distinct(type)
## # A tibble: 5 x 1
## type
## <chr>
## 1 Potted plants
## 2 Cuttings
## 3 Cut flowers
## 4 Ornamental bedding plants
## 5 Vegetable bedding plants
Now, to re-obtain the data we originally had:
flower_sales_can <- flowers %>%
filter(location == "Canada", output == "Sales")
We can avoid the scientific notation on the y-axis by simply scaling the values.
flower_sales_can <- flower_sales_can %>%
mutate(value = value / 1e6)
Now our y-axis will have the unit of millions of dollars rather than single dollars.
We can remove the legend entirely and just label the lines directly on the plot using gghighlight()
. gghighlight()
highlights five levels by default. Coincidentally, we have exactly five lines!
flower_prod_can <- flowers %>%
filter(year == 2020, location == "Canada", output == "Production (number)")
ggplot(flower_prod_can, aes(x=type, y=value))+
geom_col()
flower_prod_can <- flower_prod_can %>%
mutate(type = fct_reorder(type, value, .desc=TRUE))
We can avoid the scientific notation on the y-axis by once again scaling the values.
flower_prod_can <- flower_prod_can %>%
mutate(value = value / 1e6)
Now our y-axis will have the unit of millions of units rather than single units.
When making plots for webpages, we can make them as wide as we want. For print, we may be limited to the printing margins. However, in reducing the width of the plot,
The solution here is to flip the axes — put the plant types on the y-axis and their counts on the x-axis.
If we are making a horizontal bar plot and we want the values to be sorted from largest to smallest, in fct_reorder()
, we would use the default .desc=FALSE
.
flower_prod_can <- flower_prod_can %>%
mutate(type = fct_reorder(type, value))
Since ggplot2
v3.3.0, we no longer need to build a vertical bar plot and add coord_flip()
to it. We can now build a horizontal bar plot by supplying the values to the x-aesthetic and the flower types to the y-aesthetic.