Stat 4604: Lab 1

Adam Shen

October 5, 2022

Slide navigation

  • Type ? to bring up the presentation menu

  • Press o for the presentation overview (and jump to a specific slide)

  • To export slides to PDF, press e to activate PDF export mode, then print to PDF (note that some features may not function properly when exported to a PDF)

  • To copy a code block, hover over the top-right corner of the code block and click the clipboard icon

Contact

Name: Adam Shen

Email: firstnamelastname@cmail.carleton.ca

MTC Hours: Wednesdays 2pm to 4pm

Resources for learning R

If you do not feel very comfortable with the R language, I recommend working through Hands on Programming with R, a free online book written by the folks over at RStudio.

  • Chapters 2 through 7 for R basics, such as writing functions, R objects, subsetting

  • Chapter 9 for conditional logic

  • Chapter 11 for loops

Distribution of the day

Today, we will be working with a Pareto distribution, whose pdf is given by:

f(x) = \frac{\alpha 3^{\alpha}}{x^{\alpha + 1}}, \quad x > 3, \quad \alpha > 1,

and zero otherwise.

Task: Find the cdf and the quantile function.

Solution (pdf)

Write a function for the pdf

Note: shape refers to \alpha.

dpareto <- function(x, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
}

Write a function for the pdf

Note: shape refers to \alpha.

dpareto <- function(x, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
}

Write a function for the pdf

Note: shape refers to \alpha.

dpareto <- function(x, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
  
  ifelse(x <= scale, 0, (shape * scale^(shape)) / (x^(shape + 1)))
}

Here, I use the ifelse function because I want dpareto to be vectorized over x, i.e. if the shape criteria has been satisfied, I can supply a vector of values for x and the function should return a vector of the same length.

You do not need a loop for this!

Test dpareto

dpareto(x=4, shape=0.5)
Error in dpareto(x = 4, shape = 0.5): Value of `shape` must be greater than 1.
dpareto(x=seq(2.5, 3.5, 0.1), shape=3)
 [1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8770781
 [8] 0.7724762 0.6830135 0.6061350 0.5397751

Write a function for the cdf

Note: shape refers to \alpha.

ppareto <- function(q, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
}

Write a function for the cdf

Note: shape refers to \alpha.

ppareto <- function(q, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
}

Write a function for the cdf

Note: shape refers to \alpha.

ppareto <- function(q, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
  
  ifelse(q <= scale, 0, 1 - (scale / q)^shape)
}

Test ppareto

ppareto(q=4, shape=0.5)
Error in ppareto(q = 4, shape = 0.5): Value of `shape` must be greater than 1.
ppareto(q=seq(2.5, 3.5, 0.1), shape=3)
 [1] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 [7] 0.09368601 0.17602539 0.24868520 0.31304702 0.37026239

Write a function for the quantile function

Note: shape refers to \alpha.

qpareto <- function(p, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
}

Write a function for the quantile function

Note: shape refers to \alpha.

qpareto <- function(p, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
}

Write a function for the quantile function

Note: shape refers to \alpha.

qpareto <- function(p, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
  
  if (any(p <= 0 | p >= 1)) {
    stop("Values of `p` must be between 0 and 1.")
  }
}

Here, we need to use any because the condition p <= 0 | p >= 1 returns a vector greater than length 1. If we supply a logical condition with length greater than 1 to if, it will only use the first element.

any evaluates to TRUE if the condition inside contains at least one TRUE. Otherwise, it returns FALSE.

In the above, I have chosen to make p = 0 result in an error since F(q) = 0 for any q \leq 3. p = 1 will result in division by zero, and therefore should also raise an error.

Write a function for the quantile function

Note: shape refers to \alpha.

qpareto <- function(p, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
  
  if (any(p <= 0 | p >= 1)) {
    stop("Values of `p` must be between 0 and 1.")
  }
  
  scale / (1 - p)^(1 / shape)
}

Test qpareto

qpareto(p=0.5, shape=0.5)
Error in qpareto(p = 0.5, shape = 0.5): Value of `shape` must be greater than 1.
qpareto(p=-3, shape=3)
Error in qpareto(p = -3, shape = 3): Values of `p` must be between 0 and 1.
qpareto(p=c(0.25, 0.5, 0.75), shape=3)
[1] 3.301927 3.779763 4.762203
ppareto(q=6, shape=3)
[1] 0.875
ppareto(q=6, shape=3) |>
  qpareto(shape=3)
[1] 6

Inverse-transform method (outline)

We would like to sample from the given Pareto distribution using the inverse-transform method.

Outline of steps:

  1. Generate random deviates from Uniform(0,1).

  2. Pass random deviates to quantile function. Done!

Inverse-transform method (with symbols)

Let X_{1}, X_{2}, \ldots, X_{n} be an iid sample from the Pareto distribution (with scale parameter equal to 3). To generate this sample with the inverse-transform method:

  1. For i=1, \ldots, n, generate U_{i} where U_{i} \sim \text{Unif}(0,1).

  2. X_{i} \,=\, Q(U_{i}) \,=\, \frac{3}{(1-U_{i})^{1/\alpha}}

As seen in class, if U_{i} \sim \text{Unif}(0,1), then 1-U_{i} \sim \text{Unif}(0,1).

Therefore, it is also equivalent to perform:

X_{i} \,=\, Q(1-U_{i}) \,=\, \frac{3}{U_{i}^{1/\alpha}}

Write a Pareto sampler

rpareto <- function(n, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
}

Write a Pareto sampler

rpareto <- function(n, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
}

Write a Pareto sampler

rpareto <- function(n, shape, scale=3) {
  if (scale <= 0) {
    stop("Value of `scale` must be greater than 0.")
  }
  
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1.")
  }
  
  u <- runif(n)
  qpareto(u, shape=shape, scale=scale)
}

Test rpareto

rpareto(n=1, shape=0.5)
Error in rpareto(n = 1, shape = 0.5): Value of `shape` must be greater than 1.
set.seed(99)

rpareto(n=5, shape=3)
[1]  4.021037  3.123257  4.405686 15.332172  3.872294
set.seed(99)

rpareto(n=5, shape=3) |>
  ppareto(shape=3)
[1] 0.5847119 0.1137817 0.6842647 0.9925088 0.5349936

Check our work

Create population mean and variance functions

pop_mean <- function(shape, scale=3) {
  if (shape <= 1) {
    stop("Value of `shape` must be greater than 1 to avoid division by zero.")
  }
  
  (scale * shape) / (shape - 1)
}

pop_var <- function(shape, scale=3) {
  if (shape <= 2) {
    stop("Value of `shape` must be greater than 2 to avoid division by zero.")
  }
  
  (scale^2 * shape) / ((shape - 1)^2 * (shape - 2))
}

Generate sample

set.seed(25)

my_sample <- rpareto(n=10000, shape=3)

Compare values

pop_mean(shape=3)
[1] 4.5
mean(my_sample)
[1] 4.534096
pop_var(shape=3)
[1] 6.75
var(my_sample)
[1] 11.63279
qpareto(p=c(0.25, 0.5, 0.75), shape=3)
[1] 3.301927 3.779763 4.762203
quantile(my_sample, probs=c(0.25, 0.5, 0.75))
     25%      50%      75% 
3.304984 3.778027 4.758152 

Make a plot

x_vals <- seq(3.01, 250, 0.01)

hist(my_sample, breaks=20, freq=FALSE, main="Histogram of Pareto(3,3) sample", sub="Red line is the true density", xlab="x")
points(x=x_vals, y=dpareto(x_vals, shape=3), type="l", col="darkred")