Continuous RV, Normal (Gaussian) Distribution

class: center, middle, inverse, title-slide

.title[
# Continuous RV, Normal (Gaussian) Distribution
]
.subtitle[
## <br><br> STA 032: Gateway to data science Lecture 17
]
.author[
### Jingwei Xiong
]
.date[
### May 10, 2023
]

---

## Recall: Continuous random variables

- We saw continuous random variables earlier and how they differ from discrete random variables

- A probability distribution for a discrete random variable is called a **probability mass function**; for a continuous random variable it is a **probability density functions**

- For a continuous random variable, the probability that the random variable takes on any exact value is zero. Instead, we think about probabilities in ranges.

- `$P(a \leq X \leq b)$` is the area under the density function between `$a$` and `$b$`.

---
## Normal Distribution

- The **normal distribution** is an example of a continuous distribution

- It is a very important distribution and one of the primary inferential tools in statistics

- Many natural phenomenon approximate the normal distribution, such as weight, height, blood pressure, annual rainfall

- The normal distribution is commonly called the *Gaussian distribution* after [Carl Friedrich Gauss](https://en.wikipedia.org/wiki/Carl_Friedrich_Gauss), who wrote down the equations governing it in the early 1800's.

- It is also sometimes referred to as a *bell curve*, although there are other distributions that are symmetric and shaped like a bell

---

## Illustration: Shoe sizes

- Thinking about shoe sizes is a nice formulation for the Gaussian distribution

- Mickle et al (2010 *Footwear Science*) showed the following bimodal distribution of shoe sizes in the US.

Note that standard shoe sizes are discrete.

---

## Illustration: Shoe sizes

- Let `$X$` represent shoe size for wearers of men's shoes

- Here is a (hypothetical) probability distribution of shoe sizes of wearers of men's shoes.

---

## Illustration: Shoe sizes

If we want to know the probability that a customer coming into a store wants a men's shoe size smaller than 9, we just add up the heights of the bars for shoe sizes 8.5 and smaller. We can do this for shoe sizes in any range and tabulate the full discrete distribution of shoe sizes.

---

## Smaller Shoes

.pull-left[

```
 size probability
  5.5      0.0001
  6.0      0.0006
  6.5      0.0012
  7.0      0.0032
  7.5      0.0081
  8.0      0.0180
  8.5      0.0334
  9.0      0.0556
  9.5      0.0805
 10.0      0.1072
 10.5      0.1202
```
]
.pull-right[

```
 size probability
 11.0      0.1326
 11.5      0.1247
 12.0      0.1109
 12.5      0.0807
 13.0      0.0550
 13.5      0.0345
 14.0      0.0182
 14.5      0.0086
 15.0      0.0050
 15.5      0.0012
 16.0      0.0004
```
]

The probability of a random men's shoe wearer having a shoe size less than 9 in this population is 0.0646.

What is the probability of shoe size 10-11.5?
---

## Moving to Continuous Distributions

- Now suppose we could get *really* well-fitting shoes, using quarter sizes (9, 9.25, 9.5, 9.75, ...) or even tenth sizes (9, 9.1, 9.2, ...), or shoes specifically made to fit your feet perfectly.

- As the number of sizes increases, the bar width becomes more narrow, and the graph approaches a smooth curve.

- We will use these smooth curves to describe the probability distributions of continuous random variables (e.g. a shoe size could be 9.50032)

.pull-left[
<img src="lecture17_files/figure-html/normal-1.png" width="90%" />
]
.pull-right[
This is a *probability density function*.
]
---
## Moving to Continuous Distributions

- The probability density function can be used to get the probability of any range of continuous shoe sizes we would like to investigate.

For example, we can calculate the probability a continuous shoe size is less than 9 (the shaded area).

---
## Moving to Continuous Distributions
<img src="lecture17_files/figure-html/unnamed-chunk-8-1.png" width="40%" />

- How do we find this area of interest?

- Calculus! `$$P(a \leq X \leq b)=\text{area between a and b below the curve}=\int_a^b f(x)dx$$` where `$f(x)$` represents the density curve
  - In this course, we will use R

---
## Normal Distribution

- The normal distribution is a **symmetric, bell-shaped** distribution

- It is characterized by the mean, `$\mu$`, and the standard deviation, `$\sigma$` (or variance, `$\sigma^2$`)

- For the normal distribution, the **density function** is given by  `$$f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}}$$`

- The notation `$N(\mu,\sigma^2)$` is often used.

- The normal distribution with mean 0 and standard deviation 1 is called the **standard normal distribution**. It is commonly denoted `$Z \sim N(0, 1)$`.

---
## Probability density function for Normal Distribution
- Like `dbinom()` and `dpois()`, `dnorm()` in R gives us the probability distribution

- Here instead of `$P(X = x)$`, it is the **value of the probability density function**, `$f(x)$` on the previous slide, at values that we input

- `dnorm()` has arguments `x`, `mean` and `sd`, where `mean` and `sd` are the mean and standard deviation of the normal distribution that we want

- **Remember that `$P(X = x) = 0$` for a continuous random variable**; the value that `dnorm()` gives us is not a probability but the height of the density function

---
## Probability density function for Normal Distribution

```r
dnorm(x = -3:3, mean = 0, sd = 1)
```

```
[1] 0.004431848 0.053990967 0.241970725 0.398942280 0.241970725 0.053990967
[7] 0.004431848
```

.small[

```r
data.frame(x = c(-3, 3)) %>%
  ggplot(aes(x)) +
  stat_function(fun = dnorm, args = list(mean = 0, sd = 1)) +
  labs(title = "Probability distribution of N(0, 1)",
       y = "f(x)")
```

<img src="lecture17_files/figure-html/unnamed-chunk-10-1.png" width="504" />
]

---
## Normal Distribution varying mean 
- Which of the three distributions have means 0, 1, and 4?

---
## Normal Distribution varying standard deviation 
- Which has standard deviation 1, 2, and 4?

---
## Calculating probabilities for the normal distribution

- We saw `pbinom()`, which gave us `$P(X \leq x)$` for the binomial distribution

- Similarly, `pnorm()` gives us `$P(X \leq x)$` for the normal distribution. The arguments are 
  - `q`, the vector of quantiles ( `$x$` in `$P(X \leq x)$` ); note that you can input multiple values at once, hence "vector"
  - `mean`, the mean `$\mu$` (default value 0)
  - `sd`, the standard deviation `$\sigma$` (default value 1)

```r
pnorm(0)
```

```
[1] 0.5
```

---
## Calculating probabilities for our shoes example

Going back to our shoe size example, assume that men's shoe sizes follow a normal distribution with mean 11 and standard deviation 1.5, i.e., `$N(\mu = 11,\sigma^2 = 1.5^2)$`

What is the probability of shoe sizes less than 9?

```r
pnorm(9, mean = 11, sd = 1.5)
```

```
[1] 0.09121122
```

What is the probability of shoe sizes greater than 9?

```r
1 - pnorm(9, mean = 11, sd = 1.5)
```

```
[1] 0.9087888
```

---
## Calculating probabilities for our shoes example
What is the probability of shoe sizes less than 13?

```r
pnorm(13, mean = 11, sd = 1.5)
```

```
[1] 0.9087888
```

What is the probability of shoe size 10-11.5?
--

```r
pnorm(11.5, mean = 11, sd = 1.5) - pnorm(10, mean = 11, sd = 1.5)
```

```
[1] 0.3780661
```

---
## Probabilities between two values

To get the probability that a random wearer of men's shoes would wear a size between 10 and 11.5, we take `pnorm(11.5, mean = 11, sd = 1.5) - pnorm(10, mean = 11, sd = 1.5)` to get the value 0.3780661.

---
## Sampling from Normal distribution in R
- Just like with the Bernoulli, binomial and Poisson distributions, we can simulate random draws from the normal distribution using the `rnorm()` function

- `rnorm()` has the arguments `n, mean, sd`, where `n` is the number of draws from the distribution, `mean` is the mean and `sd` is the standard deviation.

```r
set.seed(0) # so results are reproducible 
normalDraws <- rnorm(n = 100, mean = 0, sd = 1)
head(normalDraws, 20)
```

```
 [1]  1.262954285 -0.326233361  1.329799263  1.272429321  0.414641434
 [6] -1.539950042 -0.928567035 -0.294720447 -0.005767173  2.404653389
[11]  0.763593461 -0.799009249 -1.147657009 -0.289461574 -0.299215118
[16] -0.411510833  0.252223448 -0.891921127  0.435683299 -1.237538422
```

---
## Frequency distribution varying mean and sd

.small[

```r
set.seed(0) # so results are reproducible 
normal1 <- rnorm(n = 5000, mean = 3, sd = 2)
normal2 <- rnorm(n = 5000, mean = 3, sd = 10)
normal3 <- rnorm(n = 5000, mean = 11, sd = 1.5) # shoe size distribution 
```
]

---
## Frequency distribution varying mean and sd

.small[

```r
set.seed(0) # so results are reproducible 
normal1 <- rnorm(n = 5000, mean = 3, sd = 2)
normal2 <- rnorm(n = 5000, mean = 3, sd = 10)
normal3 <- rnorm(n = 5000, mean = 11, sd = 1.5)
```
]

---
## More about the standard normal distribution

- Recall: standard normal distribution `$Z \sim N(0, 1)$`

- A normally distributed random variable can be expressed as a standard normal by **subtracting the mean and dividing by the standard deviation**; this process is called **standardization**

- `$Y \sim N(\mu, \sigma^2)$`

- `$Z = \frac{Y - \mu}{\sigma}$`

- `$E\left(\frac{Y - \mu}{\sigma}\right) = \frac{1}{\sigma}[E(Y) - \mu] = 0$`

- `$Var\left(\frac{Y - \mu}{\sigma}\right) = \frac{1}{\sigma^2}[Var(Y)] = \frac{1}{\sigma^2}[\sigma^2] = 1$`

- What we are essentially doing is **moving the location** (mean moves to 0) and **changing the scale** (standard deviation becomes 1)

---
## More about the standard normal distribution

- Earlier, we were interested in the probability of shoe sizes smaller than 13, and we calculated it using

.small[

```r
pnorm(13, mean = 11, sd = 1.5)
```

```
[1] 0.9087888
```
]

- Let `$Y$` be the random variable denoting men's shoe sizes. Then `$Y \sim N(11, 1.5^2)$`.

.tiny[
$$
`\begin{aligned}
P(Y \leq 13) &= P\left(\frac{Y - \mu_y}{\sigma_y} \leq \frac{13 - \mu_y}{\sigma_y} \right) \\
&=P\left( Z \leq \frac{13-11}{1.5} \right) \\
&=P(Z \leq \frac{2}{1.5})
\end{aligned}`
$$
]
.small[

```r
pnorm(2/1.5, mean = 0, sd = 1)
```

```
[1] 0.9087888
```
]

---
## z-score
.tiny[
$$
`\begin{aligned}
P(Y \leq 13) &= P\left(\frac{Y - \mu_y}{\sigma_y} \leq \frac{13 - \mu_y}{\sigma_y} \right) \\
&=P\left( Z \leq \frac{13-11}{1.5} \right) \\
&=P(Z \leq \frac{2}{1.5})
\end{aligned}`
$$
]

- The standardized value on the right-hand side, `$\frac{13-11}{1.5}$`, is known more generally as a z-score, where `$z = \frac{x - \mu}{\sigma} = \frac{\text{value - mean}}{\text{standard deviation}}$`

- The z-score is the **number of standard deviations above (positive z-scores) or below the mean (negative z-scores)**. To see this:

- `$x - \mu$` is the number relative to the mean, e.g., shoe size 13 is 2 above the mean
  
  - Dividing the above by `$\sigma$` gives us the number of standard deviations above the mean, e.g., with our shoe size distribution having a standard deviation of 1.5, shoe size 13 is `$\frac{2}{1.5} = 1.33$` standard deviations above the mean

---
## z-score

- The *relative* positions of values in the original and standardized distributions stay the same, i.e., `$P(Y \leq 13) = P(Z \leq \frac{2}{1.5})$` 
  - Our size 13 (or value `$\frac{2}{1.5}$` standard deviations above the mean) remains the same relative to the rest of the distribution.

---
## Standardizing in R

Consider the samples we drew earlier from `$N \sim (11, 1.5^2)$`

.small[

```r
set.seed(0) # so results are reproducible 
normal3 <- rnorm(n = 5000, mean = 11, sd = 1.5) # shoe size distribution 
standardizedNormal3 <- (normal3 - 11)/1.5
```
]

---
## Standardizing in R
<img src="lecture17_files/figure-html/unnamed-chunk-28-1.png" width="60%" />

.small[

```r
sum(normal3 <= 13)/length(normal3)
```

```
[1] 0.9072
```

```r
sum(standardizedNormal3 <= 2/1.5)/length(standardizedNormal3)
```

```
[1] 0.9072
```
]

---
## More about the standard normal distribution

- We saw earlier that `$P(Z \leq 0) = .5$`. This is because the standard normal distribution is symmetric with mean 0.

```r
pnorm(0) # default value of mean = 0 and sd = 1
```

```
[1] 0.5
```

- Tail probabilities of the standard normal distribution

- The symmetry of the normal distribution allows us to calculate the probability of values falling in the tails
  
  - For any `$z$`-score, `$P(Z \leq -z) = P(Z \geq z)$`

---
## Quantiles for the normal distribution

- Quantiles are cut points dividing the range of a probability distribution into continuous intervals

- Recall: quartiles (four groups) and percentiles (100 groups)

- `$P(X \leq q) = p$`, where `$q$` is the quantile (think of value on the horizontal axis), e.g., `$P(Z \leq 0) = .5$`

- Recall: `pnorm(q, mean, sd)` for `$P(X\leq x)$`, or `$P(Z \leq z)$` for standard normal. `pnorm()` returns the probability, `p`

```r
pnorm(q = 0, mean = 0, sd = 1)
```

```
[1] 0.5
```

- `qnorm(p, mean, sd)` for the quantile, e.g., `$P(X \leq \ ?) = p$`. `qnorm()` returns the quantile, `q`

```r
qnorm(p = .5, mean = 0, sd = 1)
```

```
[1] 0
```

---
## Important reference points for the normal distribution

- For the standard normal, z-scores (quantiles in R) corresponding to particular probabilities (critical values) are often written as `$z_p$`, where `$p$` denotes the probability in the **right tail**, e.g., `$z_{.5} = 0$`

- The z-scores corresponding to probabilities of 0.025 (2.5%) in the left and right tails are important reference points. Specifically, `$z_{.025} \approx 1.96$`

- In R, `qnorm(.025, lower.tail = FALSE)` returns the z-score corresponding to a probability of .025 in the right tail, `$z_{.025}$`, i.e., 2.5% probability in the right tail, so we should get 1.96. By symmetry, `qnorm(.975)` will return 1.96.

.pull-left[

```r
qnorm(.025, lower.tail = FALSE)
```

```
[1] 1.959964
```

```r
qnorm(.975)
```

```
[1] 1.959964
```
]

.pull-right[
<img src="img/stdnorm5.png" width="70%" />
]
---
## Important reference points for the normal distribution

.pull-left[

```r
pnorm(1.96)
```

```
[1] 0.9750021
```

```r
pnorm(1.96, lower.tail = FALSE)
```

```
[1] 0.0249979
```

<img src="img/stdnorm1.png" width="78%" />
]
.pull-right[

```r
pnorm(-1.96)
```

```
[1] 0.0249979
```

```r
pnorm(-1.96, lower.tail = FALSE)
```

```
[1] 0.9750021
```

<img src="img/stdnorm2.png" width="70%" />
]

---
## Standard normal table

- A **standard normal table** allows us to calculate values based on the standard normal distribution.

- It tells us how much area is under the normal curve to the *left* of the specified value (lower tail area). Sometimes the table shows the complement of this probability (upper or *right* tail area).

- With modern computing, we don't need to rely on these tables to get the desired probabilities, but you often find them in the back of statistics textbooks.

---
## Standard normal table

---
## Standard normal table

.pull-left[
What is the probability of a shoe size bigger than 13 (z-score 1.33)?

<img src="img/normalcurveupper.png" width="100%" />
]
--
.pull-right[
.small[

```r
pnorm(13, mean = 11, sd = 1.5, lower.tail = FALSE)
```

```
[1] 0.09121122
```

```r
pnorm(2/1.5, lower.tail = FALSE)
```

```
[1] 0.09121122
```

```r
1 - pnorm(2/1.5)
```

```
[1] 0.09121122
```
]
]

---
## Sum of independent normal random variables

- Important property: Any linear combination of normal random variables is a normal random variable with expectation and variance given by the formulas for expected value and variance of linear combinations (Lecture 15)

- Recall: A linear combination of two random variables, `$X$` and `$Y$`, is of the form `$aX+bY$`, where `$a$`
and `$b$` are constants

- Recall: 
  - `$E(aX + bY) = aE(X) + bE(Y)$`
  - For a linear combination of **independent** random variables `$Var(aX + bY) = a^2 Var(X) + b^2 Var(Y)$`

- Hence if `$X \sim N(\mu_x, \sigma_x^2)$` and `$Y \sim N(\mu_y, \sigma_y^2)$` are independent, `$W = X + Y \sim N(\mu_x + \mu_y, \sigma_x^2 + \sigma_y^2)$`

- Extends to more than two random variables in the linear combination. Note also that `$b$` can be negative, e.g., `$E(X - Y) = E(X) - E(Y)$` and `$Var(X - Y) = Var(X) + Var(Y)$`.

---
## Summary: Distributions in R

- For each distribution, R has a family of commands, starting with the letters `d`, `p`, `q` and `r`
  - `d` for density
  - `p` for cumulative density up to input value `$P(X \leq x)$`. Think of `$P(X \leq q) = p$`
  - `q` for the quantile, e.g., `$P(X \leq \ ?) = p$`
  - `r` for a random sample from the distribution

---

## Summary

- Common probability distributions: Normal

- Theoretical properties: probability density function, parameters, mean and variance, effect of varying parameters
  
  - R functions:
  
      - `dnorm()` for densities 
      - `pnorm()` for `$P(X\leq x)$`
      - `rnorm()` for random sample
    
  - Standard normal distribution