class: center, middle, inverse, title-slide .title[ # Introduction to Probability ] .subtitle[ ##
STA 032: Gateway to data science Lecture 13 ] .author[ ### Jingwei Xiong ] .date[ ### May 1, 2023 ] --- <style type="text/css"> .tiny .remark-code { font-size: 60%; } .small .remark-code { font-size: 80%; } </style> ## Recap -- - Introduction to probability - Events, sample space - Probability rules - Complement rule: `\(P(A) + P(A^c) = 1\)` - Additive rule: `\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)` - Permutation and combinations --- ## Today - Conditional probability - Marginal and joint probability - Independence - Bayes' Theorem --- ## Conditional Probability - Often we wish to know the probability an event will occur given that another event has occurred. - E.g., instead of the **marginal** probability of contracting COVID (regardless of vaccination status), we may wish to know: - The probability that someone will contract COVID *given that* they have been vaccinated - The probability someone will contract COVID *given that* they have not been vaccinated. - These are examples of **conditional probability**. - Our earlier example: `\(A\)` is the event that a person smokes; `\(B\)` is the event that a person identifies as female - The conditional probability that someone is a smoker (event `\(A\)`) given that they identify as female (event `\(B\)`) is denoted `\(P(A|B)\)`, which we say as "probability of A given B." --- ## Simple Example - Suppose we have a small population containing 3 female non-smokers, 1 female smoker, 4 non-female non-smokers, and 4 non-female smokers. .pull-left[ - Each individual's characteristics: .small[ ``` Female Smoker 1 1 1 2 0 1 3 0 1 4 0 1 5 0 1 6 1 0 7 1 0 8 1 0 9 0 0 10 0 0 11 0 0 12 0 0 ``` ] ] .pull-right[ .small[ | Female | Non-female --|-- | -- Smoke | 1 | 4 Does not smoke |3 | 4 ] ] --- ## Conditional probability .pull-left[ - Each individual's characteristics: .small[ ``` Female Smoker 1 1 1 2 0 1 3 0 1 4 0 1 5 0 1 6 1 0 7 1 0 8 1 0 9 0 0 10 0 0 11 0 0 12 0 0 ``` ] ] .pull-right[ .small[ | Female | Non-female --|-- | -- Smoke | 1 | 4 Does not smoke |3 | 4 ] ] - Conditional probability someone is a smoker given that they identify as female = `\(P(\texttt{smoker}|\texttt{female}) = \frac{1}{3+1}\)` - Looking at each individual's characteristics: only look at rows where `female == 1`. Looking at table: only look at `female` column. - The conditional probability is calculated by changing the denominator to correspond to our smaller population of interest. --- ## Conditional probability: How to do this in R? - We want to condition on the `female` column, and find the proportion of smokers and non-smokers. - Recall: **column proportions** ```r outTable <- matrix(c(1, 3, 4, 4), nrow = 2) rownames(outTable) <- c("Smoker", "Non-smoker") colnames(outTable) <- c("Female", "Non-female") prop.table(outTable, margin = 2) ``` ``` Female Non-female Smoker 0.25 0.5 Non-smoker 0.75 0.5 ``` --- ## Marginal probability - Each individual's characteristics: .small[ ``` Female Smoker 1 1 1 2 0 1 3 0 1 4 0 1 5 0 1 6 1 0 7 1 0 8 1 0 9 0 0 10 0 0 11 0 0 12 0 0 ``` ] - **Marginal probability**: "unconditional" probability; based on a single variable - One way to think about marginal probability: "throwing away/ignoring the other variable" - `\(P(\texttt{female}) = \frac{4}{12}\)` (just look at female column) - `\(P(\texttt{smoker}) = \frac{5}{12}\)` (just look at smoker column) --- ## Marginal probability - Another way to think about marginal probability: sum over the other variables - Might hear people say "marginalize over the other variable" - Recall: row and column totals in R .small[ ```r outTableTotals <- outTable %>% cbind(rowTotal = rowSums(outTable)) outTableTotals <- outTableTotals %>% rbind(columnTotal = colSums(outTableTotals)) outTableTotals ``` ``` Female Non-female rowTotal Smoker 1 4 5 Non-smoker 3 4 7 columnTotal 4 8 12 ``` ] - Column totals sum over smoking status; row totals sum over gender identity - `\(P(\texttt{female}) = \frac{4}{12}\)` (sum over smoking status; look at column totals) - `\(P(\texttt{smoker}) = \frac{5}{12}\)` (sum over gender identity; look at row totals) --- ## Joint probability - Joint probability is the probability when we are considering outcomes for **two or more** variables or processes - E.g.: - Flipping a coin *and* rolling a die - Identifying as female *and* smoking - `\(P(\texttt{smoker}\)` and `\(\texttt{female})\)`, `\(P(\texttt{smoker}, \texttt{female})\)`, `\(P(A \cap B)\)` --- ## Joint probability - Get joint probabilities from table by dividing by the total count for the entire table ```r outTable / sum(outTable) ``` ``` Female Non-female Smoker 0.08333333 0.3333333 Non-smoker 0.25000000 0.3333333 ``` - This is a **probability distribution**; should sum to 1 - The probabilities we saw in the previous class were joint and marginal probabilities --- ## Recall Our Vaccine Hesitancy Example |Ethnicity|Vaccine Hesitant | Not Hesitant | |:------|------:|-------:| | White British or Irish | 1362 | 7368 | | Other white background | 71 | 199 | | Mixed | 55 | 115 | | Asian or Asian British - Indian | 37 | 143 | | Asian or Asian British - Pakistani/Bangladeshi | 85 | 115 | | Asian or Asian British - other | 15 | 95 | | Black or Black British | 136 | 54 | | Other Ethnic Group or Not Specified | 31 | 119 | --- ## Three Probabilities Define events A=vaccine hesitant and B=Asian or Asian British-Indian. Calculate the following probabilities for a randomly-selected person drawn from the population of 10,000. - **Marginal probability** of vaccine hesitancy, `\(P(A) = \frac{1362 + 71 + 55 + 37 + 85 + 15 + 136 + 31}{10000} = \frac{1792}{10000}\)` - **Joint probability** of vaccine hesitancy and Indian ethnicity, `\(P(A \cap B) = \frac{37}{10000}\)` - **Conditional probability** of vaccine hesitancy given a person is of Indian ethnicity, `\(P(A \mid B) = \frac{37}{37+143}\)` --- ## Rules for Conditional Probability - More formally, we define **conditional probability** as `\(P(A|B)=\frac{P(A \cap B)}{P(B)}\)` - Recall: Conditional probability someone is a smoker given that they identify as female = `\(P(\texttt{smoker}|\texttt{female}) = \frac{1}{3+1}\)` - Calculated by changing the denominator to correspond to our smaller population of interest - `\(= \frac{\# (\texttt{smoker and female})}{\# \texttt{female}}\)` - According to new formula: `\(P(\texttt{smoker}|\texttt{female}) = \frac{P(\texttt{smoker} \cap \texttt{female})}{P(\texttt{female})} = \frac{\# \texttt{smoker and female}/\texttt{total}}{\# \texttt{female}/\texttt{total}}\)` - Works because total cancels out --- ## Rules for Conditional Probability - `\(P(A|B)=\frac{P(A \cap B)}{P(B)}\)` - Manipulating this formula, we get the **general multiplication rule**: `\(P(A \cap B) = P(B)P(A|B)\)`. - **Sum of conditional probabilities**: Let `\(A_1, ..., A_k\)` be disjoint outcomes. Then `\(P(A_1|B) + ... + P(A_k| B) = 1\)` - This is the same idea as the complement rule from last class: `\(P(A) + P(A^c) = 1\)`. - When there are just two events, `\(A\)` and `\(A^c\)`, we have `\(P(A|B) + P(A^c| B) = 1\)`, so `\(P(A|B) = 1 - P(A^c| B)\)`. --- ## Rules for Conditional Probability - One more helpful rule is the **law of total probability**: - `\(P(B)=P(B \mid A)P(A) + P(B \mid {A}^c)P({A}^c) = P(B \cap A)+P(B \cap {A}^c)\)` - Translates to the statement that the probability that B occurs is equal to the sum of the probabilities that B occurs with A and that B occurs without A - Extending this to `\(A_1, ..., A_k\)`, where `\(A_1, ..., A_k\)` are disjoint outcomes: `\(P(B) = P(B \cap A_1)+ ... + P(B \cap {A_k})\)` --- ## Independence - Events `\(A\)` and `\(B\)` are **independent** if knowing the outcome of one provides no useful information about the outcome of the other - E.g.: Flipping a coin and rolling a die are two independent processes - Knowing the coin was heads does not help determine the outcome of a die roll - Seeing someone with an umbrella and the day being rainy are not independent - If we see someone with an umbrella, it is more likely to be a rainy day --- ## Independence - What is the probability of flipping heads and rolling a 1 on a die? - Probability distribution | 1 | 2 | 3 | 4 | 5 | 6 --|--|-- |-- |-- |-- |-- Heads | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` Tails | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` - Intuition: 1/2 of the time we get heads, and 1/6 of those times we roll a 1 - Probabilities can therefore be multiplied --- ## Independence - Multiplication rule for independent processes: - If A and B represent events from two different and independent processes, then the probability that both A and B occur can be calculated as the product of their separate probabilities: `\(P(A \cap B) = P(A) \times P(B)\)`. - This is "if and only if" relationship. `\(P(A \cap B) = P(A) \times P(B)\)`, A indepdent of B - If there are `\(k\)` events `\(A_1, ..., A_k\)` from `\(k\)` independent processes, then the probability they all occur is `\(P(A_1) \times P(A_2) \times ... \times P(A_k)\)`. --- ## Independence - Recall this probability distribution: | 1 | 2 | 3 | 4 | 5 | 6 --|--|-- |-- |-- |-- |-- Heads | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` Tails | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` | `\(\frac{1}{12}\)` - `\(P(\texttt{roll 1} | \texttt{heads}) = \frac{P(\texttt{1 and heads})}{P(\texttt{heads})} = \frac{1/12}{1/2} = 1/6\)` - `\(P(\texttt{roll 1}) = 1/6\)` - Events `\(A\)` and `\(B\)` are **independent** if and only if `\(P(A \mid B)=P(A)\)` and `\(P(B \mid A)=P(B)\)`. --- ## Revisiting Independence - Independence can be check using conditional probability. - Intuitively, A independent of B means the condition of B won't affect A. - Events `\(A\)` and `\(B\)` are **independent** if and only if `\(P(A \mid B)=P(A)\)` and `\(P(B \mid A)=P(B)\)`. - This comes from the multiplication rule `\(P(A \cap B)=P(A)\times P(B)\)` and the definition of conditional probability, `\(P(A|B)=\frac{P(A \cap B)}{P(B)}\)` - `\(P(A|B)=\frac{P(A \cap B)}{P(B)} = \frac{P(A) P(B)}{P(B)} = P(A)\)` - `\(P(B|A)=\frac{P(A \cap B)}{P(A)} = \frac{P(A) P(B)}{P(A)} = P(B)\)` --- ## Checking Independence Are vaccine hesitancy and Indian ethnicity independent in our population? - **Marginal probability** of vaccine hesitancy, `\(P(A) = \frac{1362 + 71 + 55 + 37 + 85 + 15 + 136 + 31}{10000} = \frac{1792}{10000} = .1792\)` - **Conditional probability** of vaccine hesitancy given a person is of Indian ethnicity, `\(P(A \mid B) = \frac{37}{37+143} = .206\)` - `\(P(A) \neq P(A \mid B)\)`, so they are not independent. --- ## Independent vs Disjoint Events - For **independent events** `\(A\)` and `\(B\)`, `\(P(A \mid B)=P(A)\)` and `\(P(B \mid A)=P(B)\)`, so knowing one event occurred tells us *nothing* about the chances the other event will occur. - For two **disjoint** or **mutually exclusive** events, knowing that one event has occurred tells us that the other event definitely has not occurred, i.e., `\(P(A \cap B)=0\)`. - Disjoint events are therefore *not* independent! --- ## Bayes' Theorem - Often we know `\(P(B|A)\)` when we really want `\(P(A|B)\)` - For example, imagine a hypothetical scenario of a 40-year-old woman with a positive screening mammogram. Let the event `\(A\)` be having cancer and `\(B\)` be the positive mammogram screening result. - We want to know `\(P(A|B)\)`, which is the probability of having cancer given a positive screening result. - `\(P(B | A)\)` depends on how good the screening tool is (called **sensitivity**, the probability of a positive result given that a person has cancer) - We will need to know some properties of the screening test, and `\(P(A)\)`, the prevalance of breast cancer among 40-year-old women. - Using Bayes' Theorem is sometimes described as "updating our beliefs": without any information on the woman's test result, the probability of cancer is just `\(P(A)\)`; with the test result we can calculate `\(P(A|B)\)` --- ## Bayes' Theorem **Bayes' theorem** says that `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)`. - The last equality is by the law of total probability <img src="img/bayes.webp" width="50%" /> Credit: Matt Buck, Flickr, CC BY-SA 2.0 --- ## Bayes' Theorem `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)` - `\(A\)` = cancer, `\(B\)` = positive screening result. - `\(P(B | A)\)` is the probability of a positive screening result, given that a person has cancer. This is called the **sensitivity** of the test. Say it is .85. - We also need to know `\(P(A)\)`, the prevalance of breast cancer among 40-year-old women. Say this is .01. - The last ingredient is `\(P(B|A^c)\)`, which is the probability of a positive screening result given that a person does not have cancer. Say it is .1. --- ## Bayes' Theorem in Action `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)` - `\(P(B | A) = .85\)` - `\(P(A) = .01\)` - `\(P(B|A^c) = .1\)` - Then `\(P(A \mid B) =\frac{.85*.01}{.85*.01 + .1*(1 - .01)} = 0.079\)` --- ## Behind the scenes: Hypothetical 10,000 Consider a hypothetical population of 10,000 40-year-old women. .pull-left[ We have 1. The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`. 2. The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`. 3. The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`. ] .pull-right[ | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | | | | | Mammo - | | | | | Total | | | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ We have 1. **The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`.** 2. The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`. 3. The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`. ] .pull-right[ Item 1 says the prevalence in this group is 1%, so then we expect to have `\(10000\times0.01=100\)` cases and `\(10000\times0.99=9900\)` cancer-free women. | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | | | | | Mammo - | | | | | Total | 100 | 9900 | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ We have 1. The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`. 2. **The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`.** 3. The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`. ] .pull-right[ In the group of 100 women with cancer, the mammogram should pick up `\(100\times0.85=85\)` of them, and miss the remaining `\(100-85=15\)`. | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | 85 | | | | Mammo - | 15 | | | | Total | 100 | 9900 | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ We have 1. The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`. 2. The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`. 3. **The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`.** ] .pull-right[ In the group of 9900 women without cancer, the mammogram should correctly identify `\(9900*0.90=8910\)` of them as being cancer-free, and it will mistakenly identify `\(9900-8910=990\)` as having cancer. | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | 85 | 990 | | | Mammo - | 15 | 8910 | | | Total | 100 | 9900 | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ - Now we complete the table by filling in the row totals. - At this point, it's easy to calculate the conditional probability of cancer given a positive mammogram as `\(\frac{85}{1075}=0.079\)`. - This entire computation is equivalent to doing `\(P(A \mid B) =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)` ] .pull-right[ | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | 85 | 990 | 1075 | | Mammo - | 15 | 8910 | 8925 | | Total | 100 | 9900 | 10000 | ] --- ## Summary -- - Conditional probability - General multiplication rule: `\(P(A \cap B) = P(B)P(A|B)\)` - Sum of conditional probabilities: `\(P(A_1|B) + ... + P(A_k| B) = 1\)` - Law of total probability: `\(P(B) = P(B \cap A_1)+ ... + P(B \cap {A_k}) = P(B \mid A_1)P(A_1) + ... + P(B \mid {A_k})P({A_k})\)` - Marginal and joint probability - Revisiting independence - `\(P(A \mid B)=P(A)\)` and `\(P(B \mid A)=P(B)\)` - Bayes' Theorem - `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)`