class: center, middle, inverse, title-slide .title[ # Week 8: Causal Inference ] --- ## Schedule <br> ### Previously `\(Y = X\beta + \varepsilon\)`, and how to estimate the best `\(\beta\)` using calculus. <br> ### Today How to combine the linear model with theory to make credible causal inferences. ??? Last week we learned the basics of the lienar model, which is this all-purpose technology we use to describe the relationship between two variables. We learned calculus so that we could optimize, finding the value of `\(\beta\)` that was the best fit to the data. But we ended last week with a note of disappointment, because the linear model alone can't get us where we want to go. At its core, it's still just describing observed correlations. Today, we explore how, by combining that linear model engine with deep substantive knowledge of your topic to make credible causal inferences. --- class: center, middle ## The Fundamental Problem of Causal Inference --- ## The Fundamental Problem of Causal Inference - I have some treatment `\((X)\)` and some outcome `\((Y)\)`. I want to know if `\(X\)` causes `\(Y\)`. In other words, if changing the value of `\(X\)` would result in a different value of `\(Y\)`. - The **Fundamental Problem of Causal Inference** is that every observation is either treated or not treated. I can't see what would happen in the alternate universe where an untreated observation is treated. --- ## The Potential Outcomes Framework | Alive w/o Hospital `\((Y_0)\)` | Alive w/ Hospital `\((Y_1)\)` | Hospital `\((X)\)` | Alive `\((Y)\)` | |-------|-------|-----|-----| | 0 | 1 | 1 | 1 | | 1 | 1 | 0 | 1 | | 0 | 0 | 1 | 0 | | 0 | 1 | 1 | 1 | | 1 | 1 | 0 | 1 | | 0 | 0 | 1 | 0 | | 0 | 1 | 1 | 1 | | 0 | 0 | 1 | 0 | | 1 | 1 | 0 | 1 | | 1 | 1 | 1 | 1 | -- The big problem here is that the **potential outcomes** and the **treatment** are correlated. Sick people are more likely to go to the hospital! ??? We'd like to know `\(Y_1\)` - `\(Y_0\)`, the treated outcome minus the untreated outcome. If we knew all the potential outcomes, then it would be clear that hospitals don't cause death. There's no one on the table here that has a worse outcome because of the hospital. But we can't do that, because we only ever see one of the potential outcomes. The best we can do is estimate the **average treatment effect**. --- ## Causal Diagrams Help Make Sense of All This <img src="08-causation_files/figure-html/unnamed-chunk-1-1.png" width="600" style="display: block; margin: auto;" /> The relationship between hospitals and death is **confounded** by the severity of a person's illness. Unless we can hold severity constant (aka **condition on** or **control for** severity), we can't confidently say whether hospitals are good for your health. --- # Drawing Causal Diagrams Today, we're going to show how causal diagrams can help us think through the problem of identifying causal effects. These causal diagrams -- also known as **DAGs** (directed acyclic graphs) -- are a way of representing the relationships between a web of variables. ### Some Useful R Packages ```r # for working with DAGs library(dagitty) # for visualizing DAGs library(ggdag) ``` --- ## Three Shapes to Recognize In this lecture, we'll introduce three ways in which a statistical relationship between X and Y can be **confounded**. <br> ![](img/three-elemental-confounds.png) --- class: center, middle ## Shape 1: The Fork --- ## Forks ```r fork <- dagify(Y ~ Gender, X ~ Gender) ggdag_classic(fork) + theme_dag() ``` <img src="08-causation_files/figure-html/unnamed-chunk-3-1.png" width="500px" style="display: block; margin: auto;" /> --- ## Forks - The relationship between `\(X\)` and `\(Y\)` is **confounded** by the presence of the backdoor path through gender. <img src="08-causation_files/figure-html/unnamed-chunk-4-1.png" width="400px" style="display: block; margin: auto;" /> --- ## Forks Simulate some example data: ```r # number of observations n <- 1000 # create a data frame of simulated data simulated_data <- tibble( # randomly assign each observation as female or male female = sample(c(0,1) , size = n, replace = TRUE), # X is caused by gender + some random value (epsilon) X = 2 * female + rnorm(n, 0 , 1), # Y is caused by gender + some random value (epsilon) Y = 3 * female + rnorm(n, 0 , 1) ) ``` -- Notice that `\(X\)` is not causally related to `\(Y\)` at all. Any observed relationship between the two is driven by gender. --- ## Forks ```r ggplot(data = simulated_data, mapping = aes(x=X, y=Y)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) ``` <img src="08-causation_files/figure-html/unnamed-chunk-6-1.png" width="600" style="display: block; margin: auto;" /> --- ## Forks - Every backdoor path is an *alternative explanation* for the observed correlation between `\(X\)` and `\(Y\)`. <img src="08-causation_files/figure-html/unnamed-chunk-7-1.png" width="500px" style="display: block; margin: auto;" /> - To account for that alternative explanation, we must find a way to *condition on gender*. If gender is held constant, then it cannot explain the relationship between `\(X\)` and `\(Y\)`. --- ## Forks Here's what conditioning on gender looks like graphically... ```r ggplot(data = simulated_data, mapping = aes(x = X, y = Y)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) + facet_wrap(~female) ``` <img src="08-causation_files/figure-html/unnamed-chunk-8-1.png" width="500px" style="display: block; margin: auto;" /> --- ## Forks And with a linear model... ```r lm(Y ~ X, data = simulated_data) ``` ``` Call: lm(formula = Y ~ X, data = simulated_data) Coefficients: (Intercept) X 0.7830 0.7045 ``` ```r lm(Y ~ X, data = filter(simulated_data, female == 0)) ``` ``` Call: lm(formula = Y ~ X, data = filter(simulated_data, female == 0)) Coefficients: (Intercept) X 0.01854 0.01380 ``` --- ## Exercise In groups, take five minutes to draw as many examples of forks as you can. --- class: center, middle ## Backdoor Paths --- ## Backdoor Paths Backdoor paths are essentially giant forks: they start with a variable that causes the treatment and end with a variable that causes the outcome. -- ```r gnarly_dag <- dagify(X ~ Z, Z ~ Q, Q ~ W, U ~ W, Y ~ U, Y ~ X) ggdag_classic(gnarly_dag) + theme_dag() ``` <img src="08-causation_files/figure-html/unnamed-chunk-10-1.png" width="300px" style="display: block; margin: auto;" /> -- Conveniently, you can shut down a backdoor path by conditioning on *any* variable along it. Conditioning on `\(Z\)` or `\(Q\)` or `\(W\)` or `\(U\)` would close the backdoor path. --- ## Backdoor Paths <img src="08-causation_files/figure-html/unnamed-chunk-11-1.png" width="400px" style="display: block; margin: auto;" /> The `adjustmentSets()` function from `daggity` is useful as your DAGs get more and more complex. ```r adjustmentSets(gnarly_dag, exposure = 'X', outcome = 'Y') ``` ``` { U } { W } { Q } { Z } ``` --- ## Exercise What variables can you condition on to close the backdoor paths between `\(X\)` and `\(Y\)`? <img src="08-causation_files/figure-html/unnamed-chunk-13-1.png" width="600" style="display: block; margin: auto;" /> --- ## Closing backdoor paths is like... <br> <br> .center[ ![](img/sherlock.gif) *"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth."* ] --- class: center, middle But...why not just condition on *everything*? It would definitely close all the backdoor paths, right? --- ### The Kitchen Sink Approach to Causal Inference Why not just condition on every variable that could possibly influence your outcome? <img src="img/piled-up-dishes-in-kitchen-sink.jpg" width="500px" style="display: block; margin: auto;" /> -- This approach has several major flaws, and you should avoid taking it. To better understand these flaws, let's explore the other shapes we might find in a causal diagram... ??? The kitchen sink approach is intuitively appealing. Control for everything that could possibly influence the outcome. But this approach has some major flaws, and you should never do it. Chief among its flaws is **post-treatment bias**. --- class: center, middle ## Shape 2: The Pipe (aka the Mediator) --- ## Mediators ```r dagify(M ~ X, Y ~ M) |> ggdag_classic() + theme_dag() ``` <img src="08-causation_files/figure-html/unnamed-chunk-15-1.png" width="400px" style="display: block; margin: auto;" /> -- `\(X\)` causes `\(Y\)` through its influence on `\(M\)`. Another way of saying that is that `\(M\)` is the *mechanism* through which `\(X\)` causes `\(Y\)`. --- ## Mediators Here, we'll simulate some data with that `\(X \rightarrow M \rightarrow Y\)` structure. ```r # simulated data simulated_data <- tibble( # X is drawn randomly X = rnorm(n, 0, 1), # M is caused by X M = as.numeric(X > rnorm(n, 0, 0.5)), # Y is caused by M Y = 3*M + rnorm(n,0,1) ) ``` --- ## Mediators Plotting that data shows that `\(X\)` is strongly correlated with `\(Y\)`... ```r ggplot(data = simulated_data, mapping = aes(x=X, y=Y)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) ``` <img src="08-causation_files/figure-html/unnamed-chunk-17-1.png" width="400px" style="display: block; margin: auto;" /> --- ## Mediators ...unless you condition on `\(M\)`. ```r ggplot(data = simulated_data, mapping = aes(x=X, y=Y)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) + facet_wrap(~M) ``` <img src="08-causation_files/figure-html/unnamed-chunk-18-1.png" width="400px" style="display: block; margin: auto;" /> --- ## Mediators ```r lm(Y ~ X, data = simulated_data) ``` ``` Call: lm(formula = Y ~ X, data = simulated_data) Coefficients: (Intercept) X 1.448 1.141 ``` ```r lm(Y ~ X, data = filter(simulated_data, M == 1)) ``` ``` Call: lm(formula = Y ~ X, data = filter(simulated_data, M == 1)) Coefficients: (Intercept) X 2.8581 0.1642 ``` --- ## Mediators Conditioning on a mediator closes the path between treatment and outcome. This is called **post-treatment bias**. ![](img/montgomery-nyhan-torres.png) --- ## Mediators But sometimes you may want to condition on a mediator -- for example, when you're performing a **mediation analysis** (Baron & Kenny, 1986). If conditioning on `\(M\)` weakens the association between `\(X\)` and `\(Y\)`, that lends support to your theory that it was a mediator! <img src="08-causation_files/figure-html/unnamed-chunk-20-1.png" width="600" style="display: block; margin: auto;" /> -- The point is to never blindly condition on a post-treatment variable unless you *want* to shut down that causal pathway. --- ## Exercise In groups, take five minutes to draw as many examples of mediators as you can. --- class: center, middle ## Simpson's Paradox --- ## Simpson's Paradox Sometimes, conditioning on a third variable can completely reverse the relationship between `\(X\)` and `\(Y\)`. -- ### Classic Example (Charig et al., 1986): <img src="img/stones.png" width="500px" style="display: block; margin: auto;" /> Which is the better treatment? --- ## Simpson's Paradox <img src="08-causation_files/figure-html/unnamed-chunk-22-1.png" width="600" style="display: block; margin: auto;" /> --- ## Simpson's Paradox Relatedly, it seems like hospitals cause death unless you condition on illness... <img src="08-causation_files/figure-html/unnamed-chunk-23-1.png" width="600" style="display: block; margin: auto;" /> --- ## Simpson's Paradox Another classic example: in 1973 a lawsuit was brought against UC Berkeley for discriminating against women in graduate admissions. That year, only 35% of women applicants were admitted versus 43% of male applicants. -- But when you look at the breakdown by department... ![](img/bickel.jpg) -- So, was Berkeley discriminatory or not? Does being female cause your chances of admission to decrease? --- ## Simpson's Paradox: Exercise Should you condition by Trump approval? ```r ces |> group_by(educ) |> summarize(pct_much_worse = sum( national_economy == 'Gotten much worse') / n() * 100) ``` ``` # A tibble: 6 × 2 educ pct_much_worse <fct> <dbl> 1 No HS 32.8 2 High school graduate 33.1 3 Some college 40.2 4 2-year 39.0 5 4-year 46.7 6 Post-grad 54.8 ``` --- ## Simpson's Paradox: Exercise Should you condition by Trump approval? ```r ces |> filter(trump_approval %in% c('Strongly approve', 'Somewhat approve')) |> group_by(educ) |> summarize(pct_much_worse = sum( national_economy == 'Gotten much worse') / n() * 100) ``` ``` # A tibble: 6 × 2 educ pct_much_worse <fct> <dbl> 1 No HS 17.0 2 High school graduate 13.1 3 Some college 10.9 4 2-year 9.83 5 4-year 11.1 6 Post-grad 10.6 ``` --- class: center, middle ## Colliders --- ## Colliders `\(X\)` does not cause `\(Y\)`, but `\(X\)` and `\(Y\)` both cause `\(Z\)`. ```r collider_dag <- dagify(Z ~ X + Y) ggdag_classic(collider_dag) + theme_dag() ``` <img src="08-causation_files/figure-html/unnamed-chunk-25-1.png" width="400px" style="display: block; margin: auto;" /> The path between `\(X\)` and `\(Y\)` is naturally closed. Unless you condition on `\(Z\)`... --- ## Colliders: Example <img src="08-causation_files/figure-html/unnamed-chunk-26-1.png" width="600" style="display: block; margin: auto;" /> --- ## Colliders: Example Now select only those with combined scores greater than 1200 and admit them to UGA... ```r collider_dag <- dagify(Admitted ~ Verbal + Math) ggdag_classic(collider_dag) + theme_dag() ``` <img src="08-causation_files/figure-html/unnamed-chunk-27-1.png" width="400px" style="display: block; margin: auto;" /> ```r admitted <- if_else(verbal + math > 1200, 1, 0) ``` --- ## Colliders: Example <img src="08-causation_files/figure-html/unnamed-chunk-28-1.png" width="600" style="display: block; margin: auto;" /> --- ## Collider Bias Just like conditioning on a mediator, conditioning on a collider is a form of **post-treatment bias**. - Conditioning on a mediator closes a true causal path between `\(X\)` and `\(Y\)`. - Conditioning on a collider opens up a non-causal path between `\(X\)` and `\(Y\)`. -- ### Also called: - Selection-distortion effect - Berkson's Paradox - Surivorship bias --- ## Collider Examples Once you start looking for collider bias, it pops up everywhere... - [Hollywood Ruins All The Best Books](https://www.youtube.com/watch?v=FUD8h9JpEVQ) - [Restaurants with terrible ambience often have the best food](https://www.amazon.com/Economist-Gets-Lunch-Everyday-Foodies/dp/B00B1KZ8JG) - ["Why are handsome men such jerks?"](https://slate.com/human-interest/2014/06/berksons-fallacy-why-are-handsome-men-such-jerks.html) ![](img/anzia-berry.png) ??? Take a few minutes and discuss how these observations could have arisen from a collider. --- ## Exercise There's a dataset in the repository at `data/causal-inference/dag-data.csv`. This is the DAG I used to generate it: <img src="08-causation_files/figure-html/unnamed-chunk-29-1.png" width="500px" style="display: block; margin: auto;" /> Using `dagitty` or pen-and-paper, identify which variable(s) you can condition on to recover the true causal effect of `\(X\)` on `\(Y\)`. Then estimate it in `R`. Compare your estimate with the confounded estimate you get from `lm(Y~X)`. --- ## Further Reading <img src="img/book-of-why.jpg" width="300px" style="display: block; margin: auto;" /> --- ## Looking Ahead <br> Broadly speaking, there are two ways you can make credible causal claims: 1. Close all the back door paths between your treatment and outcome. 2. Find a *front door path*. <br> We'll tackle the first one next week, and the second one the week after that.