Week 8: Causal Inference

---

## Schedule

### Previously

`\(Y = X\beta + \varepsilon\)`, and how to estimate the best `\(\beta\)` using calculus.

### Today

How to combine the linear model with theory to make credible causal inferences.

???

Last week we learned the basics of the lienar model, which is this all-purpose technology we use to describe the relationship between two variables. We learned calculus so that we could optimize, finding the value of `\(\beta\)` that was the best fit to the data.

But we ended last week with a note of disappointment, because the linear model alone can't get us where we want to go. At its core, it's still just describing observed correlations. Today, we explore how, by combining that linear model engine with deep substantive knowledge of your topic to make credible causal inferences.

---

## The Fundamental Problem of Causal Inference

---

## The Fundamental Problem of Causal Inference

- I have some treatment `\((X)\)` and some outcome `\((Y)\)`. I want to know if `\(X\)` causes `\(Y\)`. In other words, if changing the value of `\(X\)` would result in a different value of `\(Y\)`.

- The **Fundamental Problem of Causal Inference** is that every observation is either treated or not treated. I can't see what would happen in the alternate universe where an untreated observation is treated.

---

## The Potential Outcomes Framework

| Alive w/o Hospital `\((Y_0)\)` | Alive w/ Hospital `\((Y_1)\)` | Hospital `\((X)\)` | Alive `\((Y)\)` |
|-------|-------|-----|-----|
| 0     | 1     | 1   | 1   |
| 1     | 1     | 0   | 1   |
| 0     | 0     | 1   | 0   |
| 0     | 1     | 1   | 1   |
| 1     | 1     | 0   | 1   |
| 0     | 0     | 1   | 0   |
| 0     | 1     | 1   | 1   |
| 0     | 0     | 1   | 0   |
| 1     | 1     | 0   | 1   |
| 1     | 1     | 1   | 1   |

The big problem here is that the **potential outcomes** and the **treatment** are correlated. Sick people are more likely to go to the hospital!

???

We'd like to know `\(Y_1\)` - `\(Y_0\)`, the treated outcome minus the untreated outcome. If we knew all the potential outcomes, then it would be clear that hospitals don't cause death. There's no one on the table here that has a worse outcome because of the hospital. But we can't do that, because we only ever see one of the potential outcomes. The best we can do is estimate the **average treatment effect**.

---

## Causal Diagrams Help Make Sense of All This

The relationship between hospitals and death is **confounded** by the severity of a person's illness. Unless we can hold severity constant (aka **condition on** or **control for** severity), we can't confidently say whether hospitals are good for your health.

---

# Drawing Causal Diagrams

Today, we're going to show how causal diagrams can help us think through the problem of identifying causal effects. These causal diagrams -- also known as **DAGs** (directed acyclic graphs) -- are a way of representing the relationships between a web of variables.

### Some Useful R Packages

```r
# for working with DAGs
library(dagitty)

# for visualizing DAGs
library(ggdag)
```

---

## Three Shapes to Recognize

In this lecture, we'll introduce three ways in which a statistical relationship between X and Y can be **confounded**.

![](img/three-elemental-confounds.png)
---

## Shape 1: The Fork

---

## Forks

```r
fork <- dagify(Y ~ Gender,
 X ~ Gender)

ggdag_classic(fork) + theme_dag()
```

---

## Forks

- The relationship between `\(X\)` and `\(Y\)` is **confounded** by the presence of the backdoor path through gender.

---

## Forks

Simulate some example data:

```r
# number of observations
n <- 1000

# create a data frame of simulated data
simulated_data <- tibble(
 # randomly assign each observation as female or male
 female = sample(c(0,1) , size = n, replace = TRUE),
 
 # X is caused by gender + some random value (epsilon)
 X = 2 * female + rnorm(n, 0 , 1),
 
 # Y is caused by gender + some random value (epsilon)
 Y = 3 * female + rnorm(n, 0 , 1)
) 
```

Notice that `\(X\)` is not causally related to `\(Y\)` at all. Any observed relationship between the two is driven by gender.

---

## Forks

```r
ggplot(data = simulated_data, mapping = aes(x=X, y=Y)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE)
```

---

## Forks

- Every backdoor path is an *alternative explanation* for the observed correlation between `\(X\)` and `\(Y\)`.

- To account for that alternative explanation, we must find a way to *condition on gender*. If gender is held constant, then it cannot explain the relationship between `\(X\)` and `\(Y\)`.

---

## Forks

Here's what conditioning on gender looks like graphically...

```r
ggplot(data = simulated_data, mapping = aes(x = X, y = Y)) + 
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  facet_wrap(~female)
```

---

## Forks

And with a linear model...

```r
lm(Y ~ X, data = simulated_data)
```

```

Call:
lm(formula = Y ~ X, data = simulated_data)

Coefficients:
(Intercept)            X  
     0.7830       0.7045  
```

```r
lm(Y ~ X, data = filter(simulated_data, female == 0))
```

```

Call:
lm(formula = Y ~ X, data = filter(simulated_data, female == 0))

Coefficients:
(Intercept)            X  
    0.01854      0.01380  
```

---

## Exercise

In groups, take five minutes to draw as many examples of forks as you can.

---

## Backdoor Paths

---

## Backdoor Paths

Backdoor paths are essentially giant forks: they start with a variable that causes the treatment and end with a variable that causes the outcome.

```r
gnarly_dag <- dagify(X ~ Z, Z ~ Q, Q ~ W, U ~ W, Y ~ U, Y ~ X)

ggdag_classic(gnarly_dag) + theme_dag()
```

Conveniently, you can shut down a backdoor path by conditioning on *any* variable along it. Conditioning on `\(Z\)` or `\(Q\)` or `\(W\)` or `\(U\)` would close the backdoor path.

---

## Backdoor Paths

The `adjustmentSets()` function from `daggity` is useful as your DAGs get more and more complex.

```r
adjustmentSets(gnarly_dag, exposure = 'X', outcome = 'Y')
```

```
{ U }
{ W }
{ Q }
{ Z }
```

---

## Exercise

What variables can you condition on to close the backdoor paths between `\(X\)` and `\(Y\)`?

---

## Closing backdoor paths is like...

![](img/sherlock.gif)

*"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth."*
]

---

But...why not just condition on *everything*? It would definitely close all the backdoor paths, right?

---

### The Kitchen Sink Approach to Causal Inference

Why not just condition on every variable that could possibly influence your outcome?

This approach has several major flaws, and you should avoid taking it. To better understand these flaws, let's explore the other shapes we might find in a causal diagram...

???

The kitchen sink approach is intuitively appealing. Control for everything that could possibly influence the outcome. But this approach has some major flaws, and you should never do it. Chief among its flaws is **post-treatment bias**.

---

## Shape 2: The Pipe (aka the Mediator)

---

## Mediators

```r
dagify(M ~ X, Y ~ M) |> 
  ggdag_classic() +
  theme_dag()
```

`\(X\)` causes `\(Y\)` through its influence on `\(M\)`. Another way of saying that is that `\(M\)` is the *mechanism* through which `\(X\)` causes `\(Y\)`.

---

## Mediators

Here, we'll simulate some data with that `\(X \rightarrow M \rightarrow Y\)` structure.

```r
# simulated data
simulated_data <- tibble(
 
 # X is drawn randomly
 X = rnorm(n, 0, 1),
 
 # M is caused by X
 M = as.numeric(X > rnorm(n, 0, 0.5)),
 
 # Y is caused by M
 Y = 3*M + rnorm(n,0,1)
)
```

---

## Mediators

Plotting that data shows that `\(X\)` is strongly correlated with `\(Y\)`...

```r
ggplot(data = simulated_data, mapping = aes(x=X, y=Y)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE)
```

---

## Mediators

...unless you condition on `\(M\)`.

```r
ggplot(data = simulated_data, mapping = aes(x=X, y=Y)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  facet_wrap(~M)
```

---

## Mediators

```r
lm(Y ~ X, data = simulated_data)
```

```

Call:
lm(formula = Y ~ X, data = simulated_data)

Coefficients:
(Intercept)            X  
      1.448        1.141  
```

```r
lm(Y ~ X, data = filter(simulated_data, M == 1))
```

```

Call:
lm(formula = Y ~ X, data = filter(simulated_data, M == 1))

Coefficients:
(Intercept)            X  
     2.8581       0.1642  
```

---

## Mediators

Conditioning on a mediator closes the path between treatment and outcome. This is called **post-treatment bias**.

![](img/montgomery-nyhan-torres.png)

---

## Mediators

But sometimes you may want to condition on a mediator -- for example, when you're performing a **mediation analysis** (Baron & Kenny, 1986). If conditioning on `\(M\)` weakens the association between `\(X\)` and `\(Y\)`, that lends support to your theory that it was a mediator!

The point is to never blindly condition on a post-treatment variable unless you *want* to shut down that causal pathway.

---

## Exercise

In groups, take five minutes to draw as many examples of mediators as you can.

---

## Simpson's Paradox

---

## Simpson's Paradox

Sometimes, conditioning on a third variable can completely reverse the relationship between `\(X\)` and `\(Y\)`.

### Classic Example (Charig et al., 1986):

Which is the better treatment?

---

## Simpson's Paradox

---

## Simpson's Paradox

Relatedly, it seems like hospitals cause death unless you condition on illness...

---

## Simpson's Paradox

Another classic example: in 1973 a lawsuit was brought against UC Berkeley for discriminating against women in graduate admissions. That year, only 35% of women applicants were admitted versus 43% of male applicants.

But when you look at the breakdown by department...

![](img/bickel.jpg)
--

So, was Berkeley discriminatory or not? Does being female cause your chances of admission to decrease?

---

## Simpson's Paradox: Exercise

Should you condition by Trump approval?

```r
ces |> 
  group_by(educ) |> 
  summarize(pct_much_worse = sum(
    national_economy == 'Gotten much worse') / n() * 100)
```

```
# A tibble: 6 × 2
 educ pct_much_worse
 <fct> <dbl>
1 No HS 32.8
2 High school graduate 33.1
3 Some college 40.2
4 2-year 39.0
5 4-year 46.7
6 Post-grad 54.8
```

---

## Simpson's Paradox: Exercise

Should you condition by Trump approval?

```r
ces |> 
  filter(trump_approval %in% c('Strongly approve',
                               'Somewhat approve')) |> 
  group_by(educ) |> 
  summarize(pct_much_worse = sum(
    national_economy == 'Gotten much worse') / n() * 100)
```

```
# A tibble: 6 × 2
 educ pct_much_worse
 <fct> <dbl>
1 No HS 17.0 
2 High school graduate 13.1 
3 Some college 10.9 
4 2-year 9.83
5 4-year 11.1 
6 Post-grad 10.6 
```

---

## Colliders

---

## Colliders

`\(X\)` does not cause `\(Y\)`, but `\(X\)` and `\(Y\)` both cause `\(Z\)`.

```r
collider_dag <- dagify(Z ~ X + Y)

ggdag_classic(collider_dag) + theme_dag()
```

The path between `\(X\)` and `\(Y\)` is naturally closed. Unless you condition on `\(Z\)`...

---

## Colliders: Example

---

## Colliders: Example

Now select only those with combined scores greater than 1200 and admit them to UGA...

```r
collider_dag <- dagify(Admitted ~ Verbal + Math)

ggdag_classic(collider_dag) + theme_dag()
```

```r
admitted <- if_else(verbal + math > 1200, 1, 0)
```

---

## Colliders: Example

---

## Collider Bias

Just like conditioning on a mediator, conditioning on a collider is a form of **post-treatment bias**.

- Conditioning on a mediator closes a true causal path between `\(X\)` and `\(Y\)`.

- Conditioning on a collider opens up a non-causal path between `\(X\)` and `\(Y\)`.

### Also called:

- Selection-distortion effect

- Berkson's Paradox

- Surivorship bias

---

## Collider Examples

Once you start looking for collider bias, it pops up everywhere...

- [Hollywood Ruins All The Best Books](https://www.youtube.com/watch?v=FUD8h9JpEVQ)

- [Restaurants with terrible ambience often have the best food](https://www.amazon.com/Economist-Gets-Lunch-Everyday-Foodies/dp/B00B1KZ8JG)

- ["Why are handsome men such jerks?"](https://slate.com/human-interest/2014/06/berksons-fallacy-why-are-handsome-men-such-jerks.html)

![](img/anzia-berry.png)

???

Take a few minutes and discuss how these observations could have arisen from a collider.

---

## Exercise

There's a dataset in the repository at `data/causal-inference/dag-data.csv`. This is the DAG I used to generate it:

Using `dagitty` or pen-and-paper, identify which variable(s) you can condition on to recover the true causal effect of `\(X\)` on `\(Y\)`. Then estimate it in `R`. Compare your estimate with the confounded estimate you get from `lm(Y~X)`.

---

## Further Reading

---

## Looking Ahead

Broadly speaking, there are two ways you can make credible causal claims:

1. Close all the back door paths between your treatment and outcome.

2. Find a *front door path*.

We'll tackle the first one next week, and the second one the week after that.