Week 9: Closing Back Door Paths

---

## Road Map

### Weeks 1-6: Exploration

- Working with data
- Visualization
- Summary Statistics

### Weeks 7-10: Causality

- Linear model 
- Causal diagrams
- **Closing back door paths**
- Finding front door paths

### Weeks 11+: Uncertainty

- Sampling distributions
- Confidence intervals
- Hypothesis testing

---

## Causal Inference

<br>

Broadly speaking, there are two ways you can make credible causal claims:

1. Close all the back door paths between your treatment and outcome.

2. Find a *front door path*.

This week, we tackle number 1. Next week, number 2.

---

## Does democracy reduce political corruption?

???

This will be our motivating research question over the next few weeks.

---

## Let's Play With This Dataset

```r
d <- read_csv('data/week-09/corruption-data.csv')
```

If you'd like more practice with data wrangling, check out the script I use to put it this dataset together, `R/week-09/cleanup-data.R`.

---

## Look At The Data

```r
ggplot(data = d,
       mapping = aes(x=democracy, y=cpi_score)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE)
```

---

## Look At The Data

Notice a neat trick I employed here. If `democracy` is a 0-1 variable, then the slope of the linear model gives you the difference in means between the two groups.

(Actually, you proved this in your calculus problem set. The value that minimizes the sum of squared errors is the mean.)

---

## Fit A Linear Model

The difference in corruption score between democracies and autocracies -- without conditioning on anything -- is:

```r
lm(cpi_score ~ democracy, data = d)
```

```

Call:
lm(formula = cpi_score ~ democracy, data = d)

Coefficients:
(Intercept)    democracy  
      32.09        15.61  
```

The average autocracy scores 32 on the Corruption Perceptions Index. The average democracy scores about 15.6 points higher.

Democracies are less corrupt!

---

## But wait...

---

## A Back Door Path

```r
library(dagitty)
library(ggdag)

dag <- dagify(Corruption ~ Democracy + GDP, Democracy ~ GDP)

ggdag_classic(dag) + theme_dag()
```

---

## 3D View

[Here](https://joeornstein.github.io/pols-7012/week-09.html#d-plots)

<div id="htmlwidget-1a099a1f1cd1d6345a06" style="width:600px;height:360px;" class="plotly html-widget"></div>
<script type="application/json" data-for="htmlwidget-1a099a1f1cd1d6345a06">{"x":{"visdat":{"2044564f7e8c":["function () ","plotlyVisDat"]},"cur_data":"2044564f7e8c","attrs":{"2044564f7e8c":{"x":[62134.0277804831,44814.3115073316,53159.1434717446,101649.074863026,56668.3156096077,72372.2427980477,70005.8885797836,61242.5475414734,57557.8846084843,124569.44515978,53378.3927789903,51663.2492066497,62495.9599403015,49930.1688846785,60398.5991105885,56257.1574745554,39976.9554816851,60257.5346309881,43593.5050635196,89818.2172549684,70089.3240378269,22515.1935903221,50996.0280839373,12355.7935697174,27002.2617508326,65297.5175082743,30516.6772624267,null,16330.8876795581,38742.5457547658,94028.5976845505,43443.5833739784,44011.2315131667,37856.3471346082,18552.7532590413,64847.6517376043,42894.4926800361,39911.8788857671,42422.2786841348,13037.681093997,7489.21150258395,21737.6499065213,41254.40458771,33002.9732949304,15656.6039478208,35170.2056620571,16132.4145175205,12444.441138884,44286.9474898875,28507.4762166219,2325.40791678753,17792.9086435426,45719.008290148,47445.2020829218,23882.3542676011,49040.3429456877,29619.6872725667,10063.6499589568,32503.789920638,14257.9624870448,10517.0670336401,33515.4325286195,20101.1150499252,31148.0007910082,null,4145.22703770579,24040.1170028702,3545.10033995315,25312.0802278685,34962.032489698,10193.4695239053,33323.2757008224,13034.1653285527,11231.5710195248,5652.15060759374,20395.4942407226,3292.54406464072,23040.2879526544,47002.5493359525,16804.425782267,52059.7233148301,2780.63775054572,3432.78130027653,13661.2802492827,2824.0924775131,2274.71973639652,6996.55692650234,7826.16767727681,3709.81127019698,27334.1839780114,28133.0886494052,16012.4457431478,11878.7183637027,15299.9872589341,2319.70737799741,27517.9384466114,13416.4366277108,19494.4062048351,13656.8370383651,17256.02044768,2770.68117187672,2321.23774758047,12334.9172683236,14648.2674019058,12019.9283559331,5443.22483894621,9164.34612258003,null,19276.8963300209,8397.02104225433,16288.8302196838,12861.8365587725,18107.7835662903,32850.8193505311,13598.2612449572,9302.38529765549,12283.8073656393,9003.36423142681,3567.99293874045,1794.31480252607,13341.2105189612,3624.02493904299,1278.69815751027,9110.55582043167,4521.47915130539,5486.15508967045,20944.0290161466,4898.0505701722,15039.8093220622,15611.6486204056,1106.61893517062,2424.33087194347,29181.3632177676,8172.72608750719,5427.05824167399,1667.300515485,19227.7155985312,2675.46157089664,1491.00369641032,5369.7074947396,13246.4617553151,6965.51137380329,5779.67576510782,4547.56376486565,2284.26573573575,4964.09138891893,986.680896715623,7308.48074451221,3803.45724026778,9019.69380356991,12937.4759795684,15195.5296771146,1719.90601296554,1338.09666600215,5362.75841748844,3529.31112569977,5981.45089204649,2961.44642810493,5646.39946843178,4583.01645053007,1649.52917196648,3194.90074820584,null,11362.6914755367,2156.41948202725,784.92698867139,3835.66258234277,2077.40166918098,null,1146.53586681718,3034.01211008755,null,15845.6599284929,19379.2424653675,4122.52668491946,null,null,null,null,null],"y":[1,1,1,0,1,1,1,1,1,1,1,1,null,1,1,1,1,null,1,1,0,1,1,1,1,1,null,1,null,null,0,1,1,1,1,null,1,1,1,null,1,1,1,1,1,1,null,null,1,0,0,null,1,null,1,0,1,1,1,1,0,1,0,1,0,null,1,1,1,1,1,1,1,1,1,null,null,1,0,0,0,1,1,1,1,1,1,0,1,1,0,1,1,1,0,0,1,1,1,1,0,0,1,1,0,0,1,null,0,0,null,1,1,1,1,1,0,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,1,0,0,1,0,0,1,0,0,1,0,1,1,1,1,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0],"z":[88,88,85,85,85,85,84,82,80,80,77,77,77,77,76,76,75,75,74,72,71,71,69,68,67,67,66,65,64,63,63,62,61,61,60,60,60,60,60,59,58,57,57,57,56,56,56,55,54,54,54,53,53,53,53,53,51,51,50,49,49,49,47,47,47,47,45,45,44,44,44,44,44,44,43,43,43,42,42,42,42,42,41,41,41,40,40,40,40,40,40,39,39,38,38,38,38,38,38,38,38,37,37,36,36,36,36,36,36,36,35,35,35,35,34,34,33,33,33,33,33,33,32,31,31,31,31,31,30,30,30,30,30,29,29,29,28,28,28,28,28,27,27,27,27,26,26,26,25,25,25,25,25,25,25,25,24,24,22,21,21,21,21,21,19,19,19,19,19,18,18,18,17,16,16,15,15,14,12,12],"mode":"markers","marker":{"size":5,"color":"black","symbol":104},"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"scatter3d"}},"layout":{"margin":{"b":40,"l":60,"t":25,"r":10},"scene":{"xaxis":{"title":"GDP Per Capita"},"yaxis":{"title":"Democracy"},"zaxis":{"title":"Corruption Perceptions"}},"hovermode":"closest","showlegend":false},"source":"A","config":{"modeBarButtonsToAdd":["hoverclosest","hovercompare"],"showSendToCloud":false},"data":[{"x":[62134.0277804831,44814.3115073316,53159.1434717446,101649.074863026,56668.3156096077,72372.2427980477,70005.8885797836,61242.5475414734,57557.8846084843,124569.44515978,53378.3927789903,51663.2492066497,49930.1688846785,60398.5991105885,56257.1574745554,39976.9554816851,43593.5050635196,89818.2172549684,70089.3240378269,22515.1935903221,50996.0280839373,12355.7935697174,27002.2617508326,65297.5175082743,94028.5976845505,43443.5833739784,44011.2315131667,37856.3471346082,18552.7532590413,42894.4926800361,39911.8788857671,42422.2786841348,7489.21150258395,21737.6499065213,41254.40458771,33002.9732949304,15656.6039478208,35170.2056620571,44286.9474898875,28507.4762166219,2325.40791678753,45719.008290148,23882.3542676011,49040.3429456877,29619.6872725667,10063.6499589568,32503.789920638,14257.9624870448,10517.0670336401,33515.4325286195,20101.1150499252,31148.0007910082,24040.1170028702,3545.10033995315,25312.0802278685,34962.032489698,10193.4695239053,33323.2757008224,13034.1653285527,11231.5710195248,5652.15060759374,23040.2879526544,47002.5493359525,16804.425782267,52059.7233148301,2780.63775054572,3432.78130027653,13661.2802492827,2824.0924775131,2274.71973639652,6996.55692650234,7826.16767727681,3709.81127019698,27334.1839780114,28133.0886494052,16012.4457431478,11878.7183637027,15299.9872589341,2319.70737799741,27517.9384466114,13416.4366277108,19494.4062048351,13656.8370383651,17256.02044768,2770.68117187672,2321.23774758047,12334.9172683236,14648.2674019058,12019.9283559331,5443.22483894621,9164.34612258003,19276.8963300209,8397.02104225433,12861.8365587725,18107.7835662903,32850.8193505311,13598.2612449572,9302.38529765549,12283.8073656393,9003.36423142681,3567.99293874045,1794.31480252607,13341.2105189612,3624.02493904299,1278.69815751027,9110.55582043167,4521.47915130539,5486.15508967045,20944.0290161466,4898.0505701722,15039.8093220622,15611.6486204056,1106.61893517062,2424.33087194347,29181.3632177676,8172.72608750719,5427.05824167399,1667.300515485,19227.7155985312,2675.46157089664,1491.00369641032,5369.7074947396,13246.4617553151,6965.51137380329,5779.67576510782,4547.56376486565,2284.26573573575,4964.09138891893,986.680896715623,7308.48074451221,3803.45724026778,9019.69380356991,12937.4759795684,15195.5296771146,1719.90601296554,1338.09666600215,5362.75841748844,3529.31112569977,5981.45089204649,2961.44642810493,5646.39946843178,4583.01645053007,1649.52917196648,3194.90074820584,11362.6914755367,2156.41948202725,784.92698867139,3835.66258234277,2077.40166918098,1146.53586681718,3034.01211008755,15845.6599284929,19379.2424653675,4122.52668491946],"y":[1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0,1,1,1,0,0,1,1,1,1,0,0,1,1,0,0,1,0,0,1,1,1,1,1,0,0,1,1,0,1,1,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,1,0,0,1,0,0,1,0,0,1,0,1,1,1,1,0,1,0,1,0,0,0,1,0,0,0,1,0,1,0,0,0],"z":[88,88,85,85,85,85,84,82,80,80,77,77,77,76,76,75,74,72,71,71,69,68,67,67,63,62,61,61,60,60,60,60,58,57,57,57,56,56,54,54,54,53,53,53,51,51,50,49,49,49,47,47,45,45,44,44,44,44,44,44,43,42,42,42,42,42,41,41,41,40,40,40,40,40,40,39,39,38,38,38,38,38,38,38,38,37,37,36,36,36,36,36,36,35,35,35,34,34,33,33,33,33,33,33,32,31,31,31,31,31,30,30,30,30,30,29,29,29,28,28,28,28,28,27,27,27,27,26,26,26,25,25,25,25,25,25,25,25,24,24,22,21,21,21,21,19,19,19,19,18,18,17,16,16],"mode":"markers","marker":{"color":"black","size":5,"symbol":104,"line":{"color":"rgba(31,119,180,1)"}},"type":"scatter3d","error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"line":{"color":"rgba(31,119,180,1)"},"frame":null}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>

---

## How Do You Condition On A Continuous Variable?

Every country has its own unique GDP per capita! If you tried to just isolate the countries with national income of, say, $2,770 per person, it would...just be Tanzania. Can't do a statistical comparison with a sample size of 1.

---

## How Do You Condition On A Continuous Variable?

You could try something like this...

```r
d |> 
  mutate(income_group = case_when(gdp_per_capita < 10000 ~ 'Low Income',
                                  gdp_per_capita < 40000 ~ 'Middle Income',
                                  TRUE ~ 'High Income')) |> 
  ggplot(mapping = aes(x=democracy, y=cpi_score)) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  facet_wrap(~income_group)
```

---

## How Do You Condition On A Continuous Variable?

You could try something like this...

But there's still a lot of variation in GDP within each of those subplots. We haven't completely conditioned on GDP -- so the back door path is still open!

---

## How Do You Condition On A Continuous Variable?

**Our Approach**: Add more variables to the linear model, drawing a "[plane of best fit](https://joeornstein.github.io/pols-7012/week-09.html#d-plots)".

---

## What's Going On Under The Hood

---

## Multivariable Linear Regression

The "plane of best fit" is described by a linear model with multiple `$X$` variables on the right hand side.

`$$\text{corruption} = \alpha + \beta_1 \text{democracy} + \beta_2 \text{GDP} + \varepsilon$$`
This model says: "Corruption probably depends on both democracy **and** national income. Richer, democratic countries will tend to have less corruption." We'd like to estimate the slope of both relationships simultaneously!

**Vector Representation**:

`$$\underbrace{\begin{bmatrix} 19 \\ 36 \\ 36 \\ \vdots \\ 24 \end{bmatrix}}_\text{corruption} = \underbrace{\alpha \times \begin{bmatrix} 1 \\ 1 \\1 \\ \vdots \\ 1 \end{bmatrix}}_\alpha + \underbrace{\beta_1 \times \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}}_{\beta_1 \text{democracy}} + \underbrace{\beta_2 \times \begin{bmatrix} 2,156 \\ 14,648 \\ 12,019 \\ \vdots \\ 2,961 \end{bmatrix}}_{\beta_2 \text{GDP}} +  \underbrace{\begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \vdots \\ \varepsilon_n \end{bmatrix}}_\varepsilon$$`
---

## Multivariable Linear Regression

`$$\text{corruption} = \alpha + \beta_1 \text{democracy} + \beta_2 \text{GDP} + \varepsilon$$`

---

## Multivariable Linear Regression

The challenge is to simultaneously estimate `$\alpha$`, `$\beta_1$`, and `$\beta_2$`.

`$$\text{corruption} = \alpha + \beta_1 \text{democracy} + \beta_2 \text{GDP} + \varepsilon$$`
--

We've come as far as we can with scalar algebra. It's time you learned **matrix algebra**.

---

# Matrix Algebra

---

## Matrix Algebra

A **matrix** is a bunch of vectors squished together.

.pull-right[
`$\text{GDP} = \begin{bmatrix} 2,156 \\ 14,648 \\ 12,019 \\ \vdots \\ 2,961 \end{bmatrix}$`
]

<br>

`$$X = \begin{bmatrix} 0 & 2,156 \\ 1 & 14,648 \\ 0 & 12,019 \\ \vdots & \vdots \\ 0 & 2,961 \end{bmatrix}$$`
--

We've been calling this a **dataframe**.

---

## Matrix Algebra

The **dimension** of a matrix refers to the number of rows and columns. An `$m \times n$` matrix has `$m$` rows and `$n$` columns.

```r
dim(d)
```

```
[1] 180   6
```

There are 180 rows and 6 columns in the corruption dataset.

---

## Matrix Algebra

**Adding** and **subtracting** matrices is straightforward. Just add and subtract elementwise.

`$$A + B = \begin{bmatrix} 3 & 3 \\ 6 & 7 \\ 12 & 9 \end{bmatrix}$$`
--

**Multiplying** and **dividing** is where it gets tricky.
  - You can only multiply *some* matrices together (they must be **conformable**)
  - And matrix division isn't really a thing. Instead, we multiply by the matrix's **inverse**.

---

## Matrix Multiplication

---

## Matrix Multiplication

First, let's introduce the **dot product** of two vectors.

`$$a \cdot b = \sum a_i b_i$$`
--

If `$a = [3,1,2]$` and `$b = [1,2,3]$`, then the dot product of `$a$` and `$b$` equals:

`$$a \cdot b =  3 \times 1 + 1 \times 2 + 2 \times 3 = 11$$`
--

In `R`, a dot product can be computed like so:

```r
A <- c(3,1,2)
B <- c(1,2,3)

# dot product
sum(A*B)
```

```
[1] 11
```

---

## Matrix Multiplication

**Exercise:** Take the dot product of `$a$` and `$b$`.

`$a = [1,4,5]$` and `$b = [3,2,1]$`

**Answer:**

`$$a \cdot b =  1 \times 3 + 4 \times 2 + 5 \times 1 = 16$$`

```r
A <- c(1,4,5)
B <- c(3,2,1)

# dot product
sum(A*B)
```

```
[1] 16
```

---

## Matrix Multiplication