class: center, middle, inverse, title-slide .title[ # Week 9: Closing Back Door Paths ] .subtitle[ ## Linear Regression ] --- ## Road Map -- ### Weeks 1-6: Exploration - Working with data - Visualization - Summary Statistics -- ### Weeks 7-10: Causality - Linear model - Causal diagrams - **Closing back door paths** - Finding front door paths -- ### Weeks 11+: Uncertainty - Sampling distributions - Confidence intervals - Hypothesis testing --- ## Causal Inference <br> Broadly speaking, there are two ways you can make credible causal claims: 1. Close all the back door paths between your treatment and outcome. 2. Find a *front door path*. -- This week, we tackle number 1. Next week, number 2. --- class: center, middle ## Does democracy reduce political corruption? ??? This will be our motivating research question over the next few weeks. --- ## Let's Play With This Dataset ```r d <- read_csv('data/week-09/corruption-data.csv') ``` If you'd like more practice with data wrangling, check out the script I use to put it this dataset together, `R/week-09/cleanup-data.R`. --- ## Look At The Data ```r ggplot(data = d, mapping = aes(x=democracy, y=cpi_score)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) ``` <img src="09-regression_files/figure-html/unnamed-chunk-3-1.png" width="500px" style="display: block; margin: auto;" /> --- ## Look At The Data Notice a neat trick I employed here. If `democracy` is a 0-1 variable, then the slope of the linear model gives you the difference in means between the two groups. <img src="09-regression_files/figure-html/unnamed-chunk-4-1.png" width="500px" style="display: block; margin: auto;" /> (Actually, you proved this in your calculus problem set. The value that minimizes the sum of squared errors is the mean.) --- ## Fit A Linear Model The difference in corruption score between democracies and autocracies -- without conditioning on anything -- is: ```r lm(cpi_score ~ democracy, data = d) ``` ``` Call: lm(formula = cpi_score ~ democracy, data = d) Coefficients: (Intercept) democracy 32.09 15.61 ``` -- The average autocracy scores 32 on the Corruption Perceptions Index. The average democracy scores about 15.6 points higher. -- Democracies are less corrupt! --- class: center, middle ## But wait... --- ## A Back Door Path ```r library(dagitty) library(ggdag) dag <- dagify(Corruption ~ Democracy + GDP, Democracy ~ GDP) ggdag_classic(dag) + theme_dag() ``` <img src="09-regression_files/figure-html/unnamed-chunk-6-1.png" width="600" style="display: block; margin: auto;" /> --- ## 3D View [Here](https://joeornstein.github.io/pols-7012/week-09.html#d-plots)
--- ## How Do You Condition On A Continuous Variable? -- Every country has its own unique GDP per capita! If you tried to just isolate the countries with national income of, say, $2,770 per person, it would...just be Tanzania. Can't do a statistical comparison with a sample size of 1. --- ## How Do You Condition On A Continuous Variable? You could try something like this... ```r d |> mutate(income_group = case_when(gdp_per_capita < 10000 ~ 'Low Income', gdp_per_capita < 40000 ~ 'Middle Income', TRUE ~ 'High Income')) |> ggplot(mapping = aes(x=democracy, y=cpi_score)) + geom_point() + geom_smooth(method = 'lm', se = FALSE) + facet_wrap(~income_group) ``` --- ## How Do You Condition On A Continuous Variable? You could try something like this... <img src="09-regression_files/figure-html/unnamed-chunk-8-1.png" width="700px" style="display: block; margin: auto;" /> -- But there's still a lot of variation in GDP within each of those subplots. We haven't completely conditioned on GDP -- so the back door path is still open! --- ## How Do You Condition On A Continuous Variable? **Our Approach**: Add more variables to the linear model, drawing a "[plane of best fit](https://joeornstein.github.io/pols-7012/week-09.html#d-plots)". --- class: center, middle ## What's Going On Under The Hood --- ## Multivariable Linear Regression The "plane of best fit" is described by a linear model with multiple `\(X\)` variables on the right hand side. -- `$$\text{corruption} = \alpha + \beta_1 \text{democracy} + \beta_2 \text{GDP} + \varepsilon$$` This model says: "Corruption probably depends on both democracy **and** national income. Richer, democratic countries will tend to have less corruption." We'd like to estimate the slope of both relationships simultaneously! -- **Vector Representation**: `$$\underbrace{\begin{bmatrix} 19 \\ 36 \\ 36 \\ \vdots \\ 24 \end{bmatrix}}_\text{corruption} = \underbrace{\alpha \times \begin{bmatrix} 1 \\ 1 \\1 \\ \vdots \\ 1 \end{bmatrix}}_\alpha + \underbrace{\beta_1 \times \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}}_{\beta_1 \text{democracy}} + \underbrace{\beta_2 \times \begin{bmatrix} 2,156 \\ 14,648 \\ 12,019 \\ \vdots \\ 2,961 \end{bmatrix}}_{\beta_2 \text{GDP}} + \underbrace{\begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \vdots \\ \varepsilon_n \end{bmatrix}}_\varepsilon$$` --- ## Multivariable Linear Regression `$$\text{corruption} = \alpha + \beta_1 \text{democracy} + \beta_2 \text{GDP} + \varepsilon$$` <img src="09-regression_files/figure-html/2d-1.png" width="600" style="display: block; margin: auto;" /> --- ## Multivariable Linear Regression The challenge is to simultaneously estimate `\(\alpha\)`, `\(\beta_1\)`, and `\(\beta_2\)`. `$$\text{corruption} = \alpha + \beta_1 \text{democracy} + \beta_2 \text{GDP} + \varepsilon$$` -- We've come as far as we can with scalar algebra. It's time you learned **matrix algebra**. --- class: center, middle # Matrix Algebra --- ## Matrix Algebra A **matrix** is a bunch of vectors squished together. -- .pull-left[ `\(\text{democracy} = \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}\)` ] .pull-right[ `\(\text{GDP} = \begin{bmatrix} 2,156 \\ 14,648 \\ 12,019 \\ \vdots \\ 2,961 \end{bmatrix}\)` ] -- <br> `$$X = \begin{bmatrix} 0 & 2,156 \\ 1 & 14,648 \\ 0 & 12,019 \\ \vdots & \vdots \\ 0 & 2,961 \end{bmatrix}$$` -- We've been calling this a **dataframe**. --- ## Matrix Algebra The **dimension** of a matrix refers to the number of rows and columns. An `\(m \times n\)` matrix has `\(m\)` rows and `\(n\)` columns. -- ```r dim(d) ``` ``` [1] 180 6 ``` There are 180 rows and 6 columns in the corruption dataset. --- ## Matrix Algebra **Adding** and **subtracting** matrices is straightforward. Just add and subtract elementwise. .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 2 & 3 \\ 4 & 4 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 2 & 1 \\ 4 & 4 \\ 8 & 5 \end{bmatrix}\)` ] -- `$$A + B = \begin{bmatrix} 3 & 3 \\ 6 & 7 \\ 12 & 9 \end{bmatrix}$$` -- **Multiplying** and **dividing** is where it gets tricky. - You can only multiply *some* matrices together (they must be **conformable**) - And matrix division isn't really a thing. Instead, we multiply by the matrix's **inverse**. --- class: center, middle ## Matrix Multiplication --- ## Matrix Multiplication -- First, let's introduce the **dot product** of two vectors. `$$a \cdot b = \sum a_i b_i$$` -- If `\(a = [3,1,2]\)` and `\(b = [1,2,3]\)`, then the dot product of `\(a\)` and `\(b\)` equals: `$$a \cdot b = 3 \times 1 + 1 \times 2 + 2 \times 3 = 11$$` -- In `R`, a dot product can be computed like so: ```r A <- c(3,1,2) B <- c(1,2,3) # dot product sum(A*B) ``` ``` [1] 11 ``` --- ## Matrix Multiplication **Exercise:** Take the dot product of `\(a\)` and `\(b\)`. `\(a = [1,4,5]\)` and `\(b = [3,2,1]\)` -- **Answer:** `$$a \cdot b = 1 \times 3 + 4 \times 2 + 5 \times 1 = 16$$` ```r A <- c(1,4,5) B <- c(3,2,1) # dot product sum(A*B) ``` ``` [1] 16 ``` --- ## Matrix Multiplication When you multiply two matrices, you take a series of dot products. .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 2 & 1 \\ 4 & 4 \end{bmatrix}\)` ] -- Each entry in `\(AB\)` is the dot product of a row in `\(A\)` and a column in `\(B\)`. `$$AB = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 4 & 4 \end{bmatrix} = \begin{bmatrix} & \\ & \end{bmatrix}$$` --- ## Matrix Multiplication When you multiply two matrices, you take a series of dot products. .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 2 & 1 \\ 4 & 4 \end{bmatrix}\)` ] Each entry in `\(AB\)` is the dot product of a column in `\(A\)` and a row in `\(B\)`. `$$AB = \begin{bmatrix} \color{red} 1 & \color{red} 2 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} \color{red} 2 & 1 \\ \color{red} 4 & 4 \end{bmatrix} = \begin{bmatrix} \color{red} {10} & \\ & \end{bmatrix}$$` --- ## Matrix Multiplication When you multiply two matrices, you take a series of dot products. .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 2 & 1 \\ 4 & 4 \end{bmatrix}\)` ] Each entry in `\(AB\)` is the dot product of a column in `\(A\)` and a row in `\(B\)`. `$$AB = \begin{bmatrix} \color{red} 1 & \color{red} 2 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} 2 & \color{red} 1 \\ 4 & \color{red} 4 \end{bmatrix} = \begin{bmatrix} 10 & \color{red} 9 \\ & \end{bmatrix}$$` --- ## Matrix Multiplication When you multiply two matrices, you take a series of dot products. .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 2 & 1 \\ 4 & 4 \end{bmatrix}\)` ] Each entry in `\(AB\)` is the dot product of a column in `\(A\)` and a row in `\(B\)`. `$$AB = \begin{bmatrix} \color{red} 1 & \color{red} 2 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} 2 & \color{red} 1 \\ 4 & \color{red} 4 \end{bmatrix} = \begin{bmatrix} 10 & \color{red} 9 \\ & \end{bmatrix}$$` --- ## Matrix Multiplication When you multiply two matrices, you take a series of dot products. .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 2 & 1 \\ 4 & 4 \end{bmatrix}\)` ] Each entry in `\(AB\)` is the dot product of a column in `\(A\)` and a row in `\(B\)`. `$$AB = \begin{bmatrix} 1 & 2 \\ \color{red} 2 & \color{red} 3 \end{bmatrix} \begin{bmatrix} \color{red} 2 & 1 \\ \color{red} 4 & 4 \end{bmatrix} = \begin{bmatrix} 10 & 9 \\ \color{red} {16} & \end{bmatrix}$$` --- ## Matrix Multiplication When you multiply two matrices, you take a series of dot products. .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 2 & 1 \\ 4 & 4 \end{bmatrix}\)` ] Each entry in `\(AB\)` is the dot product of a column in `\(A\)` and a row in `\(B\)`. `$$AB = \begin{bmatrix} 1 & 2 \\ \color{red} 2 & \color{red} 3 \end{bmatrix} \begin{bmatrix} 2 & \color{red} 1 \\ 4 & \color{red} 4 \end{bmatrix} = \begin{bmatrix} 10 & 9 \\ 16 & \color{red}{14} \end{bmatrix}$$` -- <br> This is all very strange and confusing to get used to if you've never seen it before, but we'll soon see that it makes representing our multivariable linear regression problem a whole lot easier. ??? To get the entry in the first row, first column of `\(AB\)`, take the dot product of: - The first row of `\(A\)`, and - The first column of `\(B\)` Then do that for every row in `\(A\)` and every column in `\(B\)`. `$$AB = \begin{bmatrix} 1 \times 2 + 2 \times 4 & 1 \times 1 + 2 \times 4 \\ 2 \times 2 + 3 \times 4 & 2 \times 1 + 3 \times 4 \end{bmatrix} = \begin{bmatrix} 10 & 9 \\ 16 & 14 \end{bmatrix}$$` --- ## Matrix Multiplication You can multiply matrices in `R` with the `%*%` command. ```r A <- cbind(c(1,2), c(2,3)) A ``` ``` [,1] [,2] [1,] 1 2 [2,] 2 3 ``` ```r B <- cbind(c(2,4), c(1,4)) B ``` ``` [,1] [,2] [1,] 2 1 [2,] 4 4 ``` -- ```r A %*% B ``` ``` [,1] [,2] [1,] 10 9 [2,] 16 14 ``` --- ## Matrix Multiplication **Exercise:** Try multiplying these two matrices. .pull-left[ `\(A = \begin{bmatrix} 4 & 1 \\ 1 & 2 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 5 & 5 \\ 2 & 1 \end{bmatrix}\)` ] -- **Answer:** `$$AB = \begin{bmatrix} 4 \times 5 + 1 \times 2 & 4 \times 5 + 1 \times 1 \\ 1 \times 5 + 2 \times 2 & 5 \times 1 + 1 \times 2 \end{bmatrix} = \begin{bmatrix} 22 & 21 \\ 9 & 7 \end{bmatrix}$$` ```r A <- cbind(c(4,1), c(1,2)) B <- cbind(c(5,2), c(5,1)) A %*% B ``` ``` [,1] [,2] [1,] 22 21 [2,] 9 7 ``` --- ## Matrix Multiplication This process -- taking the dot product of rows and columns -- means that you can only multiply two matrices `\(AB\)` if the row vectors of `\(A\)` are the same length as the column vectors in `\(B\)`. -- .pull-left[ `\(A = \begin{bmatrix} 1 & 2 \\ 4 & 3 \\ 1 & 8 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 7 & 2 \\ 4 & 3 \\ 1 & 2 \end{bmatrix}\)` ] These matrices are not **conformable**! You can't take the dot product of the rows and columns. -- ### Formally You can only multiply `\(AB\)` if the dimension of `\(A\)` is `\(m \times k\)` and the dimension of `\(B\)` is `\(k \times n\)`. The result is an `\(m \times n\)` matrix. ??? - In other words, you can only multiply `\(AB\)` if the dimension of `\(A\)` is `\(m \times k\)` and the dimension of `\(B\)` is `\(k \times n\)`. - If this condition holds, then the two matrices are **conformable**. --- ## Matrix Multiplication **Exercise**: Which can you multiply: `\(AB\)` or `\(BA\)`? .pull-left[ `\(A = \begin{bmatrix} 3 & 2 \\ 1 & 2 \\ 2 & 2 \end{bmatrix}\)` ] .pull-right[ `\(B = \begin{bmatrix} 4 & 5 & 1 \\ 5 & 4 & 1 \end{bmatrix}\)` ] .pull-left[ ```r A <- cbind(c(3,1,2), c(2,2,2)) A ``` ``` [,1] [,2] [1,] 3 2 [2,] 1 2 [3,] 2 2 ``` ] .pull-right[ ```r B <- cbind(c(4,5), c(5,4), c(1,1)) B ``` ``` [,1] [,2] [,3] [1,] 4 5 1 [2,] 5 4 1 ``` ] --- ## Matrix Multiplication **Answer:** Both! ```r A %*% B ``` ``` [,1] [,2] [,3] [1,] 22 23 5 [2,] 14 13 3 [3,] 18 18 4 ``` ```r B %*% A ``` ``` [,1] [,2] [1,] 19 20 [2,] 21 20 ``` -- But now try `A %*% A`. I can't do it here because `R` gets so mad it won't even render my slides. --- ## Matrix Multiplication To make matrices conformable for multiplication, sometimes you may need to take the **transpose** of a matrix. The transpose just takes the rows and turns them into columns. .pull-left[ `$$A = \begin{bmatrix} 4 & 1 \\ 1 & 2 \\ 3 & 3 \end{bmatrix}$$` ] .pull-right[ `$$A' = \begin{bmatrix} 4 & 1 & 3 \\ 1 & 2 & 3 \end{bmatrix}$$` ] ```r t(A) ``` ``` [,1] [,2] [,3] [1,] 3 1 2 [2,] 2 2 2 ``` ```r t(A) %*% A ``` ``` [,1] [,2] [1,] 14 12 [2,] 12 12 ``` --- ## Matrix Multiplication Multiplying a vector by its transpose is the same as taking the dot product with itself: `$$a = \begin{bmatrix} 1 \\ 3 \\ 4 \end{bmatrix}$$` `$$a \cdot a = a'a = 1 \times 1 + 3 \times 3 + 4 \times 4 = 26$$` -- Hey, it's the **sum of squares**! That could be useful for something... -- ```r a <- c(1,3,4) sum(a*a) ``` ``` [1] 26 ``` ```r t(a) %*% a ``` ``` [,1] [1,] 26 ``` --- class: center, middle ## Matrix Inversion --- ## Matrix Inversion Before I can teach you how to **divide** matrices, I need to tell you about a very special matrix, called the **identity matrix**. -- Remember how any number times 1 just equals the original number? `$$a \times 1 = a$$` This is called the **identity property**. It's what makes `\(1\)` a very special number. -- <br> The **identity matrix** `\((I)\)` is basically the `\(1\)` of matrices. `$$AI = A$$` You multiply any matrix by `\(I\)` and you get the same matrix back. --- ## Matrix Inversion The identity matrix `\(I_n\)` is an `\(n \times n\)` matrix with ones in the diagonal and zeroes everywhere else. `$$I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$` -- **Exercise:** Try multiplying this matrix with the following matrix `\(A\)`. `$$A = \begin{bmatrix} 2 & 1 & 5 \\ -2 & 8 & 100 \\ 7 & 42 & -42 \end{bmatrix}$$` --- ## Matrix Inversion **Answer:** ```r diag(3) # Create the identity matrix in R with the `diag()` function ``` ``` [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 ``` ```r A <- rbind(c(2, 1, 5), c(-2, 8, 100), c(7, 42, -2)) # multiply AI A %*% diag(3) ``` ``` [,1] [,2] [,3] [1,] 2 1 5 [2,] -2 8 100 [3,] 7 42 -2 ``` --- ## Matrix Inversion Hey, remember how dividing `\(\frac{a}{b}\)` is the same as multiplying `\(a \times \frac{1}{b}\)`? -- - `\(\frac{1}{b} = b^{-1}\)` is called the **inverse** (or reciprocal) of `\(b\)`. -- <br> **Exercise:** What do you get when you multiply a number by its inverse? -- - **Answer:** `\(a \times \frac{1}{a} = a^1a^{-1} = a^0 = 1\)` -- <br> There is an equivalent concept in matrix algebra, called the **matrix inverse**. `$$AA^{-1} = I$$` --- ## Matrix Inversion Good news! It's the 21st century. No one is going to make you solve for matrix inverses by hand. -- There is literally a function in R called `solve()` which will do it for you. ```r solve(A) ``` ``` [,1] [,2] [,3] [1,] 0.49976292 -0.025130394 -0.007112376 [2,] -0.08250356 0.004623044 0.024893314 [3,] 0.01659554 0.009127549 -0.002133713 ``` ```r A %*% solve(A) |> round() ``` ``` [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1 ``` --- ## Matrix Inversion Just like there are some matrices you can't multiply together (**non-conformable** matrices), there are some matrices you can't invert (**non-invertible** or **singular** matrices). -- `$$A = \begin{bmatrix} 1 & 2 & 5 \\ 2 & 4 & 100 \\ 3 & 6 & -42 \end{bmatrix}$$` Try to `solve()` this matrix in `R`. Again, I can't show it to you here because `R` yells at me... ```r A <- rbind(c(1,2,5), c(2,4,100), c(3,6,-42)) # solve(A) ``` -- Notice that the second column is a two times the first column. --- ## Matrix Inversion This matrix is also singular. Notice that the sum of columns 2 and 3 are a multiple of column 1. (In matrix algebra-speak, the columns are not **linearly independent**.) `$$B = \begin{bmatrix} 1 & 1 & 2 \\ 2 & 4 & 2 \\ 3 & 2 & 7 \end{bmatrix}$$` ```r B <- rbind(c(1,1,2), c(2,4,2), c(3,2,7)) # solve(B) ``` ??? I bring this up, dear reader, because there will come a day when you will want to invert a matrix and you will not be able to, because your columns are perfectly correlated with each other! (It's called multicollinearity, and I'll let Mollie tell you all about it.) --- ## Matrix Inversion Now that we know how to multiply by an inverse, we have what we need to perform matrix algebra. **Exercise:** Solve this equation for `\(A\)`. `$$AB = C$$` -- **Answer:** Multiply both sides by `\(B^{-1}\)` `$$ABB^{-1} = CB^{-1}$$` `$$AI = CB^{-1}$$` <br> `$$A = CB^{-1}$$` --- ## Matrix Inversion Watch out for conformability! With matrices, it matters whether you multiply on the right or the left. **Exercise:** Solve for `\(B\)`. `$$AB = C$$` -- **Answer:** `$$A^{-1}AB = A^{-1}C$$` `$$IB = A^{-1}C$$` <br> `$$B = A^{-1}C$$` -- `$$B \neq CA^{-1}$$` --- class: center, middle ## Back to Multivariable Linear Regression --- ## Multivariable Linear Regression This is the regression problem we wanted to solve: `$$\underbrace{\begin{bmatrix} 19 \\ 36 \\ 36 \\ \vdots \\ 24 \end{bmatrix}}_\text{corruption} = \underbrace{\alpha \times \begin{bmatrix} 1 \\ 1 \\1 \\ \vdots \\ 1 \end{bmatrix}}_\alpha + \underbrace{\beta_1 \times \begin{bmatrix} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}}_{\beta_1 \text{democracy}} + \underbrace{\beta_2 \times \begin{bmatrix} 2,156 \\ 14,648 \\ 12,019 \\ \vdots \\ 2,961 \end{bmatrix}}_{\beta_2 \text{GDP}} + \underbrace{\begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \vdots \\ \varepsilon_n \end{bmatrix}}_\varepsilon$$` -- Notice that we can restate it as a matrix multiplication problem: `$$\underbrace{\begin{bmatrix} 19 \\ 36 \\ 36 \\ \vdots \\ 24 \end{bmatrix}}_\text{corruption} = \underbrace{\begin{bmatrix} 1 & 0 & 2,156 \\ 1 & 1 & 14,648 \\ 1 & 0 & 12,019 \\ \vdots & \vdots & \vdots \\ 1 & 0 & 2,961 \end{bmatrix}}_X \underbrace{\begin{bmatrix} \alpha \\ \beta_1 \\ \beta_2 \end{bmatrix}}_\beta + \underbrace{\begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \vdots \\ \varepsilon_n \end{bmatrix}}_\varepsilon = X\beta + \varepsilon$$` ??? How do you simultaneously estiamte `\(\alpha\)`, `\(\beta_1\)`, and `\(\beta_2\)`? --- ## Multivariable Linear Regression `\(X\beta\)` is an `\(n \times 1\)` vector of predicted values, and `\(\varepsilon\)` is an `\(n \times 1\)` vector of errors. `$$Y = X\beta + \varepsilon$$` -- Just like before, we want to minimize the sum of squared errors: `$$\varepsilon \cdot \varepsilon = \varepsilon'\varepsilon = (Y - X\beta)'(Y-X\beta)$$` -- Minimizing this expression follows the same three steps we used with scalar calculus. Just be careful with matrix multiplication and division. Start by distributing the function: `$$f(X,Y,\beta) = (Y - X\beta)'(Y-X\beta) = Y'Y - 2(X\beta)'Y + (X\beta)'X\beta$$` --- ## Estimating The Regression Parameters **Step 1: Take the derivative with respect to `\(\beta\)`** `$$f(X,Y,\beta) = Y'Y - 2(X\beta)'Y + (X\beta)'X\beta$$` `$$\frac{\partial f}{\partial \beta} = -2X'Y + 2 X'X \beta$$` -- **Step 2: Set the derivative equal to zero** `$$-2X'Y + 2 X'X \beta = 0$$` -- **Step 3: Solve for `\(\beta\)`** `$$2 X'X \beta = 2 X'Y$$` `$$\hat{\beta} = (X'X)^{-1} X'Y$$` --- ## The Ordinary Least Squares (OLS) Estimator <br> <br> <br> ### `$$\hat{\beta} = (X'X)^{-1} X'Y$$` --- ## Now We Can Estimate... ```r # drop the missing values d <- filter(d, complete.cases(d)) # create the Y vector Y <- d$cpi_score # create the X matrix X <- d |> select(democracy, gdp_per_capita) |> mutate(intercept = 1) |> as.matrix() head(X) ``` ``` democracy gdp_per_capita intercept [1,] 1 62134.03 1 [2,] 1 44814.31 1 [3,] 1 53159.14 1 [4,] 0 101649.07 1 [5,] 1 56668.32 1 [6,] 1 72372.24 1 ``` --- ## Now We Can Estimate... The vector of estimates that minimizes the sum of squared errors equals `\((X'X)^{-1}(X'Y)\)`: ```r beta_hat <- solve(t(X) %*% X) %*% t(X) %*% Y beta_hat ``` ``` [,1] democracy 8.873370295 gdp_per_capita 0.000623775 intercept 23.672945358 ``` -- ```r lm(cpi_score ~ democracy + gdp_per_capita, data = d) ``` ``` Call: lm(formula = cpi_score ~ democracy + gdp_per_capita, data = d) Coefficients: (Intercept) democracy gdp_per_capita 2.367e+01 8.873e+00 6.238e-04 ``` --- ## What's Going On Here? Previously, we showed that the line of best fit between `cpi_score` and `democracy` had this slope: ```r lm1 <- lm(cpi_score ~ democracy, data = d) coef(lm1) ``` ``` (Intercept) democracy 34.000 13.875 ``` -- Now the slope is almost half that... ```r lm2 <- lm(cpi_score ~ democracy + gdp_per_capita, data = d) coef(lm2) ``` ``` (Intercept) democracy gdp_per_capita 23.672945358 8.873370295 0.000623775 ``` ...suggesting that the observed relationship between democracy and corruption was partially confounded by national income! --- class: center, middle # Looking Forward --- ## Limitations There are some important limitations to the approach we just took. In order for your causal inference to be credible, you need to believe two major things: -- 1. The relationship between your predictors and your outcome is not *nonlinear* in some way. 2. You've identified *all* of the significant back door paths and closed them off. -- <br> If you're worried about number 1, there are some diagnostic tests you can run (more on that in POLS 7014) or you could try something like **matching** methods (Chapter 14 of *The Effect*). -- <br> If you're worried about number 2, then next week will be awesome for you! --- ## Next Week: Walking Through The Front Door ```r dagify(X ~ Z + U, Y ~ X + U) |> ggdag_classic() + theme_dag() ``` <img src="09-regression_files/figure-html/unnamed-chunk-10-1.png" width="600" style="display: block; margin: auto;" />