Week 5: The Linear Model
This week we introduce the linear model, the workhorse of empirical social science. By the end of this week, you will be able to:
Fit a linear model in
R
, interpret the results, and use the fitted model to make predictions about unobserved data points.Determine when and how to compute logarithmic transformations of continuous variables.
Use differential calculus to optimize functions, and explain the link between optimization and the least-squares estimator.
Reading
- DAFSS Chapter 4
Problem Set
For this problem set, you’ll be analyzing the florida.csv dataset, which reports the total number of votes received by US presidential candidates in each Florida county during the 1996 and 2000 presidential elections.1 Please submit your responses as a knitted R
script or Quarto document.
- Who received the most votes in the Florida in the 2000 presidential election, and what was the size of their margin compared to the second-place candidate?
- How many votes did the third-party candidate (Reform Party) receive in Florida during the 1996 and 2000 elections? In which county did these candidates receive the most votes?
- Fit a linear model predicting the number of Reform Party votes we would expect in each county in the year 2000, as a function of the number of Reform Party votes cast in the year 1996. Report and interpret the estimated slope.
- Report and interpret the \(R^2\) value from the fitted linear model.
- Plot the raw data and the fitted linear model. Does anything stand out in the chart?
- Remove the county with the largest prediction error, and fit a new linear model with that county omitted. What is the \(R^2\) value of this new model?
- How many votes would you have predicted the Reform candidate to receive in that county in 2000, based on the second linear model you fit?
- Bonus. What’s going on here? If you’re not familiar with the context of this election, you may need to do a little research. Write up your conclusions in a brief report (a few paragraphs), and include it with the problem set rendered as a Quarto PDF.
Class Notes
In class, we will discuss differential calculus and how it can be applied to the problem of fitting a linear model to data.
Access the slides here.
Additional Resources
For more on the linear model:
Huntington-Klein (2021), chapter 4 (free version available here)
Lindeløv (2019) explains how nearly every statistical test is just the linear model in disguise.
For more calculus resources:
Horst (2020) explains derivatives with adorable illustrations
The Essence of Calculus (3Blue1Brown)
Strogatz (2012), Chapters 17 and 18 (ebook available through the UGA library here)
Moore and Siegel (2013), Chapter 5 (ebook available through the UGA library here)
For more on the 2000 presidential election in Florida:
- Wand et al. (2001)
References
Footnotes
Results reported before the manual recount the 2000 election.↩︎