Week 5: The Linear Model

Author

Joe Ornstein

This week we introduce the linear model, the workhorse of empirical social science. By the end of this week, you will be able to:

Reading

  • DAFSS Chapter 4

Problem Set

For this problem set, you’ll be analyzing the florida.csv dataset, which reports the total number of votes received by US presidential candidates in each Florida county during the 1996 and 2000 presidential elections.1 Please submit your responses as a knitted R script or Quarto document.

  1. Who received the most votes in the Florida in the 2000 presidential election, and what was the size of their margin compared to the second-place candidate?
  2. How many votes did the third-party candidate (Reform Party) receive in Florida during the 1996 and 2000 elections? In which county did these candidates receive the most votes?
  3. Fit a linear model predicting the number of Reform Party votes we would expect in each county in the year 2000, as a function of the number of Reform Party votes cast in the year 1996. Report and interpret the estimated slope.
  4. Report and interpret the \(R^2\) value from the fitted linear model.
  5. Plot the raw data and the fitted linear model. Does anything stand out in the chart?
  6. Remove the county with the largest prediction error, and fit a new linear model with that county omitted. What is the \(R^2\) value of this new model?
  7. How many votes would you have predicted the Reform candidate to receive in that county in 2000, based on the second linear model you fit?
  8. Bonus. What’s going on here? If you’re not familiar with the context of this election, you may need to do a little research. Write up your conclusions in a brief report (a few paragraphs), and include it with the problem set rendered as a Quarto PDF.

Class Notes

In class, we will discuss differential calculus and how it can be applied to the problem of fitting a linear model to data.

Access the slides here.

Additional Resources

For more on the linear model:

  • Huntington-Klein (2021), chapter 4 (free version available here)

  • Lindeløv (2019) explains how nearly every statistical test is just the linear model in disguise.

For more calculus resources:

  • Horst (2020) explains derivatives with adorable illustrations

  • The Essence of Calculus (3Blue1Brown)

  • Strogatz (2012), Chapters 17 and 18 (ebook available through the UGA library here)

  • Moore and Siegel (2013), Chapter 5 (ebook available through the UGA library here)

For more on the 2000 presidential election in Florida:

  • Wand et al. (2001)

References

Horst, Allison. 2020. “Derivatives Series.” https://allisonhorst.com/derivatives-series.
Huntington-Klein, Nick. 2021. The Effect: An Introduction to Research Design and Causality. S.l.: CHAPMAN & HALL CRC. https://theeffectbook.net/index.html.
Lindeløv, Jonas Kristoffer. 2019. “Common Statistical Tests Are Linear Models (or: How to Teach Stats).” https://lindeloev.github.io/tests-as-linear/.
Moore, Will H., and David A. Siegel. 2013. A Mathematics Course for Political and Social Research. Princeton, NJ: Princeton University Pres.
Strogatz, Steven H. 2012. The Joy of x: A Guided Tour of Math, from One to Infinity. Boston: Houghton Mifflin Harcourt.
Wand, Jonathan N., Kenneth W. Shotts, Jasjeet S. Sekhon, Walter R. Mebane, Michael C. Herron, and Henry E. Brady. 2001. “The Butterfly Did It: The Aberrant Vote for Buchanan in Palm Beach County, Florida.” The American Political Science Review 95 (4): 793–810. https://www.jstor.org/stable/3117714.

Footnotes

  1. Results reported before the manual recount the 2000 election.↩︎