Week 3: Experiments
This week we introduce the Fundamental Problem of Causal Inference, and discuss why randomized experiments are generally considered the gold standard approach to that problem. By the end of this week you will be able to:
Describe the potential outcomes framework
Compute a difference-in-means using
R
Write scripts that perform other essential data analysis tasks, like subsetting and creating new variables.
Reading
- DAFSS Chapter 2
Problem Set
Please submit the responses to the following exercises as a PDF knitted from an R
script or Quarto document.
- Load the voting.csv dataset from our previous problem set. What percent of registered voters who received the social pressure message turned out to vote in the 2006 election? What percent of those who did not receive the social pressure message turned out?
- Estimate the average treatment effect of the social pressure message on the probability of voting. Briefly explain why we can or cannot interpret this estimate as a causal effect.
- What was the average birth year of respondents who received the social pressure message? What about those who did not receive the social pressure message? Explain this result.
- Create a new variable called
young
, equal to 1 if the respondent was less than 25 years old in 2006 and 0 otherwise. - Was the average effect of the social pressure message larger or smaller for young respondents than for the rest of the sample? Interpret this result.
- Bonus. Estimate the average effect of the social pressure message for each birth decade cohort (e.g. everyone born in the 1960s, 1970s, 1980s, etc.). Do any of your estimates stand out as odd? What is the most sensible way to interpret this result?
Additional Resources
- Pearl and Mackenzie (2019)
References
Pearl, Judea, and Dana Mackenzie. 2019. The book of why: the new science of cause and effect. Penguin science. London: Penguin Books.