class: center, middle, inverse, title-slide # POLS 7012 ## Introduction to Political Methodology --- ### How do we learn about the world? -- We tell each other stories! -- <img src="img/daniel-tiger-1.png" width="500" style="display: block; margin: auto;" /> -- .pull-left[ <img src="img/daniel-tiger-2.jfif" width="300" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="img/daniel-tiger-4.jpg" width="300" style="display: block; margin: auto;" /> ] --- ### How do we learn about the world? -- - We tell stories to communicate lessons about how the world works. -- - Storytelling may be one of the few things shared by every culture. -- - But there are limits... -- - What lessons can we actually learn from the Daniel Tiger story? -- <br> There are three things we cannot learn about from individual stories: -- 1. Variation - "What happened to other children besides Daniel Tiger when they went to the hospital?" -- 2. Counterfactuals - "What if Daniel Tiger didn't go to the hospital?" -- 3. Generalizability - "Was Daniel's story just a fluke?" --- ## Science is story-telling, but with rules. -- - Instead of telling just one story, we tell *multiple* stories. -- - This allows us to look at variation in outcomes, assess counterfactuals, and more confidently draw general conclusions. -- - Inevitably, this means abstracting away details. -- - *Quantitative Research*: Lots of stories, less detail -- - *Qualitative Research*: Fewer stories, more detail -- - In POLS 7012 we learn the tools of quantitative research, to tell lots of stories simultaneously... -- - ...in a way that yields insights about variation, causality, and uncertainty. --- class: center, middle, inverse # Part 1: Getting Your Stories Straight ## (Measurement and Description) --- ## Part 1: Getting Your Stories Straight -- - When we analyze data, we're telling many different stories at a high level of abstraction. -- | Name | Species | Age | Hospitalized | Pajama Choice | |-------|-------|-----|-----|-----| | Daniel | Tiger | 4 | Yes | Ducks | | Katerina | Kitty Cat | 4 | No | Ballerinas | | Ms. Elaina | Human | 5 | Yes | Ducks | | O | Owl | 4 | Yes | Books | | Jodi | Platypus | 4 | No | Books | This is a **dataset**. -- - In Part 1 of the course, we'll learn how to collect, tidy, and describe datasets. --- ## Part 1: Getting Your Stories Straight ```r dataset ``` ``` # A tibble: 100 x 3 temperature hospitalized died <dbl> <chr> <chr> 1 103. Yes No 2 98.9 No No 3 101. No Yes 4 101. Yes Yes 5 101. Yes No 6 99.8 No No 7 103. Yes Yes 8 99.8 No No 9 104. Yes Yes 10 99.9 No No # ... with 90 more rows ``` -- It's difficult to make sense of 100 stories all at once. -- - We need to compute *statistics*, numbers that communicate some feature of the dataset. --- ## Part 1: Getting Your Stories Straight ```r dataset |> group_by(hospitalized) |> summarize(count = n(), number_died = sum(died == 'Yes'), pct_dead = number_died / count * 100) ``` ``` # A tibble: 2 x 4 hospitalized count number_died pct_dead <chr> <int> <int> <dbl> 1 No 52 12 23.1 2 Yes 48 16 33.3 ``` -- - Each of these are *descriptive statistics*, which tell us something about the dataset. -- - For example, people in this dataset are about 10% more likely to die if they go to the hospital. -- - Wait. What? --- class: center, middle, inverse # Part 2: What If? ## (Counterfactuals and Causality) --- ## Part 2: Counterfactuals and Causality ``` # A tibble: 2 x 4 hospitalized count number_died pct_dead <chr> <int> <int> <dbl> 1 No 52 12 23.1 2 Yes 48 16 33.3 ``` 10% more people died in the hospitalized group than in the non-hospitalized group. -- - But what conclusions can we draw from that? -- - In Part 2 of the course, we think carefully about when patterns in a dataset imply a causal relationship and when they don't. --- ## Part 2: Counterfactuals and Causality - You've likely guessed the problem. -- - People who go to the hospital tend to be sicker than people who don't go to the hospital. -- ```r dataset |> group_by(hospitalized) |> summarize(avg_temp = mean(temperature)) ``` ``` # A tibble: 2 x 2 hospitalized avg_temp <chr> <dbl> 1 No 98.6 2 Yes 102. ``` -- - Comparing hospitalized vs. non-hospitalized people may be interesting, but it doesn't reveal the effect of hospitals on mortality. -- - We need an "all else equal" comparison. --- ## Part 2: Counterfactuals and Causality In POLS 7012, we learn a few techniques for making the right comparisons. -- ### 1. "Controlling" other variables -- ```r dataset |> filter(temperature > 100) |> group_by(hospitalized) |> summarize(count = n(), number_died = sum(died == 'Yes'), pct_dead = number_died / count * 100) ``` ``` # A tibble: 2 x 4 hospitalized count number_died pct_dead <chr> <int> <int> <dbl> 1 No 10 8 80 2 Yes 44 15 34.1 ``` --- ## Part 2: Causality In POLS 7012, we learn a few techniques for making the right comparisons. ### 2. Experiments -- - Great because randomly assigning treatment ensures that the two groups are not different on average. -- - But...probably unethical in this case. --- ## Part 2: Counterfactuals and Causality In POLS 7012, we learn a few techniques for making the right comparisons. ### 3. Discontinuity Designs <div class="figure" style="text-align: center"> <img src="img/card-2009.png" alt="Card et al. (2009)" width="500" /> <p class="caption">Card et al. (2009)</p> </div> --- class: center, middle, inverse # Part 3: How Many Stories Is Enough? ## (Uncertainty and Statistical Inference) --- ## Part 3: Uncertainty - In the final few weeks of class, we discuss how to measure **uncertainty**. -- - How certain can we be that the patterns we observe in our data aren't just a random fluke? -- - Will the findings from our sample *generalize* to a larger population? -- - To answer that question, we need probability theory. -- ```r dataset |> filter(temperature > 100) |> group_by(hospitalized) |> summarize(count = n(), number_died = sum(died == 'Yes'), pct_dead = number_died / count * 100) ``` ``` # A tibble: 2 x 4 hospitalized count number_died pct_dead <chr> <int> <int> <dbl> 1 No 10 8 80 2 Yes 44 15 34.1 ``` -- That looks like a big difference, but...maybe we just drew a weird sample? --- ## Part 3: Uncertainty What's the chance that 8 out of 10 people would die in the hospital, given its mortality rate (34.1%)? -- <img src="01-stories_files/figure-html/unnamed-chunk-12-1.png" width="600" style="display: block; margin: auto;" /> --- class: center, middle, inverse # In Summary --- ## In Summary This semester, we'll learn to: -- - work confidently with data -- - organize our work in code so tht it's transparent and reproducible -- - design research to credibly identify causation (not just correlation) -- - build basic statistical models to quantify the uncertainty of our conclusions -- ## Today -- 1. Download R and RStudio -- 2. Become familiar with some programming basics -- 3. Analyze our first dataset!