POLS 7012

# POLS 7012
## Introduction to Political Methodology

---

### How do we learn about the world?

We tell each other stories!

<img src="img/daniel-tiger-2.jfif" width="300" style="display: block; margin: auto;" />
]
--

.pull-right[
<img src="img/daniel-tiger-4.jpg" width="300" style="display: block; margin: auto;" />
]

---

### How do we learn about the world?

- We tell stories to communicate lessons about how the world works.

- Storytelling may be one of the few things shared by every culture.

- But there are limits...

- What lessons can we actually learn from the Daniel Tiger story?
  
--

<br>

There are three things we cannot learn about from individual stories:

1. Variation
  
    - "What happened to other children besides Daniel Tiger when they went to the hospital?"
  
--

2. Counterfactuals
  
    - "What if Daniel Tiger didn't go to the hospital?"
  
--

3. Generalizability 
  
    - "Was Daniel's story just a fluke?"

---

## Science is story-telling, but with rules.

- Instead of telling just one story, we tell *multiple* stories.

- This allows us to look at variation in outcomes, assess counterfactuals, and more confidently draw general conclusions.

- Inevitably, this means abstracting away details.

- *Quantitative Research*: Lots of stories, less detail

- *Qualitative Research*: Fewer stories, more detail
  
--

- In POLS 7012 we learn the tools of quantitative research, to tell lots of stories simultaneously...

- ...in a way that yields insights about variation, causality, and uncertainty.

---

# Part 1: Getting Your Stories Straight

## (Measurement and Description)

---

## Part 1: Getting Your Stories Straight

- When we analyze data, we're telling many different stories at a high level of abstraction.

| Name | Species | Age | Hospitalized | Pajama Choice |
|-------|-------|-----|-----|-----|
| Daniel     | Tiger     | 4   | Yes   | Ducks |
| Katerina     | Kitty Cat     | 4   | No   | Ballerinas  | 
| Ms. Elaina     | Human     | 5   | Yes   | Ducks |
| O     | Owl     | 4  | Yes   | Books |
| Jodi     | Platypus     | 4   | No | Books   |

This is a **dataset**.

- In Part 1 of the course, we'll learn how to collect, tidy, and describe datasets.

---

## Part 1: Getting Your Stories Straight

```r
dataset
```

```
# A tibble: 100 x 3
   temperature hospitalized died 
         <dbl> <chr>        <chr>
 1       103.  Yes          No   
 2        98.9 No           No   
 3       101.  No           Yes  
 4       101.  Yes          Yes  
 5       101.  Yes          No   
 6        99.8 No           No   
 7       103.  Yes          Yes  
 8        99.8 No           No   
 9       104.  Yes          Yes  
10        99.9 No           No   
# ... with 90 more rows
```

It's difficult to make sense of 100 stories all at once.

- We need to compute *statistics*, numbers that communicate some feature of the dataset.

---

## Part 1: Getting Your Stories Straight

```r
dataset |> 
  group_by(hospitalized) |> 
  summarize(count = n(), 
            number_died = sum(died == 'Yes'), 
            pct_dead = number_died / count * 100)
```

```
# A tibble: 2 x 4
  hospitalized count number_died pct_dead
  <chr>        <int>       <int>    <dbl>
1 No              52          12     23.1
2 Yes             48          16     33.3
```

- Each of these are *descriptive statistics*, which tell us something about the dataset.

- For example, people in this dataset are about 10% more likely to die if they go to the hospital.

- Wait. What?

---

# Part 2: What If?

## (Counterfactuals and Causality)

---

## Part 2: Counterfactuals and Causality

```
# A tibble: 2 x 4
  hospitalized count number_died pct_dead
  <chr>        <int>       <int>    <dbl>
1 No              52          12     23.1
2 Yes             48          16     33.3
```

10% more people died in the hospitalized group than in the non-hospitalized group.

- But what conclusions can we draw from that?

- In Part 2 of the course, we think carefully about when patterns in a dataset imply a causal relationship and when they don't.

---

## Part 2: Counterfactuals and Causality

- You've likely guessed the problem.

- People who go to the hospital tend to be sicker than people who don't go to the hospital.

```r
dataset |> 
  group_by(hospitalized) |> 
  summarize(avg_temp = mean(temperature))
```

```
# A tibble: 2 x 2
  hospitalized avg_temp
  <chr>           <dbl>
1 No               98.6
2 Yes             102. 
```

- Comparing hospitalized vs. non-hospitalized people may be interesting, but it doesn't reveal the effect of hospitals on mortality.

- We need an "all else equal" comparison.

---

## Part 2: Counterfactuals and Causality

In POLS 7012, we learn a few techniques for making the right comparisons.

### 1. "Controlling" other variables

```r
dataset |> 
  filter(temperature > 100) |> 
  group_by(hospitalized) |> 
  summarize(count = n(), 
            number_died = sum(died == 'Yes'), 
            pct_dead = number_died / count * 100)
```

```
# A tibble: 2 x 4
  hospitalized count number_died pct_dead
  <chr>        <int>       <int>    <dbl>
1 No              10           8     80  
2 Yes             44          15     34.1
```

---

## Part 2: Causality

In POLS 7012, we learn a few techniques for making the right comparisons.

### 2. Experiments

- Great because randomly assigning treatment ensures that the two groups are not different on average.
  
--

- But...probably unethical in this case.

---

## Part 2: Counterfactuals and Causality

In POLS 7012, we learn a few techniques for making the right comparisons.

### 3. Discontinuity Designs

---

# Part 3: How Many Stories Is Enough?

## (Uncertainty and Statistical Inference)

---

## Part 3: Uncertainty

- In the final few weeks of class, we discuss how to measure **uncertainty**.

- How certain can we be that the patterns we observe in our data aren't just a random fluke?

- Will the findings from our sample *generalize* to a larger population?

- To answer that question, we need probability theory.

```
# A tibble: 2 x 4
  hospitalized count number_died pct_dead
  <chr>        <int>       <int>    <dbl>
1 No              10           8     80  
2 Yes             44          15     34.1
```

That looks like a big difference, but...maybe we just drew a weird sample?

---

## Part 3: Uncertainty

What's the chance that 8 out of 10 people would die in the hospital, given its mortality rate (34.1%)?

---

# In Summary

---

## In Summary

This semester, we'll learn to:

- work confidently with data

- organize our work in code so tht it's transparent and reproducible

- design research to credibly identify causation (not just correlation)

- build basic statistical models to quantify the uncertainty of our conclusions

## Today

1. Download R and RStudio

2. Become familiar with some programming basics

3. Analyze our first dataset!