Bell Curves

POLS 3220: How to Predict the Future

Today’s Agenda

Introduce the bell curve probability distribution
- AKA the normal distribution or the Gaussian distribution
Understand the conditions that create bell curves (the Central Limit Theorem)
Explore some useful features of bell curves for making predictions

In an upcoming Congressional election:
- 60% of voters plan to vote for the Republican candidate.
- 40% of voters plan to vote for the Democratic candidate.
A polling firm randomly calls voters in the Congressional district and asks who they plan to vote for.

Draw a probability tree representing the first two voters contacted by the polling firm. What is the probability of every possible poll result?

Because we don’t care about the order of responses (just the counts), we can combine some outcomes:

A poll with two responses is pretty worthless.
- 48% of the time you get an equal number of R’s and D’s
- 16% of the time, you get all D’s
- 36% of the time, you get all R’s
Two big problems:
- The poll is biased. Result is more likely to be wrong in one direction than the other.
- It also has high variance. Result is, on average, very far from the truth.

Over the next few slides, we’ll show that increasing the size of the poll \((n)\) does three things:

We can plot these compound probabilities on a bar chart.

I would never make you do this by hand.

Intuition Check: Why is the probability of 3R,2D so big and the probability of 0R,5D so small?

With a large enough sample, the poll results are unbiased. Centered on the truth, and equally likely to be too high or too low.
Variance shrinks with poll size.
- In a poll of 50 voters, there’s a strong chance you get a result off by 10 percentage points and call the election incorrectly.
- In a poll with 500 voters, it’s practically impossible that the result will be off by more than 10 percentage points.

As poll size gets larger, the shape of the errors takes on that gorgeous bell curve shape.
This is one of the most foundational ideas in all of statistics.

Central Limit Theorem: If an outcome is the sum of a large number of independent random events, then it will fall on a bell curve.

Human height is the sum of a large number of independent genetic and environmental factors, so…

Human height is the sum of a large number of independent genetic and environmental factors, so… . . .

Standardized test scores are the sum of a large number of independent question scores, so…

College football scores are the sum of a large number of independent successes / failures to get the ball to the other end of the field, so…

When outcomes fall on a bell curve, it makes prediction a lot easier.
That’s because outcomes are very unlikely to stray far from their expected values.

95% of poll results will be one of the red bars.

95% of poll results will be one of the red bars.

Define the margin of error as the range within which you’re 95% sure your polling error will fall.
The back-of-the-envelope approximation of a poll’s margin of error is \(\frac{100\%}{\sqrt{n}}\).
- So, for a poll with 100 respondents, margin of error is roughly \(\frac{100\%}{\sqrt{100}} = 10\%\).
- Practice: what’s the margin of error for a poll with 400 respondents?

Your outcome will fall on a bell curve if it is the sum of a large number of independent random events (Central Limit Theorem).
If the theorem holds, it’s great for making predictions, because bell curves are easy to work with.
- In a few weeks, we’ll talk about the ways in which real-world polls fall short of this idealized model.
Next Time: What happens when that independence assumption is violated?