POLS 3220: How to Predict the Future
Draw a probability tree representing the first two voters contacted by the polling firm. What is the probability of every possible poll result?
Because we don’t care about the order of responses (just the counts), we can combine some outcomes:
A poll with two responses is pretty worthless.
48% of the time you get an equal number of R’s and D’s
16% of the time, you get all D’s
36% of the time, you get all R’s
Two big problems:
The poll is biased. Result is more likely to be wrong in one direction than the other.
It also has high variance. Result is, on average, very far from the truth.
Over the next few slides, we’ll show that increasing the size of the poll \((n)\) does three things:
We can plot these compound probabilities on a bar chart.
I would never make you do this by hand.
Intuition Check: Why is the probability of 3R,2D so big and the probability of 0R,5D so small?
With a large enough sample, the poll results are unbiased. Centered on the truth, and equally likely to be too high or too low.
Variance shrinks with poll size.
As poll size gets larger, the shape of the errors takes on that gorgeous bell curve shape.
This is one of the most foundational ideas in all of statistics.
If an outcome is the sum of a large number of independent random events, then it will fall on a bell curve.
Human height is the sum of a large number of independent genetic and environmental factors, so…
Human height is the sum of a large number of independent genetic and environmental factors, so… . . .
Standardized test scores are the sum of a large number of independent question scores, so…
College football scores are the sum of a large number of independent successes / failures to get the ball to the other end of the field, so…
When outcomes fall on a bell curve, it makes prediction a lot easier.
That’s because outcomes are very unlikely to stray far from their expected values.
95% of poll results will be one of the red bars.
95% of poll results will be one of the red bars.
Define the margin of error as the range within which you’re 95% sure your polling error will fall.
The back-of-the-envelope approximation of a poll’s margin of error is \(\frac{100\%}{\sqrt{n}}\).
So, for a poll with 100 respondents, margin of error is roughly \(\frac{100\%}{\sqrt{100}} = 10\%\).
Practice: what’s the margin of error for a poll with 400 respondents?
Your outcome will fall on a bell curve if it is the sum of a large number of independent random events (Central Limit Theorem).
If the theorem holds, it’s great for making predictions, because bell curves are easy to work with.
Next Time: What happens when that independence assumption is violated?