POLS 3220: How to Predict the Future
I’m going to write a sequence of three numbers on the board.
Your goal is to guess what “rule” these numbers follow (e.g. “all the numbers must be positive”).
You can shout any sequence of three numbers at me, and I’ll tell you if it satisfies the rule.
When you think you know the rule, write it down on a slip of paper and bring it up front.
What did we learn about ourselves?
“So convenient a thing is it to be a rational creature, since it enables us to find or make a reason for everything one has a mind to.”
- Benjamin Franklin
This tendency makes playing our opening game difficult.
The most common strategy is to focus in on a particular rule, then keep guessing number sequences that satisfy the rule.
But the optimal strategy is to guess sequences that narrow down the set of possible rules.
I’m skeptical that asking these questions is a great way to measure AOMT.
Not many people disagree with the statement: “A person should always consider new information.”
What is “information”, anyway?
Let’s briefly dip our toes into information theory, a branch of mathematics that deals with, well, information.
Intuitively, information is something that shifts our beliefs.
The more surprising a piece of information is, the more it should shift our beliefs.
In a deep sense, information = surprise (Shannon 1948).
How surprised would you be to learn that an event happened?
Depends on how probable you thought the event was!
If you were 100% sure an event would happen, you wouldn’t be surprised at all to learn that it happened.
If you thought there was a 75% chance of it happening, you still wouldn’t be very surprised.
If you thought there was a 50-50 chance, you’d be kinda surprised.
If you thought there was only a 25% chance, now this would be surprising information.
If you thought there was only a 5% chance, you’ve reached what statisticians would call “statistical significance”.
You’ve observed an event you really didn’t expect to see!
Maybe your theory was wrong!
If you thought there was a 1-in-1,000 chance, now it’s downright shocking!
1-in-a-million chance?
This is, like, winning the lottery level of surprise.
Heart-attack-inducing surprise.
Notice the shape of the curve we’re drawing here. Surprise increases exponentially as we get closer to \(P(x) = 0\).
This looks an awful lot like the charts I was showing you in the lecture on Long Tails.
What I’m arguing is that the amount of information revealed by an event is a logarithmic function of how probable you thought it was.
\[ I(x) = -\text{log}(P(x)) \]
This function tells you, in essence, how much you learn from observing a piece of information.
A unit of information is called a bit.
This leads us to a related concept, called entropy (“average surprise”).
An event with probability 100% has zero entropy.
An event with probability \(\frac{1}{1,000,000}\) also has no entropy.
When seeking out information, your goal should be to decrease entropy as much as possible.
This idea will be particularly useful after the midterm, when we start discussing machine learning.
But we can also apply it to the game we played at the start of class.
I observe that “2-4-8” satisfies the rule.
A lot of potential rules here: (1) numbers go up, (2) second number goes up, (3) third number goes up, (4) all evens, (5) all positives, (6) sums to 14
If I think these six rules are equally likely, then my entropy is \(-\text{log}(\frac{1}{6}) \approx 2.6\).
The move that decreases entropy the most is one that splits the set of hypotheses in half, like “8-4-2”. No matter what happens, entropy will decrease to \(-\text{log}(\frac{1}{3}) \approx 1.6\).
Here’s a final perspective on what’s happening with the “wisdom of crowds”.
Individuals are prone to confirmation bias. We tend to look for information that confirms our theories (“4-8-16”).
But in a large enough crowd of people, you’ll end up with a diverse set of theories.
If everyone looks for information that confirms their theory, then the group ends up finding a bunch of information that collectively decreases entropy.