It shouldn't be "X% Trump, Y% Kamala," it should be "X% Trump, Y% Kamala, Z% irreducible uncertainty."
What would this irreducible uncertainty mean for an event with a binary outcome? I think Silver already accounts for increasing uncertainty as he propagates his current prediction into the future (what he calls forecast vs. nowcast).
Error bars would make sense around the expected vote percentage. Of course the probability distribution over vote percentages becomes broader as you look into the future, and perhaps he does show that to paying customers. But in the end you still have to integrate over that when the layman asks for the probabilities of who wins the election. And that still amounts to two numbers that sum to 100%.
Evaluating a predictor's performance seems straightforward to me via the usual log-likelihood score. Record the final outcome and take the log of the predictor's probability for that outcome. That score can then be summed over multiple different elections, if you like. (Not sure though if I'd call that scoring rule particularly frequentist.)
I can only find a German news article on it, but I would love to see the whole interview (subtitled). However, I doubt such a video exists.
The first result on youtube when searching for their names is the full three-hour video with English subtitles.
Nope, I got 10/20.
- Prev
- Next
Sure, in this trivial sense it is not.
I've heard this same argument from other Taleb-readers and got the impression that "irreducible uncertainty" means something entirely different than just a "none of the above" outcome.
If it is just that, then the same arguments apply: When you make the final probabilistic prediction, you integrate over the uncertainty, resulting in three numbers that add up to 100%. After recording the actual outcome, take the log of your predicted probability for this outcome, and that's your performance score.
Of course, if Silver rounds down the probability of "none of the above" to 0% for convenience and it still occurs, he'd technically incur a score of -Inf, which should tank his credibility forever. But I find that a boring technicality.
From your linked article:
Taleb is wrong here. Under this standard, no reasonable predictor could survive any real world application, as it would have to be trashed after the first mistake. And those that do survive would be hopelessly overfitted to past data.
The article got it precisely the wrong way round. Classical models such as logistic regression are trained and evaluated using their full probabilistic predictions.
It is only thresholded to a deterministic choice when used as input to a human decision, where the prediction is appropriately weighted for costs of false positives vs false negatives etc. (which you cannot do if it were a deterministic prediction in the first place).
How's that bad? I'd call that perfectly rational behaviour.
More options
Context Copy link