llzv

0 followers follows 0 users joined 2022 September 10 05:20:15 UTC

No bio...

User ID: 1095

‎

Culture War Roundup for the week of October 14, 2024

llzv 8mo ago · Edited 8mo ago

But the outcome ISN'T really binary, is it?

Sure, in this trivial sense it is not.

I've heard this same argument from other Taleb-readers and got the impression that "irreducible uncertainty" means something entirely different than just a "none of the above" outcome.

If it is just that, then the same arguments apply: When you make the final probabilistic prediction, you integrate over the uncertainty, resulting in three numbers that add up to 100%. After recording the actual outcome, take the log of your predicted probability for this outcome, and that's your performance score.

Of course, if Silver rounds down the probability of "none of the above" to 0% for convenience and it still occurs, he'd technically incur a score of -Inf, which should tank his credibility forever. But I find that a boring technicality.

From your linked article:

Because FiveThirtyEight only predicts probabilities, they do not ever take an absolute stand on an outcome: No ‘skin in the game’ as Taleb would say.

Taleb is wrong here. Under this standard, no reasonable predictor could survive any real world application, as it would have to be trashed after the first mistake. And those that do survive would be hopelessly overfitted to past data.

For standard models, like logistic regression, the default decision boundary is assumed to be 50% (or 0.5 on a 0 to 1 scale) or the alternative with the highest value. [...] If FiveThirtyEight has no stated decision boundary, it can be difficult to know how good their model actually is.

The article got it precisely the wrong way round. Classical models such as logistic regression are trained and evaluated using their full probabilistic predictions.

It is only thresholded to a deterministic choice when used as input to a human decision, where the prediction is appropriately weighted for costs of false positives vs false negatives etc. (which you cannot do if it were a deterministic prediction in the first place).

And the fact that it tries to 'call' an election months out but has to adjust radically to new info is why I call it 'gimmicky.'

How's that bad? I'd call that perfectly rational behaviour.

Context

Culture War Roundup for the week of October 14, 2024

llzv 8mo ago · Edited 8mo ago

It shouldn't be "X% Trump, Y% Kamala," it should be "X% Trump, Y% Kamala, Z% irreducible uncertainty."

What would this irreducible uncertainty mean for an event with a binary outcome? I think Silver already accounts for increasing uncertainty as he propagates his current prediction into the future (what he calls forecast vs. nowcast).

Error bars would make sense around the expected vote percentage. Of course the probability distribution over vote percentages becomes broader as you look into the future, and perhaps he does show that to paying customers. But in the end you still have to integrate over that when the layman asks for the probabilities of who wins the election. And that still amounts to two numbers that sum to 100%.

Evaluating a predictor's performance seems straightforward to me via the usual log-likelihood score. Record the final outcome and take the log of the predictor's probability for that outcome. That score can then be summed over multiple different elections, if you like. (Not sure though if I'd call that scoring rule particularly frequentist.)