Key excerpt (But it's worth reading the full thing):
But the real value-add of the model is not just in calculating who’s ahead in the polling average. Rather, it’s in understanding the uncertainties in the data: how accurate polls are in practice, and how these errors are correlated between the states. The final margins on Tuesday were actually quite close to the polling averages in the swing states, though less so in blue states, as I’ll discuss in a moment. But this was more or less a textbook illustration of the normal-sized polling error that we frequently wrote about [paid only; basically says that the polling errors could be correlated be correlated between states]. When polls miss low on Trump in one key state, they probably also will in most or all of the others.
In fact, because polling errors are highly correlated between states — and because Trump was ahead in 5 of the 7 swing states anyway — a Trump sweep of the swing states was actually our most common scenario, occurring in 20 percent of simulations. Following the same logic, the second most common outcome, happening 14 percent of the time, was a Harris swing state sweep.6
[Interactive table]
Relatedly, the final Electoral College tally will be 312 electoral votes for Trump and 226 for Harris. And Trump @ 312 was by far the most common outcome in our simulations, occurring 6 percent of the time. In fact, Trump 312/Harris 226 is the huge spike you see in our electoral vote distribution chart:
[Interactive graph]
The difference between 20 percent (the share of times Trump won all 7 swing states) and 6 percent (his getting exactly 312 electoral votes) is because sometimes, Trump winning all the swing states was part of a complete landslide where he penetrated further into blue territory. Conditional on winning all 7 swing states, for instance, Trump had a 22 percent chance of also winning New Mexico, a 21 percent chance at Minnesota, 19 percent in New Hampshire, 16 percent in Maine, 11 percent in Nebraska’s 2nd Congressional District, and 10 percent in Virginia. Trump won more than 312 electoral votes in 16 percent of our simulations.
But on Tuesday, there weren’t any upsets in the other states. So not only did Trump win with exactly 312 electoral votes, he also won with the exact map that occurred most often in our simulations, counting all 50 states, the District of Columbia and the congressional districts in Nebraska and Maine.
I don't know of an intuitive test for whether a forecast of a non-repeating event was well-reasoned (see, also, the lively debate over the performance of prediction markets), but this is Silver's initial defense of his 50-50 forecast. I'm unconvinced - if the modal outcome of the model was the actual result of the election, does that vindicate its internal correlations, indict its confidence in its output, both, neither... ? But I don't think it's irreconcilable that the model's modal outcome being real vindicates its internal correlations AND that its certainty was limited by the quality of the available data, so this hasn't lowered my opinion of Silver, either.
Jump in the discussion.
No email address required.
Notes -
The polls did better this time than 2016 and 2020. At least, in general.
The controversy about polls starts in 2016. I think this is worth emphasizing, because there are still arguments floating around that the polls in 2016 were fine. And thus every subsequent argument about polls is really a proxy war over 2016. Because 8 years later we're still talking about Trump, we're still discussing how the polls over- or under-estimate Trump. We're still discussing how the polls do or don't measure white rural voters.
In 2016 the polls were entirely wrong. For months they predicted Hillary winning by a large margin blowout, sometimes by 10+ points. (I remember sitting in class listening to a friend excitedly gossip about Texas flipping blue.) Toward election day itself, the polls converged, but still comfortably for Hillary. And when Trump won, and the argument came around that the results were technically within the margin of error -- it missed entirely that whole states were modeled vastly incorrectly. The blue wall states of Pennsylvania Wisconsin and Michigan were not supposed to have gone red. Florida was supposed to have been close. States that had once been swing states were not even close. (To. me, this was the smoking gun that Trump had a real chance in 2016: Iowa and Ohio were solidly predicted for Trump from the very beginning, and no one offered any introspection on what that implied as a general swing.)
2020 was not much better. Without getting into claims about fraud and states: Biden was also supposed to win by larger margins than many states in fact showed. There were still lots of specific misses (like Florida redding hard). And again a series of justifications that polling did just fine because, technically, everything was inside some margin of error.
2024 is actually much better. AtlasIntel and Polymarket both broadly predicted exactly what happened. Rasmussen was fairly accurate (after taking a break in 2020 if I remember correctly). There's also a lot of slop. Selzer's reputation is destroyed (actually people may forget all about it by 2028). The RCP national average was off by a few points. Ipsos and NPR and Morning Consult and the Times were all wrong. Well, maybe that's not much better than 2020 -- but mixed in with all the bad data were predictors who got everything exactly right.
So Nate Silver's problem is that his method is junk. He takes some averages and models them out. The problem is that a lot of the data he relies on is bad. A lot of the polling industry is still wrong. And unless Silver is willing to stake a lot of expertise on highly specific questions about counties and polls, he can't offer all that much insight.
I’m more sympathetic to the pollsters than I am to Nate. The pollster’s job is to poll people using a reasonable methodology and report the data, not to make predictions. They can’t just arbitrarily add Trump +3 to their sample because they think they didn’t capture enough Trump voters in their samples.
Nate’s job is explicitly to build a model that predicts things. He can legitimately adjust for things like industry polling bias. He doesn’t because he’s bad at his job.
don't the pollers have some degree of freedom because they sample based on demographics and not purely random. presumably they use this to perform adjustments. i also assume they poll the chance of the person voting as well and don't just make that number up.
They try but fundamentally, IMO, it’s a good idea to separate data collection and model building.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
What's wrong with his method? How could he have improved it?
It relies on there not being a consistent (statistical, not political, although in this case it's probably both) bias in the inputs; ie. the polls.
As I recall Silver actually rates Atlas (who absolutely nailed every swing state) pretty highly, unlike (say) RCP -- but I don't think his pollster confidence correction really amounts to anything huge -- in the end he's basically aggregating ~all the polls (he does throw some out), and if the polls are wrong, so are his model.
Based on the polls, his model was probably correct that the election was roughly a coin toss -- but his aggregation ended up favouring Kamala roughly 2-3 points (ED: vs actual results) in all the swing states, which is badly wrong and not in fact inside the error margin of an aggregation of a bunch of polls at +/- 3%.
So his statewise model is probably pretty good -- I missed the flashy toolkit he had where you could choose results for some states and see the likely shifts in others this time around -- I'll bet if you plugged Atlas' polls alone into the model, it would have had Trump at like 80%. But he didn't do that, he relied on a bunch of polls that he noted showed obvious herding towards 50% and the (cope) hypothesis that the pollsters might possibly have corrected their anti-Trump lean and be herding towards 50/50 because... they were too scared to predict a Kamala win or something?
I guess the ballsy thing for Silver to do would have been, upon noting the herding, to toss out all of the polls showing signs of this, and see what was left.
This would have (probably) had a negative impact on his subscriptions though -- so whether it was greed or his personal anti-Trump inclination, he apparently doesn't really live on The River anymore after all.
Nate's whole schtick is having a fixed model that he pre-commits to ahead of time. He wants to avoid as much judgement calls as he can. It gives the air of scientific objectivity. You can follow someone else that makes judgment calls as the race progresses, but will they be more accurate over time?
One way to rank forecasters would be by assigning them an error score for each prediction miss, weighted by an superlinear factor of the odds miss (say, (100%-prediction)^2). So Nate would get a small penalty for winding up at 51% for Kamala before the election compared to someone that guessed 90% for Kamala. Who would have the best score over multiple cycles?
Sure, I respect his stance on the 'precommit vs tinkering' spectrum -- but you don't get to precommit to a model that turns out to be wrong and try to spin it as being right all along.
If he updates his model along the lines of throwing out polls showing evidence of tinkering, maybe he can be right next time -- but this time he was not.
More options
Context Copy link
More options
Context Copy link
It is within the margin of error because his model allows for a systemic correlated error across all polls. He just doesn't make any assumptions about the direction of that error. What some people are suggesting he do is assume the direction and size of this error based on very little evidence.
That's not what I'm talking about -- his inputs to the model are an aggregation of polls; he shows you them (for swing states) on the "Silver Bulletin Election Forecast" page.
Since each these is an aggregation of 5-6 polls with a sampling error in the area of +/-3%, the statistical error on Silver's aggregation should be well less than +/- 1% -- the fact that they all ended up more like +3D means that these polls are bad, and if he can't make the correction (due to lack of information, or lack of willingness to call out political bias) he shouldn't be using them.
He even had a framework for this! There was a whole post where he identified the worst herders -- removing these ones from his model would have been trivial, but he didn't do it. Leading to model inputs that were biased ~+3D -- which is the strongest argument that his 'coin flip' EC forecast was in fact a bad prediction -- how could it be a good prediction with such inaccurate input data?
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link