site banner

The State of Forecasting: Dynamics, Challenges, Hopes

forecasting.substack.com

In short…

  • Forecasting platforms and prediction markets are partially making the pie bigger together, and partially undercutting each other.
  • The forecasting ecosystem adjusted after the loss of plentiful FTX money.
  • Dustin Moskovitz’s foundation (Open Philanthropy) is increasing their presence in the forecasting space, but my sense is that chasing its funding can sometimes be a bad move.
  • As AI systems improve, they become more relevant for judgmental forecasting practice.
  • Betting with real money is still frowned upon by the US powers that be–but the US isn’t willing to institute the oversight regime that would keep people from making bets over the internet in practice.
  • Forecasting hasn’t taken over the world yet, but I’m hoping that as people try out different iterations, someone will find a formula to produce lots of value in a way that scales.
5
Jump in the discussion.

No email address required.

The final issue is that if it is common and good then it will alter the very things it is trying to predict. Does predicting it make it true when we trust predictions at a 99.9% confidence ratio? Is there then a rebound effect where they become worthless and you need a meta meta meta meta meta prediction market to determine the accuracy of the prediction market you're trusting to verify the accuracy of prediction market that you're using to make the initial prediction?

Nah, I think the issue that precedes and largely supercedes this is the oracle question. Do people trust that whatever entity is reporting the final results is doing so accurately and isn't fudging numbers to give an edge to its allies or to cover up some other outcome that TBTB are trying to disguise?

Do we trust that ambiguous results will be resolved in good faith and correctly more often than not?

Who do we actually rely on to be the final arbiter of 'truth' such that these markets can continue to settle reliably where there's incentive to capture such institutions to divert them from the purpose of accurate reporting.

In other words I personally doubt we'll ever reach 99.9% confidence in prediction markets if only because we can't reach that confidence in the platforming hosting the markets or the entities producing the results which are deemed as 'truth,' and I don't believe these are easily tractable issues.

This is a big problem on Manifold Markets and on Polymarket. On Manifold, there are a lot of market creators who write ambiguous resolution criteria or they even change the resolution criteria after people have placed bets. The resolution criteria often describe something quite different than what the question is literally asking. For example, there was a question that straightforwardly asked whether Israel blew up a certain hospital in Gaza, and then when it turned out the hospital hadn't been blown up at all and that the bomb had exploded in the parking lot, the question was changed to whether Israel was responsible for the explosion.

It has become common to resolve in favour of some nebulous, undefined "spirit" of the question, rather than the actual meaning of the question that was asked. A lot of markets become mainly bets on how the creator will decide to resolve it rather than on what the question is purportedly about.

On Polymarket, the resolution mechanism for disputed questions relies on a Keynesian beauty contest that has settled on an equilibrium where everyone assumes the simplest and stupidest possible interpretation, and now people are even contesting uncontroversial resolutions in order to take advantage of this broken system. There will be a question that resolves and everyone agrees that it was resolved correctly, but then the resolution will be contested and everyone knows the vote will go in favour of some hypothetical interpretation that would only work if everyone was retarded, so they vote that way. No one agrees with the interpretation, but everyone is incentivized to vote how they think everyone else will vote. And everyone knows the winning vote is expected to be the one that doesn't involve reading the full resolution description and doesn't involve using any sort of complex thought.

A lot of markets become mainly bets on how the creator will decide to resolve it rather than on what the question is purportedly about.

Yes. I've seen problems arise even with fairly 'objective' markets because even if you can measure a given phenomenon with precision, people might still mistrust the sensor doing the measuring. The market asks "what will be the high temperature in Miami on [date]" and we have to consider whose thermometer? Is it calibrated correctly? Are there any conditions that might throw it into an unexpected/error state?

So now the question is somewhat less about climate conditions and more about the quirks of the measurement system.

In theory you could solve this by attaching a reputation market to the system, so that a given resolution source can have their 'trustworthiness' rating impacted if enough people suspect they're fudging numbers or intentionally writing ambiguous questions/resolution criteria.

But that's just yet another system that is susceptible to gaming.

Augur had a seemingly solid system for avoiding this, but probably couldn't handle the volume, being dependent on Ethereum.

I am literally a practicing attorney and I have had my mind blown at some of the rules-lawyering/munchkin behavior that has come out of the space.

Ironically this perhaps goes to show why sports betting is so popular, because sports rules are uniformly understood, well-defined, and the bets are set on easily determinable outcomes like "Who won" and "what was the score", outcomes which are rarely ever walked back after the fact.


I speculate that we'll see some kind of AI-based solution arise and different markets will become popular with different segments of the population based on the quirks of how, say, Kalshi's AI resolves questions vs. Polymarket's vs. Manifold's.

In this case prediction markets might not actually 'solve' the issue of people having different reality bubbles, but at least there'll be some competition.

Augur had a seemingly solid system

This is not what I recall. Invalid markets resolved to 50/50, so you had users, chiefly someone who went by the moniker of Poyo, create markets that appeared to be legit but e.g., had the wrong date, so that people would bet & he'd win money when they resolved 50/50

Yes, and the Augur 2.0 solution was to add in an option for people to bet on whether a market was invalid in the market itself.

I have a lot of lawyers in my family, one of whom is close to me and the main part of his job is to write legal documents in clear, precise, unambiguous language, so I'm used to thinking about language and rules in a certain way (I also have a STEM background, where things have precise definitions). I've been blown away by how bad otherwise intelligent people are at writing and interpreting resolution criteria. They throw out basic principles which I would have thought were necessary for there to be any hope being able to decide these things in a consistent and predictable manner. I even explained one of the resolution disputes on Polymarket to these family members, one that was ambiguous due to a blatant self-contradiction in the resolution criteria, and they said it should definitely be resolved one way, which ended up being resolved the other way (essentially on the principle of most people wouldn't read that far into the description of the resolution criteria).

One possible solution is that you have people pay to have questions answered, and as part of that payment, they pay people to act as oracles who have good reputations. So the incentive is to decide things in a way that most closely matches what the question asker intended and also most closely matches what bettors think the question is about so that they are willing to bet on it, since this improves the market's accuracy.

One possible solution is that you have people pay to have questions answered, and as part of that payment, they pay people to act as oracles who have good reputations.

Yeah, this was part of how Augur's system worked. Reward people who end up on the 'right' side of a final resolution question consistently AND anyone who is answering the question has to stake some portion of their reputation on the outcome they're judging. Eventually 'bad actors' (who are either malicious or are too stupid to reliably interpret contracts) lose out and the correct/consistent oracles accumulate more wealth so they can have more influence over future resolutions.

It helped settle into an equilibrium where it was usually not worthwhile to try to exploit an apparent ambiguity, while knowing that wealthier oracles will ignore said ambiguity and you'll lose money directly by trying to challenge them.

I've been blown away by how bad otherwise intelligent people are at writing and interpreting resolution criteria.

Yep. There are plenty of bright line rules for resolving ambiguity in legal contracts, and it can be permissible to pull in outside evidence to interpret them, but you have to think about the ENTIRE document in a systematic way, you can't just glance it over and interpret it based on vibes.

And glancing at things and going with your gut is how so, so many humans operate.

The problem is there's always a tradeoff when you try to get as precise as possible with your wording, in that it both makes it harder for laypeople to easily understand what the terms say (and less likely to read it all) and, paradoxically, can open up a greater attack surface because there's more places where ambiguities can arise.

This is where I imagine LLMs would have a role, if they are given a set of 'rules' by which all contracts are to be interpreted, and they can explain the contracts they read to laypeople, and everyone agrees that the AI's interpretation is final, then you at least make it more challenging to play games with the wording.

Do people trust that whatever entity is reporting the final results is doing so accurately

  1. Scoring rules exist
  2. Deceivers outcompete nondeceivers
  3. But yeah, you can't use a prediction marketplace to decide on something that's more valuable than the value of the whole prediction marketplace. That's one of the issues with Robin Hanson's futarchy.

doubt we'll ever reach 99.9% confidence in prediction markets

I mean, in practice you don't need 99.9, you need better than alternatives in at least some cases.

in practice you don't need 99.9, you need better than alternatives in at least some cases.

Agreed. And thus I strongly support prediction markets as a concept for making personal decisions, hedging risks, and predicting important events.

Just noticing that centralized prediction markets are yet another sort of institution that can be captured and/or sabotaged if they become important to guiding/controlling society.

Would really hope we have robust competition between them to ensure no player ever becomes fully dominant in the space.

Well if you walk out your door and a little box on your wrist literally tells you what you'll do next, even if you change your mind, or more importantly, what others will do. If it is always right you'll trust it. Then that might be good enough to get people to 99.9% yah? It is possible, yet not likely.