site banner

Friday Fun Thread for December 27, 2024

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

I just want to add a little bit from Zvi's latest:

Process for a Tier 4 problem:

  1. 1 week crafting a robust problem concept, which “converts” research insights into a closed-answer problem.
  1. 3 weeks of collaborative research. Presentations among related teams for feedback.
  1. Two weeks for the final submission.

We’re seeking mathematicians who can craft these next-level challenges. If you have research-grade ideas that transcend T3 difficulty, please email elliot@epoch.ai with your CV and a brief note on your interests.

We’ll also hire some red-teamers, tasked with finding clever ways a model can circumvent a problem’s intended difficulty, and some reviewers to check for mathematical correctness of final submissions. Contact me if you think you’re suitable for either such role.

As AI keeps improving, we need benchmarks that reflect genuine mathematical depth. Tier 4 is our next (and possibly final) step in that direction.

Tier 5 could presumably be ‘ask a bunch of problems we have actual no idea how to solve and that might not have solutions but that would be super cool’ since anything on a benchmark inevitably gets solved.

The abilities are impressive, and I actually wouldn't be surprised if it's able to perform admirably on Tier 4 "closed-answer" problems, especially as they get better and better at using rigorous back-end engines. But notice what they're expecting. They're expecting to have teams of top tier mathematicians spend a significant amount of time crafting "closed-answer problems". That really is probably where the bottleneck is, and Zvi's offhand comment is also in that vein. One possible end state is that these algorithms become an extremely useful 'calculator-on-steroids' that, like calculators, programming languages, and other automated theorem proving tools before, supercharges mathematical productivity under the guidance and direction of intuitive humans trying to push forward human understanding of human-relevant/human-interesting subject domains. Another possible end state is that the algorithms will get 'smart' enough to have all that human context, human intuition, and understanding of human-relevance/human-interestingness and be able to actually drop-and-replace human math folks. I suppose a third possible end state would be that a society of super advanced AIs go off and create their own math that humans can tell somehow is objectively good, but that they have to work and struggle to try to understand bits and pieces of (see also the computer chess championship). I really don't have any first principles to guide my reasoning of which of these end states we'll end up in. It really feels to me like a 'wait, watch, and see' situation.

I would put the last option as the most likely over a time frame greater than a decade or two, but the initial two options can be intermediate stages, albeit I don't expect any of them to last more than a few years. My reasoning is largely that much like chess, when the reward signal is highly legible, it becomes far easier to optimize for it, and diminishing returns!= nil returns, and probably PEV returns.

But you're right, only way to find out is to strap in for the ride. We live in interesting times.