Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 4mo ago (text post) 2967 thread views

Friday Fun Thread for December 27, 2024

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

Jump in the discussion.

No email address required.

self_made_human Kai su, teknon? 4mo ago

You might have already read it, but I find Terence Tao's impression of a similar model, o1, illuminating:

https://mathstodon.xyz/@tao/113132502735585408

The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent (static simulation of a) graduate student. It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "(static simulation of a) competent graduate student" is reached, at which point I could see this tool being of significant use in research level tasks

In the context of AI capabilities, going from ~0% success to being, say, 30% correct on a problem set is difficult and hard to predict. Going from 30% to 80%, on the other hand, seems nigh inevitable.

I would absolutely expect that in a mere handful of years we're going to get self-directed Competent Mathematician levels of performance, with "intuition" and a sense of mathematical elegance. We've gone from "high schooler who's heard of advanced mathematical ideas but fumbles when asked to implement them" to "mediocre grad student" (and mediocre in the eyes of Tao!).

But when you're unmoored from such design, feeling like you might just be taking shots in the dark, going down possibly completely pointless paths, I'm honestly not sure what the role of the automated theorem prover is going to be. If you haven't hit on the correct, tidy problem statement, and it just comes back with question marks, then what? If it just says, "Nope, I can't do it with the information you've given me," then what? Is it going to have the intuition to be able to add, "...but ya know, if we add this very reasonable thing, which is actually in line with the context of what you're going for and contributes rather that detracts from the elegance, then we can say..."?

In this context, the existence of ATPs allows for models to be rigorously evaluated on ground-truth signals through reinforcement learning. We have an objective function that unambiguously tells us whether it has correctly solved a problem, without the now extreme difficulty of having humans usefully grade responses. This allows for the use of synthetic data with much more confidence, and a degree of automation as you can permute and modify questions to develop more difficult ones, and then when a solution is found, use that as training data. This is suspected to be why recent thinking models have shown large improvements in maths and coding while being stagnant on what you'd think are simpler tasks like writing or poetry (because at a certain point the limitations become human graders, without a ground truth to go off when asked if one bit of prose is better than the other).

Context

ControlsFreak self_made_human 4mo ago

I just want to add a little bit from Zvi's latest:

Process for a Tier 4 problem:

1 week crafting a robust problem concept, which “converts” research insights into a closed-answer problem.

3 weeks of collaborative research. Presentations among related teams for feedback.

Two weeks for the final submission.

We’re seeking mathematicians who can craft these next-level challenges. If you have research-grade ideas that transcend T3 difficulty, please email elliot@epoch.ai with your CV and a brief note on your interests.

We’ll also hire some red-teamers, tasked with finding clever ways a model can circumvent a problem’s intended difficulty, and some reviewers to check for mathematical correctness of final submissions. Contact me if you think you’re suitable for either such role.

As AI keeps improving, we need benchmarks that reflect genuine mathematical depth. Tier 4 is our next (and possibly final) step in that direction.

Tier 5 could presumably be ‘ask a bunch of problems we have actual no idea how to solve and that might not have solutions but that would be super cool’ since anything on a benchmark inevitably gets solved.

The abilities are impressive, and I actually wouldn't be surprised if it's able to perform admirably on Tier 4 "closed-answer" problems, especially as they get better and better at using rigorous back-end engines. But notice what they're expecting. They're expecting to have teams of top tier mathematicians spend a significant amount of time crafting "closed-answer problems". That really is probably where the bottleneck is, and Zvi's offhand comment is also in that vein. One possible end state is that these algorithms become an extremely useful 'calculator-on-steroids' that, like calculators, programming languages, and other automated theorem proving tools before, supercharges mathematical productivity under the guidance and direction of intuitive humans trying to push forward human understanding of human-relevant/human-interesting subject domains. Another possible end state is that the algorithms will get 'smart' enough to have all that human context, human intuition, and understanding of human-relevance/human-interestingness and be able to actually drop-and-replace human math folks. I suppose a third possible end state would be that a society of super advanced AIs go off and create their own math that humans can tell somehow is objectively good, but that they have to work and struggle to try to understand bits and pieces of (see also the computer chess championship). I really don't have any first principles to guide my reasoning of which of these end states we'll end up in. It really feels to me like a 'wait, watch, and see' situation.

Context

self_made_human Kai su, teknon? ControlsFreak 4mo ago

I would put the last option as the most likely over a time frame greater than a decade or two, but the initial two options can be intermediate stages, albeit I don't expect any of them to last more than a few years. My reasoning is largely that much like chess, when the reward signal is highly legible, it becomes far easier to optimize for it, and diminishing returns!= nil returns, and probably PEV returns.

But you're right, only way to find out is to strap in for the ride. We live in interesting times.

Context

ControlsFreak self_made_human 4mo ago

Totally agreed that having rigorous engines that are able to provide synthetic training will massively help progress. But my sense is that the data they can generate is still of the type, "This works," "This doesn't work," or, "Here is a counterexample." Which can still be massively useful, but may still run into the context/problem definition/"elegance" concerns. Given that the back ends are getting good enough to provide the yes/no/counterexample results, I think it's highly likely that LLMs will become solidly good at translating human problems statements into rigorous problem statements for the back end to evaluate, which will be a huge help to the usefulness of those systems... but the jury is still out in my mind to what extent they'll be able to go further and add appropriate context. It's a lot harder to find data or synthetic data for that part.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.