Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.
- 109
- 1
What is this place?
This website is a place for people who want to move past shady thinking and test their ideas in a
court of people who don't all share the same biases. Our goal is to
optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.
The weekly Culture War threads host the most
controversial topics and are the most visible aspect of The Motte. However, many other topics are
appropriate here. We encourage people to post anything related to science, politics, or philosophy;
if in doubt, post!
Check out The Vault for an archive of old quality posts.
You are encouraged to crosspost these elsewhere.
Why are you called The Motte?
A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently,
it's an element in a rhetorical move called a "Motte-and-Bailey",
originally identified by
philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial
but high value claim to a defensible but less exciting one upon any resistance to the former. He likens
this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for
the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired
propositions to which one retreats when hard pressed."
On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.
New post guidelines
If you're posting something that isn't related to the culture war, we encourage you to post a thread for it.
A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts
such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a
submission statement. A submission statement is required for non-text sources (videos, podcasts, images).
Culture war posts go in the culture war thread; all links must either include a submission statement or
significant commentary. Bare links without those will be removed.
If in doubt, please post it!
Rules
- Courtesy
- Content
- Engagement
- When disagreeing with someone, state your objections explicitly.
- Proactively provide evidence in proportion to how partisan and inflammatory your claim might be.
- Accept temporary bans as a time-out, and don't attempt to rejoin the conversation until it's lifted.
- Don't attempt to build consensus or enforce ideological conformity.
- Write like everyone is reading and you want them to be included in the discussion.
- The Wildcard Rule
- The Metarule
Jump in the discussion.
No email address required.
Notes -
As the old saying goes, "Context is that which is scarce." I know, I know, it's all the rage to try to shove as much context into the latest LLM as you can. People are even suggesting organization design based around the idea. It's really exciting to see automated theorem provers starting to become possible. The best ones still use rigorous back-end engines rather than just pure inscrutable giant matrices. They can definitely speed up some things. But the hard part is not necessarily solving problems. Don't get me wrong, it's a super useful skill; I'm over the moon that I have multiple collaborators who are genuinely better than me at solving certain types of problems. They're a form of automated theorem prover from my perspective. No, the hard part is figuring out what question to ask. It has to be a worthwhile question. It has to be possible. It has to have some hope of leading to "elegance", even if some of the intermediate questions along the way seem like they're getting more and more inelegant.
Homework questions... and even the contest questions that folks are so fond of benchmarking with... have been extremely designed. Give your cousin a couple more years of doing actual research, and he'll tell you all about how much he loves his homework problems. Not necessarily because they're "easy". They might still be very hard. But they're the kind of hard that is designed to be hard... but designed to work. Designed to be possible. Designed to have a neat and tidy answer in at most a page or two (for most classes; I have seen some assigned problems that brutally extended into 5-6 pages worth of necessary calculation). But when you're unmoored from such design, feeling like you might just be taking shots in the dark, going down possibly completely pointless paths, I'm honestly not sure what the role of the automated theorem prover is going to be. If you haven't hit on the correct, tidy problem statement, and it just comes back with question marks, then what? If it just says, "Nope, I can't do it with the information you've given me," then what? Is it going to have the intuition to be able to add, "...but ya know, if we add this very reasonable thing, which is actually in line with the context of what you're going for and contributes rather that detracts from the elegance, then we can say..."? Or is it going to be like an extremely skilled grad student level problem solver, who you can very quickly ping to get intermediate results, counterexamples, etc. that help you along the way? Hopefully, it won't come back with a confident-sounding answer every time that you then have to spend the next few days closely examining for an error. (This will be better mitigated the more they're tied into rigorous back-ends.) I don't know; nobody really knows yet. But it's gonna be fun.
You might have already read it, but I find Terence Tao's impression of a similar model, o1, illuminating:
https://mathstodon.xyz/@tao/113132502735585408
In the context of AI capabilities, going from ~0% success to being, say, 30% correct on a problem set is difficult and hard to predict. Going from 30% to 80%, on the other hand, seems nigh inevitable.
I would absolutely expect that in a mere handful of years we're going to get self-directed Competent Mathematician levels of performance, with "intuition" and a sense of mathematical elegance. We've gone from "high schooler who's heard of advanced mathematical ideas but fumbles when asked to implement them" to "mediocre grad student" (and mediocre in the eyes of Tao!).
In this context, the existence of ATPs allows for models to be rigorously evaluated on ground-truth signals through reinforcement learning. We have an objective function that unambiguously tells us whether it has correctly solved a problem, without the now extreme difficulty of having humans usefully grade responses. This allows for the use of synthetic data with much more confidence, and a degree of automation as you can permute and modify questions to develop more difficult ones, and then when a solution is found, use that as training data. This is suspected to be why recent thinking models have shown large improvements in maths and coding while being stagnant on what you'd think are simpler tasks like writing or poetry (because at a certain point the limitations become human graders, without a ground truth to go off when asked if one bit of prose is better than the other).
Totally agreed that having rigorous engines that are able to provide synthetic training will massively help progress. But my sense is that the data they can generate is still of the type, "This works," "This doesn't work," or, "Here is a counterexample." Which can still be massively useful, but may still run into the context/problem definition/"elegance" concerns. Given that the back ends are getting good enough to provide the yes/no/counterexample results, I think it's highly likely that LLMs will become solidly good at translating human problems statements into rigorous problem statements for the back end to evaluate, which will be a huge help to the usefulness of those systems... but the jury is still out in my mind to what extent they'll be able to go further and add appropriate context. It's a lot harder to find data or synthetic data for that part.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link