This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
You're probably right. The technique of using the model's own outputs, and its assessment of those outputs, to further tune itself probably caps out at a fairly small amount of gain, if you don't ground the model in further interactions with the physical world.
Oh, yeah; I'd expect that sort of "self-play" to get peak model performance from "average human responses" to "best-human responses recognizable by an average human". And the economic effects of getting there, even if it caps out there, might be astounding. But frankly if we just end up having to retool 90% of the economy I'd be glad to see it, when the alternatives include extinction-level scenarios.
I think the most "real" thing I can imagine high-speed self-play for is math (theorem proving). Adding an inference step to a computer-verifiable theorem seems like it's as basic an output as choosing a move in a board game, and coming up with "interesting theorems" seems like it could be done automatically (some function combining "how short is it to state" with "how long is the shortest proof"?), and yet an AI training to play that "game" might eventually come up with something more useful than a high ELO.
Pretty sure self-play would be viable for leetcode-style or project-euler-style programming problems too, if you give it access to an interpreter. Or just any task where recognizing a good output can by done by a less capable language model than it takes to generate an output that good.
Recognizing good output is half the problem; generating an enormous array of problems is important too. With complex board games every non-deterministic opponent (including every iteration of an AI during training) is a fount of problems; with math the new problems practically generate themselves as conjectures based on previous problems' intermediate definitions. I don't see how to come up with millions of independent programming problems automatically. That might just be a failure of my imagination (or more charitably, I just haven't thought about the idea for long enough), though.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link