This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
If it's not a surprise, why didn't anyone else do it? Meta has had a giant cluster of H100s for a long time, but none of their models reached R1's level. Same for Mistral. I don't think following a GPT-from-scratch lecture is going to get you there. More likely there is a lot of data cleaning and operational work needed to even get close, and deepseek seems to be no slouch on the ML side either.
I'm not convinced that they have any left to make. OpenAI's last big "wow" moment was the release of GPT4. While they've made incremental improvements since, we haven't seen anything like the release of R1, where people get excited enough to share model output and gossip about how it could be done. OpenAI's improvement is seen through benchmark results, and for that matter, through benchmarks they funded and have special access to.
It must be frustrating to work at OpenAI. It's possible that o1's reasoning methods are much more advanced than R1's, but who can tell? In the end, those who publish and release results will get the credit.
Please forgive my uninformed speculation, but is it possible that DeepSeek leveraged existing AI's to train on synthetic data for cheap?
Gathering training data must be incredibly expensive to do from scratch.
If DeepSeek used synthetic data, then it would seem to put a ceiling on their ability, but they might be able to easily catch up to existing models for less money. Edit: I've learned more about this and I think this is not true, at least for reasoning tasks.
Why? Depends on how you generate synthetic data. For chess and Go, none of prior data was relevant at all.
More options
Context Copy link
https://x.com/ptrschmdtnlsn/status/1882480473332736418
According to this guy, they're doing reinforcement learning on self-play.
You get a base model, do chain-of-thought prompting to make it smarter, then distill that into a slightly better base model which produces slightly better results with chain of thought... And away we go!
Well that was terrifying.
Reading the Twitter thread though, it seems that this might not actually be what's happening here.
Thanks for reading more of the thread, I didn't see that part!
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
This is very common. For a long time, practically every open model was a distilled knockoff trained from synthetic data, mostly from OpenAI. It's been so common that people are familiar with the marks this leaves on the model. Such models are worse than the model they're distilled from, typically less flexible out of distribution (e.g. obeying unusual system prompts, prompts, ...) and have an even more intense "sloppy" vibe to them. It's very common, and people have long gotten bored with these knockoff models. Before deepseek, I'd even say that it's all people expected from Chinese models.
It also doesn't match what we're seeing from R1 at all though. One of the reasons R1 is so impressive is that its slop level is much lower, its creativity is way higher, and it doesn't sound like any of the existing AI models. Even Claude feels straitjacketed in comparison, much less OpenAI Models.
I wouldn't be surprised if they did use synthetic data, but whatever training method they're using seems to do a great job of hiding it. Which is amazing in itself. It could have something to do with the reinforcement learning phase that they do. But regardless, it's definitely not as simple as training on data from OpenAI, because people have been doing that forever.
More options
Context Copy link
This is probably a taste of the recursive self-improvement we've been promised by foomers. It's now known one of the reasons Anthropic held back on releasing Opus is because they were using it themselves to train Sonnet 3.5 New.
Everyone's gotta be doing it.
It's not recursive, it just helps you get a smaller model closer in performance to a bigger model. You still need the bigger model to push the frontier out.
There is the potential for a kind of recursive growth, once you have access to some kind of external verifier. A model of a certain size performs a search; external verifier gives it back a reward signal for good searches; and the model learns and gets better at the search, allowing the process to begin anew. E.g. AlphaZero.
Where it gets murkier in my head is whether LLMs can act as their own verifiers, even with arbitrary compute. As a proof of concept, humans can think a long time to come up with a novel insight and learn it, but it still seems we learn best when there is some kind of objective/external feedback signal.
Learn best certainly, but when it comes to scaling compute all it needs to do is be able to learn by itself at all. I'm sure an AI intelligence improvement cycle would go even faster if it had an even smarter AI to give feedback, but for recursive improvement all that is necessary is even a small increase, compounded over and over and over again.
More options
Context Copy link
More options
Context Copy link
sure but presumably it cuts other ways too. do we think current models can be used to train next generation models?
I don't see how. It doesn't seem likely to me that the student can surpass the master in this way. You could imagine doing RL if you had a model that was good at rating text output (like what was done with chess) but I don't know how feasible that is.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I wasn't claiming that. Just trying to support the claim that they were more open in the past. I doubt any novel AI technique discovered in the future will even have that.
Counting out the most absurdly well resourced AI lab with a history of breakthrough success seems fairly bold.
More options
Context Copy link
More options
Context Copy link