@DaseindustriesLtd comments on "Culture War Roundup for the week of February 17, 2025

Culture War Roundup for the week of February 17, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

DaseindustriesLtd late version of a small language model 4mo ago

Regardless of whether transformers are a dead-end or not, the current approach isn't doing new science or algo design. Its throwing more and more compute at the problem

Fetishizing algorithmic design is, I think, a sign of mediocre understanding of ML, being enthralled by cleverness. Data engineering carves more interesting structure into weighs.

Context

YoungAchamian DaseindustriesLtd 4mo ago

Senpai comes down from his castle in the clouds to debate me... What a time to be alive.

This isn't data engineering either. Its low-level Cuda compiler writing and server orchestration. Goodfellow, Hinton, Schimdhuber, LeCun, et al. are definitely designing architectures, they aren't doing any more data engineering than normal MLEs, like me do... They clearly enjoy designing "algos", and the world clearly respects them greatly for that expertise. Also calling it "algo" design is incredibly reductive. Afterall everyone knows LLMs were discovered when Hinton invented the Boltzmann machine decades ago. This Transformer is just a paltry, fetish, "algo". The Data Engineering in Boltzmann machines was just too primitive!!!

But obviously you must have a far deeper understanding of ML/AI, why don't you quite your day job and start an AGI company? Put your clever prose and subtle insults to work on something more real. Maybe you can compete with ScaleAI, they do data engineering. Definitely the top AI research company.

DaseindustriesLtd late version of a small language model YoungAchamian 4mo ago · Edited 4mo ago

I see you took this pretty personally.

All I have to say is that top AI research companies (not ScaleAI) are already doing data engineering (expansively understood to include training signal source) and this is the most well-guarded part of the stack, everything else they share more willingly. Data curation, curricula, and yes, human annotation are a giant chunk of what they do. I've seen Anthropic RLHF data, it's very labor intensive and it instantly becomes clear why Sonnet is so much better than its competitors.

They clearly enjoy designing "algos", and the world clearly respects them greatly for that expertise.

Really glad for them and the world.

Past glory is no evidence of current correctness, however. LeCun with his «AR-LLMs suck» has made himself a lolcow, so has Schimidhuber. Hochreiter has spent the last few years trying to one-up the Transformer and fell to the usual «untuned baseline» issue, miserably. Meta keeps churning out papers on architectures; they got spooked by DeepSeek V3 which architecture section opens with «The basic architecture of DeepSeek-V3 is still within the Transformer (Vaswani et al., 2017) framework» and decided to rework the whole Llama 4 stack. Tri Dao did incredibly hard work with Mamba 1/2 and where is Mamba? In models that fall apart on any long context eval more rigorous than NIAH. Google published Griffin/Hawk because it's not valuable enough to hide. What has Hinton done recently, Forward-Forward? Friston tried his hand at this with EBMs and seems to have degraded into pure grift. Shazeer's last works are just «transformers but less attention» and it works fine. What's Goodfellow up to? More fundamental architecture search is becoming the domain of mentally ill 17yo twitter anons.

The most significant real advances in it are driven by what you also condescendingly dismiss – «low-level Cuda compiler writing and server orchestration», or rather hardware-aware Transformer redesigns for greater scalability and unit economics, see DeepSeek's NSA paper.

This Transformer is just a paltry, fetish, "algo".

Transformer training is easy to parallelize and it's expressive enough. Incentives to find anything substantially better increase by OOM year on year, so does the compute and labor spent on it, to no discernible result. I think it's time to let go of faulty analogies and accept the most likely reality.

That tends to happen when you insult people out of the blue Dase. This:

a sign of mediocre understanding of ML

Is called being an asshole. I do ML for a living, insinuating my competence is mediocre because we disagree intellectually is poor taste. There are ways to have this discussion intellectually without resorting to being a douche. The last AI thread you commented on you were a prick to everyone who disagreed with you, up and down the thread. I have no desire to put up with your shit. Call it taking it personally or giving what was given. It's up to you if you want to be an adult and have conversation or be a bratty child.

not ScaleAI

This was heavy sarcasm on my part. ScaleAI did OpenAIs data engineering but I don't think that makes them a top AI company. data engineering is needed and important! But it's not revolutionary. Data engineering, is the same as its always been.

«low-level Cuda compiler writing and server orchestration»

This is why arguing with laymen is annoying. Low level is not "condensation" it is the technical term for "low on the compute stack" or "closer to the compiler". It's very important, the theories I have heard is that it's one of Deepseek's great winning points for why they were able to train their LLM much cheaper than everyone else. They were willing to go even more low level than the Cuda and write their own firmware-level orchestration code.

see DeepSeek's NSA paper.

We agree.

lolcow, so has Schimidhuber.

Schmidhuber has always been a lolcow, he's an inside joke. Any other ML person knows his schtick and finds it funny and doesn't take him seriously. I included it as a humorous inside joke.

Past glory is no evidence of current correctness, however. LeCun with his «AR-LLMs suck» has made himself a lolcow,

At the same time someone who has actually contributed to the field, who is in the arena, infinitely outweighs a nobody posting hot takes on an obscure forum. Regardless of my humor on Schmidhuber, even he, far out weights me as a titan in the ML field because he has contributed groundbreaking research. Where does that leave you? Jeering in the audience like its a sporting match?

What has Hinton done recently

Won a nobel prize.

is still within the Transformer (Vaswani et al., 2017) framework

Don't quote the old magic to me I was there when it was written. You seem to be labouring on the delusion that LLMs == ML/AI. LLMs are a subset of ML/AI. The current hottest topic definitely! The this "Algo Fetish" you say goes far beyond just LLMs. The research on model architectures has lead us to the encoder/decoder, then self attention, then Transformers. It's not a fetish and its not mediocre because it's not going to stop at transformers. Maybe you've forgotten a fundamental tenet of the Scientific Theory, but experiments fail. It sounds like you've just listed a bunch of experiments that failed. Should we give up and go on praising on the altar of bronze because no one has figured out how to forge iron? Seems like you are asking us to praise ignorance over discovery?

The problem is transformer's don't work on everything, and the whole field isn't just LLMs. That's the reality.

I'm not sure I believe AGI will come from transformers. If you want to have this as a separate discussion. You can let me know, nicely and we can talk about it.

Likewise my entire point, before you jumped into insult me, is that the Big Names in ML/AI are "fetishy algo freaks" They shockingly don't want to do non "mediocre algo butt sniffing" work. And Data Engineering isn't new, it isn't revolutionary, it's great, it works well, but it doesn't require some 1% ML researcher to pull it off. It requires a solid engineering team, some technical know-how, and a willingness to get your hands dirty. But no one is going to get famous doing it. It's an engineering task not a research task. And since research tasks are what people pay the ludicrously big bucks for at tech companies the engineers at xAI aren't being paid some massive king-sized salary...

As an exercise, can you tell me THE engineer at Deepseek who proposed or wrote their Parallel Thread Execution(PTX) code with a citation?

DaseindustriesLtd late version of a small language model YoungAchamian 4mo ago

Once again I notice that I am usually right in being rude to people, as their responses demonstrate precisely the character flaws inferred. This is a low-content post in defense of wounded ego, with snappy Marvel-esque one-liners («Won a Nobel prize») and dunks optimized for the audience but next to no interest in engagement on the object level. Yes, ML != LLMs, so what? Are you not talking about Altman and Elon who both clearly put their chips on LLMs? «That was a joke», yeah I get your jokes. Here's one you missed:

data engineering is needed and important! But it's not revolutionary. Data engineering, is the same as its always been.

It's not the same, though, that's the thing. Returning back to my point that has upset you –

– I meant concretely that this is why leading companies now prioritize creation of training signal sources, that is: datasets themselves (filtered web corpora, enriched and paraphrased data, purely synthetic data, even entirely non-lingual data with properties that induce interesting behaviors), curricula of datasets, model merging and distillation methods, training environments and reward shaping – over basic architecture research, in terms of non-compute spend and researcher hours; under the (rational, I believe) assumption that this has higher ROI for the ultimate goal of reaching "AGI", and that its fruit will be readily applicable to whatever future algorithmic progress may yield. This goes far beyond ScaleAI's efforts in harnessing Pinoy Intelligence to annotate samples for RLHF and you have not even bothered to address any of this. If you think names of Old Titans are a valid argument, I present Hutter as someone who Gets It, gets that what you have a sufficiently general architecture to approximate is at our stage more interesting in terms of eventual structure than how you achieve this potential generality.

This older paper is a neat illustration too. Sohl-Dickstein and Metz have done a little bit of work in non-LLM algo design if you recall, maybe you'll recognize at least them as half-decent scientists.

Now, as regards poor taste in intellectual disagreements, let's revisit this:

Regardless of whether transformers are a dead-end or not, the current approach isn't doing new science or algo design. Its throwing more and more compute at the problem and then doing the Deepseek approach of finetuning the assembly level gpu instructions to exploit the compute even better so you can throw more compute at it. I doubt, Hinton, Goodfellow, LeCunn, Schimdhubber et al. have any desire to do that. Maybe if xAI did something revolutionary like leave the LLM space or introduce a non-MoE-Transformer model for AGI, then talent of that caliber might want to work there. Currently they exist so Elon can piss all over Altman.

My rudeness was not unprovoked; it was informed by the bolded parts. I saw it as a hubristic, elitist, oblivious, tone-deaf insult towards people – scientists – actually moving the field forward today, rather than 8 or 28 years ago, and I do not care that it's slightly obfuscated or that you lack self-awareness to recognize the douche in the mirror but are eager to chimp out at it as you currently do.

yes thanks for clarification, that's exactly as I understood you.

I claim that to the extent that «talent of that caliber» shares your conceit that design of clever new algorithmic primitives for ANNs is «exciting new science» whereas data work remains and will remain ScaleAI tier «mere data engineering, same as always», this talent is behind the times, too set in their ways, and is resting on its laurels; indeed this is the same high-level philosophical error or prizing manual structure design over simplicity, generality and scalability that keeps repeating on every revolution in AI, and that Sutton has famously exposed. They are free to work on whatever excites them, publish cute papers for fellow affocionados where they beat untuned mainstream baselines, or just leave the frontlines altogether, and even loudly assert that they have superior taste if they so choose, which in my view is just irrational fetishism plus inflamed ego; I think taste is to be calibrated to actual promise of directions. But indeed, what do I know. You are free to share their presumptions. New scientific talent will figure it out.

Seems like you are asking us to praise ignorance over discovery?

To me it seems like the opposite, we just disagree on what qualifies as discovery or science at all, due to differences in taste.

Egoists gonna be egoists.

Zhean Xu probably. But I think everyone on (Chenggang Zhao and Shangyan Zhou and Liyue Zhang and Chengqi Deng and Zhean Xu and Yuxuan Liu and Kuai Yu and Jiashi Li and Liang Zhao) list could ask for a megabuck total comp in a frontier lab now, and expect affirmative response.

-1

YoungAchamian DaseindustriesLtd 4mo ago · Edited 4mo ago

(Edit) After some thought, I decided to tone done my dismissive vitriol and maybe offer a more constructive response.

Despite what you might think I don't have unlimited free time/brain power to engage in high-effort debate with random people online, I'm a shape-rotator, not a word-cell. Particularly since debating people online rarely leads to any information exchange or substantive opinion change. As such I apply a heuristic when having a discussion online on whether my interlocutor is worth it. Needless antagonism, unfounded arrogance, pithy insults and pettiness are the typical markers that its not. People who don't engage charitably and treat discussion as some sort of mal-social debate team competition, where anything goes, doubly so.

Dase you tripped up all of the above. To my chagrin, I snapped back which was unbefitting of my expectations for myself. If you want people to engage with you substantively, with high information density conversation, you have to give them a reason to put the effort in. If you write only for extreme heat with unproportionate amounts of light then no one reasonable is going to engage with you. Maybe that is to your taste, who am I to judge pigs that want to roll in the mud. Regardless I have better uses of my time than getting into the stie with you.

Food for thought: ML != LLMs, if your comment here:

was changed to this:

Fetishizing algorithmic design is, I think, a sign of mediocre understanding of LLMs, being enthralled by cleverness. Data engineering carves more interesting structure into weighs.

Then it is a far more applicable to the evidence you have provided and honestly I think the topic you actually care about. I might even agree, however the original doesn't align with the reality of ML as a field across ALL domains. But who knows, maybe my attempt at being charitable here will go nowhere, you'll double down on being an ass, and I'll update my weights with finality on the pointlessness of engaging with you in the future.

Have a good one.

That's fine, I don't feel entitled to your time at all. I also can't predict what might trigger you, just like you cannot predict what would trigger me, nor does it seems like you would care.

The discussion was originally about labs overwhelmingly focused on LLMs and competing for top talent in all of ML industry so partially that was just me speaking loosely.

I do in fact agree with heads of those labs and most star researchers they've got that LLMs strikingly similar to what was found in 201 7 will suffice for the shortest, even if not the globally optimal route to “AGI” (it's an economic concept now anyway, apparently). But it is fair that in terms of basic research there are bigger, greener pastures of intellectual inquiry, and who knows - maybe we will even find something more general and scalable than a well-designed Transformer there. Then again, my view is that taste is to be calibrated to the best current estimate of the promise of available directions, and in conjunction with the above this leads me to a strong opinion on people who dismiss work around Transformers, chiefly work on training signal sources that I've covered above, as “not new science”. Fuck it, it is science, even if a bit of a different discipline. You don't own the concept, what is this infuriatingly infantile dick-measuring?

It's not so much that I hold non-LLM, non-Transformer-centric algo design work in contempt as I am irritated by their own smug, egocentric condescension towards what I see as the royal road. Contrarianism, especially defensive contrarianism, is often obnoxious.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats