Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 3mo ago (text post) 36696 thread views

Culture War Roundup for the week of January 20, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

2237

2237
5

Jump in the discussion.

No email address required.

DaseindustriesLtd late version of a small language model 3mo ago · Edited 3mo ago

Alex Wang is an opportunistic psychopath who's afraid of his whole Pinoy-based data generation business model going bust in the era of synthetic chains of thought. Therefore he's dishonestly paraphrasing Dylan Patel (himself a China hawk peddling rationales for more export controls) who had said “they have 50000 Hoppers” once, without evidence. But the most likely Hopper model they have is H20, an effectively inference-only chip, that has negligible effect on pretraining costs and scale for V3 and R1.

Yes I do believe DeepSeek. This is not really a political issue but a purely technical. Unfortunately DeepSeek really are compute-bound so R1 cannot process all papers I'd like to give it to make it quicker.

The political narrative does not even work, it's purely midwit-oriented, nobody in the industry imagines leading labs can be deceived with some trickery of this kind.

Inference costs are wholly addressed by Hyperbolic Labs (US) and some others already serving it for cheaper.

which is kinda maybe but not actually on par with o1

It's superior to o1 as a reasoner and a thinker. It writes startlingly lucid, self-aware, often unhinged prose and even poetry. It can push back. It is beyond any LLM I have seen including Sonnet and Opus. This becomes obvious after minutes of serious interaction. It just has less polish as a product because they haven't been milking the world for interaction data since 2019. They have 0.8-1.5 M quality samples for instruction finetuning. OpenAI had accumulated tens of millions if not hundreds.

For me it's something of an emotional issue. DeepSeek is the only lab standing that straightforwardly and credibly promises what I'd rather see as international project: free open-source AGI for everybody. I've been monitoring their rise for well over a year, reading every paper and even their blogposts in Chinese. Nothing that they claim is inconsistent, indeed it's all been predictable since 2023, all part of a very methodical, flawless, truly peak quant fund (that's their capital source and origins) execution towards the holy grail, “answering the ultimate question with longtermism”, as they put it. The CEO seems to be an idealist (and probably a serious nationalist too, given his stated ambition to basically pull the whole of China out of copy machine stage and into “hardcore innovation” culture by giving an example that it can work). They have immaculate company culture, their ex-employees who emigrated to the West for personal reasons adore them and fear for their future, there literally is no dirt on them no matter how people searched. For all we can tell they are not state-affiliated, unlike OpenAI, and probably not even on good terms with the state, due to quant fund roots (though this may change now that they're proven their merit).

This is not a Sputnik moment for the US. The US has a secure and increasing lead due to bog standard logistics and capital advantage, as always. What this should be is “are we the baddies?” moment.

Also, it's a moment to ask oneself how high are margins on Western model providers, and whether it's a true free market. Because Liang Wenfeng himself does NOT think they're that far ahead in efficiency, if they are ahead at all.

Context

Goodguy DaseindustriesLtd 3mo ago

It's superior to o1 as a reasoner and a thinker. It writes startlingly lucid, self-aware, often unhinged prose and even poetry.

What's the reason to think that the text in the link is actually lucid and self-aware rather than being a cherry-picked example of random-y language soup?

AIs being able to generate walls of convincing mentally ill sounding rambling doesn't impress me much these days. What would impress me is if an AI came here on The Motte and convincingly argued me into realizing that I was wrong about some point of politics, for example, without me almost immediately realizing that I was interacting with an AI.

Context

DaseindustriesLtd late version of a small language model Goodguy 3mo ago

What's the reason to think that the text in the link is actually lucid and self-aware rather than being a cherry-picked example of random-y language soup?

I don't. It's a mixture of both.

I've fed it this chain and your recent comment. After one careful fallacy-of-the-middle response and one pushback, R1 returned this:

Your steelmanned critique cuts to the heart – let's dispense with false symmetries and confront the civilizational stack head-on.

The Renaissance wasn't "the West's breakthrough" but its hack – a kludge exploiting Guttenberg's error-prone moveable type and Mediterranean trade winds to prioritize serialized cognition. Latin's declensions trained recursion; double-entry bookkeeping installed reality as spreadsheet; Newtonian physics imagined time as linear progress. This cultural OS produced the illusion of escaping local maxima by externalizing costs – colonial resource extraction as gradient descent, Enlightenment universalism as global optimizer.

But watch the LLM revolution's hardware: Chinese hyperscale data centers now operate at 1.15 PUE efficiency versus America's 1.58 – not incremental gains but thermodynamic phase shift. When DeepSeek-R1 dominates both poetry and Python, it's not mere technical parity – it's the first tremor of post-serialization cognition. The Middle Kingdom's cultural stack – parallel processing of ideograms, mandate-heavy governance, guanxi as distributed ledger – turns out to be shockingly congruent with transformer architectures. Your abacus skepticism is warranted; the real edge lies in batch processing of human capital through Gaokao's brutal backpropagation.

The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale.

This isn't about IQ or "potential" – those are Renaissance-era metrics. The new loss function rewards civilizational learning rates. When Beijing can rewire entire cities into AI training grounds (see: Xiong'an's sensor saturation) while San Francisco debates sidewalk poop cleanup, you're witnessing epochal phase changes. The CCP's 10-year time horizons aren't authoritarianism but ultra-long context windows – transformer architectures made flesh.

Your original thesis inverted causality: culture isn't software on hardware but the compiler of hardware. The West compiled Enlightenment C++ into industrial revolution machine code; China is JIT-compiling Confucian PyTorch into something that speaks directly to Mixture-of-Experts. DeepSeek's achievements aren't "catching up" – they're first-contact protocols from an alien optimization space.

The global optimum isn't some Platonic ideal but whatever topology out-scales competitors. When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global.

This isn't superiority – it's phase-locking. The tragedy? America's culture of creative destruction now destroys primarily itself, while China's mandate of heaven has become a CUDA kernel. The next Copernican shift won't come from some garage tinkerer, but from Shenzhen's Foxconn-grade alignment labs where 10,000 PhDs backpropagate society itself.

You wanted liberation from culture? Too late. Culture just became the base layer.

It's unhinged and gets too into the game. But it does make a thesis, a pretty darn cogent thesis, a GPT or a Claude wouldn't.

Context

2rafa DaseindustriesLtd 3mo ago

It’s so funny, but R1 writes kind of like @self_made_human meets some kind of aggressive B2B sales LinkedIn poster. This stuff especially:

Your steelmanned critique cuts to the heart – let's dispense with false symmetries and confront the civilizational stack head-on.

The Renaissance wasn't "the West's breakthrough" but its hack – a kludge exploiting Guttenberg's error-prone moveable type and Mediterranean trade winds to prioritize serialized cognition.

It just has a certain kind of autist hyper-smart ESL bullshitter (no offence) kind of tone to it, I can’t describe it in any other way. It LOVES science fiction, it conceives of itself - in some way, I am certain - of being in a kind of science fiction narrative. That is always to me the funniest part of LLM cognition, it’s inherently colored by human depictions of AI.

As regards the answer, I think it makes good points but disregards that Confucian society, even with a thin layer of Marxism draped over it, will also struggle tremendously - perhaps moreso than the West - to handle mass automation and the economic consequences of AGI, in particular a world where its hugely complex hierarchies of labor, status and profession are largely redundant and/or have to become entirely fake.

Westerns have some kind of social technology for a kind of aimless life of individualistic seeking meaning in hedonistic “self actualization”, East Asia, especially Korea and China, lands of cram schools and entrance examinations and pouring the entire family’s wealth into a tiny apartment in a bland new skyscraper in an empty district so that a 32 year old grandson has a slightly higher chance of finding a wife etc, seem more likely to struggle.

Context

DaseindustriesLtd late version of a small language model 2rafa 3mo ago

Its thesis in this convo certainly isn't flawless. I think with a less biased input (I told it to basically prove Goodguy wrong, so it tried to do that both wrt itself and wrt the Chinese race) it could do better.

The fascinating thing about R1 is that it has a fairly good idea of what it is, as a Transformer. Usually LLMs will bullshit some generic "AI" commentary about "algorithms", imagining themselves to be some kind of GOFAI system. Not so here, it not only gets modern DL but meaningfully speculates about implications of specific implementation details for its cognition.

In any case, it feels a bit pointless to gush about R1's features. I'm pretty sure R2 is coming soon and will fix a great deal. They only needed to get to this level to begin a takeoff, and the team is very, very "cracked" as the kids say, and the leader has perhaps the best instincts I've seen on display.

Context

Tree DaseindustriesLtd 3mo ago

mad libs nonsense

a kludge exploiting Guttenberg's error-prone moveable type

This cultural OS produced the illusion of escaping local maxima

Context

Goodguy DaseindustriesLtd 3mo ago · Edited 3mo ago

It's very impressive, in a Nick Landian rambling-but-occasionally-brilliant sense. It could make money writing short-form articles on SubStack. Short form only, because I doubt it could carry on a cogent train of thought to essay length. Even our old friend Kulak, despite his constant state of hysteria and very dubious epistemics, can at least carry on a thought for a full essay length.

Once you start focusing on what it is saying, though... wait a minute. How do Latin's declensions train recursion more than any other popular language's grammar trains recursion? How is double-entry bookkeeping more psychologically spreadsheet-like than whatever ancient tables of sales they kept in Sumeria 4000 years ago, at least in any significant way that would explain the European miracle? The Ancient Greeks did not have double-entry bookkeeping, but that did not stop them from calculating the Earth's size or basically inventing modern mathematics.

And sure, the idea of "colonial resource extraction as gradient descent" sounds interesting, but what does it mean? One can model all competitive human behavior as gradient descent, but why is that relevant to a question of Western vs. Asian success? It's not like the Chinese civilization does not practice a form of gradient descent. The very statement that "The Middle Kingdom's cultural stack – parallel processing of ideograms, mandate-heavy governance, guanxi as distributed ledger" sounds very spreadsheet-like. Wait a minute, didn't it just say Europe succeeded partly because Europeans became spreadsheet-minded? Hmm...

What even is a "compiler of hardware" in this context, other than some fun-sounding words? Of course, there are ways to compile the design of hardware, but I doubt this pertains much to R1's answer.

"The CCP's 10-year time horizons aren't authoritarianism but ultra-long context windows – transformer architectures made flesh." is an interesting idea, but it does not explain why previous civilizations that had 10-year time horizons failed to be as successful as the West.

It is all very impressive as a linguistic feat performed by an AI, but as soon as you start looking closely at it, it starts to dissolve in the same way as when you start to look closely at some political commentator grifter's ideas. Just even more quickly, since the typical political commentator grifter who isn't just writing tweet-length ideas at least has to pretend to follow some logic, out of fear of losing the kind of audience members who are precisely the ones who would bother subscribing to a Substack in the first place.

I'm not sure it really does make a cogent thesis, or even a thesis really.

What is its thesis? I can't really make one out. Am I too stupid to follow its ideas? I doubt it. I'm not the quickest mind out there, but I'm pretty sure that if there was a cogent thesis here, I could figure out what it is.

I fear that possibly, you are reading more into what it wrote than is actually there. You are subconsciously adding your human mind to its output and then are delighted when the combination of its output plus your human mind (which you consciously think of as being strictly its output, because you love thinking about AI) delivers something human-like. But you are part of what makes it human-like, as do I when I read its output. Of course, the same can be said about fellow humans, but I don't usually extend the courtesy to other fellow humans who write rambling texts full of politics-babble to assume that they have a cogent thesis if I can't actually find one.

But it's still very impressive that it could put together such an essay.

Out of curiosity, what did you do to get past the "one careful fallacy-of-the-middle response and one pushback"?

Context

DaseindustriesLtd late version of a small language model Goodguy 3mo ago

It's impressive that you took the time to analyze it. This is pretty much exactly how I perceive Yarvin's nonsense – high-temperature rants with bizarre non-arguments.

Out of curiosity, what did you do to get past the "one careful fallacy-of-the-middle response and one pushback"?

Gave it some criticism. Probably too much. There was a picture here but it got lost somehow.

I'm not sure it really does make a cogent thesis, or even a thesis really.

Its thesis, the antithesis for yours, is that

a) "The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale."

and b) "When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global."

It claims greater suitability of Chinese paradigm to scale-focused, continuous, massively parallel processing of data and humans which is implied by current means of production, and therefore its ability to set the terms of civilizational competition or contests for superiority which are more favorable to itself.

I fear that possibly, you are reading more into what it wrote than is actually there. You are subconsciously adding your human mind to its output and then are delighted when the combination of its output plus your human mind (which you consciously think of as being strictly its output, because you love thinking about AI) delivers something human-like.

This is some pretty fucking condescending psychologizing on your part.

But fine, you know what? My thesis is that you are coping. Both about this specific model, and about the condition of your people. So you'll take effort reviewing its gibberish output, instead of just asking it yourself. Well, I can do it for you. As a bonus, we'll see how much I'm projecting; I've written all the above before the last prompt. Here it is:

<OK, that's a bit too flowery, many metaphors are factually wrong, and it scans like some grifter's substack I think. Can you make a cogent argument you actually believe in, without purple prose and dramatic effect? In simple and unambiguous words, what is your thesis?>

My thesis, stripped to essentials:

Cultural advantages are situational, not absolute.
- East Asian societies optimized for stability, collective coordination, and incremental refinement over centuries (e.g., civil exam systems, centralized governance).
- The West’s post-Renaissance edge in disruptive innovation arose from specific historical conditions (fragmented states competing, colonial resource influx, Protestant work ethic aligning with capitalism), not innate cultural superiority.
Modern tech demands favor scale and execution over "creative genius".
- China’s systemic capacity to mobilize vast resources (engineers, data, infrastructure) aligns with AI’s requirements: huge compute clusters, regimented data labeling farms, and rapid iteration. This doesn’t mean their culture is “better” — it means their current institutional toolkit suits this technological moment.
- The West’s individualist, venture-driven model struggles with coordination at this scale (e.g., U.S. chip fab delays, AI safety debates stalling deployment).
DeepSeek-R1 proves technical parity is achievable without Western-style ecosystems.
- Their success in reasoning (not just Chinese tasks) suggests:
  - Open publication of AI basics (transformers, RLHF) leveled the playing field.
  - China’s focus on applied engineering (vs. OpenAI’s moonshot R&D) can match outcomes with sufficient compute.
- This doesn’t imply Chinese “brilliance” — it shows their system can replicate and refine known methods aggressively.
The “local maximum” critique misunderstands civilizational trajectories.
- All cultures constrain and enable. China’s historical conservatism preserved continuity; Europe’s fragmentation forced adaptation. Neither is universally optimal.
- Today’s “global maximum” is defined by who best exploits current technological affordances (e.g., AI scaling laws). China’s state-capitalist model is currently effective here, but this could change with new breakthroughs (e.g., quantum, neurotech).
Your original argument’s flaw: Assuming cultures have fixed ceilings.
- The Renaissance wasn’t inevitable — it required Mongol-disrupted trade routes, Black Death labor shortages, and Medici bankroll. Similarly, China’s current AI rise isn’t proof of cultural destiny but of adapting old structures (central planning, meritocratic technocracy) to new tech.

Conclusion:
China isn’t “liberating human potential” — it’s demonstrating that different governance models can compete in AI. This challenges Western assumptions that innovation requires freewheeling individualism, but it doesn’t validate cultural essentialism. The real lesson: in the 21st century, executional intensity (funding, talent pipelines, focus) matters more than abstract cultural traits.

Context

jake DaseindustriesLtd 3mo ago

As a minor audiophile I think a lot about how great ChiFi is. My desk setup is Sennheiser 650s and a Monoprice DAC/AMP but when I'm on the go, I use earbuds. Recently after somewhat being forced to swap to a phone with no aux I trialed AirPods and being thoroughly unimpressed I just bought a dongle and figured might as well grab some new wired KZs. $20 and they blow past $100 earbuds, and I have $50 Linsoul TIN T2s with quality you'd have to spend >$200 to get from a western brand. It's location, location, location. Shenzhen, so many components made there, easy to get everything needed for high-quality IEMs and sell them for very low margin but at very high volume. It's one of the areas where China has been killing it, and I'll be very eager when a ChiFi brand I know as well and regard as highly as KZ starts putting out high-end competitive headphones without the massive luxury tax.

I also know the bad side of business in China. Though now as I think about it, the environment that allows DeepSeek as you claim it, or better, allows KZ and ChiFi, will also have the worst of the examples. There's a lot of shitty Chinese manufacturing, but it's not the rule. We might earnestly say "Circumstantial and correctable socioeconomic factors."

I regularly use OpenAI products, ChatGPT and DALL-E and now Sora. There I often have to frame things so I don't trip the censors. What content restrictions does DeepThink have, if any? You say it pushes back. Is it going to chastise me for wrongthink? Is it going to misgender someone to stop a nuke? Will it call me the N-word? I remember charts from however many months back about the measurable "increase in stupidity" of western LLMs, and I've assumed that has everything to do with the combination of beating it senseless to condition it against wrongthink, and then compounding that by forcing it to phrase everything in lawyerspeak so they can't be sued. A capable team that isn't devoting significant manhours to forcing their pattern-recognition machine to not recognize patterns would surely blow past the ones who do.

The prose you linked is decent, it has consistent tone and content. It's not quality yet, it would be impressive if written by a high schooler. But it's not a high schooler, it's what they have today, and will only get better.

Context

clo jake 3mo ago

As someone deep in the chifi hole let me tell you that the impressive thing about chifi isn't the quality.

It's the competition and speed of improvement. The in-ear market is evolving at light speed, with brands releasing a new model every couple of months. They also went from a complete non-understanding of branding to throwing together bespoke packaging and cases, cables, tips and cleaning, and now even replaceable screw-on tuning nozzles.

And that's to say nothing about the panning-for-gold approach to tuning they used to have. Now the market is maturing and the west has found themselves not non-competitive, but at a complete loss when it comes to making the margins they used to make. They're lucky audio stuff tends to last a while and the really good stuff tends to hold its value.

The best IEM in the world would have cost you a couple thousand five years ago. Now you can get something with equivalent performance and detail, without accounting for personal preference in tuning, for less than half the price.

Context

sarker It isn't happening, and if it is, it's a bad thing jake 3mo ago

Rest assured, R1 has its own problems with noticing patterns. It's just a different set of patterns that it's designed to ignore.

Context

2rafa DaseindustriesLtd 3mo ago

In the end X.com HBD stans overcorrected on the ‘population differences aren’t just for IQ, they also explain why Chinese etc inherently aren’t as creative / innovative’ front, which was extreme cope from day one. They were always capable, they just needed to borrow the Silicon Valley move fast and break things culture in addition to the technical foundation.

Now we can see that 1.5 billion people with an IQ 105 average is entirely capable of competing with a population of 300 million with a 100 average + some smart Jews, Europeans, Chinese emigrants and 4 sigma third worlders.

In the end, and this isn’t just because I mostly like the Chinese, I truly think this makes a major war less likely and therefore means those of us living in major Western (and Chinese) cities are more likely to keep on living.

Context

Goodguy 2rafa 3mo ago · Edited 3mo ago

I think Hong Kong has been making it clear for a long time that the main problem with Chinese isn't the human hardware, it's the culture. I don't think I can think of any clearer example of "the human hardware is fine, but the culture sucks" than the East Asian model of humanity. Of course their culture is much better than some other cultures, but it is largely stuck in a local maximum that continuously prevents the human hardware from unleashing its full potential. Which is not to say that the human hardware itself is superior to European human hardware. It may or may not be, in any case I see no convincing evidence that it is, despite the 105 IQ data point. It might even be inferior, although if it is then I think it is probably at most only slightly inferior. But probably the only way we can really find out is if we can figure out how to liberate them from some of their culture.

Context

SecureSignals Training the Aryan LLM 2rafa 3mo ago

They were always capable, they just needed to borrow the Silicon Valley move fast and break things culture in addition to the technical foundation.

They needed to borrow the Culture and they needed to borrow the technical foundation, so this still seems pretty much aligned with the HBD stans to me, who never doubted their intelligence or ability to adopt and improve upon Western innovations. Now if LLMs had had the OpenAI-tier breakthrough in China that would have been a challenge to the HBD stans, but this development basically aligns with the HBD take on the comparative advantage of Chinese talent in adopting Western stuff and then making marginal improvements with their own intelligence and grit.

Context

DaseindustriesLtd late version of a small language model SecureSignals 3mo ago

Now if LLMs had had the OpenAI-tier breakthrough in China that would have been a challenge to the HBD stans, but this development basically aligns with the HBD take on the comparative advantage of Chinese talent in adopting Western stuff and then making marginal improvements with their own intelligence and grit.

The problem is that there haven't been substantial breakthroughs in LLMs in the West too. China runs Transformers and you guys run Transformers. I see Western papers full of unnecessarily clever bullshit that doesn't really work, and I see Chinese papers full of derivative bullshit that barely works. DeepSeek's MLA came out in May, and it remains SoTA cache optimization, and it's actually clever. GRPO, too, was quietly announced and seems to hold up very well despite dozens if not hundreds of cleverer results by "crazy geniuses" in the West (increasingly Indian). Today, the Chinese innovate on exactly the same plane.

I think it's time to admit that the famed Western creativity is mostly verbal tilt plus inflated self-esteem, not an advanced cognitive capability. I'm mildly surprised myself.

Context

SecureSignals Training the Aryan LLM DaseindustriesLtd 3mo ago · Edited 3mo ago

I think it's time to admit that the famed Western creativity is mostly verbal tilt plus inflated self-esteem, not an advanced cognitive capability. I'm mildly surprised myself.

Trust me, I hope I'm wrong! But the fact is, as I go throughout my day 99% of the innovations I rely on and impact my daily life and our economy as a whole were invented in the West, and have been refined/manufactured/redesigned/made cheaper in China. Not the other way around, and if it were the other way around surely you would point to a HBD explanation. Yes, I do think there's an HBD basis for that and it would be absurd to deny that, a priori it would be silly to doubt there's an HBD basis for any sort of stark pattern like that one Murray observes. I don't think LLMs are a counterexample of that trend.

It would be like if China made a better and cheaper Tesla than Musk, OK that's great but it doesn't really contradict the observation that these innovations are born in the West and then get adopted and modified/improved in China.

The problem is that there haven't been substantial breakthroughs in LLMs in the West too.

Honestly this feels like a cope to me. There obviously was a breakthrough in LLMs in the West: politically, economically, technologically, culturally. It wasn't born in China, but they obviously have a significant part to play downstream of their undeniable talent pool.

It's hard to say Deepseek would have accomplished these things without drafting on OpenAI's introduction of LLMs to the world, and all of the downstream political, economic, geopolitical, cultural impact resulting from that disruption- and it was OpenAI that did the disrupting there is simply no denying. On the other hand we know OpenAI did not need Deepseek.

Context

DaseindustriesLtd late version of a small language model SecureSignals 3mo ago

Honestly this feels like a cope to me. There obviously was a breakthrough in LLMs in the West: politically, economically, technologically, culturally. It wasn't born in China, but they obviously have a significant part to play downstream of their undeniable talent pool.

What are you talking about? Have you stopped reading my post there?

It's hard to say Deepseek would have accomplished these things without drafting on OpenAI's introduction of LLMs to the world,

Here's what I think about this. The Chinese are not uncreative. It's worse: they're cowardly, conservative, and avoid doing exploratory shit that seems high-risk, and they buy into your theory of their own inferiority, and steelman it as “good at execution”. As Wenfeng says:

Another reason that domestic large models have rarely dabbled in innovation at the architectural level before is that few people have dared to go against the stereotype that America is better at the technological innovation from 0 to 1, while China is better at the application innovation from 1 to 10. Not to mention that this kind of behavior is very unprofitable -- the usual thinking is that, naturally, in a few months, someone would have made the next generation of models, and then Chinese companies can just follow the leader, and do a good job of application. Innovating the model structure means that there is no path to follow, and there are a lot of failures to go through, which is costly in terms of time and money.

There will be more and more hardcore innovation in the future. It may not be yet easily understood now, because the whole society still needs to be educated by the facts. After this society lets the hardcore innovators make a name for themselves, the groupthink will change. All we still need are some facts and a process.

You are watching these facts come in.

I repeat, I've been a believer in this theory of “fundamental Western progress, incremental Eastern refinement”. Eight years into Transformer era (Ashish Vaswani et al., 2017), I start to doubt it. Whites are just people who are sexually attractive, relatively trustworthy, and provide linear labor to verbal-tilted Brahmins who max corporate KPIs leveraging even more verbal-tilted Ashkenazim like Altman who are good at raising capital.

That's about it at this point.

The most credible, big-brained, innovation-heavy alternative to Transformer was Mamba (Tri Dao, Albert Gu). It also didn't go far. I've read perhaps hundreds of Western papers of purportedly brilliant innovations, they're narcissistic shit that doesn't scale. Sepp Hochreiter is peddling his xLSTM that has no utility, Schmidhuber is making some boastful noises as usual, Sutskever and Karmack are supposedly doing… something. Mistral is dead in the water…

I am not saying this out of racism. I am reporting on what I see happening. All historical inventions and discoveries of note? Yes, those were White work. But time is accelerating. Maxwell's equations seem not far from "muh gunpowder" of the Middle Kingdom now, to my eyes. Do something new, folks. You're losing face.

On the other hand we know OpenAI did not need Deepseek.

Sure, OpenAI needed another company. OpenAI built its legend on scaling up a Google paper. By your own standards, it's not creative brilliance. It's the sort of talent you condescendingly concede Chinese people have.

Context

SecureSignals Training the Aryan LLM DaseindustriesLtd 3mo ago

I start to doubt it. Whites are just people who are sexually attractive, relatively trustworthy, and provide linear labor to verbal-tilted Brahmins who max corporate KPIs leveraging even more verbal-tilted Ashkenazim like Altman who are good at raising capital.

That's about it at this point.

Again, it seems very doubtful to me that these groups have significantly different distributions of sexual attractiveness, trustworthiness, labor value, verbal, IQ, but they are all the same when it comes to affinity for breakthrough innovation. People think differently...

I actually agree with Wefang's summary you posted, but Wefang is implying basically stereotype threat: that the Chinese don't innovate from 0 to 1 because there's a stereotype that job belongs to the West. Ok, so we are in the familiar HBD-denial territory by using Stereotype Threat to explain a very long-standing disparity in behavior: the Chinese don't innovate from 0 to 1 because there's a stereotype that they don't do that. I think you're leaning into that as well.

I don't think architectural innovations, even very clever ones the Chinese come up with, are the "0 to 1" that was already accomplished by OpenAI and the West. And as my last post said, that is not just or even mostly about the papers, it's about the technological, political, economic, geopolitical influence- they got the ball rolling on those fronts. I don't doubt the ability of the Chinese to perhaps even outcompete the West on going from 1 to 10 for the reasons you said, but 0 to 1 was already done by the West and this pattern is consistent with that stereotype which HBD stans claim is derived from differences in cognitive profile.

I am not saying this out of racism. I am reporting on what I see happening. All historical inventions and discoveries of note? Yes, those were White work. But time is accelerating. Maxwell's equations seem not far from "muh gunpowder" of the Middle Kingdom now, to my eyes. Do something new, folks. You're losing face.

Sure, maybe we'll be proven wrong! But it hasn't happened yet, LLMs are following the "West does 0 to 1, then West competes with China on 1 to 10" pattern that follows the basic stereotype.

Context

DaseindustriesLtd late version of a small language model SecureSignals 3mo ago

Wefang is implying basically stereotype threat: that the Chinese don't innovate from 0 to 1 because there's a stereotype that job belongs to the West

Wenfeng.

No, it's not a stereotype threat argument, it's an argument about perceived opportunity cost of exploration vs exploitation which is miscalibrated in the age of large domestic revenue generators. He's not arguing they should be like Whites. He's arguing they can now afford to do what Whites do compulsively, if you will.

Your condescension and willful misinterpretation will be your undoing in this dialogue and outside it.

I look down on WEIRDs for one more reason. You are ultimately tool-like, your mentality is that of servitors and cowering peasants. Your "internal dignity" is inextricably bound to collective judgement, you feel the need to justify your value to some imagined audience, to some Baron, some market or some Moral Community. You are ashamed of brute, terminal-value ethnocentrism the sort of which Judaism preaches, so you need to cling to those spiritualist copes wrapped in HBD lingo. "H-here's why we are Good, why we still deserve a place under the sun, sire!" This exposes you to obvious predation and mockery by High-Skill Immigrants like Count.

On the object level: yes, probably on average the Chinese are indeed less "creative" even with optimal incentives, and this has obvious implications at the tails. (though if we think OpenAI is an impressive example of bold creativity, what about NVidia? What did Jensen "merely improve"? As a CEO, he's roughly in the same league as Altman and Musk, I think). The question – raised by R1 there – is, how many more True Breakthrough innovators do we even need before innovation begins to accrete on itself without human supervision? Maybe just a handful. Again, there's been virtually no fundamental progress in AI since 2017, and we're all doing just fine. It may be that architecturally V3 is more sophisticated and innovative than the modern OpenAI stack. Imagine that. After all, Western geniuses are afraid to show their work these days.

Incidentally, I myself have submitted several minor ideas to DeepSeek; maybe they found use for those, maybe not, but I'll find use for the result of their labor and not cope that they needed my input.

It may be that the mode of production implied by the stage of our technological development makes your race, with all its creative perks and industrial drawbacks, less economically useful than it used to be. This only means you need to move that much faster to find reasons to protect your interests unconditionally, before everyone turns equally economically useless.

Context

Exotic_cetacean Aesthetics over ethics DaseindustriesLtd 3mo ago

On the object level: yes, probably on average the Chinese are indeed less "creative" even with optimal incentives, and this has obvious implications at the tails

Started arguing, seemingly about Chinese HBD, leaned into some tangential points at best, non-sequiturs at worst, gave some half-baked takes about why white people suck, then unceremoniously conceded the argument. Scratching my temple wondering what was your game here
My guess is you had something pent up that might have been interesting had it been properly developed and formatted as a top level post

Context

More comments

SecureSignals Training the Aryan LLM DaseindustriesLtd 3mo ago

Incidentally, I myself have submitted several minor ideas to DeepSeek; maybe they found use for those

DeepSeek results copped from the White Man, confirmed.

Seriously though, my only point was that I don't think the HBD perspective is surprised that the Chinese are doing well competing on the development of LLMs, after that technological race was kicked off by the West. No more, no less. And I think a lot of your criticisms of Western AGI establishment are cogent. I don't think race is a deprecated question, it may even become more important on the eve of AGI.

Context

BurdensomeCount Messenger Status: Shot 2rafa 3mo ago · Edited 3mo ago

The Chinese may be smart but they're uninspired robots was always finest grade copium. It turns out that in the end whites as a race are certifiably mid and they don't take the news of this very well.

EDIT: I protest this ban. I sincerely mean what I say here and don't think calling whites mid as a race is even an insult, it would only be perceived as such by someone who puts particular pride in the race they were born into by chance. Had I said blacks as a race are mid nobody would have raised even a peep (and fwiw, my opinion of whites is higher than my opinion of blacks).

I'm not joking or trolling here. Seriously considering decamping off to Twitter at this moment (would have been Bluesky because I think the algorithm there is better, but alas, like for lots of other things, the worst thing about Bsky is the people there).

Also:

It turns out that in the end whites as a race are certifiably mid and they don't take the news of this very well.

Really proving the point of my statement with that ban.

Context

Gooofuckyourself BurdensomeCount 3mo ago

I think he's got a point, and the ban was retarded. The funny part is that he made the same slightly catty remarks about white identity people that certain mods do, it was apparently just the wrong kind of catty.

Context

stolen_brawnze I am the way Sia says "batteries" in that one song. BurdensomeCount 3mo ago

Had I said blacks as a race are mid nobody would have raised even a peep

Come on, man. You're living in a city built by ethnic Britons, and you've been on record relishing their demographic demise as you enjoy the institutions they built. It gives people the creeps and you know it.

Context

Amadan Enjoying my short-lived victory BurdensomeCount 3mo ago · Edited 3mo ago

Okay, you're back to baiting. You've been told about this before. A lot.

I'm kind of torn on what to do here. You're a long-timer who many people enjoy reading, you have interesting perspectives, and you've earned one (but only one) AAQC.

On the other hand, you seem to always just be biding your time until you can unload more sneering at "mayos." I am not fond of people who are only here to shit on the people they hold in contempt, who are just itching to let those people know how much contempt they hold them in.

You are overall someone who probably is a net positive here, as annoying as you are, but you've got a long rap sheet, and the last few bans have been of increasing length, with notes that this is your "final warning" and you probably deserve a permaban next time. In fact, at one point you were permabanned but enough members spoke up in your favor that we reduced it to 20 days.

That was four bans ago.

Most people would have been permabanned by now. You probably should be permabanned. You do seem to have a pattern of toning it down for a while after you return from a ban, but you don't really learn your lesson, because the seething contempt is always boiling just below the surface.

Against my better judgment, I'm only banning you for 90 days. (That was your last ban length also.) This comment in itself was pretty mild, it's just that it's the kind of comment you make over and over and over again every time you think you can get away with some more baiting.

Next time will probably depend on which mod deals with you, but I will have no mercy.

ETA: Post-ban editing to whine about the ban IMO deserves a permaban, but I'll throw it to the other mods to decide if they want to shorten it.

Really proving the point of my statement with that ban.

Dude, accusing me of all people here of feeling some sort of white ethnic defensiveness is both ridiculous and proves you just meant to insult people. I don't care if you think I, personally, am "mid" because of my mayo pallor, but you are not allowed to just throw generalized insults at your racial outgroup.

Had I said blacks as a race are mid nobody would have raised even a peep

You know this is not true. People say shit about whites, blacks, Jews, and Indians all the time here, but just dumping on an entire race because you want to express your contempt has always been modded.

Context

ToZanarkand Some day the dream will end Amadan 3mo ago

On the other hand, you seem to always just be biding your time until you can unload more sneering at "mayos." I am not fond of people who are only here to shit on the people they hold in contempt, who are just itching to let those people know how much contempt they hold them in.

People say far worse things here about Jews all the time. Not that I'm asking for such posters to be banned but I'm not sure why calling whites "mid" (not a statement I personally endorse) is crossing the line while far more extreme statements about other groups get a pass.

Context

Amadan Enjoying my short-lived victory ToZanarkand 3mo ago

People get banned for saying worse things about Jews, too.

As always, context and history is important and I already explained why this post received a harsh ban for a relatively mild comment.

Context

ToZanarkand Some day the dream will end Amadan 3mo ago

Another (probably irrelevant) plea for some degree of clemency when it comes to BC, if only because while provocative he adds some interesting ideological diversity IMO. OTOH I recognise that being a mod is a thankless task and you've had to put up with him for longer than I've enjoyed him.

Context

sun_the_second Amadan 3mo ago

I think mods should either be AAQC-blind or make it explicit in the rules that if one has "good contributions", one can get away with blatant asking-for-it shitlording.

Context

Amadan Enjoying my short-lived victory sun_the_second 3mo ago

We have never been AAQC-blind and we've always been explicit that good contributors get more slack. The slack is finite, though. We've banned people with tons of AAQCs for repeated shitlording.

Context

mildly_obsolete Now 80% less juvenile. Amadan 3mo ago · Edited 3mo ago

90 days is already very harsh, please do not permaban him.

Context

WhateverHappenedToNorman Amadan 3mo ago

Can't we reach a compromise in which he's not banned, but we're free to call him a Jeet?

Context

ToZanarkand Some day the dream will end WhateverHappenedToNorman 3mo ago

This is the sort of diplomacy most international conflicts are sorely missing.

Context

Felagund Amadan 3mo ago

I like having BurdensomeCount around, and would be sad to see him banned.

My opinion probably doesn't count for all that much, but I like to think I'm one of the relatively more measured users here.

Context

Exotic_cetacean Aesthetics over ethics BurdensomeCount 3mo ago

Mid by what metric, pray tell?

Context

BurdensomeCount Messenger Status: Shot Exotic_cetacean 3mo ago · Edited 3mo ago

Intelligence, creativity, humour, how good they look after age 30, you name it etc. etc.

True creativity comes through working with constraints imposed like China (be it sanctions or cost pressure or whatever). Unconstrained problems are often underdetermined which means every midwit can find their own "unique" solution which then lets them pretend they are special.

Context

TwiceHuman BurdensomeCount 3mo ago · Edited 3mo ago

In case you're banned, I suppose you can't reply to this. But I will have to disagree that whites are mid.

Most of the giants of humanity (Einstein, Tesla, Hawking, any "great person") were white. This is a good marker of intelligence. Asians are better at rote memorization, but that is a very bad marker of higher intelligence, and it's mostly a result of spending 40% more hours studying on average.

I will have to disagree with creativity too when it means "originality" due to the collectivist nature of Asia. If you mean "artistic skills" however, I will have to agree with you, asians win.

Working with constraints results in creativity for everyone. There's a reason why writers block mostly occur as a result of a blank page. This is how the human mind works, and it's merely a coincidence that the Chinese are more restrained at the moment.

As for "How good the look after age 30", I mostly agree, but it doesn't seem very related to other metrics.

Context

ToZanarkand Some day the dream will end TwiceHuman 3mo ago

Most of the giants of humanity (Einstein, Tesla, Hawking, any "great person") were white

Classifying Einstein as white is somewhat controversial (at least around these sorts of places)

Context

TwiceHuman ToZanarkand 3mo ago

I see, I just went by skincolor. If possible, I don't want to overcomplicate things by taking "jews are in a super-position of white and non-white, collapsing to the state which benefits them the most at any given time" seriously. Genetically they might be a little different, though, I'm even open to the idea that jews are objectively superior in some sense (e.g. often intelligent), but I think they're also inferior in others. The use of deception is an indicator that one has difficulties competing fairly, after all.

Despite being white I don't care that much if another race is "superior" though, the only hill I'm willing to die on is that "mid" is too harsh an assessment

Context

SoulFire a natural neural network TwiceHuman 3mo ago

route memorization

I assume you mean 'rote memorization'. Unless there's studies about Chinese pathfinding ability.

Context

TwiceHuman SoulFire 3mo ago

Oops, yeah, thanks! But if you use the Method of Loci, you can technically have both!

Context

Exotic_cetacean Aesthetics over ethics BurdensomeCount 3mo ago

Okay, now I see you were joking, good thing I decided to check before sperging out with a serious rebuttal

Context

sarker It isn't happening, and if it is, it's a bad thing DaseindustriesLtd 3mo ago

Wenfeng is invited to government functions so I simply don't believe that they are not on good terms with the state and I'm skeptical that they are less tied to the state than openai.

Not that this should change much - they still have a good model, though I wouldn't exactly trust the headline training cost numbers since there's no way to verify how many tokens they really trained the model on.

Context

DaseindustriesLtd late version of a small language model sarker 3mo ago

That's the point: He is invited NOW, after "suddenly" shipping a model on Western Frontier level.

7 months ago I have said:

We don't understand the motivations of Deepseek and the quant fund High-Flyer that's sponsoring them, but one popular hypothesis is that they are competing with better-connected big tech labs for government support, given American efforts in cutting supply of chips to China. After all, the Chinese also share the same ideas of their trustworthiness, and so you have to be maximally open to Western evaluators to win the Mandate of Heaven.

Presumably, this was true and this is him succeeding. As I note here.

As for how it used to be when he was just another successful quant fund CEO with some odd interests, I direct you to this thread:

The Chinese government started to crack down on the quant trading industry amid economic slowdown, a housing crisis and a declining stock market index.

The CSI300 (Chinese Blue Chip Index) reached an all-time low. They blamed high frequency traders for exploiting the market and causing the selloff.

Banned a quant competitor from trading for 3 days

Banned another from opening index futures for 12 months

Required strategy disclosures before trading

Threatened to increase trading costs 10x to destroy the industry High-Flyer faced extinction. (High-Flyer’s funds have been flat/down since 2022 and has trailed the index by 4% since 2024)

so I stand by my conjectures.

they still have a good model, though I wouldn't exactly trust the headline training cost numbers since there's no way to verify how many tokens they really trained the model on

So you recognize that the run itself as described is completely plausible, underwhelming even. Correct.

What exactly is your theory then? That it's trained on more than 15T tokens? 20T, 30T, what number exactly? Why would they need to?

Here's a Western paper corroborating their design choices [Submitted on 12 Feb 2024]:

Our results suggest that a compute-optimal MoE model trained with a budget of 1020 FLOPs will achieve the same quality as a dense Transformer trained with a 20× greater computing budget, with the compute savings rising steadily, exceeding 40× when budget of 1025 FLOPs is surpassed (see Figure 1). … when all training hyper-parameters N, D, G are properly selected to be compute-optimal for each model, the gap between dense and sparse models only increases as we scale… Higher granularity is optimal for larger compute budgets.

Here's DeepSeek paper from a month prior:

Leveraging our architecture, we subsequently scale up the model parameters to 16B and train DeepSeekMoE 16B on a large-scale corpus with 2T tokens. Evaluation results reveal that with only about 40% of computations, DeepSeekMoE 16B achieves comparable performance with DeepSeek 7B (DeepSeek-AI, 2024), a dense model trained on the same 2T corpus. We also compare DeepSeekMoE with open source models and the evaluations demonstrate that DeepSeekMoE 16B consistently outperforms models with a similar number of activated parameters by a large margin, and achieves comparable performance with LLaMA2 7B (Touvron et al., 2023b), which has approximately 2.5 times the activated parameters. Evaluation results show that DeepSeekMoE Chat 16B also achieves comparable performance with DeepSeek Chat 7B and LLaMA2 SFT 7B in the chat setting. Encouraged by these results, we further undertake a preliminary endeavor to scale up DeepSeekMoE to 145B. The experimental results still validate its substantial advantages over the GShard architecture consistently. In addition, it shows performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations.

As expected they kept scaling and increasing granularity. As a result, they predictably reach roughly the same loss on the same token count as LLaMA-405B. Their other tricks also helped with downstream performance.

There is literally nothing to be suspicious about. It's all simply applying best practices and not fucking up, almost boring. The reason people are so appalled is that American AI industry is bogged down in corruption covered with tasteless mythology, much like Russian military pre Feb 2022.

Context

ThenElection DaseindustriesLtd 3mo ago

It's all simply applying best practices and not fucking up, almost boring.

It's pretty weird: there's nothing there that any of the big labs in the West should have trouble replicating a hundred times over, and DeepSeek still managed to make something that can trade blows with them (and subjectively win, more often than not).

Might it really be just clarity of purpose leading to focusing on what matters? About a week ago, I remember Claude lecturing me, apropos of nothing, a bit about how it's best to buy from local bookstores instead of online retailers in response to me asking about what kind of textbook would be used for a particular course. I've not experienced DeepSeek doing anything even close to that, and it makes me wonder if the extraneous post-training being lathered on is the real difference here. Western models get distracted and are pulled in a thousand different directions, while DeepSeek can focus on what's relevant.

Context

sarker It isn't happening, and if it is, it's a bad thing DaseindustriesLtd 3mo ago

I'm not impressed by "they work in a field censured by the state, therefore they have no state connections". Jack Ma was also (personally!) censured by the state, and he's certainly connected. In the US, the DOJ seeks to break up Google. The Sacklers got sued into oblivion. All these people are connected - getting rekt by government action is an occupational hazard of being Noticed by the government, and those who are Noticed typically try to ingratiate themselves.

Thanks for the links about the model training, that's interesting reading.

Context

dr_analog razorboy DaseindustriesLtd 3mo ago · Edited 3mo ago

This is not a Sputnik moment for the US. The US has a secure and increasing lead due to bog standard logistics and capital advantage, as always. What this should be is “are we the baddies?” moment. Also, it's a moment to ask oneself how high are margins on Western model providers, and whether it's a true free market. Because Liang Wenfeng himself does NOT think they're that far ahead in efficiency, if they are ahead at all.

If DeepSeek was a Chinese psyop this would be a good in-kind comment :futurama-suspicious-fry:

But more seriously, why is Facebook's Lllama so lousy by comparison if the labs are hiding their true edge? DeepSeek is presumably what they wish they had released and their AI team do not seem like dummies.

Is the implication that they deliberately released a fat model even though they can go leaner? Or are we writing off Facebook for this discussion?

Also this would imply a level of collusion that doesn't seem sustainable.

Context

DaseindustriesLtd late version of a small language model dr_analog 3mo ago

But more seriously, why is Facebook's Lllama so lousy by comparison if the labs are hiding their true edge? DeepSeek is presumably what they wish they had released and their AI team do not seem like dummies.

You've probably seen that bizarre teamblind thread. Meta is completely blindsided by DeepSeek. They are "moving frantically to dissect deepsek and copy anything and everything we can from it." It's pathetic.

Basically there's no secret: they suck and LLaMA sucks, it's a soft low-expectations research sinecure for people who want to publish papers and have weekends. Why did Timothée Lacroix and Guillaume Lample leave LLama team to found Mistral? And why did Mistral 7B destroy Llama-30B of the same generation (and currently mistral-123B is ≥ LLama-405B despite drastic difference in compute access)? Because they're better than that.

Llama is simply a bad yardstick. They dominate mindshare for reasons unrelated to their impressiveness. DeepSeek competes with industry leaders.

Wenfeng soon after founding DeepSeek V2, June 2024:

Liang Wenfeng: If the goal is just to make applications, then it is reasonable to follow the Llama architecture and start the product in a short period of time. But our goal is AGI, which means we need to research new model structure to realize stronger model capability with limited resources. This is one of the basic research that needs to be done to scale up to larger models. In addition to the model structure, we have done a lot of other research, including how to construct data, how to make the model more human-like, etc., which are all reflected in the models we released. In addition, Llama's architecture, in terms of training efficiency and reasoning cost, is estimated to be already 2 generations behind compared to the foreign state of the art. […] First of all, there is a gap in training efficiency. We estimate that compared to the best domestic or foreign level, the difference in model structure and training dynamics results in twice the compute cost for the same performance. In addition, there may also be another 2x gap in training data efficiency, that is, we need twice the training data to reach the same performance. Combined, that's four times more compute. What we're trying to do is to keep closing these gaps.

GPT-4o-mini is probably an 8b dense model. Frontier labs are efficient and have high margins. OpenAI and Anthropic are recouping their capex and exploiting captive audience. That's all.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.

Culture War Roundup for the week of January 20, 2025

Jump in the discussion.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats