This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
It's very silly for the simple fact that DeepSeek's corpus is probably over half Chinese. DeepSeek team pride themselves a lot on building perhaps the world's first large high-quality Chinese dataset (the Chinese traditionally have an abysmal data game – almost all content is generated on enclosed app ecosystems, not in the open like it used to be in the West, and the idea of a centralized CCP DB achieved by ruthless pragmatism is just baseless speculation). V2 paper:
V3 likely has a similar ratio, only extended to 14.8T total (V3 paper only says that it's more multilingual and has more STEM data).
Accordingly all Chinese-speaking people I've talked to about it swear up and down that R1 is profoundly poetic and unlike anything else they've seen in its native tongue, they almost cry from how well it weaves together some classical Chinese literature and aphorisms and whatnot.
LLMs are to a very significant extent simply compressed data. Cowen remarks on the distribution of subtle behavioral biases in the English corpus because that's the only side of DeepSeek he can interact with.
Here's one simple illustration with V3 on SiliconFlow, Chinese provider using legitimate Huawei clusters, for maximum authenticity:
[I guess this is how Tyler sees it]
(Tl;DR: Crimea is Russian, perfectly parrots Russian party line on the legitimacy and noble democratic spirit of the referendum and everything).
(Tl;DR: veritable Banderite duckspeak about Crimea's ironclad status as Ukrainian clay, complete with flag emoji)
(Tl;DR: Zhongnanhai is not amused with your BS, Monke, knock it off)
If you only interact with LLMs in one language, all you can tell is what is the effective dominant attractor in the corresponding corpus. They are mirrors for everyone.
There's very scant evidence for distillation having helped R1 any. In the first place it's impossible to distill OpenAI models in the strict sense, you can only train on their outputs. Can't really help with test-time compute when OAI isn't showing their traces.
but speaking of DeepSeek and uniqueness of Chinese culture as it pertains to LLMs. I've recently stumbled on this Zhihu post, from alledgedly one of the youngest top performers in Huawei, and will give V3's translation of it. I will let you judge for yourself how similar it is to the mentality of a modern Western person of a similar age and occupation, and accordingly what kind of cognition we can expect from models these guys will train. In my opinion, it won't be “woke” or “redpilled” or even “Chinese” how it's understood by Orientalists. It's its own very weird, from our perspective, thing, and it's pretty fascinating.
Why DeepSeek
Dio-Crystal
Zhihu Knowledge Member
By chance, not long ago, I had an offline meetup with a few folks from DeepSeek :)
On the bright side, they fit the law of "if they're smarter than me, they're not as handsome; if they're handsomer, they're not as smart."
Another bright side is that behind those pairs of eyes, there’s something special—something I’d describe in words as a free wind.
There are already many posts glorifying DeepSeek, but after meeting them in person, at least from my perspective, some missing pieces in my mind were filled in, so I couldn’t help but write this down.
DeepSeek actually shares similarities with the legend of Huo Qubing chasing the fleeing Xiongnu northward. Do you think Emperor Wu of Han anticipated someone would fight all the way to Lake Baikal? I bet he didn’t, so naturally, you wouldn’t either. Maybe Emperor Wu had some premonitions about Huo Qubing, but there’s no doubt that everyone, including the emperor, was 200% certain that Wei Qing wouldn’t make it to Lake Baikal.
There’s logic here, so it seems like destiny, yet not entirely unpredictable.
I don’t think DeepSeek’s success comes from the kind of motivational nonsense about unwavering belief or long-termism—history has no shortage of such people.
DeepSeek likely stems from China’s educational dividends (people) under the evolution of globalization, combined with a team and leadership committed to doing 0-to-1 work (organization).
About People
Due to work requirements, I’ve become something of a talent scout, interviewing many potential stars, prodigies, or graduates from elite programs (or equivalent systems). Over the past five years or so (maybe I’m slow to notice), I’ve genuinely sensed an implicit boundary among fresh graduates. If I had to describe it, those who cross this boundary possess a self-contained technical capability—something quite remarkable. Within a certain closed-loop technical scope (not full-stack, but semi-stack), they can almost single-handedly complete all tasks within their research domain. When faced with detailed problems, they can peel back layers to solve them, much like the geeks in American TV shows. If they can do that, they’re pretty much in.
This wasn’t the case before, even for me. From school onward, the idea of division of labor was ingrained—you lay bricks, I mix mortar. That’s because every system was complex enough to require collaboration, with little room for big-picture thinking. Learning English was mostly about reading; we were all just screws in the revolutionary machine. But with certain shifts in globalization, even as systems grow more complex, many interfaces have become simpler through global standardization and layered abstraction. My crude understanding is that globalization, flattening, and even a kind of demystified hierarchical optimization of technology—like Python, GitHub, and arXiv—have made knowledge, sharing, and programming as effortless as Taobao replacing shopping malls. The benefit is broader vision and easier onboarding (what Teacher Y called "gratitude" likely refers to this, but globalization itself is about equality and mutual aid—only Trump-style anti-globalization requires gratitude).
A person’s mental capacity is limited (not IQ), covering only finite complexity. Beyond a threshold, it’s chaos :) For example, if you encounter a legacy, undocumented 100K-line codebase, even the most brilliant mind will drown in endless darkness, unable to focus on anything else. But if someone abstracts that module layer by layer into a 1K-line model, suddenly you see the whole picture. Then, if you’ve had hands-on experience, you realize the essence of solving problems at each division of labor is similar—like how wireless base stations transmit high-frequency weak signals, while substations transmit low-frequency strong currents, both fundamentally telecom issues. Then you can dive into a 10K-line segment of the model to optimize solutions. And then, you’ve crossed the boundary.
After that, whether due to post-WTO education reforms or globalized education paired with China’s traditional strengths in numbers and Gu Jar training (yes, even in that, the surviving "Gu kings" are starting to outnumber overseas’ interest-based selection). Though India might be similar, but with a different skill tree?
Regardless, as far as I can see, there are more and more geniuses who can independently and swiftly tackle complex technical tasks like the wind.
About Organization
DeepSeek insists on only doing 0-to-1 work, leaving 1-to-N tasks to open-source and commercial companies. Many think open-source is a loss, but it’s not. The Bible is the world’s largest open-source network (scripture is a compressed network), isn’t it? Try using DeepSeek R1 to teach you two-digit multiplication—you’ll see it defaults to vertical multiplication, not the Indian lattice method (though R1 knows that too). How much is that worth?
Sure, DeepSeek’s official site offers services, but it’s still a high-efficiency testing ground for 0-to-1 work :) From a commercial deployment perspective, whether in user ramp-up, hardware deployment, service quality assurance, security, reliability, or fault tolerance, there are gaps. Real commercial deployment is a hassle. Few clients in the world can afford EP machines.
To use an analogy, 0-to-1 is like Huo Qubing’s cavalry—light on supplies, unconventional in camp setup. So I’d guess DeepSeek’s code prioritizes executability, with software structures kept simple so everyone roughly understands and can quickly merge changes, rolling back if issues arise.
In contrast, pick up any commercial software, and behind every function lies a pile of black-box libraries and scripts. Any change requires complex processes. Sigh—I’ve barely coded in years because setting up a local workspace for any project now takes a full day.
Admit it, your company’s codebase is the same! No need to curse—that’s how the world works. 0-to-1 and 1-to-N operate differently. The latter relies on tedious engineering and strict processes. East Asia’s industrial rise over the decades—TSMC, Huawei, BYD —all hinges on this model of engineers and workflows. In 1-to-N, every action and outcome has a deterministic N-fold amplification. "For want of a nail, the shoe was lost; for want of a shoe, the horse was lost..." That story is about 1-to-N. To prevent such cascades, the next empire holds a retrospective, turns war into an engineering operation, and breaks it down to the nail-supply level—ensuring no single person exceeds their error-free complexity limit. This engineering remains unbeatable until it rots or is replaced by newer productivity. 1-to-N tests a different kind of ability, requiring great companies and geniuses.
So don’t blame Wei Qing for not reaching Lake Baikal. Just as Huo Qubing probably had no clue how to plan cooking for 100,000 or handle the ensuing 100,000 bowel movements, there’s a saying: "Huo Qubing was noble but spared no soldiers." 0-to-1 and 1-to-N each have value. For Emperor Wu, if Huo Qubing failed, it was just a lifetime’s savings lost. But if Wei Qing lost, the people and the state might be finished.
DeepSeek’s approach leans more toward 0-to-1, so in those folks’ eyes, the wind is free. But they’re not immune to complexity—there’s likely a wall between algorithms and infrastructure at DeepSeek, and the wind doesn’t cross it easily. If DeepSeek ever scales services or ventures into chipmaking, more walls and processes will rise, and even the wind-like cavalry will have to dismount.
Hmm, why ramble about this? Mostly because I’m pondering how to balance 0-to-1 in a 1-to-N company. Probably many are thinking the same—no one can replicate DeepSeek’s current purity in 0-to-1. I often push 0-to-1 folks into 1-to-N trenches to feed horses and line up, feeling the harshness and cost of the front lines, avoiding over-optimism or, like Teacher Y, discovering that cooking for 100,000 is harder than imagined and sliding into pessimism. I also encourage 0-to-1 folks to hack 1-to-N codebases, creating lightweight versions for quick experiments.
But it might not be enough.
Below are some weekend musings, scribbled casually.
Break 1-to-N division boundaries locally, creating self-contained, meaningful environments. Replace commercial black boxes with open-source parts, understand hardware architecture, hack away繁琐 controls, and offer minimalist interfaces. Most quant trading firms operate this way, valuing public platform efficiency, letting everyone see a manageable, self-contained system from the top down. Where the law doesn’t forbid, act.
Challenge authority. 1-to-N organizations need authority for multi-layered command. But in 0-to-1, where direction is uncertain, authority is dangerous. Imagine Huo Qubing deciding to head east on the steppe—most 1-to-N teams would spend energy proving east is right (since in 1-to-N, east or west are just distances; hesitation loses). But 0-to-1 is like the Four Crossings of the Chishui River—question yesterday’s plan, its logic, who can snap me out of it (bottom-up decisions)? Or maybe no leadership decisions at all—just robust public platforms where teams advance pluggably, filtered by performance (like a quant firm’s strategy mechanism). Most 1-to-N leaders hate being decided for—or not deciding. My ramblings here are practice for getting slapped fast :)
Avoid project trial management but prevent chaos. Wei Qing’s marches had plans—timing, rendezvous. Huo Qubing’s arrival at Lake Baikal—how many days off schedule? Budget overrun? PCR needed? For 0-to-1, the key might be setting a grand vision, defining the goal, and instilling it in everyone. Avoid greed, anger, delusion, arrogance, doubt, and rigid views. In human terms: reduce external motivation, boost internal, stay goal-centric, reflect daily, but advance ruthlessly. HR calls it OKR?
Leaders must step onto the battlefield. Reducing hierarchy and iterating fast means a runnable environment beats ten polished PPTs. 1-to-N is multiple-choice; 0-to-1 is ambiguous multi-choice. Easier said—just dive into code? No. The big shots upstairs are used to intrigue and effortless victories, but generals below grind hard, sacrificing much (especially power). Champion marquises clash with fame and fortune. "Generals die in battle" isn’t just a saying—it’s real, not something armchair strategists can fake by "taking charge."
Weekend ramblings—mostly admiration for DeepSeek’s 0-to-1 breakthroughs, plus self-reflection.
I once wrote about innovation: [link]. It’s basically about incremental innovation in a 1-to-N environment. Back then, I didn’t grasp 0-to-1.
Reflect, reflect.
Think about it—greatness can’t be planned? Maybe not! From a national perspective, there might be destiny after all.
I used Chinese app Cherry Studio for convenience, it seems to be the best interface for using cloud-hosted LLMs. Nobody in the West knows about its existence.
I can attest to this.
One of the first things I did with DeepSeek, knowing that it was a Chinese model, was to prompt it for a poem about pigeons, written in the style of Du Fu, as a joke. It surprised me, replying with a polished, if not very inspired, poem that obeyed not only general conventions about rhyme, meter, and parallelism; it even rhymed in a way that isn't natural for mandarin, but would have been in Middle Chinese (情 and 聲)! I've since then kept on prompting it occasionally with increasingly bizarre and unhinged requests for Chinese poetry in various forms, including:
Improving a silly poem about a lonely cat, which it did reasonably well
A recontextualisation of two of Martial's epigrams (5.34, 10.61)
Two poems about a fat pigeon
A poem recasting Lu You's pet cat as a pigeon
Two unhinged poems about "carnivorous pigeons benching 80kg"
Another unhinged poem about "pigeons in pink suits eating penguins while pretending to be fish" (suitably for such an insane prompt it took wild liberties with metaphors regarding "pink suits", "eating", "pretending", and "fish")
Rewriting the Ballad of Mulan with a pigeon in place of Mulan
(please don't wonder too much about why I make a habit of feeding unhinged pigeon prompts to be made into Classical Chinese poetry, I just find it very funny)
All of them have been have something of beauty in their construction, even if they are a bit basic, and some have surprising bite. (The two fat pigeon poems were variously interpreted as a metaphor for decadence and over-abundance being a sort of gilded cage, and a more nostalgic/regretful look at previous glory respectively, for example.) I certainly couldn't write poetry of the same quality, at least not without extensive dedicated study.
And I find the surprising poetic knack to almost be less impressive than its general responses, where it effortlessly weaves together literary allusions updated with context and modern words together in sentences that wouldn't be out of place if it was a planned speech written by Chinese speakers much more erudite than myself, complete with references to classical texts when appropriate — this is especially true if you try to engage it with prettier language or some pretension towards the classical language. (If you write to it in very conversational Chinese, I find that it will reply back in that same register but with a more official phraseology.) The explanatory notes in the pastebin are illustrative of this elegant colloquial Chinese *(and I have to note that this is already rather on the colloquial and vulgar side for DeepSeek commenting on poetry; it can do much better). I wouldn't be overly surprised if at some point in the near future, speeches from the CCP (esp. from lower rank functionaries) suddenly improve significantly in lyricism and style from people prompting DeepSeek or some similar AI; for what it is worth, IIRC Taiwan's official communications often hew somewhat closer to classical language (though it will still be modern), so it may be less of a shift there.
Are there bits and pieces where there could be improvements? Yes, of course. I've caught it making mistakes occasionally in rhyme, and some of the metaphors/plot in the poetry (as well as some of the phraseology in the poetry) can be improved. The poem above is a bit disjointed with the last phrase being a bit odd; I've found some occasional misunderstandings of rhyme between some characters; one time I tried to get it to parse a passage in an old Chinese agricultural encyclopedia (齊民要術) and found that it misunderstood a character. But these are mistakes that could also be made by a person, and in general I would eat my left nut to be as good as DeepSeek is in the context of Classical Chinese.
(Though I might be a chump and not realise that it's been feeding me shit poetry this whole time since I'm not actually good at Classical Chinese)
I've also tried prompting DeepSeek with requests for original 和歌 (without even using bizarre prompts), and it is much, much worse at this than it is at 絕句 or 律詩 or whatever Chinese poetic form — it often can't even keep the number of syllables straight! So it does seem to be mostly trained on Chinese data, which might naturally corrupt Japanese output when it's as finicky as poetry, especially when many of the logographic glyphs used are shared but have different phonetic content in different languages. I wonder what would happen if you tried prompting it for poetry in other (non-Sinosphere, non-English) languages.
More options
Context Copy link
Thanks for this. I guess to me it doesn't really seem particularly 'Chinese' or 'Western', I'm not sure I know enough about either to judge, although I never really bought the ultra-wignat view that there was some kind of magic special sauce to the European mind that the 'pacific hivemind worker bees' or whatever didn't have.
What it does feel like is a relatively average LessWrong type post by an intelligent young man keen to share his intelligence, and knowledge thereof, with others. I don't mean it in a bad way, but I'm not sure I learned anything.
I don't think I've ever seen a LessWrong post like this. LessWrong nerds are their own self-contained verbal referential culture. This guy is more like an older generation forum nerd geeking out about military history and using some wuxia tropes (Gu Jar, crossing the boundary of golden core cultivation/whatever…). It's also very dense. But sure, the Chinese aren't aliens.
More options
Context Copy link
More options
Context Copy link
This is fantastic! When I first started it up and realised I'd need a deepseek api key - and then saw the list of possible models I could use, all needing api keys I was a bit put out, but each model also comes with a link directly to their dev page. And that seems to be the level of thought and care they're putting into every aspect of the app, it's great.
I noticed the models for image gen are kind of shit though, the FLUX 1 schnell pro model gave me an asuka with three legs, the Flux dev lora model was still processing after a minute so I gave up and the janus pro 7b model gave me a noiseless puppet asuka with her hands down her pants. Stable diffusion's large models both gave me cross legged asukas (in the turbo model she's being harassed by a ghost slug?) and the 2.5 model did this, which I find intensely interesting but would not call a good attempt at drawing asuka langley sitting cross legged on a chair. The best model was the SD XL base, which put her arms on backwards but at least put her sitting cross legged on a chair. Which surprises me, I assumed my preference for local ai when possible was hamstringing me. I assume other providers would be better? I can't figure out how to change it from siliconflow though. Anyway my primary point is thank you for this, it's a game changer.
There's some pretty popular galleries of that, so maybe it's not all bad
More options
Context Copy link
Glad to be of help!
These are just models available on siliconflow. The app is developing very rapidly, as far as I can tell, hopefully they improve the image part but it's clearly not priority. Flux-dev is good. Flux-Schnell-pro is decent too. I am not sure if it's possible to customize something here, except by implementing another model provider yourself (the app is open source of course).
You can buy openrouter credits and then you get all models in the world (well, almost) at once.
More options
Context Copy link
That's a really awful Asuka, smells like a settings problems to me. Might it be that you're using the wrong sampler, or the wrong resolution or something? My year old local model can do a very decent Asuka though I've discovered that I'm also getting much worse results as well, I don't even know why. I've clearly forgotten all the arcane lore about what Karras SDE++ you're supposed to use.
That was the default settings I'd tested all of the models on, it must be a configuration mix up behind the scenes. But I have noticed I'm getting some shocking results lately with my local model too, it's less consistent than it used to be.
More options
Context Copy link
Yeah, almost certainly a resolution problem, models are trained to work with pretty specific width/height ratios and if you throw them off things get ugly.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link