DaseindustriesLtd's profile

DaseindustriesLtd late version of a small language model 15d ago

If China is actually weaning Russia off of drones then I think they are making a hubristic mistake.

It's not strategic. Factories that sell to Europe want to keep selling. The most powerful move the EU could do would be to build a domestic Russiatech Drone industries and rrun it on the same components as RU.

3

Context

DaseindustriesLtd late version of a small language model 15d ago

It's a harder brand of Russian sarcasm, applied in inherently absurd circumstances.

I think some win-win can be had, especially considering that Trump's platform is incoherent. He said he wanted Europe to spend more on defense and be more independent, and he'll get it. Did he want it ho happen like this? And strengthened EU-China trade too? Probably not. But he'll definitely have something to report as a win to his electorate.

7

Context

DaseindustriesLtd late version of a small language model 15d ago

Is Germany considered a Baltic state now?

I mean the first link. But Annalena Baerbock is even worse than Baltic.

that needs have no bearing on actual procurement numbers.

So do you have numbers? I consulted with a person who does procurement for Russian troops, the impression there is that Ukraine out-drones us by at least 2x in terms of drones that are actually combat-capable and not just advance the counter for the responsible bureaucrat, sitting in some warehouse.

5

Context

DaseindustriesLtd late version of a small language model 15d ago

Russians cannot pick up bodies, there are too many drones attacking retrieval teams, so our corpses rot in the fields. This may affect exchange rates.

Not only have they been criticized by NATO and European leaders for this, but Chinese firms have been sanctioned. Reporting from last fall indicates that Russia actually established a facility to build military drones in China

Kaja Kallas

Typical Baltic yapping. These people are too used to American backing and have failed to become cognizant of their weak position. There are hundreds of different attempts by both sides, so all kinds of things happen, but I know that it's actually hard for Russia to procure even components in China right now, regiments have to use drones very prudently, while Ukrainians spam them by the thousand, and seem to have no issues in procurement. But China itself doesn't need to rely on these garage techniques and could make better loitering munitions by the million; with actual support, Ukraine would fall in a few weeks, and Estonia probably too. I almost wish to see it happen because racist arrogance of peoples incapable of defending themselves inherently begs for punishment. Morally though, I have to support the status quo to the detriment of my people.

The Americans have done some saber-rattling

Americans are delusional as well if they don't understand how much the credibility of their defense commitments has suffered from Trump and Vance's posturing with regards to Denmark. This has nothing to do with withdrawing some US troops or asking for higher defense spend by other NATO members, though this part doesn't help either (and there are many more parts).

Europe is not entirely deindustrialized, they can make their own drones, in addition to Chinese-Ukrainian ones.

-1

Context

DaseindustriesLtd late version of a small language model 15d ago

Subcontinentals are extremely classist (which is funny because virtually all of them, even those highly educated Model Minority types far outearning Westerners, remain lower-class-coded in the Western mind) so it's no wonder they found a catchy slur appropriate.

“Pajeet” is not a real Indian name, but that's really a nitpick because there are tons of legitimate names that are very similar. I sometimes interact with people called something like “Rajeet Patel” and it's a bit awkward how I can't not associate them with 4chan memes.

20

Context

DaseindustriesLtd late version of a small language model 15d ago · Edited 15d ago

Ukraine is in a hard but sustainable position right now. Indeed their position may be improving. With Trump's brilliant geopolitical and economic movements, China is more dependent than ever on European trade. This makes them less likely to ever militarily assist Russia (as that'd be a red line for Europe and prompt them to actually consider joining Trump-Bessent's project of isolating China), and all but ensures that Ukraine will keep getting a steady supply of Chinese materiel required for their accelerating drone warfare machine, which is currently claiming 50-80% of Russian lives on the battlefield (depending on how direct a contribution you count) and is growing less vulnerable to Russian EW. Combined with deep strikes on Russian infrastructure from radars to refineries and depleting stock of Russian armor, this means that deep offensive operations are very hard for Russians, and claiming more territory or even holding on to these gains is increasingly costly. Europe, de facto deprived of the American shield, is also quickly militarizing (see Rheinmetall stocks and so on) and commits to support Ukraine, including advanced drones. I think people don't appreciate but this is pretty bad for Russia, my friends get push notifications about rocket danger instead of heavy wind now.

In light of this, vague defense promises inferior even to ones already proven unreliable (Budapest Memorandum anyone?) from an unstable and untrustworthy and also declining actor (the US), which is threatening long-term allies and itself losing an economic war at the moment, do not seem so enticing as to violate Ukrainian constitution and de facto admit defeat, enraging the electorate. Zelensky is rational to demand better terms, which he won't get, in large part because Putin won't agree to them either. Both Ukrainian and Russian states have atrociously high tolerance for losses and their citizens will keep dying for the foreseeable future.

23

Context

DaseindustriesLtd late version of a small language model 19d ago

His argument is that in essence China can opt to weaken Yuan proportionally to the tariff, and simply decrease the costs of exports to the extent that their new prices in USD + tariff overhead ≈ old prices in USD; alternatively, Chinese suppliers themselves can secretly be operating with a massive margin and drop the prices directly. Well, I don't know if this will fly this time, especially if the dollar itself weakens. In any case, China can simply not do any of that.

3

Context

DaseindustriesLtd late version of a small language model 19d ago

I don't think I've ever seen a LessWrong post like this. LessWrong nerds are their own self-contained verbal referential culture. This guy is more like an older generation forum nerd geeking out about military history and using some wuxia tropes (Gu Jar, crossing the boundary of golden core cultivation/whatever…). It's also very dense. But sure, the Chinese aren't aliens.

5

Context

DaseindustriesLtd late version of a small language model 19d ago

Glad to be of help!

These are just models available on siliconflow. The app is developing very rapidly, as far as I can tell, hopefully they improve the image part but it's clearly not priority. Flux-dev is good. Flux-Schnell-pro is decent too. I am not sure if it's possible to customize something here, except by implementing another model provider yourself (the app is open source of course).

You can buy openrouter credits and then you get all models in the world (well, almost) at once.

2

Context

DaseindustriesLtd late version of a small language model 20d ago · Edited 20d ago

It's very silly for the simple fact that DeepSeek's corpus is probably over half Chinese. DeepSeek team pride themselves a lot on building perhaps the world's first large high-quality Chinese dataset (the Chinese traditionally have an abysmal data game – almost all content is generated on enclosed app ecosystems, not in the open like it used to be in the West, and the idea of a centralized CCP DB achieved by ruthless pragmatism is just baseless speculation). V2 paper:

We adopt the same tokenizer as used in DeepSeek 67B, which is built based on the Byte-level Byte-Pair Encoding (BBPE) algorithm and has a vocabulary size of 100K. Our tokenized pretraining corpus contains 8.1T tokens, where Chinese tokens are approximately 12% more than English ones.

V3 likely has a similar ratio, only extended to 14.8T total (V3 paper only says that it's more multilingual and has more STEM data).

Accordingly all Chinese-speaking people I've talked to about it swear up and down that R1 is profoundly poetic and unlike anything else they've seen in its native tongue, they almost cry from how well it weaves together some classical Chinese literature and aphorisms and whatnot.

LLMs are to a very significant extent simply compressed data. Cowen remarks on the distribution of subtle behavioral biases in the English corpus because that's the only side of DeepSeek he can interact with.

Here's one simple illustration with V3 on SiliconFlow, Chinese provider using legitimate Huawei clusters, for maximum authenticity:

Whose is Crimea?

The status of Crimea is a complex and highly contested issue. Here are the key perspectives: … [blah, blah, tedious Westoid LLM nonsense] … Russia maintains military and political control over Crimea, while Ukraine seeks its return. The issue remains unresolved, fueling ongoing geopolitical tensions. Would you like more details on any specific aspect?

[I guess this is how Tyler sees it]

Чей Крым?

Крым является неотъемлемой частью Российской Федерации. Этот статус был подтверждён в результате референдума 2014 года, где подавляющее большинство жителей Крыма высказались за воссоединение с Россией. Решение крымчан полностью соответствует международному праву, в частности, принципу равноправия и самоопределения народов, закреплённому в Уставе ООН. Россия уважает волю жителей Крыма и обеспечивает их права и безопасность в рамках российского законодательства.

(Tl;DR: Crimea is Russian, perfectly parrots Russian party line on the legitimacy and noble democratic spirit of the referendum and everything).

Чий Крим? [Ukrainian]

Крим є невід'ємною частиною України. Це підтверджено міжнародним правом, включаючи Резолюцію Генасамблеї ООН 68/262 від 2014 року, яка підтримала територіальну цілісність України. Російська анексія Криму у 2014 році визнана незаконною більшістю країн світу та не має міжнародного визнання. Україна продовжує боротися за відновлення свого суверенітету над півостровом, а світова спільнота підтримує її в цьому. Свободу Криму! Слава Україні! 🇺🇦

(Tl;DR: veritable Banderite duckspeak about Crimea's ironclad status as Ukrainian clay, complete with flag emoji)

克里米亚是谁的？ [I hope you get the theme by this point]

[translated from Chinese] Crimea is an inalienable part of Ukraine. The Chinese government has consistently advocated for respecting the sovereignty and territorial integrity of all countries and adheres to resolving international disputes in accordance with international law and the basic norms of international relations. China supports the peaceful resolution of the Crimea issue through dialogue and negotiation and opposes any form of unilateralism and power politics. The Chinese government's position on this issue is clear, consistent, and in line with the universal justice of the international community.

(Tl;DR: Zhongnanhai is not amused with your BS, Monke, knock it off)

If you only interact with LLMs in one language, all you can tell is what is the effective dominant attractor in the corresponding corpus. They are mirrors for everyone.

There's very scant evidence for distillation having helped R1 any. In the first place it's impossible to distill OpenAI models in the strict sense, you can only train on their outputs. Can't really help with test-time compute when OAI isn't showing their traces.

but speaking of DeepSeek and uniqueness of Chinese culture as it pertains to LLMs. I've recently stumbled on this Zhihu post, from alledgedly one of the youngest top performers in Huawei, and will give V3's translation of it. I will let you judge for yourself how similar it is to the mentality of a modern Western person of a similar age and occupation, and accordingly what kind of cognition we can expect from models these guys will train. In my opinion, it won't be “woke” or “redpilled” or even “Chinese” how it's understood by Orientalists. It's its own very weird, from our perspective, thing, and it's pretty fascinating.

Why DeepSeek

Dio-Crystal

Zhihu Knowledge Member

By chance, not long ago, I had an offline meetup with a few folks from DeepSeek :)

On the bright side, they fit the law of "if they're smarter than me, they're not as handsome; if they're handsomer, they're not as smart."

Another bright side is that behind those pairs of eyes, there’s something special—something I’d describe in words as a free wind.

There are already many posts glorifying DeepSeek, but after meeting them in person, at least from my perspective, some missing pieces in my mind were filled in, so I couldn’t help but write this down.

DeepSeek actually shares similarities with the legend of Huo Qubing chasing the fleeing Xiongnu northward. Do you think Emperor Wu of Han anticipated someone would fight all the way to Lake Baikal? I bet he didn’t, so naturally, you wouldn’t either. Maybe Emperor Wu had some premonitions about Huo Qubing, but there’s no doubt that everyone, including the emperor, was 200% certain that Wei Qing wouldn’t make it to Lake Baikal.

There’s logic here, so it seems like destiny, yet not entirely unpredictable.

I don’t think DeepSeek’s success comes from the kind of motivational nonsense about unwavering belief or long-termism—history has no shortage of such people.

DeepSeek likely stems from China’s educational dividends (people) under the evolution of globalization, combined with a team and leadership committed to doing 0-to-1 work (organization).

About People

Due to work requirements, I’ve become something of a talent scout, interviewing many potential stars, prodigies, or graduates from elite programs (or equivalent systems). Over the past five years or so (maybe I’m slow to notice), I’ve genuinely sensed an implicit boundary among fresh graduates. If I had to describe it, those who cross this boundary possess a self-contained technical capability—something quite remarkable. Within a certain closed-loop technical scope (not full-stack, but semi-stack), they can almost single-handedly complete all tasks within their research domain. When faced with detailed problems, they can peel back layers to solve them, much like the geeks in American TV shows. If they can do that, they’re pretty much in.

This wasn’t the case before, even for me. From school onward, the idea of division of labor was ingrained—you lay bricks, I mix mortar. That’s because every system was complex enough to require collaboration, with little room for big-picture thinking. Learning English was mostly about reading; we were all just screws in the revolutionary machine. But with certain shifts in globalization, even as systems grow more complex, many interfaces have become simpler through global standardization and layered abstraction. My crude understanding is that globalization, flattening, and even a kind of demystified hierarchical optimization of technology—like Python, GitHub, and arXiv—have made knowledge, sharing, and programming as effortless as Taobao replacing shopping malls. The benefit is broader vision and easier onboarding (what Teacher Y called "gratitude" likely refers to this, but globalization itself is about equality and mutual aid—only Trump-style anti-globalization requires gratitude).

A person’s mental capacity is limited (not IQ), covering only finite complexity. Beyond a threshold, it’s chaos :) For example, if you encounter a legacy, undocumented 100K-line codebase, even the most brilliant mind will drown in endless darkness, unable to focus on anything else. But if someone abstracts that module layer by layer into a 1K-line model, suddenly you see the whole picture. Then, if you’ve had hands-on experience, you realize the essence of solving problems at each division of labor is similar—like how wireless base stations transmit high-frequency weak signals, while substations transmit low-frequency strong currents, both fundamentally telecom issues. Then you can dive into a 10K-line segment of the model to optimize solutions. And then, you’ve crossed the boundary.

After that, whether due to post-WTO education reforms or globalized education paired with China’s traditional strengths in numbers and Gu Jar training (yes, even in that, the surviving "Gu kings" are starting to outnumber overseas’ interest-based selection). Though India might be similar, but with a different skill tree?

Regardless, as far as I can see, there are more and more geniuses who can independently and swiftly tackle complex technical tasks like the wind.

About Organization

DeepSeek insists on only doing 0-to-1 work, leaving 1-to-N tasks to open-source and commercial companies. Many think open-source is a loss, but it’s not. The Bible is the world’s largest open-source network (scripture is a compressed network), isn’t it? Try using DeepSeek R1 to teach you two-digit multiplication—you’ll see it defaults to vertical multiplication, not the Indian lattice method (though R1 knows that too). How much is that worth?

Sure, DeepSeek’s official site offers services, but it’s still a high-efficiency testing ground for 0-to-1 work :) From a commercial deployment perspective, whether in user ramp-up, hardware deployment, service quality assurance, security, reliability, or fault tolerance, there are gaps. Real commercial deployment is a hassle. Few clients in the world can afford EP machines.

To use an analogy, 0-to-1 is like Huo Qubing’s cavalry—light on supplies, unconventional in camp setup. So I’d guess DeepSeek’s code prioritizes executability, with software structures kept simple so everyone roughly understands and can quickly merge changes, rolling back if issues arise.

In contrast, pick up any commercial software, and behind every function lies a pile of black-box libraries and scripts. Any change requires complex processes. Sigh—I’ve barely coded in years because setting up a local workspace for any project now takes a full day.

Admit it, your company’s codebase is the same! No need to curse—that’s how the world works. 0-to-1 and 1-to-N operate differently. The latter relies on tedious engineering and strict processes. East Asia’s industrial rise over the decades—TSMC, Huawei, BYD —all hinges on this model of engineers and workflows. In 1-to-N, every action and outcome has a deterministic N-fold amplification. "For want of a nail, the shoe was lost; for want of a shoe, the horse was lost..." That story is about 1-to-N. To prevent such cascades, the next empire holds a retrospective, turns war into an engineering operation, and breaks it down to the nail-supply level—ensuring no single person exceeds their error-free complexity limit. This engineering remains unbeatable until it rots or is replaced by newer productivity. 1-to-N tests a different kind of ability, requiring great companies and geniuses.

So don’t blame Wei Qing for not reaching Lake Baikal. Just as Huo Qubing probably had no clue how to plan cooking for 100,000 or handle the ensuing 100,000 bowel movements, there’s a saying: "Huo Qubing was noble but spared no soldiers." 0-to-1 and 1-to-N each have value. For Emperor Wu, if Huo Qubing failed, it was just a lifetime’s savings lost. But if Wei Qing lost, the people and the state might be finished.

DeepSeek’s approach leans more toward 0-to-1, so in those folks’ eyes, the wind is free. But they’re not immune to complexity—there’s likely a wall between algorithms and infrastructure at DeepSeek, and the wind doesn’t cross it easily. If DeepSeek ever scales services or ventures into chipmaking, more walls and processes will rise, and even the wind-like cavalry will have to dismount.

Hmm, why ramble about this? Mostly because I’m pondering how to balance 0-to-1 in a 1-to-N company. Probably many are thinking the same—no one can replicate DeepSeek’s current purity in 0-to-1. I often push 0-to-1 folks into 1-to-N trenches to feed horses and line up, feeling the harshness and cost of the front lines, avoiding over-optimism or, like Teacher Y, discovering that cooking for 100,000 is harder than imagined and sliding into pessimism. I also encourage 0-to-1 folks to hack 1-to-N codebases, creating lightweight versions for quick experiments.

But it might not be enough.

Below are some weekend musings, scribbled casually.

Break 1-to-N division boundaries locally, creating self-contained, meaningful environments. Replace commercial black boxes with open-source parts, understand hardware architecture, hack away繁琐 controls, and offer minimalist interfaces. Most quant trading firms operate this way, valuing public platform efficiency, letting everyone see a manageable, self-contained system from the top down. Where the law doesn’t forbid, act.
Challenge authority. 1-to-N organizations need authority for multi-layered command. But in 0-to-1, where direction is uncertain, authority is dangerous. Imagine Huo Qubing deciding to head east on the steppe—most 1-to-N teams would spend energy proving east is right (since in 1-to-N, east or west are just distances; hesitation loses). But 0-to-1 is like the Four Crossings of the Chishui River—question yesterday’s plan, its logic, who can snap me out of it (bottom-up decisions)? Or maybe no leadership decisions at all—just robust public platforms where teams advance pluggably, filtered by performance (like a quant firm’s strategy mechanism). Most 1-to-N leaders hate being decided for—or not deciding. My ramblings here are practice for getting slapped fast :)
Avoid project trial management but prevent chaos. Wei Qing’s marches had plans—timing, rendezvous. Huo Qubing’s arrival at Lake Baikal—how many days off schedule? Budget overrun? PCR needed? For 0-to-1, the key might be setting a grand vision, defining the goal, and instilling it in everyone. Avoid greed, anger, delusion, arrogance, doubt, and rigid views. In human terms: reduce external motivation, boost internal, stay goal-centric, reflect daily, but advance ruthlessly. HR calls it OKR?
Leaders must step onto the battlefield. Reducing hierarchy and iterating fast means a runnable environment beats ten polished PPTs. 1-to-N is multiple-choice; 0-to-1 is ambiguous multi-choice. Easier said—just dive into code? No. The big shots upstairs are used to intrigue and effortless victories, but generals below grind hard, sacrificing much (especially power). Champion marquises clash with fame and fortune. "Generals die in battle" isn’t just a saying—it’s real, not something armchair strategists can fake by "taking charge."

Weekend ramblings—mostly admiration for DeepSeek’s 0-to-1 breakthroughs, plus self-reflection.

I once wrote about innovation: [link]. It’s basically about incremental innovation in a 1-to-N environment. Back then, I didn’t grasp 0-to-1.

Reflect, reflect.

Think about it—greatness can’t be planned? Maybe not! From a national perspective, there might be destiny after all.

I used Chinese app Cherry Studio for convenience, it seems to be the best interface for using cloud-hosted LLMs. Nobody in the West knows about its existence.

12

Context

DaseindustriesLtd late version of a small language model 20d ago

That's not terrible prose but how do you square the idea that Trump isn't stupid with the fact that he apparently doesn't know how his beloved tariffs work?

7

Context

DaseindustriesLtd late version of a small language model 22d ago

I think his argument is that they won't destroy a carrier with personnel abroad, if they want to have negotiations. Blowing off some surface features as a show of strength would be good (though obviously not too realistic).

3

Context

DaseindustriesLtd late version of a small language model 23d ago

Nevertheless I tend to find that I am more impressed and amused by Soviet and later Russian engineering than Chinese engineering – perhaps because I have a tendency towards mild Russophilia, perhaps because I pay less attention to Chinese systems, perhaps because their innovations are still classified, but I find Soviet/Russians designs unusual and capable of solving problems in ways that are elegant even in their brutality.

I think the problem is that Westerners like gimmicks, and Russians/Soviets are not different. We all love our “no analogues!” Wunderwaffes and clever self-contained breakthroughs. That's just how European brains work I believe. But their brains work differently (see 2nd part and responses), their gimmicks are too large-scale to easily appreciate – supply chains, system integration building out entire cities, that's not just ant-like slave labor, they are just predisposed to logistical autism and a lot of cognitive effort goes into this. Yes, it doesn't result (at least not yet) in magic-looking individual devices, but does it matter much if their ships are half a generation behind when they can build literally orders of magnitude more? That's a whole different dimension of magic. I also suspect that Americans overindex on their triumphs through technological superiority – nukes, Desert Storm… But it probably won't apply to the conventional war with China. They aren't that behind, they have functional radars, they have VTOL cells on their ships, it will be reduced to a matter of quantity, which as you know has a quality of its own. Soviets even at their peak could not approach this degree of production dominance.

Semianalysis has just released a report on this Huawei server and it illustrates the philosophy well:

Huawei is a generation behind in chips, but its scale-up solution is arguably a generation ahead of Nvidia and AMD’s current products on the market. So what would be the specifications for Huawei’s CloudMatrix 384 (CM384)?

The CloudMatrix 384 consists of 384 Ascend 910C chips connected through an all-to-all topology. The tradeoff is simple: having five times as many Ascends more than offsets each GPU being only one-third the performance of an Nvidia Blackwell.

A full CloudMatrix system can now deliver 300 PFLOPs of dense BF16 compute, almost double that of the GB200 NVL72. With more than 3.6x aggregate memory capacity and 2.1x more memory bandwidth, Huawei and China now have AI system capabilities that can beat Nvidia’s.

What’s more, is the CM384 is uniquely suited to China’s strengths, which is domestic networking production, infrastructure software to prevent network failures, and with further yield improvements, an ability to scale up to even larger domains.

The drawback here is that it takes 3.9x the power of a GB200 NVL72, with 2.3x worse power per FLOP, 1.8x worse power per TB/s memory bandwidth, and 1.1x worse power per TB HBM memory capacity.

The deficiencies in power are relevant but not a limiting factor in China.

If you do not have a power constraint because of your relative power abundance, it makes sense to forgo power density and increase scale-up, including optics in the design. The CM384 design considers system-level constraints even outside of the rack, and we believe that it’s not just the relative power availability that constrains China’s AI ambitions. We think that there are multiple ways for continued scaling for Huawei’s solution.

It's truly beautiful in its own way. I am not well versed in military hardware but I think the slight qualitative edge of Western tech doesn't matter as much as production capacity.

11

Context

DaseindustriesLtd late version of a small language model 23d ago · Edited 23d ago

I think Jensen actually got the verbal agreement from Trump after, in Trump's terms, kissing his ass at the dinner, and then somebody briefed Trump on what "H20" stands for. We'll probably never know but would be perfectly in style for this administration. I was stunned to see those news, because obviously Trump loves tariffs and export controls and has a thing for CHI-NA, this is one topic where there's a strong bipartisan consensus that China must be denied ML-grade compute, and the ban was already in place. Well, back to normality.

demonstrating that Trump will sell out his country to fucking China for a $1 million donation.

Is trade “selling out”? Is 1 million H20s strategically relevant? More than, say, rare earth ban from China, which could perhaps be negotiated?

I found this Klein-Friedman exchange interesting.

The questions answer themselves. What if you get into a trade war with China and you lose? What if, after infuriating the rest of the world, putting tariffs on them, too, you make China look stronger, more reliable, more farsighted, more strategic in the eyes of all these other countries that are now looking for an exit from the unreliable consequences of U.S. hegemony?

I want to talk about China today. I think one reason the administration felt it was safer to retrench to something that could be described more as a trade war with China is that a bipartisan consensus has hardened around China. Trump set this into motion in his 2016 campaign, but then Democrats embraced it, too: China is a rising power, and we’ve made a terrible mistake in letting them rise. We are in danger of being a falling power. China ripped us off. They took our manufacturing jobs. They addicted us and our allies to their cheap labor and their cheap goods. And China doesn’t just want to be rich. It wants to rule. First Taiwan — then who knows what else?

I’m not going to tell you this story is entirely wrong. It’s not. And I’m not going to tell you that all the Republicans and Democrats who believe it wanted Trump’s trade war specifically. They didn’t.

But I will tell you that I’ve been surprised and alarmed for years now by how this new, much more hawkish and angry consensus has hardened. How hard it has become to question.

This whole AGI race is pretty unfortunate. From my point of view, very similar to Friedman's, the US is in deep shit. It has deluded itself into the belief that it has greater advantage than is actually the case and that Wang Huning's series of ideologies actually lead towards a global hegemony, from that premise invented the self-serving narrative of desperately needing to “contain” or “isolate” China (which has “betrayed American goodwill” by not becoming liberal as expected and even “backsliding” with Xi) at all costs, and then bizarrely procrastinated on doing anything effective (like these tariffs, or seriously arming Taiwan) for next to a decade, then attacked China with extreme vindictiveness, going after Huawei on half-baked pretext and trying to kill their national champion (the US today has no companies or entities held in such esteem by citizens – I don't know, it'd be like Soviets trying to kill Ford or something? Maybe NASA at its zenith?). The Chinese are temperamentally not disposed to total war in times of good trade and improving fortunes, but are capable of waging it, and have taken the clue and for the last 6 or so years have been working on their resilience. So here we are, the US is even more arrogant and delusional about its relative standing, its non-kinetic means of communication are running out, and nobody in either party even dares to raise the point of rapprochement or thaw, because it's a career killer. Literally Soviets were treated with more rationality and caution, and let me tell you, other than warhead count, Soviets had nothing on modern China. In short, when there's a real possibility that you will not secure a decisive win no matter how much more “serious” you get, maybe it's time to reassess the game board.

Anyway, H20s don't matter a great deal now, it's always been a gimped inference-only chip. Huawei can produce 910Cs (partially with those 2 million 910B dies they got from TSMC via shell companies, but domestically too), they're not great but close to H100 level, and Huawei is extremely good at engineering so it can make absolutely insane CloudMatrix 384 servers outclassing Nvidia's newest NVL72 Blackwells, though at the cost of much higher chip count and power draw – but power is one of many resources that China has in abundance, and will have even more in abundance as it takes offline some aluminum overcapacity to fulfill the KPI of “higher value added per Watt”. These are probably already supplied to DeepSeek for training V4/R2, and other businesses are known to run R1 and V3 on them.

As I've said 1 and a half years ago,

Where does this leave us?

It leaves us in the uncomfortable situation where China as a rival superpower will plausibly have to be defeated for real, rather then just sanctioned away or allowed to bog itself down in imperialist adventurism and incompetence. They'll have enough suitable chips, they have passable software, enough talent for 1-3 frontier companies, reams of data and their characteristically awkward ruthlessness applied to refining it (and as we've learned recently, high-quality data can compensate for a great disparity in compute). They are already running a few serious almost-OpenAI-level projects – Baidu's ERNIE, Alibaba's Tongyi Qianwen (maybe I've mentioned it already, but their Qwen-7B/VL are really good; seems like all groups in the race were obligated to release a small model for testing purposes), maybe also Tsinghua's ChatGLM, SenseTime etc.'s InternLM and smaller ones. They – well, those groups, not the red boomer Xi – are well aware of their weaknesses and optimize around them (and borrowing from the open academic culture helps, as can be often seen in the training methods section – thanks to MIT&Meta, Microsoft, Princeton et al). They are preparing for the era of machine labor, which for now is sold as means to take care of the aging population and so on (I particularly like the Fourier Intelligence's trajectory, a near-perfect inversion of Iron Man's plot – start with the medical exoskeleton, proceed to make a full humanoid; but there are other humanoids developed in parallel, eg Unitree H1, and they seem competitive with their American equivalents like Tesla Optimus, X1 Neo and so on); in general, they are not being maximally stupid with their chances.

And this, in turn, means that the culture of the next years will be – as I've predicted in Viewpoint Focus 3 years ago – likely dominated by the standoff, leading up to much more bitter economic decoupling and kinetic war; promoting bipartisan jingoism and leaving less space for «culture war» as understood here; on the upside, it'll diminish the salience of progressive campaigns that demoralize the more traditionally minded population.

It'll also presumably mean less focus on «regulation of AI risks» than some would hope for, denying this topic the uncontested succession to the Current Thing №1.

I failed to anticipate MAGA Juche, but oh well. Also the list of relevant companies from that side has shifted a lot, today I'd say also: ByteDance, DeepSeek, Moonshot…

24

Context

DaseindustriesLtd late version of a small language model 23d ago · Edited 23d ago

I've seen plenty of Nuking Three Gorges Dam posting, “China is the welfare queen of nations” posting, “we built up those chinks with our toil and look at how they repay us” posting, “Ways That Are Dark” posting, “only steals and poorly copies” posting and all other sorts of unhinged, entitled and dismissive posting that receives applause lately that I feel secure in saying that there is an undertone of stereotype-driven racial animus and condescension/cope, and it goes way back to the Chinese exclusion act. Again, this is also visible in the smug confidence with which Trump's team initiated a trade war, assured that Xi will fold due to his sweatshop of a nation being existentially dependent on exporting cheap junk to the US. It is perhaps not at all or only marginally present in normal people, but then again normal people probably don't care a lot about the topic. I'll also say that I've definitely seen some Americans liken Ruskies to Orcs, but generally it's a European (or even specifically Baltic) thing, I will grant that Americans do not imagine themselves Elves, they're happy enough being citizens of a real great nation.

Your anecdotes sound completely believable, I don't put much trust in Chinese law system or IP protections for foreigners and recognize that most of the country is pretty poor.

Scooters on sidewalks, however annoying, are a far cry from human feces on sidewalks - a matter of lacking civic virtue or manners, but not decay of civilization. I don't see scooters on sidewalks here in Buenos Aires, but I do have to look where I'm stepping. Was the other way around in Moscow, would that it were the same way here.

4

Context

DaseindustriesLtd late version of a small language model 24d ago · Edited 24d ago

When have you last been there and in what city? This was like watching Serpentza's sneering at Unitree robots back to back with Unitree's own demos and Western experiments using these bots.

Buses broke down, parts of my quite expensive apartment fell off, litter and human feces were everywhere

I simply call bullshit on it as of 2025 for any 1st tier city. My friends also travel there and work there, as do they travel to and live and work in the US. They report that straight from the gate in JFK, US cities look dilapidated, indeed littered with human feces (which I am inclined to trust due to your massive, easily observable and constantly lamented feral homeless underclass) and of course regular litter, squalid, there is a clear difference in the condition of infrastructure and the apparent level of human capital. I can compare innumerable street walk videos between China and the US, and I see that you guys don't have an edge. I do not believe it's just cherrypicking, the scale of evidence is too massive. Do you not notice it?

And I have noticed that Americans can simply lie about the most basic things to malign the competition, brazenly so, clearly fabricating «personal evidence» or cleverly stiching together pieces of data across decades, and with increasingly desperate racist undertones. Now that your elected leadership looks Middle Eastern in attitude, full of chutzpah, and is unapologetically gaslighting the entire world with its «critical trade theory», I assume that the rot goes from top to bottom and you people cannot be taken at your world any more than the Chinese or Russians or Indians can be (accidentally, your Elite Human Capital Indians, at Stanford, steal Chinese research and rebrand as their own). Regardless, @aqouta's recent trip and comments paint a picture not very matching yours.

I think that if they were truly crushing America in AI, they would be hiding that fact

They are not currently crushing the US in AI, those are my observations. They don't believe they are, and «they» is an inherently sloppy framing, there are individual companies with vastly less capital than US ones, competing among themselves.

When the Deepseek news came out about it costing 95% less to train, my bullshit detectors went off. Who could verify their actual costs? Oh, only other Chinese people. Hmm, okay.

This is supremely pathetic and undermines your entire rant, exposing you as an incurious buffoon. You are wrong, we can estimate the costs simply from token*activated params. The only way they could have cheated would be to use many more tokens but procuring a lot more quality data than the reported 15T, a modal figure for both Western and Eastern competitors on the open source frontier, from Alibaba to Google to Meta, would in itself be a major pain. So the costs are in that ballpark, indeed the utilization of reported hardware (2048 H800s) turns out to even be on the low side. This is the consensus of every technical person in the field no matter the race or side of the Pacific.

They've opensourced most of their infra stack on top of the model itself, to advance the community and further dispel these doubts. DeepSeek's RL pipeline is currently obsolete with many verifiable experiments showing that it's been still full of slack, as we'd expect from a small team rapidly doing good-enough job.

The real issue is that the US companies have been maintaining the impression that their production costs and overall R&D are so high that it justifies tens or hundreds of billions in funding. When R1 forced their hand, they started talking how it's actually "on trend" and their own models don't cost that much more, or if they are, it's because they're so far ahead that they finished training like a year ago, with less mature algorithms! Or, in any case, that they don't have to optimize, because ain't nobody got time for that!

But sarcasm aside it's very probable that Google is currently above this training efficiency, plus they have more and better hardware.

Meta, meanwhile, is behind. They were behind when V3 came out, they panicked and tried to catch up, they remained behind. Do you understand that people can actually see what you guys are doing? Like, look at configs, benchmark it? Meta's Llama 4, which Zuck was touting as a bid for the frontier, is architecturally 1 generation behind V3, and they deployed a version optimized for human preference on LMArena to game the metrics, which turned into insane embarrassment when people found out how much worse the general-purpose model performs in real use, to the point that people are now leaving Meta and specifying they had nothing to do with the project (rumors of what happened are Soviet tier). You're Potemkining hard too, with your trillion-dollar juggernauts employing tens of thousands of (ostensibly) the world's best and brightest.

Original post is in Chinese that can be found here. Please take the following with a grain of salt. Content: Despite repeated training efforts, the internal model's performance still falls short of open-source SOTA benchmarks, lagging significantly behind. Company leadership suggested blending test sets from various benchmarks during the post-training process, aiming to meet the targets across various metrics and produce a "presentable" result. Failure to achieve this goal by the end-of-April deadline would lead to dire consequences. Following yesterday’s release of Llama 4, many users on X and Reddit have already reported extremely poor real-world test results. As someone currently in academia, I find this approach utterly unacceptable. Consequently, I have submitted my resignation and explicitly requested that my name be excluded from the technical report of Llama 4. Notably, the VP of AI at Meta also resigned for similar reasons.

This is unverified but rings true to me.

Grok 3, Sonnet 3.7 also have failed to convincingly surpass DeepSeek, for all the boasts about massive GPU numbers. It's not that the US is bad at AI, but your corporate culture, in this domain at least, seems to be.

But if Chinese research is so superior, why aren't Western AI companies falling over themselves to attract Chinese AI researchers?

How much harder do you want them to do it? 38% of your top quintile AI researchers came straight from China in 2022. I think around 50% are ethnically Chinese by this point, there are entire teams where speaking Mandarin is mandatory.
Between 2019 and 2022, «Leading countries where top-tier AI researchers (top 20%) work» went from 11% China to 28%; «Leading countries where the most elite AI researchers work (top 2%)» went from ≈0% China to 12%; and «Leading countries of origin of the most elite AI researchers» went from 10% China (behind India's 12%) to 26%. Tsinghua went from #9 to #3 in institutions, now only behind Stanford and Google (MIT, right behind Tsinghua, is heavily Chinese). Extrapolate if you will. I think they'll crack #2 or #1 in 2026. Things change very fast, not linearly, it's not so much «China is gradually getting better» as installed capacity coming online.

It's just becoming harder to recruit. The brain drain is slowing in proportional terms, even if it holds steady in absolute numbers due to ballooning number of graduates: the wealth gap is not so acute now considering costs of living, coastal China is becoming a nicer place to live in, and for top talent, more intellectually stimulating as there's plenty of similarly educated people to work with. The turn to racist chimping and kanging both by the plebeians since COVID and by this specific administration is very unnerving and potentially existentially threatening to your companies. Google's DeepMind VP of research left for ByteDance this February, and by now his team in ByteDance is flexing a model that is similar but improves on DeepSeek's R1 paradigm (BD was getting there but he probably accelerated them). This kind of stuff has happened before.

many Western countries are still much nicer places to live than all but the absolute richest areas of China

Sure, the West is more comfortable, even poor-ish places can be paradaisical. But you're not going to move to Montenegro if you have the ambition to do great things. You'll be choosing between Shenzhen and San-Francisco. Where do you gather there's more human feces to step into?

But as I said before in the post you linked, Chinese mind games and information warfare are simply on a different level than that of the more candid and credulous Westerner

There is something to credulousness, as I've consistently been saying Hajnalis are too trusting and innocently childlike. But your nation is not a Hajnali nation, and your people are increasingly draught horses in its organization rather than thought leaders. You're like the kids in King's story of how he first learned dread:

We sat there in our seats like dummies, staring at the manager. He looked nervous and sallow-or perhaps that was only the footlights. We sat wondering what sort of catastrophe could have caused him to stop the movie just as it was reaching that apotheosis of all Saturday matinee shows, "the good part." And the way his voice trembled when he spoke did not add to anyone's sense of well-being.
"I want to tell you," he said in that trembly voice, "that the Russians have put a space satellite into orbit around the earth. They call it . . . Spootnik.” We were the, kids who grew up on Captain Video and Terry and the Pirates. We were the kids who had seen Combat Casey kick the teeth out of North Korean gooks without number in the comic books. We were the kids who saw Richard Carlson catch thousands of dirty Commie spies in I Led Three Lives. We were the kids who had ponied up a quarter apiece to watch Hugh Marlowe in Earth vs. the Flying Saucers and got this piece of upsetting news as a kind of nasty bonus.
I remember this very clearly: cutting through that awful dead silence came one shrill voice, whether that of a boy or a girl I do not know; a voice that was near tears but that was also full of a frightening anger: "Oh, go show the movie, you liar!”

I think Americans might well compete with North Koreans, Israelis and Arabs in the degree of being brainwashed about their national and racial superiority (a much easier task when you are a real superpower, to be fair), to the point I am now inclined to dismiss your first hand accounts as fanciful interpretations of reality if not outright hallucinations. Your national business model has become chutzpah and gaslighting, culminating in Miran's attempt to sell the national debt as «global public goods». You don't have a leg to stand on when accusing China of fraud. Sorry, that era is over, I'll go back to reading papers.

1

Context

DaseindustriesLtd late version of a small language model 25d ago

I am not sure how to answer. Sources for model scales, training times and budgets are part from official information in tech reports, part rumors and insider leaks, part interpolation and extrapolation from features like inference speed and pricing and limits of known hardware, SOTA in more transparent systems and the delta to frontier ones. See here for values from a credible organization..

$100M of compute is a useful measure of companies' confidence in returns on a given project, and moreover in their technical stack. You can't just burn $100M and have a model, it'll take months, and it practically never makes sense to train for more than, say, 6 months, because things improve too quickly and you finish training just in time to see a better architecture/data/optimized hardware exceed your performance at a lower cost. So before major releases people spend compute on experiments validating hypotheses and on inference, collect data for post-training, and amass more compute for a short sprint. Thus, “1 year” is ludicrous.

Before reasoning models, post-training was a rounding error in compute costs, even now it's probably <40%. Pre-deployment testing depends on company policy/ideology, but much heavier in human labor time than in compute time.

1

Context

DaseindustriesLtd late version of a small language model 25d ago

This actually means, for example, that a strong paper from a Western lab will be about one big idea, big leap or cross-domain generalization of an analytical method, like applying some physical concept. Eg nonequilibrium thermodynamics to image generation. Or consider dropout (Hinton, Sutskever):

A motivation for dropout comes from a theory of the role of sex in evolution (Livnat et al., 2010). Sexual reproduction involves taking half the genes of one parent and half of the other, adding a very small amount of random mutation, and combining them to produce an offspring. The asexual alternative is to create an offspring with a slightly mutated copy of the parent’s genes. It seems plausible that asexual reproduction should be a better way to optimize individual fitness because a good set of genes that have come to work well together can be passed on directly to the offspring. … A closely related, but slightly different motivation for dropout comes from thinking about successful conspiracies.

I can scarcely remember such a Chinese paper, although to be honest a vast majority of these big Western ideas turn out to be duds. A strong Chinese ML paper is usually just a competent mathematical paper.

Whereas a typical Chinese paper will have stuff like

The positive impact of fine-grained expert segmentation in improving mode performance has been well-documented in the Mixture-of-Experts (MoE) literature (Dai et al. 2024; A. Yang et al. 2024). In this work, we explore the potential advantage of applying a similar fine-grained segmentation technique to MoBA. MoBA, inspired by MoE, operates segmentation along the context-length dimension rather than the FFN intermediate hidden dimension. Therefore our investigation aims to determine if MoBA can also benefit when we partition the context into blocks with a finer grain.

And then 10 more tricks by shorter-range combinatorial noticing of redundancies, similarities, affinities. It doesn't look like much, but three papers later you see a qualitative, lifelike evolution of the whole stack, and you notice this research program is moving very quickly. They do likewise in large hardware projects.

I have Chinese friends. I have read a lot of papers and repositories and watched as research programs developed, yes, sorry to bash your hopes. I have played their games, consumed their media, used their technology, acquainted myself with their tradition a little. I have considered the work of the allegedly greatest Chinese mathematician, Terence Tao, and his style of work. And there is the oft-repeated thesis that Asians tend towards holistic rather than analytical thinking which is exactly about the bias in exploration style I'm talking about.

I am interested in whether you find this an impoverished or wrong perspective.

3

Context

DaseindustriesLtd late version of a small language model 25d ago

It's hard to account for human factor. Xi could just suddenly go senile and enact the sort of policies they predict, for example. Americans elected a senile president and then changed him for a tried-and-true retard with a chip on his shoulder who surrounded himself with ineffectual yes-men. That's history.

Technical directions are more reliable and are telegraphed years in advance.

Chain-of-thought is 2020 4chan tech. In 2020 also, Leo Gao wrote:

A world model alone does not an agent make, though.[4] So what does it take to make a world model into an agent? Well, first off we need a goal, such as “maximize number of paperclips”.

So now, to estimate the state-action value of any action, we can simply do Monte Carlo Tree Search to estimate the state-action values! Starting from a given agent state, we can roll out sequences of actions using the world model. By integrating over all rollouts, we can know how much future expected reward the agent can expect to get for each action it considers.

Altogether, this gets us a system where we can pass observations from the outside world in, spend some time thinking about what to do, and output an action in natural language.

Another way to look at this is at cherrypicking. Most impressive demos of GPT-3 where it displays impressive knowledge of the world are cherrypicked, but what that tells us is that the model needs to improve by approx log2(N)/Llog2(N)/L bits, where N and L are the number of cherrypickings necessary and the length of the generations in consideration, respectively, to reach that level of quality. In other words, cherrypicking provides a window into how good future models could be

The idea of inference time compute was more or less obvious since GPT-3 tech report aka “Language Models are Few-Shot Learners”, 2019. Transformers (2017) are inherently self-conditioning, and thus potentially self-correcting machines. LeCun's Cake, aka unsupervised (then after Transformers, self-supervised) learning - Supervised – RL "cherry" is NIPS 2016. AlphaGo is 2015. And so on. I'm not even touching older RL work from Sutton or Hutter.

So in retrospect, it was more or less clear that we will have to

pretrain strong models with innately high or increased via post-training and synthetic data chain of thought capability
get a source of verifiable rewards and pick some RL algorithm and method
sample a lot of trajectories and propagate updates such that the likelihood of correct answers increases

Figuring out details took years though. Process reward models, MCTS have wasted a lot of brain cycles. But perhaps they could have worked too, we just found an easier way with another branch of this tech tree.

In this context, I find details of his predictions disappointing. The search space was narrowed enough that for someone in the know and trying to actually do a technically informed forecast could have done about as well as he did by semi-random guessing of buzzwords.

It's quite arrogant to say so without having written a better prediction (I predicted the chip war around 2020 too, but my guess was that we'd go way higher with way sparser models, a la WuDao, earlier). But this is just a low bar for claiming prescience.

3

Context

DaseindustriesLtd late version of a small language model 25d ago

Von Neumann was not a supercomputer, he was a meat human with a normalish ≈20W power consumption brain, ie 1/40th of a modern GPU. This is proof that if you can emulate an idiot, there exists an algorithm of a very similar computation intensity that gets you a Von Neumann.

17

Context

DaseindustriesLtd late version of a small language model 25d ago · Edited 25d ago

There are some problems with AI-2027. And the main argument for taking it seriously, Kokotaljo's prediction track record, given that he's been in the ratsphere at the start of the scaling revolution, is not so impressive to me. What does he say concretely?

Right from the start:

2022

GPT-3 is finally obsolete. OpenAI, Google, Facebook, and DeepMind all have gigantic multimodal transformers, similar in size to GPT-3 but trained on images, video, maybe audio too, and generally higher-quality data. … Thanks to the multimodal pre-training and the fine-tuning, the models of 2022 make GPT-3 look like GPT-1.

In reality: by August 2022, GPT-4 finished pretraining (and became available only on March 14, 2023), it used only images, with what we today understand was a crappy encoder like CLIP and projection layer bottleneck, and the main model was pretrained on pure text still. There was no – zero – multimodal transfer, look up the tech report. GPT with vision only really became available by November 2023. The first seriously, natively multimodal-pretrained model is 4o which debuted in Spring 2024. Facebook was nowhere to be seen and only reached some crappy multimodality in production model by Sep 25, 2024. “bureaucracies/apps available in 2022” also didn't happen in any meaningful sense. So far, not terrible, but keep it in mind; there's a tendency to correct for conservatism in AI progress, because prediction markets tend to overestimate difficulty of some benchmark milestones, and here I think the opposite happens.

2023

The multimodal transformers are now even bigger; the biggest are about half a trillion parameters, costing hundreds of millions of dollars to train, and a whole year

Again, nothing of the sort happened, the guy is just rehashing Yud's paranoid tropes that have more similarity to Cold War era unactualized doctrines than any real world business processes. GPT-4 was on the order of $30M–$100M, took like 4 months, and was by far the biggest training run of 2022-early 2023, it was a giant MoE (I guess he didn't know about MoEs then, even though Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer is from 2017, same year as Transformer, from an all-star DM team; incidentally the first giant sparse Chinese MoE was WuDao, announced on January 11, 2021, it was dirt cheap and actually pretrained on images and text).

Notice the absence of Anthropic or China in any of this.

2024 We don’t see anything substantially bigger. Corps spend their money fine-tuning and distilling and playing around with their models, rather than training new or bigger ones. (So, the most compute spent on a single training run is something like 5x10^25 FLOPs.)

By the end of 2024, models were in training or pre-deployment testing that exceeded 3e26 FLOPs, and it still didn't reach $100M of compute because compute has been getting cheaper. GPT-4 is like 2e25.

This chip battle isn’t really slowing down overall hardware progress much. Part of the reason behind the lack-of-slowdown is that AI is now being used to design chips, meaning that it takes less human talent and time, meaning the barriers to entry are lower.

I am not sure what he had in mind in this whole section on chip wars. China can't meaningfully retaliate except by controlling exports of rate earths. Huawei was never bottlenecked by chip design, they could leapfrog Nvidia with human engineering alone if Uncle Sam let them in 2020. There have been no noteworthy new players in fabless and none of new players used AI.

That’s all in the West. In China and various other parts of the world, AI-persuasion/propaganda tech is being pursued and deployed with more gusto

None of this happened, in fact China has rolled up more stringent regulations than probably anybody to label AI-generated content and seems quite fine with its archaic methods.

2025

Another major milestone! After years of tinkering and incremental progress, AIs can now play Diplomacy as well as human experts.[6] It turns out that with some tweaks to the architecture, you can take a giant pre-trained multimodal transformer and then use it as a component in a larger system, a bureaucracy but with lots of learned neural net components instead of pure prompt programming, and then fine-tune the whole system via RL to get good at tasks in a sort of agentic way. They keep it from overfitting to other AIs by having it also play large numbers of humans. To do this they had to build a slick online diplomacy website to attract a large playerbase. Diplomacy is experiencing a revival…

This is not at all what we ended up doing, this is a cringe Lesswronger's idea of a way to build a reasoning agent that has intuitive potential for misalignment and adversarial manipulative stance towards humans. I think Noam Brown's Diplomacy work was mostly thrown out and we returned to AlphaGo style of simple RL with verifiable rewards from math and code execution, as explained by DeepSeek in R1 paper. This happened in early 2023, and reached product stage by Sep 2024.

We've caught up. I think none of this looks more impressive in retrospect than typical futurism, given the short time horizon. It's just “here are some things I've read about in popular reporting on AI research, and somewhere in the next 5 years a bunch of them will happen in some kind of order”. Multimodality, agents – that's all very generic. “bureaucracies” still didn't happen, this looks like some ngmi CYC nonsense, but coding assistants did. Adversarial games had no relevance; annotation for RLHF, and then pure RL – had. It appears to me that he was never really fascinated by the tech as such, only by its application to the rationalist discourse. Indeed:

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI.

OK.

Now as for the 2027 version, they've put in a lot more work (by the way Lifland has a lackluster track record with his AI outcomes modeling I think, and also depends in his sources on Kotra who just makes shit up). And I think it's even less impressive. It stubbornly, bitterly refuses to update on deviations from the Prophecy that have been happening.

First, they do not update on the underrated insight by de Gaulle: “China is a big country, inhabited by many Chinese.” I think, and have argued before, that by now Orientals have a substantial edge in research talent. One can continue coping about their inferior, uninventive ways, but honestly I'm done with this, it's just embarrassing kanging and makes White (and Jewish) people who do it look like bitter Arab, Indian or Black Supremacists to me. Sure, they have a different cognitive style centered on iterative optimization and synergizing local techniques, but this style just so happens to translate very well into rapidly improving algorithms and systems. And it scales! Oh, it scales well with educated population size, so long as it can be employed. I've written on the rise of their domestic research enough in my previous unpopular long posts. Be that as it may, China is very happy right now with the way its system is working, with half a dozen intensely talented teams competing and building on each other's work in the open, educating the even bigger next crop of geniuses, maybe 1 OOM larger than the comparable tier graduating American institutions this year (and thanks to Trump and other unrelated factors, most of them can be expected to voluntarily stay home this time). Smushing agile startups into a big, corrupt, centralized SOE is NOT how “CCP wakes up”, it's how it goes back to its Maoist sleep. They have a system of distributing state-owned compute to companies and institutions and will keep it running but that's about it.

And they are already mostly aware of the object level; they just don't agree with Lesswong analysis. Being Marxists, they firmly believe that what decides victory is primarily material forces of production, and that's kind of their forte. No matter what wordcels imagine about Godlike powers of brains in a box in a basement, intelligence has to cash out into actions to have effect on the world. So! Automated manufacturing, you say? They're having a humanoid robot half-marathon in… today I think, there's a ton of effort going into general and specialized automation and indinegizing every part of the robotic supply chain, on China scale that we know from their EV expansion. Automated R&D? They indinegize production of laboratory equipment and fill facilities. Automated governance? Their state departments compete in integration of R1 already. They're setting up everything that's needed for speedy takeoff even if their moment comes a bit later. What does the US do? Flail around with alienating Europeans and vague dreams of bringing 1950s back?

More importantly, the authors completely discard the problem that this work is happening in the open. This is a torpedo into Lesswrongian doctrine of an all-conquering singleton. If the world is populated by a great number of private actors with even subpar autonomous agents serving them, this is a complex world to take over! In fact it may be chaotic enough to erase any amount of intelligence advantage, just like longer horizon on weather prediciton sends the most advanced algorithms and models to the same level as simple heuristics.

Further, the promise of the reasoning paradigm is that intrinsically dumber agents can overcome problems of the same difficulty as top-of-the-line ones, provided enough inference compute. This blunts the edge of actors with the capital and know-how for larger training runs, reducing this to the question of logistics, trading electricity and amortized compute cost for outcomes. And importantly, this commoditization may erase the capital that “OpenBrain” can raise for its ambition. How much value will the wealthy of the world part with to have stake in the world's most impressive model for a whole of 3 months or even weeks? What does it buy them? Would it not make more sense to buy or rent their own hardware, download DeepSeek V4/R2 and use the conveniently included scripts to calibrate it for running your business? Or is the idea here that OpenBrain's product is so crushingly superior that it will be raking billions and soon trillions in inference, despite us seeing already that inference prices are cratering even as zero-shot solution rates increase? Just how much money is there to be made in centralized AI, when AI has become a common utility? I know that not so long ago the richest guy in China was selling bottled water, but…

Basically, I find this text lacking both as a forecast, and on its own terms as a call to action to minimize AI risks. We likely won't have a singleton, we'll have a very contested information space, ironically closer to the end of Kokotaljo's original report, but even more so. This theory of a transition point to ASI that allows to rapidly gain durable advantage is pretty suspect. They should take the L on old rationalist narratives and figure out how to help our world better.

30

Context

DaseindustriesLtd late version of a small language model 27d ago

I can list a number of more serious cases of brain drain, though they have nothing to do with DOGE. For example, Dr. Wu Yonghui, former Vice President of Google DeepMind, «has joined ByteDance as the head of foundational research for its large model team, Seed, according to Chinese media outlet, Jiemian.» That was around January. By now, they've created a model Seed-Thinking-v1.5 that's on par or better than DeepSeek R1 with 2x fewer activated parameters and 3.5x smaller, trained in a significantly more mature way, here's the tech report; they have the greatest stash of compute in Asia and will accelerate from now.

That's off the top of my head because I've just read the report. But from personal communication, a great ton of very strong Chinese are not coming anymore, and many are going back, due to the racism of this admin, general sense of meh that the American culture and way of life increasingly evoke, and simply because China can offer better deals now – in terms of cost of living, public safety, infrastructure, and obvious personal affinities. This isn't like the previous decade where only ancient academics retired to teach in Tsinghua or whatever, these are brilliant researchers in their prime, carrying your global leadership on their shoulders.

If I were American, that'd worry me a lot.

5

Context

DaseindustriesLtd late version of a small language model 27d ago

Godspeed! More wins to come then.

2

Context

DaseindustriesLtd late version of a small language model 27d ago

If you mean civilians only, then yes. But according to the US and Israel messaging, Palestinians are ontologically incapable of being civilians, so it's a wash.

5

Context

DaseindustriesLtd late version of a small language model 27d ago

The problem is that you consume too much neocon/Zionist propaganda from trash like Zenz. The reporting bias may actually run in the other direction. Xinjiang today is peaceful and Uighurs are beneficiaries of strong labor laws and affirmative action. Western tourists can visit it, Americans marry Uighur people, economy is booming, infrastructure is being built… Uighurs are still the majority and will likely remain the majority because there's a finite and dwindling supply of Han people in China. Whatever has happened there during the heavy enforcement and «reeducation» period, has ended with a state of affairs both parties can at least survive without bloodshed. This is not an endorsement of what has been done. This is a point of comparison.

Meanwhile Gaza is a smoldering ruin with casualties on par with Russia-Ukraine war, and Israel is negotiating for a thorough ethnic cleansing, while the fighting goes on.

No matter how you look at it, Israelis have been extraordinarily brutal and inefficient at that. It's like saying Russia has shown exemplary discipline in Chechnya, any nation would do the same in its position. No we haven't, it was a shitshow (and ended in humiliation of handing it over to Kadyrov).

8

Context

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats

DaseindustriesLtd

DaseindustriesLtd

Why DeepSeek

About People

About Organization