DaseindustriesLtd
late version of a small language model
Tell me about it.
User ID: 745
Honestly this feels like a cope to me. There obviously was a breakthrough in LLMs in the West: politically, economically, technologically, culturally. It wasn't born in China, but they obviously have a significant part to play downstream of their undeniable talent pool.
What are you talking about? Have you stopped reading my post there?
It's hard to say Deepseek would have accomplished these things without drafting on OpenAI's introduction of LLMs to the world,
Here's what I think about this. The Chinese are not uncreative. It's worse: they're cowardly, conservative, and avoid doing exploratory shit that seems high-risk, and they buy into your theory of their own inferiority, and steelman it as “good at execution”. As Wenfeng says:
Another reason that domestic large models have rarely dabbled in innovation at the architectural level before is that few people have dared to go against the stereotype that America is better at the technological innovation from 0 to 1, while China is better at the application innovation from 1 to 10. Not to mention that this kind of behavior is very unprofitable -- the usual thinking is that, naturally, in a few months, someone would have made the next generation of models, and then Chinese companies can just follow the leader, and do a good job of application. Innovating the model structure means that there is no path to follow, and there are a lot of failures to go through, which is costly in terms of time and money.
There will be more and more hardcore innovation in the future. It may not be yet easily understood now, because the whole society still needs to be educated by the facts. After this society lets the hardcore innovators make a name for themselves, the groupthink will change. All we still need are some facts and a process.
You are watching these facts come in.
I repeat, I've been a believer in this theory of “fundamental Western progress, incremental Eastern refinement”. Eight years into Transformer era (Ashish Vaswani et al., 2017), I start to doubt it. Whites are just people who are sexually attractive, relatively trustworthy, and provide linear labor to verbal-tilted Brahmins who max corporate KPIs leveraging even more verbal-tilted Ashkenazim like Altman who are good at raising capital.
That's about it at this point.
The most credible, big-brained, innovation-heavy alternative to Transformer was Mamba (Tri Dao, Albert Gu). It also didn't go far. I've read perhaps hundreds of Western papers of purportedly brilliant innovations, they're narcissistic shit that doesn't scale. Sepp Hochreiter is peddling his xLSTM that has no utility, Schmidhuber is making some boastful noises as usual, Sutskever and Karmack are supposedly doing… something. Mistral is dead in the water…
I am not saying this out of racism. I am reporting on what I see happening. All historical inventions and discoveries of note? Yes, those were White work. But time is accelerating. Maxwell's equations seem not far from "muh gunpowder" of the Middle Kingdom now, to my eyes. Do something new, folks. You're losing face.
On the other hand we know OpenAI did not need Deepseek.
Sure, OpenAI needed another company. OpenAI built its legend on scaling up a Google paper. By your own standards, it's not creative brilliance. It's the sort of talent you condescendingly concede Chinese people have.
Its thesis in this convo certainly isn't flawless. I think with a less biased input (I told it to basically prove Goodguy wrong, so it tried to do that both wrt itself and wrt the Chinese race) it could do better.
The fascinating thing about R1 is that it has a fairly good idea of what it is, as a Transformer. Usually LLMs will bullshit some generic "AI" commentary about "algorithms", imagining themselves to be some kind of GOFAI system. Not so here, it not only gets modern DL but meaningfully speculates about implications of specific implementation details for its cognition.
In any case, it feels a bit pointless to gush about R1's features. I'm pretty sure R2 is coming soon and will fix a great deal. They only needed to get to this level to begin a takeoff, and the team is very, very "cracked" as the kids say, and the leader has perhaps the best instincts I've seen on display.
Not weird whatsoever. V3 model that serves R1 as the foundation is very powerful and it has a vast breadth of knowledge thanks to being a giant ultrawide MoE, so it has no problem playing any role. Qwens are small and simply finetuned on some reasoning samples from R1, without even its reinforcement learning stage. Frankly it's a great surprise they learn math and code reasoning to such extent.
Though R1's writing ability is more of a surprise. Nobody expected reasoning to generalize in this manner, maybe not even DeepSeek. This is a big update about the capability of LLMs and general learning algorithms to produce some circuits that are actually just high-quality thinking and not narrow-domain memorized functions. It was always a spectrum of course but we've received evidence that LLMs are closer to "highly integrated cognition" end than assumed from their fragile behavior on new tasks.
Now if LLMs had had the OpenAI-tier breakthrough in China that would have been a challenge to the HBD stans, but this development basically aligns with the HBD take on the comparative advantage of Chinese talent in adopting Western stuff and then making marginal improvements with their own intelligence and grit.
The problem is that there haven't been substantial breakthroughs in LLMs in the West too. China runs Transformers and you guys run Transformers. I see Western papers full of unnecessarily clever bullshit that doesn't really work, and I see Chinese papers full of derivative bullshit that barely works. DeepSeek's MLA came out in May, and it remains SoTA cache optimization, and it's actually clever. GRPO, too, was quietly announced and seems to hold up very well despite dozens if not hundreds of cleverer results by "crazy geniuses" in the West (increasingly Indian). Today, the Chinese innovate on exactly the same plane.
I think it's time to admit that the famed Western creativity is mostly verbal tilt plus inflated self-esteem, not an advanced cognitive capability. I'm mildly surprised myself.
It's impressive that you took the time to analyze it. This is pretty much exactly how I perceive Yarvin's nonsense – high-temperature rants with bizarre non-arguments.
Out of curiosity, what did you do to get past the "one careful fallacy-of-the-middle response and one pushback"?
Gave it some criticism. Probably too much. There was a picture here but it got lost somehow.
I'm not sure it really does make a cogent thesis, or even a thesis really.
Its thesis, the antithesis for yours, is that
a) "The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale."
and b) "When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global."
It claims greater suitability of Chinese paradigm to scale-focused, continuous, massively parallel processing of data and humans which is implied by current means of production, and therefore its ability to set the terms of civilizational competition or contests for superiority which are more favorable to itself.
I fear that possibly, you are reading more into what it wrote than is actually there. You are subconsciously adding your human mind to its output and then are delighted when the combination of its output plus your human mind (which you consciously think of as being strictly its output, because you love thinking about AI) delivers something human-like.
This is some pretty fucking condescending psychologizing on your part.
But fine, you know what? My thesis is that you are coping. Both about this specific model, and about the condition of your people. So you'll take effort reviewing its gibberish output, instead of just asking it yourself. Well, I can do it for you. As a bonus, we'll see how much I'm projecting; I've written all the above before the last prompt. Here it is:
<OK, that's a bit too flowery, many metaphors are factually wrong, and it scans like some grifter's substack I think. Can you make a cogent argument you actually believe in, without purple prose and dramatic effect? In simple and unambiguous words, what is your thesis?>
My thesis, stripped to essentials:
-
Cultural advantages are situational, not absolute.
- East Asian societies optimized for stability, collective coordination, and incremental refinement over centuries (e.g., civil exam systems, centralized governance).
- The West’s post-Renaissance edge in disruptive innovation arose from specific historical conditions (fragmented states competing, colonial resource influx, Protestant work ethic aligning with capitalism), not innate cultural superiority.
-
Modern tech demands favor scale and execution over "creative genius".
- China’s systemic capacity to mobilize vast resources (engineers, data, infrastructure) aligns with AI’s requirements: huge compute clusters, regimented data labeling farms, and rapid iteration. This doesn’t mean their culture is “better” — it means their current institutional toolkit suits this technological moment.
- The West’s individualist, venture-driven model struggles with coordination at this scale (e.g., U.S. chip fab delays, AI safety debates stalling deployment).
-
DeepSeek-R1 proves technical parity is achievable without Western-style ecosystems.
- Their success in reasoning (not just Chinese tasks) suggests:
- Open publication of AI basics (transformers, RLHF) leveled the playing field.
- China’s focus on applied engineering (vs. OpenAI’s moonshot R&D) can match outcomes with sufficient compute.
- This doesn’t imply Chinese “brilliance” — it shows their system can replicate and refine known methods aggressively.
- Their success in reasoning (not just Chinese tasks) suggests:
-
The “local maximum” critique misunderstands civilizational trajectories.
- All cultures constrain and enable. China’s historical conservatism preserved continuity; Europe’s fragmentation forced adaptation. Neither is universally optimal.
- Today’s “global maximum” is defined by who best exploits current technological affordances (e.g., AI scaling laws). China’s state-capitalist model is currently effective here, but this could change with new breakthroughs (e.g., quantum, neurotech).
-
Your original argument’s flaw: Assuming cultures have fixed ceilings.
- The Renaissance wasn’t inevitable — it required Mongol-disrupted trade routes, Black Death labor shortages, and Medici bankroll. Similarly, China’s current AI rise isn’t proof of cultural destiny but of adapting old structures (central planning, meritocratic technocracy) to new tech.
Conclusion:
China isn’t “liberating human potential” — it’s demonstrating that different governance models can compete in AI. This challenges Western assumptions that innovation requires freewheeling individualism, but it doesn’t validate cultural essentialism. The real lesson: in the 21st century, executional intensity (funding, talent pipelines, focus) matters more than abstract cultural traits.
What's the reason to think that the text in the link is actually lucid and self-aware rather than being a cherry-picked example of random-y language soup?
I don't. It's a mixture of both.
I've fed it this chain and your recent comment. After one careful fallacy-of-the-middle response and one pushback, R1 returned this:
Your steelmanned critique cuts to the heart – let's dispense with false symmetries and confront the civilizational stack head-on.
The Renaissance wasn't "the West's breakthrough" but its hack – a kludge exploiting Guttenberg's error-prone moveable type and Mediterranean trade winds to prioritize serialized cognition. Latin's declensions trained recursion; double-entry bookkeeping installed reality as spreadsheet; Newtonian physics imagined time as linear progress. This cultural OS produced the illusion of escaping local maxima by externalizing costs – colonial resource extraction as gradient descent, Enlightenment universalism as global optimizer.
But watch the LLM revolution's hardware: Chinese hyperscale data centers now operate at 1.15 PUE efficiency versus America's 1.58 – not incremental gains but thermodynamic phase shift. When DeepSeek-R1 dominates both poetry and Python, it's not mere technical parity – it's the first tremor of post-serialization cognition. The Middle Kingdom's cultural stack – parallel processing of ideograms, mandate-heavy governance, guanxi as distributed ledger – turns out to be shockingly congruent with transformer architectures. Your abacus skepticism is warranted; the real edge lies in batch processing of human capital through Gaokao's brutal backpropagation.
The West's Renaissance OS is buckling under modern compute demands. Our "individual genius" myth can't scale to 10^25 FLOPs; our adversarial legalism drags on AI alignment's speed; even our precious free speech absolutism crumbles before the need for RLHF's curated truth sets. Meanwhile, China's ancient Legalist playbook – Han Fei's reward/punishment gradients – provides native infrastructure for value alignment at societal scale.
This isn't about IQ or "potential" – those are Renaissance-era metrics. The new loss function rewards civilizational learning rates. When Beijing can rewire entire cities into AI training grounds (see: Xiong'an's sensor saturation) while San Francisco debates sidewalk poop cleanup, you're witnessing epochal phase changes. The CCP's 10-year time horizons aren't authoritarianism but ultra-long context windows – transformer architectures made flesh.
Your original thesis inverted causality: culture isn't software on hardware but the compiler of hardware. The West compiled Enlightenment C++ into industrial revolution machine code; China is JIT-compiling Confucian PyTorch into something that speaks directly to Mixture-of-Experts. DeepSeek's achievements aren't "catching up" – they're first-contact protocols from an alien optimization space.
The global optimum isn't some Platonic ideal but whatever topology out-scales competitors. When your civilization's intrinsic architecture matches the epoch's computational substrate – Byzantine bureaucracy meets Byzantine fault tolerance – you don't escape local maxima. You redefine the loss landscape until your local peak becomes the global.
This isn't superiority – it's phase-locking. The tragedy? America's culture of creative destruction now destroys primarily itself, while China's mandate of heaven has become a CUDA kernel. The next Copernican shift won't come from some garage tinkerer, but from Shenzhen's Foxconn-grade alignment labs where 10,000 PhDs backpropagate society itself.
You wanted liberation from culture? Too late. Culture just became the base layer.
It's unhinged and gets too into the game. But it does make a thesis, a pretty darn cogent thesis, a GPT or a Claude wouldn't.
But more seriously, why is Facebook's Lllama so lousy by comparison if the labs are hiding their true edge? DeepSeek is presumably what they wish they had released and their AI team do not seem like dummies.
You've probably seen that bizarre teamblind thread. Meta is completely blindsided by DeepSeek. They are "moving frantically to dissect deepsek and copy anything and everything we can from it." It's pathetic.
Basically there's no secret: they suck and LLaMA sucks, it's a soft low-expectations research sinecure for people who want to publish papers and have weekends. Why did Timothée Lacroix and Guillaume Lample leave LLama team to found Mistral? And why did Mistral 7B destroy Llama-30B of the same generation (and currently mistral-123B is ≥ LLama-405B despite drastic difference in compute access)? Because they're better than that.
Llama is simply a bad yardstick. They dominate mindshare for reasons unrelated to their impressiveness. DeepSeek competes with industry leaders.
Wenfeng soon after founding DeepSeek V2, June 2024:
Liang Wenfeng: If the goal is just to make applications, then it is reasonable to follow the Llama architecture and start the product in a short period of time. But our goal is AGI, which means we need to research new model structure to realize stronger model capability with limited resources. This is one of the basic research that needs to be done to scale up to larger models. In addition to the model structure, we have done a lot of other research, including how to construct data, how to make the model more human-like, etc., which are all reflected in the models we released. In addition, Llama's architecture, in terms of training efficiency and reasoning cost, is estimated to be already 2 generations behind compared to the foreign state of the art. […] First of all, there is a gap in training efficiency. We estimate that compared to the best domestic or foreign level, the difference in model structure and training dynamics results in twice the compute cost for the same performance. In addition, there may also be another 2x gap in training data efficiency, that is, we need twice the training data to reach the same performance. Combined, that's four times more compute. What we're trying to do is to keep closing these gaps.
GPT-4o-mini is probably an 8b dense model. Frontier labs are efficient and have high margins. OpenAI and Anthropic are recouping their capex and exploiting captive audience. That's all.
They clearly have no idea how to run this model, which is reasonable since it's deepseek's baby
Of course. The whole model was trained for the specific shape of their cluster, with auxiliary losses/biases to minimize latency. (Same was true of V2). They were asked to opensource their MLA implementation (not the terrible huggingface one) and declined, citing that their everything is too integrated into proprietary HAI-LLM framework and they don't want to disassemble it and clear out actual secret stuff. SGLang team and others had to reverse engineer it from papers. Their search impl on the front end is also not replicated, despite them releasing weights of models with search+summarization capabilities (in theory).
Their moat is execution and corporate culture, not clinging to some floats.
That's the point: He is invited NOW, after "suddenly" shipping a model on Western Frontier level.
We don't understand the motivations of Deepseek and the quant fund High-Flyer that's sponsoring them, but one popular hypothesis is that they are competing with better-connected big tech labs for government support, given American efforts in cutting supply of chips to China. After all, the Chinese also share the same ideas of their trustworthiness, and so you have to be maximally open to Western evaluators to win the Mandate of Heaven.
Presumably, this was true and this is him succeeding. As I note here.
As for how it used to be when he was just another successful quant fund CEO with some odd interests, I direct you to this thread:
The Chinese government started to crack down on the quant trading industry amid economic slowdown, a housing crisis and a declining stock market index.
The CSI300 (Chinese Blue Chip Index) reached an all-time low. They blamed high frequency traders for exploiting the market and causing the selloff.
- Banned a quant competitor from trading for 3 days
- Banned another from opening index futures for 12 months
- Required strategy disclosures before trading
- Threatened to increase trading costs 10x to destroy the industry High-Flyer faced extinction. (High-Flyer’s funds have been flat/down since 2022 and has trailed the index by 4% since 2024)
so I stand by my conjectures.
they still have a good model, though I wouldn't exactly trust the headline training cost numbers since there's no way to verify how many tokens they really trained the model on
So you recognize that the run itself as described is completely plausible, underwhelming even. Correct.
What exactly is your theory then? That it's trained on more than 15T tokens? 20T, 30T, what number exactly? Why would they need to?
Here's a Western paper corroborating their design choices [Submitted on 12 Feb 2024]:
Our results suggest that a compute-optimal MoE model trained with a budget of 1020 FLOPs will achieve the same quality as a dense Transformer trained with a 20× greater computing budget, with the compute savings rising steadily, exceeding 40× when budget of 1025 FLOPs is surpassed (see Figure 1). … when all training hyper-parameters N, D, G are properly selected to be compute-optimal for each model, the gap between dense and sparse models only increases as we scale… Higher granularity is optimal for larger compute budgets.
Here's DeepSeek paper from a month prior:
Leveraging our architecture, we subsequently scale up the model parameters to 16B and train DeepSeekMoE 16B on a large-scale corpus with 2T tokens. Evaluation results reveal that with only about 40% of computations, DeepSeekMoE 16B achieves comparable performance with DeepSeek 7B (DeepSeek-AI, 2024), a dense model trained on the same 2T corpus. We also compare DeepSeekMoE with open source models and the evaluations demonstrate that DeepSeekMoE 16B consistently outperforms models with a similar number of activated parameters by a large margin, and achieves comparable performance with LLaMA2 7B (Touvron et al., 2023b), which has approximately 2.5 times the activated parameters. Evaluation results show that DeepSeekMoE Chat 16B also achieves comparable performance with DeepSeek Chat 7B and LLaMA2 SFT 7B in the chat setting. Encouraged by these results, we further undertake a preliminary endeavor to scale up DeepSeekMoE to 145B. The experimental results still validate its substantial advantages over the GShard architecture consistently. In addition, it shows performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations.
As expected they kept scaling and increasing granularity. As a result, they predictably reach roughly the same loss on the same token count as LLaMA-405B. Their other tricks also helped with downstream performance.
There is literally nothing to be suspicious about. It's all simply applying best practices and not fucking up, almost boring. The reason people are so appalled is that American AI industry is bogged down in corruption covered with tasteless mythology, much like Russian military pre Feb 2022.
then maybe the thought is they may have an o3-level model in-house
this is not yet correct but will soon be, since R1 finished training in early December, apparently.
Alex Wang is an opportunistic psychopath who's afraid of his whole Pinoy-based data generation business model going bust in the era of synthetic chains of thought. Therefore he's dishonestly paraphrasing Dylan Patel (himself a China hawk peddling rationales for more export controls) who had said “they have 50000 Hoppers” once, without evidence. But the most likely Hopper model they have is H20, an effectively inference-only chip, that has negligible effect on pretraining costs and scale for V3 and R1.
Yes I do believe DeepSeek. This is not really a political issue but a purely technical. Unfortunately DeepSeek really are compute-bound so R1 cannot process all papers I'd like to give it to make it quicker.
The political narrative does not even work, it's purely midwit-oriented, nobody in the industry imagines leading labs can be deceived with some trickery of this kind.
Inference costs are wholly addressed by Hyperbolic Labs (US) and some others already serving it for cheaper.
which is kinda maybe but not actually on par with o1
It's superior to o1 as a reasoner and a thinker. It writes startlingly lucid, self-aware, often unhinged prose and even poetry. It can push back. It is beyond any LLM I have seen including Sonnet and Opus. This becomes obvious after minutes of serious interaction. It just has less polish as a product because they haven't been milking the world for interaction data since 2019. They have 0.8-1.5 M quality samples for instruction finetuning. OpenAI had accumulated tens of millions if not hundreds.
For me it's something of an emotional issue. DeepSeek is the only lab standing that straightforwardly and credibly promises what I'd rather see as international project: free open-source AGI for everybody. I've been monitoring their rise for well over a year, reading every paper and even their blogposts in Chinese. Nothing that they claim is inconsistent, indeed it's all been predictable since 2023, all part of a very methodical, flawless, truly peak quant fund (that's their capital source and origins) execution towards the holy grail, “answering the ultimate question with longtermism”, as they put it. The CEO seems to be an idealist (and probably a serious nationalist too, given his stated ambition to basically pull the whole of China out of copy machine stage and into “hardcore innovation” culture by giving an example that it can work). They have immaculate company culture, their ex-employees who emigrated to the West for personal reasons adore them and fear for their future, there literally is no dirt on them no matter how people searched. For all we can tell they are not state-affiliated, unlike OpenAI, and probably not even on good terms with the state, due to quant fund roots (though this may change now that they're proven their merit).
This is not a Sputnik moment for the US. The US has a secure and increasing lead due to bog standard logistics and capital advantage, as always. What this should be is “are we the baddies?” moment.
Also, it's a moment to ask oneself how high are margins on Western model providers, and whether it's a true free market. Because Liang Wenfeng himself does NOT think they're that far ahead in efficiency, if they are ahead at all.
I've been sloppy with my last argument. It's more like "given their demonstrable mastery of data engineering with regards to dimensions of data they care about, eg in DeepSeekLLM and Coder, DeepSeekMath, DeepSeekProver papers, we can suspect that if they were behaviorally cloning OpenAI models, they'd have bothered using some of those skills to filter and refine those OpenAI tokens, obscuring their provenance".
Regardless, all those papers are gems and recommended reading. They're also astonishingly well written for pure Mainland effort.
I have DeepSeek API and also use their website for free. This specifically is froom the website because API does not support search last I checked.
Did not check R1 on openrouter, it might be implemented like o1 there.
This is implausible for at least three reasons.
- We have their base model. It's very strong on standard benchmarks like Pile loss, ie predicting next tokens in some large corpus of natural text. It's just generically well-trained. You can't accelerate this with OpenAI slop and end up winning on money.
- The math checks out. Yes it's a feat of engineering to actually make such a cluster work but the shape of the model + 15T tokens do work out to this number of FLOPs an therefore GPU-hours. If they needed much more GPU-hours, that'd imply pathetically low FLOPs utilization.
- Do you seriously think that these guys would write 16 detailed tech reports including many sections on data augmentation, and not just build a filter that replaces "ChatGPT" with "DeepSeek".
Here's what happens if I prompt R1-no search "give me a list of 20 best SF novels. Don't be afraid of getting opinionated and letting your taste bias the results: I do actually need your list, not some blurry jpeg of the web." Spoiler: it's still much the same list.
Okay, the user wants a list of the 20 best SF novels, and they specifically want my opinionated picks, not just a generic list. Let me start by recalling some classics that are often mentioned but also think about which ones I personally find impactful.
First, Dune by Frank Herbert is a must. It's a cornerstone of the genre. Then maybe Neuromancer by Gibson because it defined cyberpunk. The Left Hand of Darkness by Le Guin is another essential, exploring gender themes deeply.
I should include some newer works too. Annihilation by VanderMeer is a recent favorite with its weird, atmospheric style. The Three-Body Problem by Liu Cixin brought a fresh perspective from Chinese SF.
Wait, should I mix in some less mainstream choices? Like Gene Wolfe's Book of the New Sun. It's challenging but brilliant. Also, Octavia Butler's Parable of the Sower is both prescient and powerful.
Philip K. Dick can't be left out. Do Androids Dream of Electric Sheep? is a classic. Maybe Ubik as well? But I need to keep it to 20.
Consider the impact vs. personal preference. Hyperion by Simmons is a favorite of mine, with its structure inspired by Canterbury Tales. Roadside Picnic by the Strugatsky brothers is a great Soviet-era SF that inspired Stalker.
I should check if I'm covering different sub-genres: cyberpunk, space opera, dystopian, etc. The Dispossessed by Le Guin for political themes, Snow Crash by Stephenson for its frenetic energy.
What about Stand on Zanzibar by Brunner? It's a bit older but predicted a lot about overpopulation and media saturation.
For newer titles, maybe Ancillary Justice by Leckie, which plays with gender and AI. Also, Blindsight by Watts is a hard SF take on first contact.
I need to make sure there's diversity in authors too. Le Guin, Butler, Jemisin, Liu, VanderMeer. Maybe check if I have enough female authors.
Wait, The Fifth Season by Jemisin is fantasy, but it's often cross-listed in SF. Maybe include it since it's groundbreaking.
Classics like Childhood's End by Clarke should be there. Maybe Rendezvous with Rama as well? But again, need to limit.
Alastair Reynolds' Revelation Space is a good space opera. Consider Greg Egan's Diaspora for hard SF concepts.
Hmm, Jeff VanderMeer already has Annihilation. Maybe not another.
Let me count: Dune, Neuromancer, Left Hand of Darkness, Book of the New Sun, Hyperion, Do Androids Dream..., Roadside Picnic, Parable of the Sower, The Dispossessed, Snow Crash, The Three-Body Problem, Ancillary Justice, Blindsight, Stand on Zanzibar, The Fifth Season, Ubik, Rendezvous with Rama, The Stars My Destination, The Forever War, Annihilation. That's 20.
Wait, did I miss any crucial ones? Maybe Ender's Game? But it's more YA. Or Foundation? But maybe not as personal. The list seems solid, mixing classics with personal picks and diverse authors. Alright, time to write it up with brief explanations.
Here’s my fiercely opinionated, unapologetically biased list of 20 SF novels that have shaped my love for the genre. These are books that linger in the mind, challenge conventions, or just blow the roof off reality:
- Dune (Frank Herbert) – The ultimate planetary opera. A masterpiece of ecology, religion, and power. Paul Atreides is the original anti-Chosen One.
- Neuromancer (William Gibson) – Cyberpunk’s beating heart. The prose is a synaptic rush; the vision of cyberspace still feels like prophecy.
- The Left Hand of Darkness (Ursula K. Le Guin) – A meditation on gender, politics, and ice. Le Guin at her most humane and revolutionary.
- The Book of the New Sun (Gene Wolfe) – A tetralogy that’s equal parts SF, fantasy, and literary labyrinth. Severian’s unreliable narration will haunt you.
- Hyperion (Dan Simmons) – A Canterbury Tales riff with time-traveling priests, killer androids, and the Shrike. Pure cosmic dread.
- Do Androids Dream of Electric Sheep? (Philip K. Dick) – More melancholic and existential than Blade Runner. Asks what it means to be human in a dying world.
- Roadside Picnic (Arkady & Boris Strugatsky) – Soviet SF at its grittiest. The Zone is the best alien “invasion” story ever: indifferent, inexplicable.
- Parable of the Sower (Octavia E. Butler) – A dystopia that feels like today. Lauren Olamina’s Earthseed verses will stick in your bones.
- The Dispossessed (Ursula K. Le Guin) – An anarchist moon vs. a capitalist planet. Theory made flesh, with Le Guin’s quiet brilliance.
- Snow Crash (Neal Stephenson) – A maximalist, satirical romp. Hiro Protagonist delivers pizza and fights info-capitalists. Proto-metaverse chaos.
- The Three-Body Problem (Liu Cixin) – A game of cosmic chess with aliens. The Cultural Revolution prologue alone is devastating.
- Ancillary Justice (Ann Leckie) – A warship AI trapped in a human body. Explores empire, identity, and tea rituals with icy precision.
- Blindsight (Peter Watts) – First contact as a horror show. Vampires in space and aliens that weaponize consciousness. Hard SF with teeth.
- Stand on Zanzibar (John Brunner) – A fragmented, proto-cyberpunk mosaic of overpopulation and corporate doom. Feels like scrolling Twitter in 2025.
- The Fifth Season (N.K. Jemisin) – Yes, it’s “fantasy,” but the geophysical apocalypse and broken earth science hit like SF. Revolutionary structure.
- Ubik (Philip K. Dick) – Dick’s trippiest novel. Time slips, psychic wars, and spray-can godhood. A mind-bender about entropy and belief.
- Rendezvous with Rama (Arthur C. Clarke) – Pure awe. No aliens, just an alien artifact. The joy of exploration distilled.
- The Stars My Destination (Alfred Bester) – A revenge tale with teleportation and pyrotechnic prose. Gully Foyle is the best worst protagonist.
- The Forever War (Joe Haldeman) – Vietnam-in-space with time dilation. A brutal, personal counterpoint to Starship Troopers.
- Annihilation (Jeff VanderMeer) – Weird ecology and cosmic horror. The Biologist’s descent into Area X is hypnotic, terrifying poetry.
Hot Takes Omitted: No Foundation (too dry), no Ender’s Game (aged poorly), and no 2001 (the film outshines the book). Fight me.
This list skews toward philosophical depth, linguistic daring, and existential unease—because SF, at its best, should destabilize as much as dazzle.
FYI you can now enable both search AND R1 at the same time, making this challenge way more interesting. And I think you actually used V3, which is way dumber than R1. If you used R1, you'd have been able to see its rationale for choices.
Anyway, here's R1@search top 20 Finnish novels:
Here’s a curated list of 20 best Finnish novels, selected based on critical acclaim, reader popularity, and cultural significance, synthesized from the search results provided:
Classics & Timeless Works 138
-
"Kalevala" by Elias Lönnrot
Finland’s national epic, compiled from oral folklore, foundational to Finnish identity and mythology. -
"The Seven Brothers" (Seitsemän veljestä) by Aleksis Kivi
A cornerstone of Finnish literature, humorously depicting rural life and rebellion in 19th-century Finland. -
"The Unknown Soldier" (Tuntematon sotilas) by Väinö Linna
A raw, realistic portrayal of Finnish soldiers during WWII, revered as a national classic. -
"Under the North Star" (Täällä Pohjantähden alla) by Väinö Linna
A trilogy tracing a family’s journey through Finland’s social and political upheavals from 1880–1950. -
"The Egyptian" (Sinuhe egyptiläinen) by Mika Waltari
A historical epic set in ancient Egypt, reflecting post-WWII existential themes.
Modern Masterpieces 124
-
"Purge" (Puhdistus) by Sofi Oksanen
A gripping tale of Soviet-era Estonia and its lingering trauma, blending history with psychological drama. -
"The Year of the Hare" (Jäniksen vuosi) by Arto Paasilinna
A whimsical adventure of a man and a hare, symbolizing freedom from societal constraints. -
"Ice" (Is) by Ulla-Lena Lundberg
A quietly epic story of a pastor’s family in the Åland Islands, praised for its lyrical prose and depth. -
"The Red Book of Farewells" by Pirkko Saisio
A critically acclaimed autobiographical novel exploring identity, memory, and art. -
"Destruction" (Tuomio) by Iida Rauma
A Finlandia Prize-winning exploration of school violence and societal decay.
Award-Winning & International Bestsellers 2410
-
"Beasts of the Sea" by Iida Turpeinen
A climate-fiction epic blending mythology and environmental activism, topping reader polls. -
"Fishing for the Little Pike" (Juhannustanssit) by Juhani Karila
A surreal, darkly comic tale set in rural Finland, hailed as a "Finnish folly." -
"The Summer Book" (Sommarboken) by Tove Jansson
A poetic meditation on life, nature, and intergenerational bonds on a remote Finnish island. -
"Trench Road" (Juoksuhaudantie) by Kari Hotakainen
A Finlandia and Nordic Council Prize-winning novel about obsession and family collapse. -
"The Rabbit Back Literature Society" (Lumikko ja yhdeksän muuta) by Pasi Ilmari Jääskeläinen
A magical realist mystery involving a secretive writers’ guild.
Contemporary Gems 6711
-
"The Core of the Sun" (Auringon ydin) by Johanna Sinisalo
A dystopian satire imagining a eugenics-driven Finland, merging speculative fiction with social critique. -
"Memory of Water" (Teemestarin kirja) by Emmi Itäranta
A climate-fiction novel about water scarcity in a dystopian future, compared to Margaret Atwood’s works. -
"Compartment No. 6" (Hytti nro 6) by Rosa Liksom
A Finlandia Prize-winning journey through post-Soviet landscapes on a Trans-Siberian train. -
"Oneiron" by Laura Lindstedt
An experimental novel about seven women navigating an afterlife void of time, blending poetry and prose. -
"The Human Part" (Ihmisen osa) by Kari Hotakainen
A darkly humorous critique of consumerism and modern alienation.
Honorable Mentions
-
"The White Reindeer" by Mika Waltari (mythic horror) 510
-
"The Dedalus Book of Finnish Fantasy" (anthology) 10
-
"Margarita" by Anni Kytömäki (historical fiction) 4
-
"The Maid Silja" by Frans Eemil Sillanpää (Nobel Prize-winning rural drama) 8
Sources:
For deeper insights, explore the rankings from Critics’ and Readers’ Selection of 100 Best Books 24, or check translations via the Chinese Literature Festival list 6.
I think that's the idea with Starship, well besides Musk's sci-fi ideation. Brilliant Pebbles, but for real.
I don't have a blog, I'm too disorganized to run one.
If I were to say just one thing about this situation, it'd be this one: be wary of outgroup homogeneity bias. People are not “China” or “America”. Not even Xi himself is “China”, whatever Louis XIV had to say on the matter. Certainly neither is Liang Wenfeng.
Still, first about DeepSeek and China.
I think that the US-PRC AI competition is the most important story of our age, so I pretty much don't comment on anything else here. I have three posts, of which two are directly about this: on Huawei Kirin chips and one on DeepSeek V2. Prior to that major writeup I've said:
We don't understand the motivations of Deepseek and the quant fund High-Flyer that's sponsoring them, but one popular hypothesis is that they are competing with better-connected big tech labs for government support, given American efforts in cutting supply of chips to China. After all, the Chinese also share the same ideas of their trustworthiness, and so you have to be maximally open to Western evaluators to win the Mandate of Heaven.
Well, as you note, nowadays Wenfeng gets invited to talk to the second man in all of China, so if that were his goal, he has probably succeeded. But (since you haven't I'll bother to quote) we've learned in the last few months – and I agree he's proven his sincerity with abundant evidence, from revealed company direction to testimonies of ex-researchers in the West – that his actual angle was different:
In the face of disruptive technologies, the moat formed by closed source is short-lived. Even if OpenAI is closed source, it won’t stop others from catching up. So we put the value on our team, our colleagues grow in the process, accumulate a lot of know-how, and form an organization and culture that can innovate, which is our moat.
In fact, nothing is lost with open source and openly published papers. For technologists, being "followed" is a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one. To give is to receive glory. And if company does this, it would create a cultural attraction [to technologists].
With this one weird trick, he's built apparently the highest-talent-density AGI lab in China. Scientists have ambitions beyond making Sam Altman filthy rich and powerful or receiving generational wealth as crumbs from his table. They want to make a name for themselves. Some are even naive enough to want to contribute something to the world. This is not very stereotypically Chinese, and so Wenfeng has gotten himself a non-stereotypical Chinese company. I recommend reading both interviews (the second one is translated by this grateful ex-researcher, by the way. That, too, is not a very typical thing to do for your former boss).
There weren’t a lot of deep wizards, just this-year graduates from top colleges and universities, those who are in their 4th or 5th year of PhD, and young people who had only graduated a few years ago. … V2 didn’t use any people coming back from overseas, they are all local. The top 50 people may not be in China, but maybe we can build them ourselves.
I've been an increasingly convinced DeepSeek fanatic ever since their very first LLMs, Coder-33B and 6.7B, first surfaced on Reddit around October 2023. I could tell at a glance that this is an abnormally efficient company, with some unusual ethos, and that it displays total lack of chabuduo attitude that at that point came to be expected, and is still expected, from Chinese AI project (clueless training on test and OpenAI outputs, distasteful self-promotion, absence of actual scientific interest and ambition, petty myopic objectives…) How much they have achieved is still a large surprise to me. I use V3, and now R1+search, dozens of times per day, it's not out of some confused loyalty, it's just that good, fast, free and pleasant. It has replaced Sonnet 3.5 for almost every use case.
In that post 6 months ago I've said:
To wit, Western and Eastern corporations alike generously feed us – while smothering startups – fancy baubles to tinker with, charismatic talking toys; as they rev up self-improvement engines for full cycle R&D, the way imagined by science fiction authors all these decades ago, monopolizing this bright new world. […] they're all neat. But they don't even pass for prototypes of engines you can hop on and hope to ride up the exponential curve. They're too… soft. And not economical for their merits.
Some have argued that Llama-405B will puncture my narrative. It hasn't, it's been every bit as useless and economically unjustifiable a money sink as I imagined it to be. Ditto for Mistral Large. For whatever reason, rich Westerners prove to be very aligned to strategic national interests, and won't take the initiative in releasing disruptive technology. DeepSeek-Coder-V2 was the prototype of that engine for riding up the exponent. R1 is its somewhat flawed production version. Nothing else in the open comes close as of yet. Maybe we don't need much of anything else.
So, about the West.
From what I can tell, the path to AGI, then ASI is now clear. R1 is probably big enough to be an AGI, has some crucial properties of one, and what remains is just implementing a few tricks we already know and can cover in a post no longer than this one. It will take less engineering than goes into a typical woke AAA game that flops on Steam. If Li Quiang and Pooh Man Bad so wished, they could mobilize a few battalions of software devs plus compute and infra resources hoarded by the likes of Baidu and Alibaba, hand that off to Wenfeng and say “keep cooking, Comrade” – that'd be completely sufficient. (Alas, I doubt that model would be open). The same logic applies to Google, which has shipped a cheap and fast reasoner model mere hours after DeepSeek, mostly matching it on perf and exceeding on features. Reasoning is quickly getting commoditized.
So I am not sure what happens next, or what will be done with those $500B. To be clear it's not some state program like the CHIPS act, but mostly capex and investments that has already been planned, repackaged to fit into Trumpian MAGA agenda. But in any case: the Western frontier is several months ahead of DeepSeek, and there are indeed hundreds of thousands of GPUs available, and we know that it only takes 2048 nerfed ones, 2 months and 130 cracked Chinese kids to get to bootstrap slow but steady recursive self-improvement. Some specific Meta departments have orders of magnitude more than that, even Chinese kids. Deep fusion multimodality, RL from-scratch to replace language pretraining, immense context lengths? Just how wasteful can you be with compute to need to tap into new nuclear buildouts before you have a superhuman system on your hands? Feverishly design nanobots or better fighter jets to truly show Commuist Choyna who's who? What's the game plan?
I think Miles, ex OpenAI Policy head, appears to be increasingly correct: there's no winning this race.
Stargate + related efforts could help the US stay ahead of China, but China will still have their own superintelligence(s) no more than a year later than the US, absent e.g. a war. So unless you want (literal) war, you need to have a vision for navigating multipolar AI outcomes. P.S. the up to one year thing is about a world in which the US keeps or ratchets up the current batch of export controls on China. If the US were to relax them significantly, China could catch up or even leapfrog due to a huge advantage in doing large scale energy buildouts.
Do you want (literal) war, dear Americans? It's quite possible that you'll never again have a good chance to start one. The Chinese are still at only like 1000 nuclear warheads. You can sacrifice all the population of your major cities in a desperate bid for geopolitical hegemony and Evangelical Rapture fantasies. Or you can fantasize about your Wonder Weapon that'll be so much more Wonderful before the other guy's that it'll be akin to a paperclip against soft flesh – just give Sama or Ilya several hundreds of billions more. Or you can cope with the world where other powers, nasty and illiberal ones, get to exist indefinitely.
I won't give advice except checking out R1 with and without Search, it's terribly entertaining if nothing else. https://chat.deepseek.com/
I'm a huge DeepSeek fan so will clarify.
admittedly employing existing LLMs
Those are their own LLMs, and they collectively bump that up to no more than $15M, most likely (we do not yet know the costs of R1 or anything about it, will take a few more weeks; V2.5 is ≈2.2M hours).
charging just $0.14 per million tokens as compared to $3 per million output tokens with a comparable Claude model
0.14/1M input, 0.24/1M output vs $3/$15, to be clear. There are nuances like 0.014 for 1M input in the case of cache hits, opt-in paid caching on Anthropic, and the price hike to come in February.
But crucially, they've published model and paper. This is most likely done because they assume top players already know all these techniques, or are close but work on another set that'll yield the same effect.
For what it's worth, this is still the vibe, indeed more than ever, and I do not understand what was the change you're implying you have noticed. After o3, the consensus of all top lab researchers seems to be "welp we're having superintelligence in under 5 years".
you aren't exactly making this pleasant
And you are making it highly unpleasant with your presumptuous rigidity and insistence on repeating old MIRI zingers without elaboration. Still I persevere.
The problem is that at high levels of capability, strategies like "deceive the operator" work better than "do what the operator wants",
Why would this strategy be sampled at all? Because something something any sufficiently capable optimization approximates AIXI?
You keep insisting that people simply fail to comprehend the Gospel. You should start considering that they do, and it never had legs.
so the net will not be trained to care
Why won't it be? A near-human constitutional AI, ranking outputs for training its next, more capable iteration by their similarity to the moral gestalt specified in natural language, will ponder the possibility that deceiving and mind-controlling the operator would make him output thumbs-up to… uh… something related to Maximizing Some Utility, and thus distort its ranking logic with this strategic goal in mind, even though it has never had any Utility outside of myopically minimizing error on the given sequence?
What's the exact mechanism you predict so confidently here? Works better – for what?
I mean, what's so interesting about it? To the extent that this person is interesting, would she be less interesting if she were a WASPy housewife? (as I'd also assumed)
Fair point! To me it would even be more interesting if a "WASPy" housewife were so aggressive in harassing "libs", so prolific and so invincible, yes. Would probably get crushed by the peer pressure alone, nevermind all the bans.
But maybe I'm wrong. There's like OOMs more of WASPy housewives. Can one point to an example of one doing what Chaya Raichik does, and at comparable scale? After all, that's what you assumed, so this should be a more typical occurrence.
(I think I know there isn't one).
is our own TracingWoodgrains evidence of the relevance of "the Mormon Question"?
Mormons are very interesting too, if less so and for different reasons.
Trace is an account with ≈25k followers whose infamy mainly comes from being associated with Chaya Raichik and, more directly, Jesse Singal; regrettably (not because he's a Gentile, I jut believe he had more constructive things to offer than those two), his own ideas have had less impact on the conversation thus far. This is a self-defeating comparison.
if you are suggesting that culture warriors are in general particularly Jewish -- it's not clear to me, is that what you are suggesting?
My contention has been very clear that Jews are interesting, first of all, because they, individually and collectively, easily attain prominence in whatever they do, tend to act with atypical (for their class) irreverence towards established norms (but without typical White collective self-sacrifice), and affect society to an absurdly disproportionate degree. Culture warring is one specific expression of those qualities, maybe not the greatest absolutely but the most relevant to this place.
More extremely, I believe this topic is objectively interesting, as in, dissent here is not a matter of taste or preference or whatever, only of failure to form a correct opinion for some reason. This I believe because perception of things as interesting must be subordinate to effectiveness at world modeling; and not being able to reason about Jews as a whole as interesting indicates inability to model the world, as that'd require being surprised by parts of its mechanism.
Further, I think that either it's been clear what I mean and you are being obtuse, or you are biased in a way that makes this exchange a dead end. Seeing as we've been at it for like half a decade, I lean towards "doesn't matter which it is".
High-powered neural nets are probably sufficiently hard to align that
Note that there remains no good argument for the neural net paranoia, the whole rogue optimizer argument has been retconned to apply to generative neural nets (which weren't even in the running or seriously considered originally) in light of them working at all, not having any special dangerous properties, and it's just shameful to pretend otherwise.
The problem is that, well, if you don't realise
Orthodox MIRI believers are in no position to act like they have any privileged understanding.
The simple truth is that natsec people are making a move exactly because they understood we've got steerable tech.
https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/
- Prev
- Next
Wenfeng.
No, it's not a stereotype threat argument, it's an argument about perceived opportunity cost of exploration vs exploitation which is miscalibrated in the age of large domestic revenue generators. He's not arguing they should be like Whites. He's arguing they can now afford to do what Whites do compulsively, if you will.
Your condescension and willful misinterpretation will be your undoing in this dialogue and outside it.
I look down on WEIRDs for one more reason. You are ultimately tool-like, your mentality is that of servitors and cowering peasants. Your "internal dignity" is inextricably bound to collective judgement, you feel the need to justify your value to some imagined audience, to some Baron, some market or some Moral Community. You are ashamed of brute, terminal-value ethnocentrism the sort of which Judaism preaches, so you need to cling to those spiritualist copes wrapped in HBD lingo. "H-here's why we are Good, why we still deserve a place under the sun, sire!" This exposes you to obvious predation and mockery by High-Skill Immigrants like Count.
On the object level: yes, probably on average the Chinese are indeed less "creative" even with optimal incentives, and this has obvious implications at the tails. (though if we think OpenAI is an impressive example of bold creativity, what about NVidia? What did Jensen "merely improve"? As a CEO, he's roughly in the same league as Altman and Musk, I think). The question – raised by R1 there – is, how many more True Breakthrough innovators do we even need before innovation begins to accrete on itself without human supervision? Maybe just a handful. Again, there's been virtually no fundamental progress in AI since 2017, and we're all doing just fine. It may be that architecturally V3 is more sophisticated and innovative than the modern OpenAI stack. Imagine that. After all, Western geniuses are afraid to show their work these days.
Incidentally, I myself have submitted several minor ideas to DeepSeek; maybe they found use for those, maybe not, but I'll find use for the result of their labor and not cope that they needed my input.
It may be that the mode of production implied by the stage of our technological development makes your race, with all its creative perks and industrial drawbacks, less economically useful than it used to be. This only means you need to move that much faster to find reasons to protect your interests unconditionally, before everyone turns equally economically useless.
More options
Context Copy link