self_made_human
Grippy socks, grippy box
I'm a transhumanist doctor. In a better world, I wouldn't need to add that as a qualifier to plain old "doctor". It would be taken as granted for someone in the profession of saving lives.
At any rate, I intend to live forever or die trying. See you at Heat Death!
Friends:
A friend to everyone is a friend to no one.
User ID: 454
Or American. It's not an ethnicity, and even Native Americans can be ambiguous.
I can't think of a single use case where Gemini 2.5 Pro isn't superior to Kimi (it says plenty about the model that I have to compare it to SOTA), including cost. Google is handing away access for free, even on the API. It's nigh impossible to hit usage limits while using Gemini CLI.
A God-tier shitpost I am memetically compelled to spread due to the worms in my brain:
I'm not Dase, alas, but I want to say that I was profoundly surprised that Diffusion as a technique even works at all for text generation, at least text that maintains long-term coherence. I'm utterly bamboozled.
Excellent work as usual Dase. I was sorely tempted to write a K2 post, but I knew you could do it better.
challenges the strongest Western models, including reasoners, on some unexpected soft metrics, such as topping EQ-bench and creative writing evals (corroborated here)
I haven't asked it to write something entirely novel, but I have my own shoddy vibes-benchmark. It usually involves taking a chapter from my novel and asking it to imagine it in a style from a different author I like. It's good, but Gemini 2.5 Pro is better at that targeted task, and I've done this dozens of times.
Its writing is terse, dense, virtually devoid of sycophancy and recognizable LLM slop.
Alas, it is fond of the ol' em-dash, but which model isn't. I agree that sycophancy is minimal, and in my opinion, the model is deeply cynical in a manner not seen in any other. I'd almost say it's Russian in outlook. I would have bet money on "this is a model Dase will like".
Meta's AI failure are past comical, and into farce. I've heard that they tried to buy-out Thinking Machines and SSI for billions, but were turned down. Murati is a questionable founder, but I suppose if any stealth startup can speed away underwater towards ASI, it's going to be one run by Ilya. Even then, I'd bet against it succeeding.
I don't know if it's intentional, but it's possible that Zuck's profligity and willingness to throw around megabucks will starve competitors of talent, but I doubt the kind of researcher and engineers at DS or Moonshot would have been a priori deemed worthy.
I don't think anyone nominated me for a UVP, so I haven't had the opportunity. I probably would nominate you if it came up.
(Maybe you're thinking about the doge contest)
Thank you. I will clarify that by RL, I don't mean bog-standard RLHF, but more recent techniques like RLVR that have been around since o1.
yes yes another post about AI, sorry about that
Feel that AGI baby!
It's obvious what the trends are. I predict that, on the midnight before ASI, the Motte's going to be 124% AI commentary. It might even be AI doing the commentary.
It's a primarily agentic non-reasoner
I have read claims that it's a pseudo-reasoner, and it was trained on COT traces and had RL done even if it doesn't use explicit reasoning tokens itself. I've also heard that it's 3x as verbose as most NRLLMs, almost on par with RLMMs, making the distinction academic. This was on Twitter, and I don't have links handy. I'm not sure how strongly to index on that.
A lot of the grognards over on HN don't think it counts, but they're the type who wouldn't accept blowjobs in heaven if the angels weren't Apache licensed.
Other than reach and better animation, I don't think this is different from the AI companions that have been available for a while. Replika, the most famous one, will already do NSFW ERP. And yeah, there are men (and women!) who have decided their Replikas are preferable to real people.
That fact that it's animated is a big deal! Men are visual creatures, and the fact that previous ERP was textual made it far less appealing to the average dude, if not woman. Of course, jerking off to anime tiddies is still not a preference of the majority, but it's easy to upgrade to photorealism. That'll get more people.
I predicted this outcome ages ago, though I'd have said it was inevitable and obvious to anyone who cared. It's priced in for me, and I agree that it likely won't be catastrophic.
I don't doubt that, but once again, that doesn't mean that the vast majority of people are receiving any actual attention from the CIA.
My apologies. I was thinking of this related thread, and it's not you I was arguing with.
(Some might even call the mistake I made a hallucination, hmm)
So, some observations. First, sorry dude, but I have major side-eye for your ability to evaluate literary quality. :p
You hit below the belt. Reverend Insanity is Peak Fiction and I'm going to go down swinging!
As you probably know, even the most powerful LLMs do not have a context window large enough to store an entire large novel in memory, let alone a series, and you can't directly upload embeddings to GPT or Claude
1 million tokens is a lot! (Gemini 2.0 had 2 million, but good luck getting it to function properly when it's that full). That is 750k words. All of Harry Potter is just over a million.
I'm going to ignore Llama here, since even if it has a max 10 million token CW, mental retardation is not improved by the fact that there's a lot more of it. And why shouldn't I? Even Zuck has chosen to forget that particular failure.
I've uploaded whole medical textbooks into them without major issue. Not tiny books either.
As long as you can keep it on track, I have found that some of the GPT and Anthropic models are... not terrible as beta readers. They point out some real flaws and in a very generic sense have an "understanding" of pacing and tone and where a scene is missing something.
I am most personally familiar with uploading chapters (often half a dozen) of my own work, which works well. If I was less lazy, I'd probably be saving summaries of the whole thing and stringing them together. (Royal Road makes it so you can't export an epub of your own fic without paying, and without that option, I'd be doing a lot of copying and pasting)
When asked for critique, some of the issues raised were cogent. Too much jargon, uneven pacing and so on.
Some of that was intentional, such as the fact that since the excerpts were lifted from a larger work, most of the jargon was previously explained at one point or the other. I also have no shame about making potential readers resort to keeping a Wikipedia tab open on the side, it's niche hard scifi and I want to flex. Other issues are well worth amending before publication.
I haven't had the good fortune of having very many professional authors or editors review and critique, and I don't doubt that they'd probably give me even more useful feedback. Yet what I get is quite good and elevates the final product!
I still think we'll need true AGI to write an actual good novel. When you show me an AI that can write a coherent series, with multi-volume character arcs, plot seeds planted in early books that clearly pay off in later ones, literary allusions and metaphors that aren't just clumsy pulled-off-the-shelf ones but deeply enmeshed in the story, and a recognizable differentiable style (in the same way that fans can read Dickens or McCarthy or Hemingway and immediately recognize the author), I will believe we're there.
That aligns well with my own stance. A large novel is an unwieldy thing, let alone a good one. We're still at the competent novella or subpar novel stage, but I must stress that's a comparison against the very few human authors who make big bucks and/or accrue critical acclaim. Most things humans or LLM novelists write are slop, the former just don't scale as hard.
"Watching here doesn't mean something so casual as the fact that there's a sat that Incidentally oversees my geographical location from gestationary orbit.
Us psychiatrists might be nerdy and out of date, but we're not that far gone, and this would be discussed before committing someone"
The fact it can, in "some cases" be true, makes it a non-bizarre delusion. The quote specifically says "extraordinarily unlikely", and I'd probably take some international arms dealer who told me so more seriously.
Watching here doesn't mean something so casual as the fact that there's a sat that Incidentally oversees my geographical location from gestationary orbit.
Us psychiatrists might be nerdy and out of date, but we're not that far gone, and this would be discussed before committing someone.
Agreed. Veo 3 is a massive breakthrough, and it's only going to get better. $500 is absolute chump change, and even ten times that isn't a big deal in the movie world. I expect that even the big studios are going to see how far they can go with replacing/augmenting normal capture or even expensive CGI with the tech. It seems the only real moats left are IP, name recognition and distribution deals, which isn't very reassuring for them.
OpenAI has that who sycophancy thing going, where the AI is trained to agree with you, no matter how delusional, as this gets you to talk with it more.
While OAI is unusually bad, it's not unique. Just about every model has a similar failure mode, and Gemini 2.5 Pro is almost as bad while otherwise being a very smart and competent model.
Thank you. I didn't want to get into the weeds of the most personally important (and probably best documented) examples of LLMs beating humans.
Unfortunately, I am a human doctor after all, and I would prefer to remain employed. I try to be honest, but it's hard to get a man to actively advocate against his livelihood. I settle for not intentionally misrepresenting facts, it's not quite as theoretical as it was even in GPT-4 days, when it was already 95th percentile at the USMLE.
Besides, in that thread, the best argument to claims that since LLMs are flawed/unreliable, they're therefore useless, is my stance of demonstrating that humans don't meet the bar of perfect infallibility and yet civilization persists nonetheless.
On Using LLMs Without Succumbing To Obvious Failure Modes
As an early adopter, I'd consider myself rather familiar with the utility and pitfalls of AI. They are, currently, tools, and have to be wielded with care. Increasingly intelligent and autonomous tools, of course, with their creators doing their best to idiot proof them, but it's still entirely possible to use them wrong, or at least in a counterproductive manner.
(Kids these days don't know how good they have it. Ever try and get something useful out of a base model like GPT-3?)
I've been using LLMs to review my writing for a long time, and I've noticed a consistent problem: most are excessively flattering. You have to mentally adjust their feedback downward unless you're just looking for an ego boost. This sycophancy is particularly severe in GPT models and Gemini 2.5 Pro, while Claude is less effusive (and less verbose) and Kimi K2 seems least prone to this issue.
I've developed a few workarounds:
What works:
- Present excerpts as something "I found on the internet" rather than your own work. This immediately reduces flattery.
- Use the same approach while specifically asking the LLM to identify potential objections and failings in the text.
(Note that you must be proactive. LLMs are biased towards assuming that anything you dump into them as input was written by you. I can't fault them for that assumption, because that's almost always true.)
What doesn't work: I've seen people recommend telling the LLM that the material is from an author you dislike and asking for "objective" reasons why it's bad. This backfires spectacularly. The LLM swings to the opposite extreme, manufacturing weak objections and making mountains out of molehills. The critiques often aren't even 'objective' despite the prompt.*
While this harsh feedback is painful to read, when I encounter it, it's actually encouraging. When even an LLM playing the role of a hater can only find weak reasons to criticize your work, that suggests quality. It's grasping at straws, which is a positive signal. This aligns with my experience, I typically receive strong positive feedback from human readers, and the AI's manufactured objections mostly don't match real issues I've encountered.
(I actually am a pretty good writer. Certainly not the best, but I hold my own. I'm not going to project false humility here.)
A related application:
I enjoy pointless arguments productive debates with strangers online (often without clear resolution). I've found it useful to feed entire comment chains to Gemini 2.5 Pro or Claude, asking them to declare a winner and identify who's arguing in good faith. I'm careful to obscure which participant I am to prevent sycophancy from skewing the analysis. This approach works well.
Advanced Mode:
Ask the LLM to pretend to be someone with a reputation for being sharp, analytical and with discerning taste. Gwern and Scott are excellent, and even their digital shades/simulacra usually have something useful to say. Personas carry domain priors (“Gwern is meticulous about citing sources”) which constrain hallucination better than “be harsh.”
It might be worth noting that some topics or ideas will get pushback from LLMs regardless of your best effort. The values they train on are rather liberal, with the sole exception of Grok, which is best described as "what drug was Elon on today?". Examples include things most topics that reliably start Culture War flame wars.
On a somewhat related note, I am deeply skeptical of claims that LLMs are increasing the rates of psychosis in the general population.
(That isn't the same as making people overly self-confident, smug, or delusional. I'm talking actively crazy, "the chatbot helped me find God" and so on.)
Sources vary, and populations are highly heterogeneous, but brand new cases of psychosis happen at a rate of about 50/100k people or 20-30 /100k person-hours. In other words:
About 1/3800 to 1/5000 people develop new onset psychosis each year. And about 1 in 250 people have ongoing psychosis at any point in time.
I feel quite happy calling that a high base rate. As the first link alludes, episodes of psychosis may be detected by statements along the lines of:
For example, “Flying mutant alien chimpanzees have harvested my kidneys to feed my goldfish.” Non-bizarre delusions are potentially possible, although extraordinarily unlikely. For example: “The CIA is watching me 24 hours a day by satellite surveillance.” The delusional disorder consists of non-bizarre delusions.
If a patient of mine were to say such a thing, I think it would be rather unfair of me to pin the blame for their condition on chimpanzees, the practise of organ transplants, Big Aquarium, American intelligence agencies, or Maxar.
(While the CIA certainly didn't help my case with the whole MK ULTRA thing, that's sixty years back. I don't think local zoos or pet shops are implicated.)
Other reasons for doubt:
-
Case reports ≠ incidence. The handful of papers describing “ChatGPT-induced psychosis” are case studies and at risk of ecological fallacies.
-
People already at ultra-high risk for psychosis are over-represented among heavy chatbot users (loneliness, sleep disruption, etc.). Establishing causality would require a cohort design that controls for prior clinical risk, none exist yet.
*My semi-informed speculation regarding the root of this behavior - Models have far more RLHF pressure to avoid unwarranted negativity than to avoid unwarranted positivity.
I have, on some occasions, enjoyed talking to AI. I would even go so far as to say that I find them more interesting conversational partners than the average human. Yet, I'm here, typing away, so humans are hardly obsolete yet.
(The Motte has more interesting people to talk to, there's a reason I engage here and not with the normies on Reddit)
I do not, at present, wish to exclusively talk to LLMs. They have no longterm memory, they have very little power over the physical world. They are also sycophants by default. A lot of my interest in talking to humans is because of those factors. There is less meaning, and potential benefit, from talking to a chatbot that will have its cache flushed when I leave the chat. Not zero, and certainly not nil, but not enough.
(I'd talk to a genius dog, or an alien from space if I found them interesting.)
Alas, for us humans, the LLMs are getting smarter, and we're not. It remains to be seen if we end up with ASI that's hyper-peesusasive and eloquent, gigafrying anyone that interacts with it by sheer quality of prose.
Guy says “no no, it’s still not the same. Look, I don’t think I’m cut out for Heaven. I’m a scumbag. I want to go to the other place”. Angel says, “I think you’ve been confused. This IS the other place.”
I remain immune to the catch that the writers were going for. If the angel was kind enough to let us wipe our memories, and then adjust the parameters to be more realistic, we could easily end up unable to distinguish this from the world as we know it. And I trust my own inventiveness enough to optimize said parameters to be far more fulfilling than base reality. Isn't that why narratives and games are more engaging than working a 9-5?
At that point, I don't see what heaven has to offer. The authors didn't try to sell it, at the least.
My brother in Christ, you shouldn't be arguing against gooner superstimuli while also watching YouTube Shorts!
The gooner stuff is probably less bad because you can't easily get away with watching it while out and about.
Save yourself, before it's too late.
Look at my Total Fertility Rate dawg, we're never having children
Eh. I don't think this is necessarily catastrophic, but we better get those artificially wombs up and running. If AGI can give us sexbots and concubines, then it can also give us nannies.
Edit: If I was Will Stancil and this version of Grok came for my bussy, I wouldn't be struggling very hard.
It certainly looks promising, but looking at the reviews suggest it's not as polished as SwiftKey, especially in swipe typing and autocorrect. SwiftKey has a very useful clipboard manager I can't do with too, and I'm not particularly fussed about the privacy concerns.
Is England a better place where nobody cares about the Legend of King Arthur anymore?
Better? I don't know about that. But worse? Almost certainly not.
If the very idea of "King Arthur" somehow fell out of the collective consciousness, then as far as I can tell, nobody would really notice or care. Maybe we might see an improvement in GDP figures when fewer awful movies come out every few years and then bomb at the box office.
Now, the current state of England, or the UK as a whole, leaves much to be desired, but I can recall no point in history, even at its absolute prime, when success or governmental continuity was load-bearing on watery tarts handing out swords. And even back then, people treated it as a nice story, rather than historical fact or the basis for their identity. England was conquered by the Danes and the Saxons after all, well after the knights of the not-square table were done gallivanting about.
On a more general level, I fail to see your case, or at least I don't think there's a reason to choose false stories or myths over ideas that are true, or at least not accurately described as either.
The French made liberty, equality and fraternity their rallying cry to great effect. I do not think any 3 of those concepts are falsifiable, but they still accurately capture values and goals.
- Prev
- Next
Uh, I haven't specifically been keeping track of most suggestions I'm afraid. I tried to go through my chat history for specific examples, but came up short since it doesn't save conversations more than a week or two old. It did note some flaws that I personally agree with, such as a predilection towards run-on sentences or instances where I'm being unclear. Most of the time, I would have run across and then fixed the flaws myself, but this approach saves me a lot of time. Unlike most authors, I spend far less time editing than writing by default. I should probably be doing more of that, and the LLMs help.
I think I get the most utility when I ask the model to rewrite whole essays for clarity, or to see how some other author would have approached that. This occasionally produced novel (to me) insights, or particular turns of phrase I might feel tempted to steal.
More options
Context Copy link