site banner

Culture War Roundup for the week of March 24, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

I've tried the reasoning models. They fail just as much (just tried Gemini 2.5 too and it did even worse). The purpose was to illustrate an example of how they fail. To showcase their poor reliability. I did not say they won't get better. They will, just not as much as you think. You can't just take 2 datapoints and extrapolate forever.

And I don't get your example, wouldn't the NICE CKS be in the dataset many times over? Maybe my point wasn't clear. These tools are amazing as search engines as long as the user using them is responsible and able to validate the responses. It does not mean they are thinking very well. Which means they will have a hard time doing things not in the dataset. These models are not a pathway to AGI. They might be a part of it, but it's gonna need something else. And that/those parts might be discovered tomorrow, or in 50 years.

And I don't see why reality will smack me in the face. I'm already using these as much as possible since they are great tools. But I don't expect my work to look very different in 2030 compared to now. Since programming does not feel very different today compared to 2015. The main problem has always been to make the program not collapse under its own weight, by simplifying it as much as possible. Typing the code has never been relevant. Thanks for the comment btw, it made me try out programming with gemini 2.5 and it's pretty good.

I mean, I assume both of us are operating on far more than 2 data points. I just think that if you open with an example of a model failing at a rather inconsequential task, I'm eligible to respond with an example of it succeeding at a task that could be more important.

My impression of LLMs is that in the domains I personally care about:

  1. Medicine.
  2. Creative fiction
  3. Getting them to explain random things I have no business knowing. Why do I want to understand lambda calculus or the Church Turing hypothesis? I don't know. I finally know why Y Combinator has that name.

They've been great at 1 and 3 for a while, since GPT-4. 2? It's only circa Claude 3.5 Sonnet that I've been reasonably happy with their creative output, occasionally very impressed.

Number 3 encompasses a whole heap of topics. Back in the day, I'd spot check far more frequently, these days, if something looks iffy, these days I'll shop around with different SOTA models and see if they've got a consensus or critique that makes sense to me. This almost never fails me.

And I don't get your example, wouldn't the NICE CKS be in the dataset many times over?

Almost certainly. But does that really matter to the end user? I don't know if the RS wiki has anti-scraping measures, but there's tons of random nuggets of RS build and items guide all over the internet. Memorization isn't the only reason that models are good, they think, or do something so indistinguishable from the output of human thought that it doesn't matter.

If you met a person who was secretly GPT-4.5 in disguise, you would be rather unlikely to be able to tell at all that they weren't a normal human, not unless you went about suspicious from the start. (Don't ask me how this thought experiment would work, assume a human who just reads lines off AR lenses I guess).

These tools are amazing as search engines as long as the user using them is responsible and able to validate the responses. It does not mean they are thinking very well. Which means they will have a hard time doing things not in the dataset. These models are not a pathway to AGI. They might be a part of it, but it's gonna need something else. And that/those parts might be discovered tomorrow, or in 50 years.

This is a far more reasonable take in my opinion, if you'd said this at the start I'd have been far more agreeable.

I have minor disagreements nonetheless:

  1. 99% of the time or more, what current models say in my field of expertise (medicine) is correct when I check it. Some people claim to experience severe Gell-Mann amnesia when using AI models, and that has not really been my experience.
  2. This means that unless it's mission critical, the average user can usually get by with taking answers at face value. If it's something important, then checking is still worthwhile.
  3. Are current models AGI? Who even knows what AGI means these days. By most definitions before 2015, they count. It's valid to argue that that reveals a weakness of those previous definitions, but I think that at the absolute bare minimum these are proto-agi. I expect an LLM to be smarter and more knowledgeable and generally flexible than the average human. I can't ask a random person on the street what beta reduction is and expect an answer unless I'm on the campus of a uni with a CS course. That the same entity can also give good medical advice? Holy shit.
  4. Are the current building blocks necessary or sufficient for ASI? Something so smart than even skeptics have to admit defeat (Gary Marcus is retarded, so he doesn't count)? Maybe. Existing ML models can theoretically approximate any computable function, but something like the Transformer architecture has real world limitations.

And I don't see why reality will smack me in the face. I'm already using these as much as possible since they are great tools. But I don't expect my work to look very different in 2030 compared to now. Since programming does not feel very different today compared to 2015.

Well, if you're using the tools regularly and paying for them, you'll note improvements if and when they come. I expect reality to smack me in the face too, in the sense that even if I expect all kinds of AI related shenanigans, seeing a brick wall coming at my car doesn't matter all that much when I don't control the brakes.

For a short span of time, I was seriously considering switching careers from medicine to ML. I did MIT OCW programs, managed to solve one Leetcode medium, and then realized that AI was getting better at coding faster than I would. (And that there are a million Indian coders already, that was a factor). I'm not saying I'm a programmer, but I have at least a superficial understanding.

I distinctly remember what a difference GPT-4 made. GPT-3.5 was tripped up by even simple problems and hallucinated all the time. 4 was usually reliable, and I would wonder how I'd ever learned to code before it.

I have little reason to write code these days, but I can see myself vibe-coding. Despite your claims that you don't feel that programming had changed since 2015, there are no end of talented programmers like Karpathy or Carmac who would disagree.

Thanks for the comment btw, it made me try out programming with gemini 2.5 and it's pretty good.

You're welcome. It's probably the best LLM for code at the moment. That title changes hands every other week, but it's true for now.

99% of the time or more, what current models say in my field of expertise (medicine) is correct when I check it. Some people claim to experience severe Gell-Mann amnesia when using AI models, and that has not really been my experience.

  1. Okay can we get people to start using delusions or confabulations instead of hallucinations. This always irks me.

  2. I know we've bickered about this in the past but I think you have to be very cautious about what decision support tools and LLMs are doing in practical medicine at this time - fact recall is not most of the problem or difficulty.

The average person here could use UpToDate to answer many types of clinical questions, even without the clinical context that you, I, and ChatGPT have.

That's not the hard part of medicine. The hard part is managing volume (which AI tools can do better than people) and vagary (which they are shit at). Patients reporting symptoms incorrectly, complex comorbidity, a Physical Exam, these sorts of things are HARD.

Furthermore the research base in medicine is ass, and deciding if you want a decision support tool to use the research base or not is not a simple question.

On the topic of hallucinations/confabulations from LLMs in medicine:

https://x.com/emollick/status/1899562684405670394

This should scare you. It certainly scares me. The paper in question has no end of big names in it. Sigh, what happened to loyalty to your professional brethren? I might praise LLMs, but I'm not conducting the studies that put us out of work.

The average person here could use UpToDate to answer many types of clinical questions, even without the clinical context that you, I, and ChatGPT have.

I expect that without medical education, and only googling things, the average person might get by fine for the majority of complaints, but the moment it gets complex (as in the medical presentation isn't textbook), they have a rate of error that mostly justifies deferring to a medical professional.

I don't think this is true when LLMs are involved. When presented with the same data as a human clinician, they're good enough to be the kind of doctor who wouldn't lose their license. The primary obstacles, as I see them, lie in legality, collecting the data, and the fact that the system is not set up for a user that has no arms and legs.

I expect that when compared to a telemedicine setup, an LLM would do just as well, or too close to call.

That's not the hard part of medicine. The hard part is managing volume (which AI tools can do better than people) and vagary (which they are shit at). Patients reporting symptoms incorrectly, complex comorbidity, a Physical Exam, these sorts of things are HARD.

I disagree that they can't handle vagary. They seem epistemically well calibrated, consider horses before zebras, and are perfectly capable of asking clarifying questions. If a user lies, human doctors are often shit out of luck. In a psych setting, I'd be forced to go off previous records and seek collateral histories.

Complex comorbidities? I haven't run into a scenario where an LLM gave me a grossly incorrect answer. It's been a while since I was an ICU doc, that was GPT-3 days, but I don't think they'd have bungled the management of any case that comes to mind.

Physical exams? Big issue, but if existing medical systems often use non-doctor AHPs to triage, then LLMs can often slot into the position of the senior clinician. I wouldn't trust the average psych consultant to find anything but the rather obvious physical abnormalities. They spend blissful decades avoiding PRs or palpating livers. In other specialities, such as for internists, that's certainly different.

I don't think an LLM could replace me out of the box. I think a system that included an LLM, with additional human support, could, and for significant cost-savings.

Where I currently work, we're more bed-constrained than anything, and that's true for a lot of in-patient psych work. My workload is 90% paperwork versus interacting with patients. My boss, probably 50%. He's actually doing more real work, at least in terms of care provided.

Current setup:

3-4 resident or intern doctors. 1 in-patient cons. 1 outpatient cons. 4 nurses a ward. 4-5 HCAs per ward. Two wards total, and about 16-20 patients.

?number of AHPs like mental health nurses and social workers triaging out in the community. 2 ward clerks. A secretary or two, and a bunch of people whose roles are still inscrutable to me.

Today, if you gave me the money and computers that weren't locked down, I could probably get rid of half the doctors, and one of the clerks. I could probably knock off a consultant, but at significant risk of degrading service to unacceptable levels.

We're rather underemployed as-is, and this is a sleepy district hospital, so I'm considering the case where it's not.

You would need at least one trainee or intern doctor who remembered clinical medicine. A trainee 2 years ahead of me would be effectively autonomous, and could replace a cons barring the legal authority the latter holds. If you need token human oversight for prescribing and authorizing detention, then keep a cons and have him see the truly difficult cases.

I don't think even the ridiculous amount of electronic paperwork we have would rack up more than $20 a day for LLM queries.

I estimate this would represent about £292,910 in savings from not needing to employ those people, without degrading service. I think I'm grossly over-estimating LLM query costs, asking one (how kind of it) suggests a more realistic $5 a day.

This is far from a hyperoptimized setup. A lot of the social workers spend a good fraction of their time doing paperwork and admin. Easy savings there, have the rest go out and glad-hand.

I re-iterate that this is something I'm quite sure could be done today. At a certain point, it would stop making sense to train new psychiatrists at all, and that day might be now (not a 100% confidence claim). In 2 years? 5?

Do keep in mind how terrible most medical research is, and that includes research into our replacements. This isn't from lack of effort but from the various systems, pressures, and ethics at play.

How do you simulate a real patient encounter when testing an LLM? Well maybe you write a vignette (okay that's artificial and not a good example. Maybe you sanitize the data inputs and have a physician translate into the LLM. Well shit, that's not good either.

Do you have the patient directly talk to the LLM and have someone else feed in lab results? Okay maybe getting closer but let's see evidence they are actually doing that.

All in the setting of people very motivated to show the the tool works well and therefore are biased in research publication (not to mention all the people who run similar experiments and find that it doesn't work but can't get published!).

You see this all the time in microdosing, weed, and psychedelic research. The quality is ass.

Also keep in mind that a good physician is a manager also - you are picking up the slack on everyone else's job, calling family, coordinating communication for a variety of people, and doing things like actually convincing the patient to follow recommendations.

I haven't seen any papers on an LLMs attempts to get someone to take their 'beetus medication vs a living breathing person.

Also Psych will be up there with the procedurealists in the last to be replaced.

Also also other white collar jobs will go first.

Do you have the patient directly talk to the LLM and have someone else feed in lab results? Okay maybe getting closer but let's see evidence they are actually doing that.

I expect this would work. You could have the AI be something like GPT-4o Advanced Voice for the audio communication. You could record video and feed it into the LLM. This is something you can do now with Gemini, I'm not sure about ChatGPT.

You could, alternatively, have a human (cheaper than the doctor) handle the fussy bits. Ask the questions the AI wants asked, while there's a continuous processing loop in the background.

No promises, but I could try recording a video of myself pretending to be a patient and see how it fares.

All in the setting of people very motivated to show the the tool works well and therefore are biased in research publication (not to mention all the people who run similar experiments and find that it doesn't work but can't get published!).

I mean, quite a few of the authors are doctors, and I presume they'd also have a stake in us being gainfully employed.

Also keep in mind that a good physician is a manager also - you are picking up the slack on everyone else's job, calling family, coordinating communication for a variety of people, and doing things like actually convincing the patient to follow recommendations.

I'd take orders from an LLM, if I was being paid to. This doesn't represent the bulk of a doctor's work, so if you keep a fraction of them around.. People are already being conditioned to take what LLMs take seriously. They can be convinced to take them more seriously, especially if vouched for.

I haven't seen any papers on an LLMs attempts to get someone to take their 'beetus medication vs a living breathing person.

That specific topic? Me neither. But there are plenty of studies of the ability of LLMs to persuade humans, and the very short answer is that they're not bad.

I mean, quite a few of the authors are doctors, and I presume they'd also have a stake in us being gainfully employed.

Nah most of us Get Too Excited About Making A Difference.

Sidebar- I was watching "In Good Company" at lunch today (podcast in which the manager of Norway's sovereign wealth fund interview the most successful people in the world) and the CEO of Goldman asked Nicolai about the best features in leaders - empathy was one of them! And this was noted in the context of LLMs taking over other parts of the job for many things!

Empathy and leadership are core to being a physician (at least in the U.S.) and if two of the world's most successful people are going to emphasize the importance of that I'm going to imagine we will be well positioned lol.

Nah most of us Get Too Excited About Making A Difference.

That's why I'm quite candid about my opinions here, it doesn't make a difference what I tell people on a niche underwater basket weaving forum.

Empathy and leadership are core to being a physician (at least in the U.S.) and if two of the world's most successful people are going to emphasize the importance of that I'm going to imagine we will be well positioned lol.

I looked up studies about LLMs and empathy, including in medical settings and vs human doctors, and there plenty. Can't vouch for them.

But I had a quasi-transformative experience that involved one today (in a significant role), and I might write that up and tag you.

Please do!