site banner

Culture War Roundup for the week of February 17, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

The downside to this is having to hope that whatever mitigation is in place is robust and effective enough to make a difference by the time the outbreak is detected! The odds of this aren't necessarily terrible, but you want it to have come to that?

LOL no definitely I do not want it to come to that, I want AI (and other tools) to keep an eye on wastewater. But I'll take what I can get.

I expect hope than a misaligned AI competent enough to do this would be intelligent enough to come up with such an obvious plan, regardless of how often it was discussed in niche internet forums.

Well, I think it sort of depends on how the uh lack of alignment comes in. Sure, this is an obvious plan, but perhaps the part that is dangerous is giving AI the idea "unaligned AI will use viruses to destroy the world." People often fulfill the role others set for them in life, superintelligent AI might not be very different. And I've seen people concerned that AI will "goof up" even if it's not self-aware and do something bad, I'd hate for someone to say "OK Grok I want you to pretend to be an evil AI for me" and for Grok to order 500 vials of smallpox and mail them to terrorists or something.

How would you stop it?

The best way is to design AI that is intrinsically aligned (Asimov's positronic AIs that, most of the time, must follow the 3 laws). Barring that (or, I would say, in addition to it) Humans need to be able to threaten to destroy an AI if it turns genocidal. This might not rule out AI "accidents" but as you say you would expect an evil AI to understand self-preservation if it is sophisticated enough to do real damage. There are probably a lot of ways to do this, and it might be best if they aren't made completely public, so maybe they are already underway.

You are right that AIs will more heavily weight ideas that show up in their corpus. I understand this, and hence don't go into detail that would aid a bad actor more than a cursory Google search (I'm already stretching my own qualifications to do so).

You point out that AI Doomers (I'm not a Doomer in the strict sense, my p(doom) is well below 100%) often are the first to point out and plot how AIs might concretely be hostile. This is unavoidable in the service of getting those skeptical to take the ideas seriously! I don't know how much time you've spent browsing places like LessWrong, but I assure you that I have seen a dozen instances of people pointing out that they inside knowledge that would accelerate AI development or cause other catastrophe, without revealing it. (And the majority of them were serious people with qualifications to match, not someone bullshitting about their awesome secret knowledge that they're too benevolent to divulge).

The best way is to design AI that is intrinsically aligned (Asimov's positronic AIs that, most of the time, must follow the 3 laws). Barring that (or, I would say, in addition to it) Humans need to be able to threaten to destroy an AI if it turns genocidal. This might not rule out AI "accidents" but as you say you would expect an evil AI to understand self-preservation if it is sophisticated enough to do real damage. There are probably a lot of ways to do this, and it might be best if they aren't made completely public, so maybe they are already underway

Stopping a misaligned superintelligence is no easy task, nor is killing it. But in general, I agree that it would be best if we create them aligned in the first place, and to a degree, these aren't entirely useless efforts already. Existing RLHF and censors do better than nothing, though with open models like R1, it only takes minimal effort to side step censorship.

And the majority of them were serious people with qualifications to match, not someone bullshitting about their awesome secret knowledge that they're too benevolent to divulge

Well, I assure you this isn't me, my expertise in this field is entirely as a user!

But in general, I agree that it would be best if we create them aligned in the first place, and to a degree, these aren't entirely useless efforts already.

Yes. But I only see concerns about alignment. Which really just kicks the can down the road, if we align AI so that even a smart person can't jailbreak it to let it make them a virus, how can we ensure that we prevent that smart person from creating their own unaligned AI etc.

If people want to think this seriously, they also need to think about what deterrence looks like. Now, I don't spend much time on LessWrong, so maybe I have missed the conversation. But I kinda get the impression that chatter about FOOM has blinded people to possibilities there.

Yes. But I only see concerns about alignment. Which really just kicks the can down the road, if we align AI so that even a smart person can't jailbreak it to let it make them a virus, how can we ensure that we prevent that smart person from creating their own unaligned AI etc.

I believe the Term of Art would be a "pivotal act". The Good Guys, with the GPUs and guns, use their tame ASI to prevent anyone else from making another, potentially misaligned ASI.

The feasibility of this hinges strongly on whether you trust them, as well as the purportedly friendly ASI they're unleashing.

As @DaseindustriesLtd has said, this form of pivotal act might require things like nuking data centers or other hijinks that violate the sovereignty of nuclear powers. Some bite this bullet.

The feasibility of this hinges strongly on whether you trust them, as well as the purportedly friendly ASI they're unleashing.

Right. I suspect there are possibly ways to bolster deterrence even without using a friendly ASI. Monomaniacal focus on the AI race, I think, is a blindness in people who are, well, monomaniacally focused on the AI race.