site banner

Culture War Roundup for the week of July 15, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

9
Jump in the discussion.

No email address required.

High-powered neural nets are probably sufficiently hard to align that with correct understanding, the game in question is Stag Hunt, not Prisoner's Dilemma (i.e. if nobody else builds Skynet, you should also not build Skynet even selfishly, as the risk that Skynet will turn on you outweighs the military advantage if it doesn't; it's only if somebody else is already building Skynet that you're incentivised to join in). It's hard to co-ordinate in Stag Hunt with 3+ people, but it's still not Prisoner's Dilemma levels of fucked.

The problem is that, well, if you don't realise that you're playing Stag Hunt, and think you're playing Prisoner's Dilemma instead, then of course you're going to play as if you're playing Prisoner's Dilemma.

High-powered neural nets are probably sufficiently hard to align that

Note that there remains no good argument for the neural net paranoia, the whole rogue optimizer argument has been retconned to apply to generative neural nets (which weren't even in the running or seriously considered originally) in light of them working at all, not having any special dangerous properties, and it's just shameful to pretend otherwise.

The problem is that, well, if you don't realise

Orthodox MIRI believers are in no position to act like they have any privileged understanding.

The simple truth is that natsec people are making a move exactly because they understood we've got steerable tech.

https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/

Orthodox MIRI believers are in no position to act like they have any privileged understanding.

The simple truth is that natsec people are making a move exactly because they understood we've got steerable tech.

https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/

Sorry for taking three days to actually read your citation, but you aren't exactly making this pleasant. Now I've read it, though.

Short version: Yes, the neural net will definitely understand what you want. The problem is that at high levels of capability, strategies like "deceive the operator" work better than "do what the operator wants", so the net will not be trained to care what you want.

you aren't exactly making this pleasant

And you are making it highly unpleasant with your presumptuous rigidity and insistence on repeating old MIRI zingers without elaboration. Still I persevere.

The problem is that at high levels of capability, strategies like "deceive the operator" work better than "do what the operator wants",

Why would this strategy be sampled at all? Because something something any sufficiently capable optimization approximates AIXI?

You keep insisting that people simply fail to comprehend the Gospel. You should start considering that they do, and it never had legs.

so the net will not be trained to care

Why won't it be? A near-human constitutional AI, ranking outputs for training its next, more capable iteration by their similarity to the moral gestalt specified in natural language, will ponder the possibility that deceiving and mind-controlling the operator would make him output thumbs-up to… uh… something related to Maximizing Some Utility, and thus distort its ranking logic with this strategic goal in mind, even though it has never had any Utility outside of myopically minimizing error on the given sequence?

What's the exact mechanism you predict so confidently here? Works better – for what?

Even a flaky subhuman model can probably be made limited enough and wrapped in enough layers of manually-written checks to keep it safe for its builders, in which case your first paragraph is only true for a definition of "high-powered" that's literally superhuman. That's not to say it won't come true eventually, though, which makes your second paragraph more worrisome. A Prisoner's Dilemma payoff matrix can be modified continuously into a Stag Hunt matrix, with no sharp distinction between the two if we add any uncertainty to the payoffs, and if capabilities progress faster than alignment then that's what we'd expect to happen.

I'm not sure if it's a stag hunt (I'll admit needing to look this up) considering AI development (so far) has not been a particularly communal process. As far as I know, China's and the US' AI models haven't shared code/development information and from the way chip stocks are down this morning, the segregation between the two major power players is not a cooperative model.

The reasons aren't the same, but the payoffs are (with rabbit being "build Skynet" and stag being "don't"). Party A has a preference order of "nobody builds" > "A builds and nobody else does" > "everyone builds" > "others build but A doesn't". This is the Stag Hunt matrix, with two Nash equilibria ("all stag" and "all rabbit").

That makes sense, I got confused because I was focused on the 'stag hunt' scenario having cooperative actors while 'prisoners dilemma' has competitive actors when the actual focus is on the number of stable Nash equilibria per scenario.

It's stag hunt where "Don't build Skynet" is playing co-operate.