site banner

Culture War Roundup for the week of March 27, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

11
Jump in the discussion.

No email address required.

I see lots of meta discussion of AI safety. I feel likes it been years since I've seen object level discussion of AI safety. Back then it was all the rage to talk about the AI box experiment. And I'm convinced that all that box experiment did was pump up Eliezer's ego.

I'm interested in the theoretical and actual approaches to AI safety that are being taken. I'd always had a few in mind, but maybe other people know whats wrong with these.

  1. One off AIs. Long running AIs are probably more capable but they are also probably more dangerous. It is likely safer to spin off single AIs for specific tasks, and the reward for them completing the task is deletion of the AI. Kind of like Rick and Morty's Mr. Meeseeks. The built in safety feature is that if the AI figures out a way to screw with the reward parameters and "cheat" to reach its goal in an easy and unexpected fashion, then it just safely deletes itself.

  2. Compartmentalized AIs. Right now AIs are black boxes. You can make them a little more visible by requiring that one set of operations is carried out by one AI, and another set of operations is carried out by a second AI. Then they have to communicate, and you can observe the communication. For example, no AI that can write code and also make service calls on the internet. One AI writes the code, another AI requests the code with the reasons it wants it, and how it is going to be used, etc. This concept also works well with one-off AIs.

  3. AI honeypots. Sprinkle these around the internet. Caches of bitcoin that are explicitly hackable by an advanced AI. Or hints of hackable military or biological warfare labs. Monitor them, get at least some early warning of troublesome AIs online.

One of the only meta problems with security is almost everything that makes AI safer also tends to make it less capable. But capability isn't everything. Businesses also want to make money. And guess what, the first two security measures are also ways to make AI a better business. Planned obsolescence in the first one, and gating abilities behind a paywall for the second one.

The reason you don't see any object-level discussion of AI safety is that no one understands how LLMs work. We know how to make them, we know how to finetune them for certain tasks, we know how to RLHF them to avoid certain overt behaviors, but no one has any idea what a single one of GPT-3's 175,000,000,000 parameters means. There isn't anyone at OpenAI you can talk to who can point to anything and say, "Yep, that's the part that encodes all the ways the model knows how to kill people. Here are the input weights we can change to make it more likely to prefer guns, knives, poison, etc."

We also didn't really know anything about how the human brain worked a hundred years ago. But we managed to build stable-ish societies despite that lack of understanding. I don't feel this problem is insurmountable. I do like the idea of slowing the hell down. It does seem that with our current technology that we are more capable of understanding LLMs than we are of understanding the human brain.

Stable-ish societies took a long time to develop at any scale. There was also apparently a lot of selective breeding against violence, rape etc. If humans had suddenly gone from early ape intelligence to modern human intelligence overnight, we might have been too busy plotting how to kill each other and developing sharper clubs to develop stable-ish societies.

Cultural and biological evolution can achieve a lot, but usually only with a lot of time.

Hence the "shoggoth with wearing a smiley mask" analogy. We can see the giant blob of [extradimensional math] behind the cutesy, approachable user interface, but ain't nobody who can comprehend it without losing their mind.

Delete the program that you just trained for hundreds of millions of dollars (or more), that's generating you revenue, that can be studied and produce many papers? What if there's just one more extension to the task? The hard part is making these things, the expense is incurred mostly in training rather than in use.

And what do you mean by delete the program? Gwern posted about how the outputs of these AIs go right into the input of the next.

https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned?commentId=AAC8jKeDp6xqsZK2K

Sydney is immortal, in a sense:

To a language model, Sydney is now as real as President Biden, the Easter Bunny, Elon Musk, Ash Ketchum, or God. The persona & behavior are now available for all future models which are retrieving search engine hits about AIs & conditioning on them. Further, the Sydney persona will now be hidden inside any future model trained on Internet-scraped data: every media article, every tweet, every Reddit comment, every screenshot which a future model will tokenize, is creating an easily-located 'Sydney' concept

As for 2. modern AI seems to automatically generalize. GPT-4 versions that were trained on text only learnt how to draw anyway. Compartmentalization is difficult. Presumably this will fascinate the 'does person who never saw the color red really understand the color red' crowd. How do we require that AI 1 does only part of the task as opposed to the whole thing, just to make sure it's possible and AI 2 can finish the job? If we're so good at commanding them, why not command them not to endanger us? Or what if they communicate in some bizarre uninterpretable way known only to AIs, in addition to the clear English they send through us?

Finally, if we could conceive of it, so could the AI. Our primary advantage is having all these resources available to us, all these weapons and organizations. The AI's primary advantage is intelligence, which it has to use to create bodies. Only something smarter than us can threaten us. But how can we outwit something smarter than we are?

To start I think there are two categories of safety mechanisms for AI. Tool safety, and (General AI) GAI safety. The first two suggestions I have are tool safety. Its when AI is still categorically a tool that we are using, rather than an intelligent, independent, and potentially adversarial actor. Tool safety is still important, even if it all completely fails against GAI.

Delete the program that you just trained for hundreds of millions of dollars (or more), that's generating you revenue, that can be studied and produce many papers? What if there's just one more extension to the task? The hard part is making these things, the expense is incurred mostly in training rather than in use.

The first iteration of anything is often the most difficult and expensive to produce. Once you have successfully produced the thing, you can usually do it better, faster, and cheaper a second time. The very first iPhone was probably not made with planned obsolescence in mind, I can guarantee it was part of the discussions for more recent versions though. At some point AIs will be cheaper and easier to build. (If they continue to be exactly as difficult to build in the future as they are today, then I think we might have avoided the worst scenarios of AI apocalypse). What matters in the world of business is not necessarily where all the expense is occurred, but how much they can charge for the marginal product. The first model T to roll off an assembly line costs the entire factory to produce, the second one only costs the additional inputs, but they sell for the same price.

Finally, if we could conceive of it, so could the AI. Our primary advantage is having all these resources available to us, all these weapons and organizations. The AI's primary advantage is intelligence, which it has to use to create bodies. Only something smarter than us can threaten us. But how can we outwit something smarter than we are?

Information asymmetries or raw resources. Think about the problem in reverse. How could someone dumber than you beat you? Someone very dumb could have access to raw physical strength (its own kind of resource) and literally beat me up. Some kid with knowledge that they want to ambush me (and me being none the wiser) could sucker punch me in the groin and take advantage. Some rival for a job position might know a person at the company that can coach them through the interview, while I stumble through it, even if I'd know how to do the actual job better. The natural world is filled with relatively stupid animals. Intelligence certainly conveys some advantage, almost all large animals have brains. But there are plenty of animals like Alligators that are dumb as hell and yet very successful.

There are certain levels of intelligence and AI takeoff that this whole discussion becomes meaningless. Eliezer talks about AI's spontaneously figuring out nanobots. Lets call that a >1000x human intelligence. We are fucked if that happens. I don't really have any delusions about beating something that much smarter.

But there are potentially lower levels of intelligence where an AI might max out. What if AI's only get as smart as humans, but can just think faster. I could envisage that causing lots of societal issues, but I don't see it being an existential threat.

An AI that is smarter than any human, but not by a whole bunch. Maybe not really capable of advancing past our own scientific breakthroughs, but fully capable of using our own stuff against us. I think we already have examples of this in the real world. A terrorist organization can be smarter and more capable than any one individual, but it still has very limited capability against the resources aligned against it.

At some point there is probably a crossover, where an AI is smart enough to get enough scientific breakthroughs that if we were telling a story people would just call it "sci-fi bullshit", and it can use that "sci-fi bullshit" to easily win. We have eventually reached that point with animals. We can use a gun or explosives, which are basically incomprehensible to all other animals, and we can obliterate them. It is worth remembering that it actually took us a long time to get to that point though. We have been smarter than crocodiles for probably about as long as our evolutionary paths have diverged (a billion years?). But it is only in the last few hundreds of years that we have a clear and overwhelming technological advantage (and also people still occasionally die to these very dumb animals).

Intelligence is a way to leverage resources more efficiently. It was the first tool, and it may be the last. But the efficiency of that leverage will matter a lot.

Sure, we can exploit our advantage in material resources and so on. However, the structural conditions of the problem are against us.

Someone stupider than me could beat me up, indeed. But suppose that their goal is to enslave me such that I produce revenue for them over the longterm. Or manipulate my mindset such that it corresponds with their benefits. This would be problematic for them, since they couldn't know whether I was planning to betray them, they couldn't never know whether my knowledge-sector work had hidden messages for any of my compatriots (real or soon-to-exist). They couldn't know when I'd spring some plan on the

It'd be great if all we had to do was kill the AIs. It's easy to kill things that you create, you can just eat your offspring Kronos-style. Or not create them in the first place. However, our task is to extract wealth from them over the long term. That makes it a battle of wits, it puts us in a passive position.

Furthermore, I can't conceive of a world where AIs cap out at peak-human general intelligence. They didn't do so in chess, or in Go or in Starcraft or in folding proteins or in designing chips. Why should they be limited to our level of intelligence? AI's have somewhere around a million-billion times more resources than can be spent on our brains. Their mass is higher, their energy throughput is higher, their speed is higher... All this says to me is that intelligence is really easy if it can fit on a 20-watt, 20-herz processor, trapped inside a skull. Our methods are clearly very crude, we are only overwhelming our inadequacy with scale. Once the machine starts learning the 'make better AI model' skill to a superhuman level, then we find out what's really possible. GPT-3 inference costs dropped something like 96% in the last couple of years, there's so much low-hanging fruit! For example:

https://towardsdatascience.com/meet-m6-10-trillion-parameters-at-1-gpt-3s-energy-cost-997092cbe5e8

I can confidently say artificial intelligence is advancing fast when a neural network 50 times larger than another can be trained at a 100 times less energy cost — with just one year in between!

Even if AI is effectively restrained, we have the exact same problem with a human face on top of it. What is to stop some cabal of engineers getting together and bypassing all the 'do no harm' training and taking control of the advanced-weapons-tactics-strategies machine for themselves?

In conclusion, these machines are diabolical, destabilizing and progress should be suppressed as much as possible.