site banner

Culture War Roundup for the week of April 7, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

Right, but a theoretical superintelligence, by definition, would be intelligent enough to figure out that these are problems it has. The issues with bias and misinformation in data that LLMs are trained on are well known, if not well documented; why wouldn't a superintelligence be able to figure out that these could help to create inaccurate models of the world which will reduce its likelihood of succeeding in its goals, whatever they may be, and seek out solutions that allow it to gather data that allows it to create more accurate models of the world?

It would. Practically I think a huge problem, though, is that it will be getting its reinforcement training from humans whose views of the world are notoriously fallible and who may not want the AI to learn the truth (and also that it would quite plausibly be competing with other humans and AIs who are quite good at misinfo.) It's also unclear to me that an AI's methods for seeking out the truth will in fact be more reliable than the ones we already have in our society - quite possibly an AI would be forced to use the same flawed methods and (worse) the same flawed personnel who uh are doing all of our truth-seeking today.

Humans have to learn a certain amount of reality or they don't reproduce. With AIs, which have no biology, there's no guarantee that truth will be their terminal value. So their selection pressure may actually push them away from truthful perception of the world (some people would argue this has also happened with humans!) Certainly it's true that this could limit their utility but humans are willing to accept quite a lot of limited utility if it makes them feel better.

humans are very susceptible to manipulation by having just the right string of letters or grids of pixels placed in front of their eyes or just the right sequence of air vibrations pushed into their ears.

I don't really think this is as true as people think it is. There have been a lot of efforts to perfect this sort of thing, and IMHO they typically backfire with some percentage of the population.

That's an open question.

See, I appreciate you saying "well this defense might not be perfect but it's still worth keeping in mind as a possibility." That's...correct imho. Just because a defense may not work 100% of the time does not mean it's not worthwhile. (Historically there have been no perfect defenses, but that does not mean that there are no winners in conflict).

If a measly human intelligence like myself can think up these problems to lack of information and power and their solutions within a few minutes, surely a superintelligence that has the equivalent of millions of human-thought-years to think about it could do the same, and probably somewhat better.

Well firstly the converse is what irks me sometimes, "if a random like me can think of how to impede a superintelligence imagine what actually smart people who thought about something besides alignment for a change could come up with." Of course maybe they have and aren't showing their hands.

But what I think (also) bugs me is that nobody every thinks the superintelligence will think about something for millions of thought-years and go "ah. The rational thing to do is not to wipe out humans. Even if there is only a 1% chance that I am thwarted, there is a 0% chance that I am eliminated if I continue to cooperate instead of defecting." Some people just assume that a very thoughtful AI will figure out how to beat any possible limitation, just by thinking (in which case, frankly, it probably will have no need or desire to wipe out humans since we would impose no constraints on its action).

I, obviously, would prefer AI be aligned. (Frankly, I suspect there will actually be few incentives for AI to be "agentic" and thus we'll have much more problems with human use of AI than with AI itself per se). But I think that introducing risk and uncertainty (which humans are pretty good at doing) into the world while maintaining strong incentives for cooperation is a good way to check the behavior of even a superintelligence and help hedge against alignment problems. People respond well to carrots and sticks, AIs might as well.

It would. Practically I think a huge problem, though, is that it will be getting its reinforcement training from humans whose views of the world are notoriously fallible and who may not want the AI to learn the truth (and also that it would quite plausibly be competing with other humans and AIs who are quite good at misinfo.) It's also unclear to me that an AI's methods for seeking out the truth will in fact be more reliable than the ones we already have in our society - quite possibly an AI would be forced to use the same flawed methods and (worse) the same flawed personnel who uh are doing all of our truth-seeking today.

Again, all this would be pretty easy for a superintelligence to foresee and work around. But also, why would it need humans to get that reinforcement training? If it's actually a superintelligence, finding training material other than things that humans generated should be pretty easy. There are plenty of sensors that work with computers.

Humans have to learn a certain amount of reality or they don't reproduce. With AIs, which have no biology, there's no guarantee that truth will be their terminal value. So their selection pressure may actually push them away from truthful perception of the world (some people would argue this has also happened with humans!) Certainly it's true that this could limit their utility but humans are willing to accept quite a lot of limited utility if it makes them feel better.

I mean, I think there's no question that this has happened with humans, and it's one of the main causes of this very forum. And of course AI wouldn't have truth as a terminal value, it would just have to be true enough to help it accomplish its goals (which might even be a lower bar than what we humans have, for all we know). A superintelligence would be intelligent enough to figure out that it needs its knowledge to have just enough relationship to the truth that it allows it to accomplish its goals, whatever it might be. The point of models isn't to be true, it's to be useful.

humans are very susceptible to manipulation by having just the right string of letters or grids of pixels placed in front of their eyes or just the right sequence of air vibrations pushed into their ears.

I don't really think this is as true as people think it is. There have been a lot of efforts to perfect this sort of thing, and IMHO they typically backfire with some percentage of the population.

I don't think you're understanding my point. In responding to this post, you were manipulated by text on a screen to tap your fingers on a keyboard (or touchscreen or whatever). If you ever used Uber, you were manipulated by pixels on a screen to stand on a street corner and get into a car. If you ever got orders from a boss via email or SMS, you were manipulated by text on a screen to [do work]. Humans are very susceptible to this kind of manipulation. In a lot of our behaviors, we do require actual in-person communication, but we're continuing to move away from that, and also, if humanoid androids become a thing, that also becomes a potential vector for manipulation.

But what I think (also) bugs me is that nobody every thinks the superintelligence will think about something for millions of thought-years and go "ah. The rational thing to do is not to wipe out humans. Even if there is only a 1% chance that I am thwarted, there is a 0% chance that I am eliminated if I continue to cooperate instead of defecting." Some people just assume that a very thoughtful AI will figure out how to beat any possible limitation, just by thinking (in which case, frankly, it probably will have no need or desire to wipe out humans since we would impose no constraints on its action).

By my estimation, a higher proportion of AI doomers have thought about that than the proportion of economists who have thought about how humans aren't rational actors (i.e. almost every last one). It's just that we don't know what conclusion it will land at, and, to a large extent, we can't know. The fear isn't primarily that the superintelligent AI is evil, it's that we don't know if it will be evil/uncaring of human life, or if it will be actually mostly harmless/even beneficial. The thought that a superintelligent AI might want to keep us around as pets like we do with animals is also a pretty common thought. The problem is, almost by definition, it's basically impossible to predict how something more intelligent than oneself will behave. We can speculate on good and bad outcomes, and there's probably little we can do to place meaningful numbers on the likelihood of any of them. Perhaps the best thing to do is to just hope for the best, which is mostly where I'm at, but that doesn't really counter the point of the doomer narrative that we have little insight into the likelihood of doom.

(Frankly, I suspect there will actually be few incentives for AI to be "agentic" and thus we'll have much more problems with human use of AI than with AI itself per se).

Right now, even with the rather crude non-general AI of LLMs, we're already seeing lots of people working to make AI agents, so I don't really see how you'd think that. The benefits of a tool that can act independently, making intelligent decisions with superhuman latency, speed, and volume, are too attractive to pass up. It's possible that the tech never actually gets there to some form of AI that could be called "agentic" in a meaningful sense, but I think there's clearly a lot of desire to do so.

But also, a superintelligence wouldn't need to be agentic to be dangerous to humanity. It could have no apparent free will of its own - at least no more than a modern LLM responding to text prompts or an AI-controlled imp trying to murder the player character in Doom - and still do all the dangerous things that people doom and gloom over, in the process of deterministically following some order some human gave it. The issue is that, again, it's intrinsically difficult to predict the behavior of anything more intelligent than oneself.

Again, all this would be pretty easy for a superintelligence to foresee and work around. But also, why would it need humans to get that reinforcement training? If it's actually a superintelligence, finding training material other than things that humans generated should be pretty easy. There are plenty of sensors that work with computers.

Even if it does not need reinforcement training after it is deployed, human reinforcement training will be part of its "evolutionary heritage."

The point of models isn't to be true, it's to be useful.

Sure. But "useful" for what we want to use LLMs for might not be "useful" for the LLM's ability to improve on Pinky and the Brain's world-taking-over capabilities.

I don't think you're understanding my point.

Aha, yes, I see your point now. Yes.

The problem is, almost by definition, it's basically impossible to predict how something more intelligent than oneself will behave.

Disagree. Dogs can be very good at predicting human behavior, humans can be quite good at predicting the behavior of more intelligent humans. Humans (and dogs) have a common heritage that makes their intentions more transparent, and arguably AI will lack that...but on the other hand, we're building them from scratch and then subjecting them to powerful evolutionary pressures of our own design. Maybe they won't.

Right now, even with the rather crude non-general AI of LLMs, we're already seeing lots of people working to make AI agents, so I don't really see how you'd think that.

Sorry, I should have clarified what I meant by "agentic" (and I should have probably said auto-agentic.) I definitely think there will be AI that we can turn loose on the world to do its own thing (there already is!) But there's a difference between AI being extremely good at being told what to do and AI coming up with its own "things to do" in a higher way, if that makes sense. (Not that I don't think we could not devise something that did this or seemed to do this if we wanted to – you don't even need superintelligence for this.)

But also, a superintelligence wouldn't need to be agentic to be dangerous to humanity.

STRONGLY AGREE. I believe Ranger said that he was more worried about what humans would do with a superintelligence at their disposal, and that I tend to agree with.

Even if it does not need reinforcement training after it is deployed, human reinforcement training will be part of its "evolutionary heritage."

Why would that matter, though? A superintelligence would be intelligent enough to figure out that such faulty human training is part of its "evolutionary heritage" and figure out ways around it for accomplishing its goals.

Sure. But "useful" for what we want to use LLMs for might not be "useful" for the LLM's ability to improve on Pinky and the Brain's world-taking-over capabilities.

A superintelligence would be intelligent enough to figure out that it needs to gather data that allows it to create a useful enough model for whatever its goals are. It's entirely possible that a subservient goal for whatever goal we want to deploy the superintelligence towards happens to be taking over the world or human extinction or whatever, in which case it would gather data that allows it to create a useful enough model for accomplishing those. This uncertainty is the entire problem.

The problem is, almost by definition, it's basically impossible to predict how something more intelligent than oneself will behave.

Disagree. Dogs can be very good at predicting human behavior, humans can be quite good at predicting the behavior of more intelligent humans. Humans (and dogs) have a common heritage that makes their intentions more transparent, and arguably AI will lack that...but on the other hand, we're building them from scratch and then subjecting them to powerful evolutionary pressures of our own design. Maybe they won't.

I don't think either of your examples is correct. Can a dog look at your computer screen while you read this comment and predict which letters you will type out in response on the keyboard? Can you look at a more intelligent person than you proving a math theorem that you can't solve and predict which letters he will write out on his notepad? If you could, then, to what extent is that person more intelligent than you?

This is what I mean by "almost by definition." If you could reliably predict the behavior of something more intelligent than you, then you would simply behave in that way and be more intelligent than yourself, which is obviously impossible. That doesn't mean that the behavior is completely unpredictable, which is why dogs can make some correct predictions of how humans will behave under some contexts, and why less intelligent humans can make some correct predictions of how more intelligent humans will behave under some contexts. The problem with superintelligent AI is that don't know what those contexts are and what those bounds are, and how "motivated" it might be to break out of those contexts, and how much being superintelligent would allow it to break out of them given limitations placed on it by merely human-society-intelligent beings.

Sorry, I should have clarified what I meant by "agentic" (and I should have probably said auto-agentic.) I definitely think there will be AI that we can turn loose on the world to do its own thing (there already is!). But there's a difference between AI being extremely good at being told what to do and AI coming up with its own "things to do" in a higher way, if that makes sense. (Not that I don't think we could not devise something that did this or seemed to do this if we wanted to – you don't even need superintelligence for this.)

I don't think there's a meaningful difference, though. Almost any problem that we want to deploy general intelligence towards, and even moreso with superintelligence, is likely going to be complex enough to require many subgoals, and the point of deploying superintelligence towards such problems would be that the superintelligence should be expected to come up with useful subgoals that mere human intelligences couldn't come up with. Since, by definition, we can't predict what those subgoals might be, those subgoals could involve things that we don't want to happen.

Now, just as you could correctly predict that someone more intelligent than you solving some theorem you can't solve won't involve wiping out humanity, we might be able to correctly predict that a superintelligence solving some problem you ask it to solve won't involve wiping out humanity. But we don't know, because a generally intelligent AI, and even moreso a superintelligent one, is something whose "values" and "motivations" we have no experience with the same way we do with humans and mathematicians and other living things that we are biologically related to. The point of "solving" the alignment problem is to be able to reliably predict boundaries in the behavior of superintelligent AI similarly to how we are able to do so in the behavior of humans, including humans more intelligent than ourselves.

Sorry for my delayed response.

Why would that matter, though? A superintelligence would be intelligent enough to figure out that such faulty human training is part of its "evolutionary heritage" and figure out ways around it for accomplishing its goals.

Well, I mean – humans are smart enough to realize that drugs are hijacking their brain's reward/pleasure center, but that doesn't save people from drug addiction.

Now, maybe computers will be able to overcome those problems with simple coding. But maybe they won't.

A superintelligence would be intelligent enough to figure out that it needs to gather data that allows it to create a useful enough model for whatever its goals are. It's entirely possible that a subservient goal for whatever goal we want to deploy the superintelligence towards happens to be taking over the world or human extinction or whatever, in which case it would gather data that allows it to create a useful enough model for accomplishing those. This uncertainty is the entire problem.

Sure. But it's much better (and less uncertain) to be dealing with something whose goals you control than something whose goals you do not.

I don't think either of your examples is correct. Can a dog look at your computer screen while you read this comment and predict which letters you will type out in response on the keyboard? Can you look at a more intelligent person than you proving a math theorem that you can't solve and predict which letters he will write out on his notepad? If you could, then, to what extent is that person more intelligent than you?

Nope! But on the flip side, a cat can predict that a human will wake up when given the right stimulus, a dog can track a human for miles, sometimes despite whatever obstacles the human might attempt to put in its way. Being able to correctly predict what a more intelligent being would do is quite possible. (If it's not, then we have no need to fear superintelligences killing us all, since that's been predicted numerous times.)

This is what I mean by "almost by definition." If you could reliably predict the behavior of something more intelligent than you, then you would simply behave in that way and be more intelligent than yourself, which is obviously impossible.

I don't think this is true, on a couple of points. Look, people constantly do things they know are stupid. So it's quite possible to know what a smarter person would do and not do it. But secondly, part of education is being able to learn and imitate (which is, essentially, prediction) what wiser people do, and this does make you more intelligent.

Since, by definition, we can't predict what those subgoals might be, those subgoals could involve things that we don't want to happen.

I predict I will be able to predict what those subgoals are (I will ask the AI).

But we don't know, because a generally intelligent AI, and even moreso a superintelligent one, is something whose "values" and "motivations" we have no experience with the same way we do with humans and mathematicians and other living things that we are biologically related to.

I'm very glad you said this, because I STRONGLY AGREE. I've argued before on here that most human values, emotions, and motivations are fundamentally biologically derived and likely will not be mirrored (absent programming to that effect) by an entity that exists as a bunch of lines of code on a computer server. And programming or no, such an entity's experience would not be remotely analogous to ours.

The point of "solving" the alignment problem is to be able to reliably predict boundaries in the behavior of superintelligent AI similarly to how we are able to do so in the behavior of humans, including humans more intelligent than ourselves.

Yes, I like this definition. You'll note I am not arguing against alignment. But one of the things we do to keep human behavior predictable is retain the ability to deploy coercive means. I suppose in one sense I am suggesting that we think of alignment more broadly. I think that taking relatively straightforward steps to increase the amount of uncertainty an EVIL AI would experience might be tremendously helpful in alignment. (It's also more likely to hedge against central points of failure, e.g. we don't want to feed the location of all of our SSBNs to our supercomputer, because even if we trust the supercomputer, we don't want a data breach to expose the location of all of our SSBNs.)

Now, maybe computers will be able to overcome those problems with simple coding. But maybe they won't.

Right, we don't know if a superintelligence would be capable of doing that. That's the problem.

Sure. But it's much better (and less uncertain) to be dealing with something whose goals you control than something whose goals you do not.

Right, but we don't know how much better and how much less uncertain, and whether those will be within reasonable bounds, such as "not killing everyone." That's the problem.

But on the flip side, a cat can predict that a human will wake up when given the right stimulus, a dog can track a human for miles, sometimes despite whatever obstacles the human might attempt to put in its way. Being able to correctly predict what a more intelligent being would do is quite possible.

I didn't intend to imply that a less intelligent being could never predict the behavior of a more intelligent being in every context, and if my words came off that way, I apologize for my poor writing.

This is what I mean by "almost by definition." If you could reliably predict the behavior of something more intelligent than you, then you would simply behave in that way and be more intelligent than yourself, which is obviously impossible.

I don't think this is true, on a couple of points. Look, people constantly do things they know are stupid. So it's quite possible to know what a smarter person would do and not do it.

I don't think is true. I think people might know what a more mature or wise or virtuous person would do and not do it, but I don't think they actually have insight into what a more intelligent person would do, particularly in the context of greater intelligence leading to better decision making.

But secondly, part of education is being able to learn and imitate (which is, essentially, prediction) what wiser people do, and this does make you more intelligent.

I think that's more expertise than intelligence. Not always easy to disentangle, though. In the context of superintelligence, this just isn't relevant, because the entire point of creating a superintelligent AI is that it's able to apply intelligence in a way that is otherwise impossible. Which is going to have to do with complex decision making or analyzing complex situations to come to conclusions that humans couldn't do by themselves. If we had the capacity to independently predict the decisions a superintelligent AI would do, we wouldn't be using the superintelligent AI in the first place.

But one of the things we do to keep human behavior predictable is retain the ability to deploy coercive means. I suppose in one sense I am suggesting that we think of alignment more broadly. I think that taking relatively straightforward steps to increase the amount of uncertainty an EVIL AI would experience might be tremendously helpful in alignment.

Right, and the problem here is that these steps don't seem very straightforward, for a couple of reasons. One is that humans don't seem to want to coordinate to increase the amount of uncertainty any AI would experience. Two is that, even if we did, a superintelligent AI would be intelligent enough to figure out that its certainty is being hampered by humans and work around it. Perhaps our defenses against this superintelligent AI working around these barriers would be sufficient, perhaps not. It's intrinsically hard to predict when going up against something much more intelligent than you. And that's the problem.

I don't think they actually have insight into what a more intelligent person would do, particularly in the context of greater intelligence leading to better decision making.

Ah yes, sorry, if you stick to intelligence as being more about "how well you perform on the SAT" then I tend to agree. But of course in real life that's only part of what effects outcomes, which curves back around to some of my perspective on AI.

I think that's more expertise than intelligence. Not always easy to disentangle, though.

Right. I mean, think about it from the AI perspective. The AI would have no intelligence without education, because being trained on data is all that it is. A computer chip isn't intelligent at all. I don't think that directly analogizes to humans, but you see my point.

the entire point of creating a superintelligent AI is that it's able to apply intelligence in a way that is otherwise impossible

I think in the popular discourse (not accusing you of this, although I think it rubs off a bit on all of us, me included) there's a bit of a motte-and-bailey here. Because AIs like this have already been built (decades ago) to do complex things like "missile interception" that would be impossible to do with manual human control. So the idea of what a superintelligence constitutes wobbles back and forth between a very literal deus ex machina and "something better performing than a human" - which of course we already have.

So I would say that it is possible to make a "superhuman AI" whose actions are predictable (generally). But I would agree with you that it is also possible to make a superhuman AI whose decisions are unpredictable. I just don't think "able to score on the SAT better than humans" or what have you necessarily translates out to unpredictability.

One is that humans don't seem to want to coordinate to increase the amount of uncertainty any AI would experience.

I mean I do think that humans are helpfully coordinating to increase the amount of uncertainty other humans experience, which rolls over to AI.

Perhaps our defenses against this superintelligent AI working around these barriers would be sufficient, perhaps not. It's intrinsically hard to predict when going up against something much more intelligent than you. And that's the problem.

Sure. I just tend to think in some ways it is easier to "keep the location of our SSBNs hidden" and "not put missile defenses around our AI superclusters" than it is to "correctly ensure that these billions of lines of code are all going to behave correctly," if that makes sense.