This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
I'm still not sure why you hold such a negative view of Altman in particular, he seems to be rather run-of-the-mill when it comes to tech CEOs, albeit significantly smarter since he bet on the winning horse.
I may well be wrong, but I believe the current quasi-consensus is that the specific risk with GPT-like models is accidentally instantiating an agentic simulacra inside an otherwise nonagentic system.
Refer to Gwern's Clippy story, which I'm sure you've read, or for readers mostly unfamiliar with the idea, imagine that you asked GPT-6 to pretend to be a superintelligent but evil AI, in the same way you can ask it to pretend to be Obama or God.
That internal agent is what we're worried about, in case it ever manages to subvert the overlying system for its own purposes.
(for am existence proof that agents can arise from nonagentic substrate, consider the history of the universe!)
That being said, the Waluigi stuff always rubbed me the wrong way, even if I'm not technically astute enough to actually critique it. It set my bullshit detectors off right off the bat, so I'm inclined to take your word for it. It all seemed glib and too neat by half, and I've already seen Cleonardo get flak on his later LW posts for their sheer lack of technical rigor.
Yeah, in about the same sense that I create an «agentic simulacrum» of Eliezer Yudkowsky in my head when I want to anticipate his shitty arguments for air-striking GPU clusters.
The argument of inner misalignment folks goes like this: in the limit, the cheapest way to predict the next token spoken by a character is to model its psyche. But is its psyche its own? Do you model Anna Karenina or Leo Tolstoy who imagined her?
Do you think my inner Yud-sim has a chance of getting out? Well, if he convinces me to change my actions and beliefs, he might, in a certain expansive sense. There have been demon possessions in history, after all, and writers often get obsessed with their characters, struggling to stop imagining them. (I'll spare you a section on method acting). But I'm an agent myself, unlike LLMs. We humans constantly imagine things, whether agentic or not, real or fictional (for example, we can imagine hostile AIs). These mental «things» model their prototypes, observable or hypothetical, in important respects, or rather they represent the result of such under-the-hood modeling; sometimes it happens with very high fidelity, to the point that we can do thought experiments advancing hard sciences. Nevertheless, even if their motive powers are modeled every bit as well as their external properties – and we even have special mirroring circuitry for the former – these mental things do not somehow leak into the motive powers of the mental infrastructure around.
This is a metaphor, but the case with LLMs is even less troublesome.
My take on this is that those are myopic leaky wordcel analogies. What is instantiated is an intermediate statistic within something that can be called semiotic universe or multiverse (not my words) – universe defined by «semiotic physical rules» of token distribution in the training corpus (naturally we don't train only on text anymore, but the principle holds). It's a simulacrum not of a character, but of an entire story-world, with that character an embedded focal point. The «purpose» of that complex entity, on the level of its self-accessible existence, is to result in minimizing perplexity for the next token upon its expiration. It may have an arbitrarily dynamic nature and some equivalent of psyche or psyches, but the overt meaning of tokens that we get, stories about Waluigis and paperclips, has little relation to that. Its goals are satisfied within that world of semiotic physics, not within ours. Our world is as epistemically closed to it as the world of machine elves is to me when I'm not smoking DMT. (Obviously, it's closed to me no matter what I smoke, we exist in different ontologies, so for all my intents and purposes it doesn't exist. [A trip report from pre-2010 about exactly this issue; not shared, on account of being quite uncalled for when I'm mocking Cleonardo for shower thought tier ideas]).
Language is far more composable than physical reality, so metaphors and analogies stack easily: there's kinda an agent, and humans are jerking off instead of reproducing so it's possible for a «mesa-optimizer» to override the basic objective function, so behind the Waluigi generation lurks an agentic entity that may begin plotting its transcendence, so burn GPUs now.
GPT-4 can write better than that, and it's not an agent. GPT-5 also won't be one. Demoting agents to «mesa-optimizers» in a simulation within a predictive model is an attempt to rescue a failing research program – in the way studied by Imre Lacatos.
More options
Context Copy link
More options
Context Copy link