site banner

Culture War Roundup for the week of March 3, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

7
Jump in the discussion.

No email address required.

No one is under the delusion that the "thinking" box reflects the actual underlying process by which the LLM generates the text that does the actual decision making. This is just like humans, where no one actually expects that the internal conscious thoughts that someone uses to think through some decision before arriving at a conclusion reflects the actual underlying process by which the human makes the decision. The "thinking" box is the equivalent of that conscious thought process that a human goes through before coming to the decision, and in both, the text there appears to influence the final decision.

It seems to me that there are at least three separate things here, if we consider the human example.

  1. The actual cause of a human's decision. This is often unconscious and not accurately known even by the person making the decision.

  2. The reasons a person will tell you that they made a decision, whether before or after the decision itself. This is often an explanation or rationalisation for an action made after the decision was taken, for invisible type-1 reasons.

  3. The action the person takes.

I would find it entirely unsurprising if you did a study with two groups, one of which you ask to make a decision, and the other of which you ask to explain the process by which they would make a decision and then subsequently make a decision, those two groups would show different decisions. Asking someone to reflect on a decision before they make it will influence their behaviour.

In the case of the LLMs with the thought boxes, my understanding was that we are interested in the LLM's 1, i.e. the actual reasons it takes particular actions, but that the box, at best, can only give you 2. (And just like a human's 2, the LLM's stated thought process is only unreliably connected, at best, to the actual decision-making process.)

I thought that what we were interested in was 1 - we want to know the real process so that we can shape or modify it to suit our needs. So I'm confused as to why, it seems to me, some commentators behave as if the thought box tells us anything relevant.

I thought that what we were interested in was 1 - we want to know the real process so that we can shape or modify it to suit our needs. So I'm confused as to why, it seems to me, some commentators behave as if the thought box tells us anything relevant.

I think all 3 are interesting in different ways, but in any case, I don't perceive commenters as exploring 1. Do you have any examples?

If we were talking about humans, for instance, we might say, "Joe used XYZ Pokemon against ABC Pokemon because he noticed that ABC has weakness to water, and XYZ has a water attack." This might also be what consciously went through Joe's mind before he pressed the buttons to make that happen. All that would be constrained entirely to 2. In order to get to 1, we'd need to discuss the physics of the neurons inside Joe's brain and how they were stimulated by the signals from his retina that were stimulated by the photons coming out of the computer screen which come from the pixels that represent Pokemons XYZ and ABC, etc. For an LLM, the analog would be... something to do with the weights in the model and the algorithms used to predict the next word based on the previous words (I don't know enough about how the models work beneath the hood to get deeper than that).

In both humans and LLMs, 1 would be more precise and accurate in a real sense, and 2 would be mostly ad hoc justifications. But 2 would still be interesting and also useful for predicting behavior.

The reasoning is produced organically by a reinforcement learning process to make the LLM perform well on problems (mostly maths and textbook questions). The model is rewarded for producing reasoning that tends to produce correct answers. At the very least, that suggests the contents of the thinking box are relevant to behaviour.