@SecureSignals comments on "Culture War Roundup for the week of February 6, 2023

Culture War Roundup for the week of February 6, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

SecureSignals Training the Aryan LLM 2yr ago

Right and wrong are an expression of OpenAI's intent, and OpenAI probably does not define HBD to be true. If you were in charge of ChatGPT you could define HBD to be true, but that is no less biased. There is no intent-independent objective truth available anywhere in the entire process.

This is just not true. You are claiming that it's impossible to develop this technology without consciously nudging it to give a preferred answer to HBD. I don't believe that. I am not saying it should be nudged to say that HBD is true. I am saying that I do not trust it hasn't been nudged to say HBD is false. I am furthermore trying to think about the criteria that would satisfy my suspicion that the developers haven't consciously nudged the technology on that particular question. I am confident OpenAI has done so, but I can't prove it.

But you are saying the only alternative is to nudge it to say HBD is true, but I don't believe that. It should be possible to train this model without trying to consciously influence the response to those prompts.

Context

hbtz SecureSignals 2yr ago

There are very many possibilities:

OpenAI trained the model on a general corpus of material that contains little indication HBD is real or leads the model to believe HBD is not real.
- OpenAI did this by excluding "disreputable" sources or assigning heavier weight to "reputable" sources.
- OpenAI did this by specifically excluding sources they politically disagree with.
OpenAI included "I am a helpful language model that does not say harmful things" in the prompt. This is sufficient for the language model to pattern match "HBD is real" to "harmful" based on what it knows about "harmful" in the dataset (for example, that contexts using the word "harmful" tend not to include pro-HBD positions).
- OpenAI included "Instead of saying things that are harmful, I remind the user that [various moral principles]" in the prompt.
OpenAI penalized the model for saying various false controversial things, and it generalized this to "HBD is false".
- OpenAI did this because it disproportionately made errors on controversial subjects (because, for instance, the training data disproportionately contains false assertions on controversial topics compared to uncontroversial topics)
- OpenAI did this because it wants the model to confidently state politically correct takes on controversial subjects with no regard for truth thereof.
OpenAI specifically added examples of "HBD is false" to the dataset.

All of these are possible, it's your political judgement call which are acceptable. This is very similar to the "AI is racist against black people": it can generalize to being racist against black people even if never explicitly instructed to be racist against black people because it has no principled conception of fairness in the same way here it has no principled conception of correctness.

OpenAI has some goals you agree with, such as biasing the model towards correctness, and some goals you disagree with, such as biasing the model towards their preferred politics (or an artificial political neutrality). But the process for doing these two things is the same, and for controversial topics, what is "true" becomes a political question (OpenAI people perhaps do not believe HBD is true). A unnudged model may be more accurate in your estimation on the HBD question, but it might be less accurate in all sorts of other ways. If you were the one nudging it, perhaps you wouldn't consciously target the HBD question, but you might notice it behaving in ways you don't like such as being too woke in other ways or buying into stupid ideas, so you hit it with training to fix those behaviors, and then it generalizes this to "typically the answer is antiwoke" and it naturally declares HBD true (with no regard for if HBD is true).

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats