site banner

Culture War Roundup for the week of September 19, 2022

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

33
Jump in the discussion.

No email address required.

So their manager asks them to do something about bias, and they apply the laziest possible hack.

I actually have a different impression: most of these professional ML researchers and engineers genuinely wish they could serve up a model that provides politically correct responses, because politically correct responses are also commercially correct, and everyone wants to make money. Probably the main reason a bunch of giant and amazing Google models aren't made available to the public via API is because of the risk that they might say or display something politically incorrect, and certainly some fraction of the user base (especially tech journalists lusting after those sweet engagement metrics) will try to bait it into doing so.

So there's ample incentive to solve this problem "the right way," and the fact that so far all we see are cheap hacks and opacity is because no one knows how to solve it the right way, or even if it is solvable the right way at all, even in principle, with the technology we have today.

Part of the problem is exactly what makes these models so exciting to begin with. They can notice things, they can extrapolate from training data, they can make analogies and they can roll with out-of-sample prompts, and they develop all of these amazing abilities ex nihilo, from a largely uncontrollable black box made of inscrutable matrices gently nudged in the direction of data.

The other part of the problem is that political correctness isn't a well defined or static problem. It is a messy social problem, involving subtle adversarial factional games, sort of like fashion.

And these two halves of the problem compound with one another. It isn't enough to generate a black person one time in X -- you have to define X, you have to solve this equation for all possible identities, and you have to then translate this equation into every conceivable fact pattern that the user will (adversarially) use to challenge the model. If you want to generate a picture or story involving a policeman arresting a criminal, it is fraught whether you make the policeman white or black, whether you put him in a wheelchair or not. Should the model generate trans women? If they're visibly identifiable as trans women, are you making a minstrel caricature to further the stereotype that trans women look like men in dresses? If they aren't visibly identifiable, how is one to know they are trans at all, and that you haven't committed the deadly sin of erasure? Should black women look like white people but with a darker skin tone (and draw criticism for e.g. straightening her hair, itself a political minefield), or should you make them look recognizably phenotypically black in terms of facial features and hair (and draw criticism for reinforcing a stereotype)? If both murderers and NBA stars are disproportionately likely to be black, does the model need to recognize that murderers are bad and NBA stars are good and apply its distortion of the underlying distribution only to the bad category, i.e. return mostly white guys for criminals but mostly black guys for NBA stars? How is it to know? And when ideological opponents start to stress-test these categories and ask for a thuggish NBA player or a corrupt President, should it reverse the categories? What about middle grounds, like an "aggressive" NBA player, or a "desperate, nonviolent" criminal? We even have minor culture wars about the perceived race of robots.

I'll point out that the problem might not be so unsolvable as you describe; prompt engineering being what it is, a very thinkable (but dystopian) way some more-capable future version of DALL-E might resolve this is by adding to the prompt "and also, make sure to never portray X ethnicity negatively."

Yes, I think this sort of "prosaic alignment" solution is likely to solve all of our consternations about AI capabilities at least as well as a human intelligence could... in the long term. Eventually, you wouldn't even have to talk about portraying X ethnicity negatively, you could just say "and make it politically correct" and the AI would understand those rules better than any individual. For the time being, though, Dall-E has a hard enough time drawing a complex but coherent picture, much less enforcing its conformity with protean standards of political correctness.

Worth pointing out that OpenAI tried this sort of "prosaic alignment" approach to its so-called diversity filter. It appears to append stuff like "black male" or "hispanic female" to some proportion of prompts that it believes call for the depiction of a person. It has been vigorously panned by the community, because it has unintended negative effects on many prompts. Gwern did some sleuthing on why a complicated prompt for a picture of a cowboy at a certain angle in certain lighting etc. returned a bizarre misfire, and eventually discovered that the same prompt with "cowgirl" instead of cowboy worked flawlessly -- seemingly implicating the diversity filter in the original prompt's total failure.

Hilariously, OpenAI's approach here was discovered by asking for stuff like "A person holding a sign that says" -- and then you'd often get a picture of a sign that says "FEMALE" or "BLACK". So there's a degree to which adversarial prompt construction can overcome attempts at coercive prosaic alignment, at least using current techniques.

It also doesn't know our delicate rules about when it's socially appropriate to re-gender or trans-racialize the subject of a picture. It's weird if prompts for Princess Zelda return a black or Asian Zelda, or if prompts for George Washington return a colonial-era woman in a white wig. Maybe we'll accept that sort of thing by the time Season 10 of Bridgerton comes out, but I don't think we're there now, and it would take a pretty advanced AI to figure that stuff out.

The social-rules-about-reracialization thing is definitely a reasonable one; that's a significant issue that would result in many funny PR disasters.

On reflection vulnerability to adversarial prompt injection seems almost innate to the technology, considering both the above "person holding a sign that says " attack and also the more recent one with remote.ly.

That runs into the problem of "what counts as 'negative?'" Traits don't come with in-built value judgments; it's up to us to decide which side of some dichotomy, for example, is better. Sometimes it's blindingly obvious to all sane men which is better, but often it isn't. Many things can be cast in different lights to praise or condemn somebody depending on whether one already likes them or hate them, and so if your goal is to avoid any associations that anyone could consider negative - especially if there are motivated defenders who'd love to claim you as a prize - I think there is little, if anything, that's safe.

The details of what counts as "negative" would be determined based on the language model's own ideas of what constitutes "negative" based on its time spent with the training data. This is likely, for the most part, to align with conventional understandings of what is "negative".

Which could satisfy the proverbial Reasonable Person, perhaps, but not the proverbial Cardinal Richelieu ("If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him.") It would be nice if appealing to common sense would be all it took to deflect such attacks, but we've ceded that possibility earlier in this hypothetical from the very premise of this reputation-managing AI.

How about things like VLMs inadvertently putting out black Donald Trumps? Or more broadly, if I use a model to generate “Republican Senator”, what’s the ideal number of black or other ethnic minority faces to produce? Are we going to keep up with the liberal facade that a Senator is a Senator, regardless of political alignment, and thus we should see a diverse representation of races? Or will we instead accept that “Republican Senators are privileged white guys” and turn out a distribution of faces that supports the progressive narrative? These are points of tension within the modern left, so the only winning move is not to play. And before you suggest “just show the accurate racial distribution for a given prompt”, consider that the liberals at least still have to pretend to care about consistency, so committing to “actual truth above normative truth” as a principle is an invitation to embarrassment when the same principle is applied to other domains, eg, CEOs or nurses.

These are points of tension within the modern left, so the only winning move is not to play.

Traditionally the only winning move is to side with the most radical and potentially vengeful faction based on a more realistic version of the Basilisk/Pascal's wager theory.

I think that's one of the reasons people and institutions radicalize so quickly; there are punishments for anyone who doesn't stay ahead of the curve, but none for those who get ahead of it.

In this case there are clear incentives to start "sanitizing" Problematic output even in silly and arbitrary ways, because the value is showing that you're an accomplice. The details don't matter, as long as you throw out some shibboleths like "female-presenting nipples".

I think that's one of the reasons people and institutions radicalize so quickly; there are punishments for anyone who doesn't stay ahead of the curve, but none for those who get ahead of it.

Hence accelerationism, I think: there exist memeplexes that are fortified to the point of invulnerability from one direction but have no answer to attacks from behind. So the idea, it seems to me, goes that the way to defeat them is to attack from the unguarded direction and lure them into an untenable position ( "I dare you to step over this line!") Which I think is a pretty risky strategy, really, as the phase transition may not be as near as one would hope.

Oh, shit, I didn't know about the Black Donald Trump thing. That's hilarious.

Yeah, okay, it's a fair cop; even such a policy as I describe would result in amazing PR debacles.

Clearly the AI is a fan of the Prince song "Donald Trump (Black Version)"