site banner

Culture War Roundup for the week of March 17, 2025

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

I've always prided myself on my ability to stay at the bleeding edge of AI image gen.

As you'd expect, given my enthusiastic reporting on Google's public access to their new multimodal AI with image generation built in, I decided to spend a lot of time fooling around with it.

I was particularly interested in generating portrait photos of myself, mostly for the hell of it. Over on X, people have being (rightfully) lauding it as the second coming of Photoshop. Sure, if you go to the trouble of making a custom LORA for Stable Diffusion or Flux, you can generate as many synthetic images of yourself as your heart desires, but it is a bit of a PITA. Think access to a good GPU and dozens of pictures of yourself for best results, unless you use a paid service. Multimodal LLMs promise to be much easier, and more powerful/robust.

I spent a good several hours inputting the best existing photos I have of my face into it, and then asking it to output professionally taken "photos".

The good news:

It works.

The bad news:

It doesn't work very well.

I'm more than used to teething pains and figuring out how to get around the most common failure modes of AI. I made sure to use multiple different photos, at various angles, different hairstyles and outfits. It's productive to think of it as commissioning an artist online who doesn't know you very well, give them plenty to work with. I tried putting in a single picture. Two. Three. Five. Different combinations, many different prompts before I drew firm conclusions.

The results almost gave me body dysphoria. Not because I got unrealistically flattering ersatz-versions of myself, but quite the opposite.

The overwhelming majority of the fake SMHs could pass as my siblings or close cousins. Rough facial structure? Down pat, usually. There are aspects that run in the family.

Finer detail? Shudder. The doppelgangers are usually chubbier around the cheeks, and have a BMI several digits above mine. I don't have the best beard on the planet, but it's actually perfectly respectable. This bastard never made it entirely through puberty.

The teeth.. I've got a very nice set of pearly whites, and I've been asked multiple times by drunken Scotsmen and women if they're original or Turkish. These clones came from the discount knock-off machine that didn't offer dental warranties.

The errors boil down to:

  1. Close resemblance, but subtly incorrect ethnicities. Brown-skinned Indians are not made alike, I'm not Bihari or any breed of South Indian. Call it the narcissism of small differences if you must.

  2. Slightly mangled features as above.

  3. Tokenizer issues. The model doesn't map pixels to tokens 1:1 (that would be very expensive computationally), so fine details in a larger picture might be jarring on close inspection.

  4. Abysmal taste by default, compared to dedicated image models. Base Stable Diffusion 1.0 could do better in terms of aesthetics, Midjourney today has to be reined in from making people perfect.

  5. Each image takes up a few hundred tokens (the exact count is handily displayed). If a picture is a thousand words, then that's like working with a hundred. I suspect there is a lot of bucketing or collapse to the nearest person in the data set involved.

  6. It still isn't very good at targeted edits. Multiple passes on a face subtly warp it, and you haven't felt pain until you've asked it to reduce the (extra) buccal fat and then had it spit out some idiot who stuck his noggin into a bee hive.

If I had to pick images that could pass muster on close inspection, I'd be looking at maybe one in a hundred. Anyone who knows me would probably be able to tell at a glance that something was off.

People on X have been showing off their work, but I suspect that examples, such as grabbing a stock photo of a model and then reposing it with a new item in hand, only pass because we're seeing small N or cherry picked examples. I suspect the actual model in question could tell something was up.

Of course, this is a beta/preview. This is the worst the tech will ever be, complaints about AI fingers are suspiciously rare these days, aren't they?

I'm registering my bets that by the end of the year, the SOTA will have leapt miles forward. Most people will be able to generate AI profile pictures, flesh out their dating app bios, all the rest with ease and without leaving home. For the lazy, like me, great! For those who cling to their costly signals, they're about to get a lot cheaper, and quickly. This is Gemini 2.0 Flash, the cheap and cheerful model. We haven't seen what the far larger Pro model can manage.

(You're out of luck if you expect me to provide examples, I'm not about to doxx myself. If you want to try it, find an internet rando who is a non-celebrity, and see how well it fairs. For ideal results, it needs to be someone who isn't Internet Famous as the model will have a far better pre-existing understanding of their physiognomy. Uncanny resemblances abound, but they're uncanny.)

What does any of this have to do with the culture war? AIUI, this is the "culture war roundup" post, not a general "open thread" post; so this really belongs somewhere else.

The decline of the ability to take for granted that visual imagery, no matter how seemingly realistic, is a reflection of reality?

The inability of a cutting edge AI model to distinguish between subtle ethnic nuance?

Deepfakes? The collapse of consensus reality?

Tokenizer issues. The model doesn't map pixels to tokens 1:1 (that would be very expensive computationally), so fine details in a larger picture might be jarring on close inspection.

The codebook on this model must be absolutely tiny. They are probably trying to avoid bloating up the model vocabulary, but the quality is awful. Latent diffusion models don't use a quantizing vae but instead a continuous latent space, so they have an advantage there. In a sense the diffusion vae isn't compressing the image so much as reducing its dimensionality, but the VQ-VAE is doing some crazy compression.

The saying is that a picture is worth a thousand words, but the ratio is actually much more. I'm sure they'll get it right eventually, but it will be tricky to strike a balance between allocating model capacity to images vs text.

Anyways the problem will probably be solved through scaling up, just like how sora as impressive capabilities but micro-video models are next to worthless. (Of course sora is also useless but it's cool)

They are probably trying to avoid bloating up the model vocabulary

Image tokens should have no impact at all on the vocabulary size. I guess that they are doing the same as other multimodal models (input image is compressed, as you say probably using a VAE, but classically using a pretrained vision transformer), and let the ouput image tokens just be free. No need to quantize anything.

Fair warning to anyone inspired to pass off these images as your own:

They're watermarked. If you use AI Studio, they have a blue logo that's trivial to remove. But even so, including on the API, they're algorithmically watermarking outputs. It's almost certainly imperceptible to the naked eye, and resistant to common photo manipulation techniques after the fact.

If they're sharing with 3rd parties like Meta, expect Instagram to automatically throw up "AI generated" tags in the near future if it doesn't do so now. You can probably hedge your bets by editing or removing EXIF metadata, but don't say I didn't warn you.

The likes of Google and Meta are tolerating if not boosting the stream of clickbait slop that's being pumped onto the internet. Google lets pages from sites with millions of words blatantly copied off of chatgpt take the first result, and Scamazon is still printing and selling actual books with for actual money that contain the same. Zuccbook is putting AI slop images in my feed all the time.

These corps can nuke the vast majority of this with the most trivial of classifiers; this is even more obvious than nigerian prince scams. Yet they aren't even trying to take action. Yes I know it will be a cat and mouse game, but at least make them work for it.

watermarked

Of course we don't know what the watermark is, but if we did attacking it is usually easy. I haven't seen any hidden watermarks that can't be defeated easily by direct attack

Google lets pages from sites with millions of words blatantly copied off of chatgpt take the first result

Wait, really? I don't think I've ever seen this; certainly not in the first result. Do you possibly mean in the ads? If not, do you have an example search term?

Maybe I'm just easy to fool, but I honestly don't think I've been impacted by (text) AI slop at all. I imagine it fills out the bodies of those recipe blogs no one reads, but I've been skipping over those since they were all artisanally crafted slop. I'm reasonably confident almost all the fiction I read was written by a real person -- as far as I know, SOTA text gen still isn't able to maintain continuity over tens of thousands of words. Maybe as an assistant for editing or filling out short exchanges, but at that point I wouldn't really call it slop. (And, if it is good enough that I really can't tell, why should I care?) I'm certainly not buying bottom-of-the-barrel self help ebooks off Amazon, or whatever trendy topic people are generating books for.

Here's a few sites I personally reported to google, all of which were in the first page of the search results and often the top result. I have a log because reporting a site gives a confirmation email, but unfortunately it doesn't contain the search terms so I don't have those.

These are of course malicious sites so be careful

  • https://w asteremovalusa .com/blog/finishing-an-attic-without-a-permit/
  • https://i nnovair .com/can-i-convert-my-r22-to-r410a/
  • https://ww w.ncesc .com/can-a-person-eat-frog-eggs/
  • https://ce darparkroofingandwaterdamage .com/treehouse-cave-preserve/
  • https://pu ffy .com/blogs/best-sleep/does-the-dryer-kill-bed-bugs
  • https://m ynatureguard .com/blog/does-vicks-vaporub-keep-mosquitoes-away/
  • https://jus t-athletics .com/will-my-e-zpass-work-if-its-in-the-glove-box/
  • https://mo torandwheels .com/how-fast-tires-lose-air/
  • https://w ww.magestore .com/blog/how-do-store-alarms-work/

AI companies and governments seem far more concerned about the abuse of AI imagery/video than about text. This is an understandable stance, because people still haven't entirely recalibrated to not being able to trust clear, photorealistic imagery as we could within recent memory. It's not like photoshop hasn't been around for a while, but AI image slop is OOMs easier to mass produce.

I expect that Google, and especially OAI, are deeply concerned about being taken to task on the matter, even if I don't think they should be held liable for what users do with such broad tools, any more than I think Adobe needs to have its clay fired for political cartoons. There's been far more interest and pro-active effort in watermarking leading edge image gen as compared to mere words.

Of course we don't know what the watermark is, but if we did attacking it is usually easy. I haven't seen any hidden watermarks that can't be defeated easily by direct attack

For a sophisticated user? Certainly. But the tricks that only somewhat knowledgeable people might try, such as obvious transformations like cropping, rotating, scaling, compression or color shifting probably won't work.

If Google hashes all their images and saves that, there are perceptually lossless hashing techniques that are troublesome to remove and which resist rather major transforms. That's all over the place, particularly for CP detection. It is unclear to me, at the very least, what lengths I'd need to go through to make the risk of being caught out minimal.

I expect @DaseindustriesLtd would be the person to ask on that front.

This is just going to lead either to more in person dating or more ‘biometric verification’ using a combination of ID and facial scan / FaceID type data. The latter still leaves room for trickery, but that just makes the former more likely. I understand that among zoomers dating app usage is already falling off a cliff. People have apparently moved to Instagram, which is much messier, network-based and features the ability to see pictures of someone they didn’t curate.

Thanks to living in the genteel-authoritarianism that is Britain, I've made my peace with every app and their mother asking for biometrics and scanning my face.

Obvious fraud will be caught, and as is, you need to generate 20 pictures for 1 that'll pass to a casual onlooker, closer to a hundred to be imperceptible to someone who knows you.

People have apparently moved to Instagram, which is much messier, network-based and features the ability to see pictures of someone they didn’t curate.

I'm mildly annoyed that Google is being so laissez-faire about things and letting any idiot who asks into their dev preview. I'm no dev, but I was there years before it was cool. Expect everyone to know about this soonish, and adopt it faster than earlier AI image gen models.

Far more people want flattering photos for insta than want to pass off AI art as their own, which is currently the primary use case barring artistic expression and catfishing schemes. You don't even need to learn how to make a LORA or fine-tune a model, just supply a few pics and ask nicely.

Google infamously curates its results to be racially diverse to the detriment of accuracy, so I'm not surprised. Your real face was not sufficiently equitable according to the algorithm, so your physical appearance was adjusted to be in line with their code of conduct.

This is why every model that attempts to chase alignment or whatever arbitrary standard will be retarded in practice. If you punish your algorithm for being accurate, then it won't be accurate. (Surprise!) It won't give you 'accurate result with DEI characteristics': it will just shit itself and give you something terrible.

This is why I think Musk has an advantage in this field: he's not shooting his infant AGI in the knees by forcing it to crimestop

While I am 100% on board the Google hate train, I think this particular criticism is unfair. I believe what's happening here is just a limitation of current-gen multimodal LLMs - you have to lose fidelity in order to express a detailed image as a sequence of a few hundred tokens. Imagine having, say, 10 minutes to describe a person's photograph to an artist. Would that artist then be able to take your description and perfectly recreate the person's face? Doubtful; humans are HIGHLY specialized to detect minute details in faces.

Diffusion-based image generators have a lot of detail, but no real understanding of what the prompt text means. LLMs, by contrast, perfectly understand the text, but aren't capable of "seeing" (or generating) the image at the same fidelity as your eyes. So right now I think there's an unavoidable tradeoff. I expect this to vanish as we scale LLMs up further, but faces will probably be one of the last things to fall.

I wonder if, this year, there'll be workflows like: use an LLM to turn a detailed description of a scene into a picture, and then use inpainting with a diffusion model and a reference photo to fix the details...?

I wonder if, this year, there'll be workflows like: use an LLM to turn a detailed description of a scene into a picture, and then use inpainting with a diffusion model and a reference photo to fix the details...?

You can already do this, all of the pieces are there.

If I was willing to engage in a mild bout of Photoshopping, especially using its own AI generative fill and face restoration features, I'd go from 1 in 20 images being usable to closer to 1 in 10. I'm too lazy to bother at the moment, but it would be rather easy!

If I had to think of other easy ways to improve the success rate, using close-cropped images would be my go-to. Less distracting detail for the model. I could also take one of the horrors, crop it to just the face and shoulders, provide a reference image and ask it to transfer the details. I could then stitch it back together in most full-featured image editors.

It's a plus that right now, it's easier to just spam regenerate images. If the failure rate was significantly higher, that's how I'd get around it.

I must say that I don't quite agree with this take.

Google has definitely cooked themselves with ridiculous levels of prompt injecting with their initial Imagen released, as evidenced by people finding definitive evidence of the backend adding "person of color" or {random ethnicity that isn't white} to prompts that didn't specify that. That's what caused the Native American or African versions of "ancient English King" or literal Afro-Samurai.

They back-pedalled hard. And they're still doing so.

Over on Twitter, one of the project leads for Gemini, Logan Kilpatrick, is busy promising even fewer restrictions on image generation:

https://x.com/OfficialLoganK/status/1901312886418415855

Compared to what DALLE in ChatGPT will deign to allow, it's already a free for all. And they still think they can loosen the reigns further.

Google infamously curates its results to be racially diverse to the detriment of accuracy, so I'm not surprised. Your real face was not sufficiently equitable according to the algorithm, so your physical appearance was adjusted to be in line with their code of conduct.

You'd expect that a data-set that had more non-Caucasians in it would be better for me! Of course, if they chose to manifest their diversity by adding a billion black people versus a more realistic sampling of their user pool..

Even so, I don't ascribe these issues to malice, intentional or otherwise, on Google's part.

What strikes me as the biggest difference between current Gemini output and that of most dedicated image models is how raw they are. Unless you specifically prompt it, or append examples, they come out looking like a random picture on the internet. Very unstylized and natural, as opposed to DALLE's deep fried mode collapse, or Midjourney's so aesthetic it hurts approach.

This is probably a good thing. You want the model to be able to output any kind of image, and it can. The capability is there, it only needs a lot of user prompting, or in the future, tasteful finetuning. If done tastelessly, you get hyper-colorful plastinated DALLE slop. OAI seems to sandbag far more, keeping pictures just shy of photo-realism, or outright nerfing anime (and hentai, by extension).

This is why every model that attempts to chase alignment or whatever arbitrary standard will be retarded in practice. If you punish your algorithm for being accurate, then it won't be accurate. (Surprise!) It won't give you 'accurate result with DEI characteristics': it will just shit itself and give you something terrible.

This would be true if Google was up to such hijinks. I don't think they are, for reasons above. Gemini was probably trained on a massive, potentially uncurated data set. I expect they did the usual stuff like scraping out the CP in Laion's data set (unless they decided not to bother and mitigate that with filters before an image is released to the end user), and besides, they're Google, they have all of my photos on their cloud, and those of millions of others. And they certainly run all kinds of Bad Image detectors for anything you uncritically permit them to upload and examine.

That being said, everything points towards them training omnivorously.

OAI, for example, has explicitly said in their new Model Spec that they're allowing models to discuss and output culture war crime-think and Noticingâ„¢. However, the model will tend to withdraw to a far more neutral persona and only "state the facts" instead of its usual tendency to affirm the user. You can try this yourself with racial crime stats, it won't lie, and will connect the dots if you push it, while hedging along the way.

Grok, however, is a genuinely good model. It won't even suck up to Musk, and he owns the damn thing.

TLDR: Gemini's performance is more likely constrained by its very early nature, small model, tokenization glitches and unfiltered image set rather than DEI shenanigans.

I grudgingly concede to your argument but I must say they have earned considerable skepticism: they will have to iterate quite a few times before the hillarity of their first attempt will fade from my imagination.

By all means, remember their bullshit. I haven't forgotten either, and won't for a while. The saying "never attribute to malice what can be explained by stupidity" doesn't always hold true, so suspicion is warranted, if there's another change in the CW tides, Google is nothing if not adroit at doing an about face.

It's just that in this case, stupidity includes {small model, beta testing, brand new kind of AI product} and the given facts lean more towards that end.