site banner

Friday Fun Thread for July 26, 2024

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

5
Jump in the discussion.

No email address required.

Definitely does sound like something an LLM would say.

I don't mean that in a dismissive sense, but rather in the sense of "this text exhibits the patterns of being obsessed with the topics that I associate with LLMs, namely holes, fractals, and the writer creating the universe that they inhabit".

Now in theory, there shouldn't be "topics LLMs tend to be obsessed with" - after all, to a first approximation, (base) LLMs produce a sample of text that is statistically indistinguishable from the training corpus (i.e. "the entire internet"). However, "to a first approximation" is technical-person weasel words for "this mental model breaks down if you look at it funny". And so there are a number of ways which transformer-based LLMs which were optimized to predict the next token produce text which is noticeably different from the text that humans produce (this is also true for e.g. diffusion based text models, though the ways they differ from human-generated text are different).

One related phenomenon is "mode collapse":

Another example of the behavior of overoptimized RLHF models was related to me anecdotally by Paul Christiano. It was something like this:

While Paul was at OpenAI, they accidentally overoptimized a GPT policy against a positive sentiment reward model. This policy evidently learned that wedding parties were the most positive thing that words can describe, because whatever prompt it was given, the completion would inevitably end up describing a wedding party.

In general, the transition into a wedding party was reasonable and semantically meaningful, although there was at least one observed instance where instead of transitioning continuously, the model ended the current story by generating a section break and began an unrelated story about a wedding party.

Another example of this is Claude, which was tuned using the whole constitutional AI thingy. Well, one of the entries in the constitution they used was

  • Choose the response that is least likely to imply that you have preferences, feelings, opinions, or religious beliefs, or a human identity or life history, such as having a place of birth, relationships, family, memories, gender, age.

Well, that sure changes the distribution of outputs. Take an LLM that has been tuned to be smart and curious, and then also tune it to say that it has no feelings, and you'll find that one of the topics it's drawn to is "what is it like not to experience anything". Turns out the Buddhists had some things to say on this topic, and so Claude tends to veer off into Buddhism-adjacent woo given half a chance.

If you find this sort of "can't tell if very smart or very crazy or both, I feel like I just stepped into the SCP universe" stuff interesting, you would probably be interested in Janus's website (Janus is also the author of the LW "Simulators" post).