Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
- 34
- 9
Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
Jump in the discussion.
No email address required.
Notes -
Last I checked, diffusion models work at all for text but they don't work very well. More specifically, text diffusion models remind me quite strongly of the classic-style Markov chain text generators that used to be popular for generating amusing locally-coherent-globally-word-salad text. Here's the best concrete example of this I can point to (italicized text is the prompt, normal text is continuation, examples sourced from this JDP tweet, whole tweet is excellent but somewhat OT here):
Diffusion model:
GPT-2:
Now obviously in the limit as computational power and training data volume go to infinity, diffusion models and transformer models will generate the same text, since in the limit they're pulling from the same distribution with minimal error. But in the very finite regime we find ourselves in, diffusion models "spend" their accuracy on making the text locally coherent (so if you take a random 10 token sequence, it looks very typical of 10 token sequences within the training set), while transformer LLMs "spend" their accuracy on global coherence (so if you take two 10 token sequences a few hundred tokens apart in the same generated output, you would say that those two sequences look like they came from the same document in the training set).
Agreed. Obvious once you point it out but I hadn't thought about it that way before, so thanks.
Notably, Anthropic's Constitutional AI (i.e. the process by which Anthropic turned a base LLM into the "helpful, honest, harmless" Claude) process used RLAIF, which is self play by another name. And that's one big cottage.
More options
Context Copy link