site banner

Small-Scale Question Sunday for April 2, 2023

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

4
Jump in the discussion.

No email address required.

I stumbled upon this post https://www.lesswrong.com/posts/cgqh99SHsCv3jJYDS/we-found-an-neuron-in-gpt-2 where the authors explain that they have found a particular "neuron" activations of which are highly correlated with the network outputting article "an" versus "a" (they also found a bunch of other interesting neurons). This made me thinking, people often say that LLMs generate text sequentially, one word at a time, but is that actually true?

I mean, in the literal sense it's definitely true, at each step a GPT looks at the preceding text (up to a certain distance) and produces the next token (a word or a part of a word). But there's a lot of interesting stuff happening in between, and as the "an" issue suggests this literal interpretation might be obscuring something very important.

Suppose I ask a GPT to solve a logical puzzle, with three possible answers, "apple", "banana", "cucumber". It seems more or less obvious that by the time the GPT outputs "The answer is an ", it already knows what the answer actually is. It doesn't choose between "a" and "an" randomly, then fit the next word to match the article, it chooses the next word somewhere in its bowels, then outputs the article.

I'm not sure how to make this argument more formal (and force it to provide more insight contrary to the "it autocompletes one word at a time"). Maybe it could be dressed up in statistics, like suppose we actually ask the GPT to choose one of those three plants at random, then we'll see that it outputs "a" 2/3rds of the time, which tells us something.

Or maybe there could be a way to capture a partial state somehow. Like, when we feed the GPT this: "Which of an apple, a banana, and a cucumber is not long?" it already knows the answer somewhere in its bowels, so when we append "Answer without using an article:" or "Answer in Esperanto:" only a subset of the neurons should change activation values. Or maybe it's even possible to discover a set of neurons that activate in a particular pattern when the GPT might want to output "apple" at some point in the future.

Anyway, I hope that I justified my thesis that "it generates text one word at a time" oversimplifies the situation to the point where it might produce wrong intuitions, that when a GPT chooses between "a" and "an" it doesn't yet know which word will follow. While it does output words one at a time, it must have a significant lookahead state internally (which it regenerates every time it needs to output a single word btw).

See also https://www.lesswrong.com/posts/nmxzr2zsjNtjaHh7x/actually-othello-gpt-has-a-linear-emergent-world

The headline result is that Othello-GPT learns an emergent world representation - despite never being explicitly given the state of the board, and just being tasked to predict the next move, it learns to compute the state of the board at each move.

IMO, LLMs are "just" trying to predict the next token, the same way humans are "just" trying to pass on our genes. It does not preclude LLMs having an internal world model, and I suspect they actually do.

Strongly agree that it internally represents state about very distant parts of it's answer somehow. I've never tried interacting with it in German, but German's Satzklammer or separable prefix verbs offer more extreme examples of this kind of distant grammatical agreement rule that can be used to assess/prove that the ai is thinking ahead.