Like many people I've been arguing about the nature of LLMs a lot over the last few years. There is a particular set of arguments that I found myself having to recreate from scratch over and over again in different contexts, so finally put it together in a larger post, and this is that post.
The crux of it is that I think both the maximalist and minimalist claims about what LLMs can do/are doing are simultaneously true, and not in conflict with one another. A mind made out of text can vary along two axes, the quantity of text it has absorbed, which here I call "coverage," and the degree to which that text has been unified into a coherent model, which here I call "integration." As extreme points on that spectrum, a search engine is high coverage, low integration, and an individual person is low coverage, high integration, and LLMs are intermediate between the two. And most importantly, every point on that spectrum is useful for different kinds of tasks.
I'm hoping this will be a more useful way of thinking about LLMs than the ways people have typically talked about them so far.
Jump in the discussion.
No email address required.
Notes -
There are many possible ways to deal with uncertainty, this is widely recognized as an important goal.
Epistemic Neural Networks
Teaching Models to Express Their Uncertainty in Words
BayesFormer: Transformer with Uncertainty Estimation
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
and more.
In principle, I think it's not a big scientific challenge because we can elicit latent knowledge and so probe the model's "internal belief" regarding its output; this can be used as a signal during training. For now this is approached more crudely, just to improve average truthfulness (already cited by @faul_sname):
I also expect a lot from TART-derived approaches:
More options
Context Copy link