site banner

The Many Ways that Digital Minds Can Know

moultano.wordpress.com

Like many people I've been arguing about the nature of LLMs a lot over the last few years. There is a particular set of arguments that I found myself having to recreate from scratch over and over again in different contexts, so finally put it together in a larger post, and this is that post.

The crux of it is that I think both the maximalist and minimalist claims about what LLMs can do/are doing are simultaneously true, and not in conflict with one another. A mind made out of text can vary along two axes, the quantity of text it has absorbed, which here I call "coverage," and the degree to which that text has been unified into a coherent model, which here I call "integration." As extreme points on that spectrum, a search engine is high coverage, low integration, and an individual person is low coverage, high integration, and LLMs are intermediate between the two. And most importantly, every point on that spectrum is useful for different kinds of tasks.

I'm hoping this will be a more useful way of thinking about LLMs than the ways people have typically talked about them so far.

8
Jump in the discussion.

No email address required.

There are many possible ways to deal with uncertainty, this is widely recognized as an important goal.

and more.

In principle, I think it's not a big scientific challenge because we can elicit latent knowledge and so probe the model's "internal belief" regarding its output; this can be used as a signal during training. For now this is approached more crudely, just to improve average truthfulness (already cited by @faul_sname):

begin by operationalizing what it means for a network to “know” the right answer to a question, even if it doesn’t produce that answer. We define this as the difference between generation accuracy (measured by a model’s output) and probe accuracy (selecting an answer using a classifier with a model’s intermediate activations as input). Using the LLaMa 7B model, applied to the TruthfulQA benchmark from Lin et al. (2021)–a difficult, adversarially designed test for truthful behavior–we observe a large 40% difference between probe accuracy and generation accuracy. This statistic points to a major gap between what information is present at intermediate layers and what appears in the output.

I also expect a lot from TART-derived approaches:

Tart comprises of two components: a generic task-agnostic reasoning module, and embeddings from the base LLM. The reasoning module is trained using only synthetic data (Gaussian logistic regression problems), agnostic of the auto-regressively trained language model, with the objective of learning to perform probabilistic inference (Section 4.1). This learned transformer module is then composed with the base LLM, without any training, by simply aggregating the output embedding and using those as an input along with the class label (Section 4.2). Together, these components make Tart task-agnostic, boost performance quality by improving reasoning, and make the approach data-scalable by aggregating input embeddings into a single vector.