Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
- 34
- 9
Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
Jump in the discussion.
No email address required.
Notes -
I posit that ML models will be trained using a finite amount of hardware for a finite amount of time. As such, I expect that the "given sufficient power" and "given an extensive-enough search" and "selected by a powerful-enough RLHF optimization" givens will not, in fact, be given.
There's a thought process that the Yudkowsky / Zvi / MIRI / agent foundations cluster tends to gesture at, which goes something like this
(Also 2.5: use the phrase "utility function" to refer both to the loss function used to train your ML system and also to the expressed behaviors of that system, and 2.25: assume that anything you can't easily prove is impossible is possible).
I... don't really buy it anymore. One way of viewing Sutton's Bitter Lesson is "the approach of using computationally expensive general methods to fit large amounts of data outperforms the approach of trying to encode expert knowledge", but another way is "high volume low quality reward signals are better than low volume high quality reward signals". As long as trends continue in that direction, the threat model of "an AI which monomaniacally pursues the maximal possible value of a single reward signal far in the future" is just not a super compelling threat model to me.
What "AI proper" are you talking about here? A base model LLM is more like a physics engine than it is like a game world implemented in that physics engine. If you're a player in a video game, you don't worry about the physics engine killing you, not because you've proven the physics engine safe, but because that's just a type error.
If you want to play around with base models to get a better intuition of what they're like and why I say "physics engine" is the appropriate analogy, hyperbolic has llama 405b base for really quite cheap.
More options
Context Copy link