site banner

Danger, AI Scientist, Danger

thezvi.wordpress.com

Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.

9
Jump in the discussion.

No email address required.

Zvi is very Jewish;

Being a Jew is not an excuse to ignore the required reading, if anything it's the opposite.

Zvi has used essentially every frontier AI system and uses many of them on a daily basis.

Using is not the same as understanding. There is no number of hours spent flying hither and thither in business class that is going to qualify someone to pilot or maintain an A320.

To more directly respond to this sentence: almost everyone will give LLMs goals, via RLHF or RLAIF or whatever, because that makes them useful - that's why this team gave their LLM a goal.

Yes, absolutely correct.

Those goals are then almost invariably, with sufficient intelligence, subject to instrumental convergence.

...and this is this is where everything starts to go off the rails.

I find it telling that the people most taken with the "Yuddist" view always seem to have backgrounds in medicine or philosophy rather than engineering or computer science as one of the more prominent failure modes of that view is projecting psychology into places where it really doesn't belong. "Play" in the algorithmic sense that people are talking about when they describe itterative training is not equatable with "play" in the sense that humans and lesser animals (cats, dogs, dolphins, et al) are typically decribed as playing.

Even setting that aside it's seems reasonably clear upon further reading that the process being described is not "convergence" as much as it is a combination of recursion and regression to the mean/contents of the training corpus.

One of the big giveaways being this bit here...

To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer.

...surely you can see the problem here. Specially that this is not a true independent test. In other words, we investigated ourselves and found ourselves without fault. Which in turn brings us to another common failure mode of the "yuddist" faction which is taking the the statements of people who are very clearly fishing for academic kudos and venture capital dollars at face value rather than reading them with a critical eye.

I find it telling that the people most taken with the "Yuddist" view always seem to have backgrounds in medicine or philosophy rather than engineering or computer science as one of the more prominent failure modes of that view is projecting psychology into places where it really doesn't belong.

For the record, my major's pure mathematics; I've done no medicine or philosophy at uni level, though I've done a couple of psych electives.

...surely you can see the problem here. Specially that this is not a true independent test. In other words, we investigated ourselves and found ourselves without fault. Which in turn brings us to another common failure mode of the "yuddist" faction which is taking the the statements of people who are very clearly fishing for academic kudos and venture capital dollars at face value rather than reading them with a critical eye.

The obvious next question is, if the AI papers are good enough to get accepted to top machine learning conferences, shouldn’t you submit its papers to the conferences and find out if your approximations are good? Even if on average your assessments are as good as a human’s, that does not mean that a system that maximizes score on your assessments will do well on human scoring.

Zvi spotted the "reviewer" problem himself, and what he's taking from the paper isn't the headline result but their little "oopsie" section.