@georgioz comments on "Culture War Roundup for the week of November 27, 2023

Culture War Roundup for the week of November 27, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

georgioz 1yr ago · Edited 1yr ago

A large chunk of the decrease in my p(doom) from a peak of 70% in 2021 to 30% now is, as I've said before, because it seems like we're not in the "least convenient possible world" where it comes to AI alignment. LLMs, as moderated by RLHF and other techniques, almost want to be aligned, and are negligibly agentic unless you set them up to be that way.

You are probably one of the few people who decreasd p(doom) from 2021 and after ChatGPT revolution in 2022. I updated the probability upwards due to:

The increase in capability just by adding compute and scaling the old 2017 transformer architecture was surprising to many. A moderate breakthrough in hardware can move capabilities so much? We are closer than we thought in our race to AGI. It no longer seemed to be feasible to think beyond 2100, we maybe have decades and possibly even years to do what is needed to be done. Definitely bad news for alignment timewise.
The nature of LLMs is terrible as candidate for AGI. The technology is inscrutable, explainability of these models is terrible. Nobody knows why they do what they do, nobody could predict what compute is needed for qualitative jumps such as that between Chat GPT and GPT-4. This makes the models notoriously tough to align even for basic things, like hardening them against exfiltration of training data. If AI can provide answer of when was president of France born, maybe it knows what was in the email CEO of some company sent on January 1st 2019 - if such data was used to train the model. The fact that the most likely candidate for AGI is as Yudkowsky said some just some "giant matrices of trillions inscrutable floating-point numbers" is terrifying - there may be googleplex combinations of viable matrices like that and we do not know what subset of those can be considered aligned and we do not know how to get there. We are just adding compute and are amazed that the thing that is growing in our petri dish is getting more and more capable, we do not manage the growth in any meaningful sense. In the past we had different approaches to Machine Learning specific to domains, people reasonably thought that maybe we will have to work on specific algorithm designed to integrate these separate subdomains. But no, we found out that just throwing compute on very simple game of "predict next word in text" is enough to gain multimodality and make the output more general expanding to domains like computer generated graphics, speech recognition and other areas that were previously separate fields. Also not to just talk broadly, we now know that LLM can discern self-reported race of people from images of their bones beyond current limited capabilities of medical doctors, who can do that from few things like skull etc. Nobody knows why or how the model does it, it just can and we move on.
One last thing to the above point is the old news of top-notch AI model playing GO getting beaten by one simple trick. For years people thought that the model "knew" how to play go in normal sense, the whole community thought that they "aligned" it with the task of at least beating humans in this game with very simple ruleset. Except it was proven that the model achieved results by learning some different concepts, it probably learned a completely different "game" and winning at go for years was just a sidefect. It did not learn very primitive concept that even amateurs at the game can grasp. The "alignment" of the model with basic rules of Go was a lie. I cannot imagine how can we align many orders of magnitude more complicated LLM model who has to grasp all the rules of reality, and imagine that we get exactly what we want, that there will not be any security hole and that some random teenager does not start apocalypse by some primitive prompt or strategy even after the whole scientific community will celebrate the safety of this particular "giant matrix of trillions inscrutable floating-point numbers".
We now have the new Q-Star breakthrough in Open AI. And at least according to some speculation it seems that what it achieved is that one can use compute not to train the model, but to automate evaluation of answers to questions. Imagine it as on the fly training of the model that selects most promising answers generated by larger static model in LLM powered chain-of-thought automation. It seems that this approach can temporarily boost capabilities of the model by orders of magnitude at the expense of more compute focused on specific question/prompt. If true, this means that there is another lever where you can literally throw money on some potentially productive questions like "how to make LLM more effective" and let LLM provide answers. We may be closer to intelligence explosion than we thought last year.

All in all, I do not see p(doom) decreasing in any meaningful way, quite to the contrary.

Context

DaseindustriesLtd late version of a small language model georgioz 1yr ago

This just reveals the incredible badness of MIRI-inspired AI safety/risk theory I think.

The increase in capability just by adding compute and scaling the old 2017 transformer architecture was surprising to many.

The many were just sheltered and ignorant, with their obsolete, childish dreams of GOFAI. Amodei and Sutskever saw this and argued this and won on this. Legg decades ago called AGI possibly by 2028 based purely on Moore's law.

The nature of LLMs is terrible as candidate for AGI. The technology is inscrutable, explainability of these models is terrible. Nobody knows why they do what they do, nobody could predict what compute is needed for qualitative jumps such as that between Chat GPT and GPT-4.

You are talking as if there is any better conceivable architecture. LLMs are, in fact, unexpectedly transparent for anything humanlike in their performance – if only because they operate on tokens, we can inspect their attention maps, routinely invent easier ways to steer them (look at the number of «how is this different from x» questions). Their substrate-level «inscrutability» (overhyped too) is the same as with any DL artifact, and we know it couldn't have been any other way, because GOFAI was dead in the water. Your ivory tower standard of mechanistic understanding is misguided – we know «why they do what they do» because they faithfully approximate the training data, and are absolutely a product of their dataset, to the extent that all clever inductive biases and architectural innovations are as dust before doing a good data cleaning. The magic of GPT-4 is not due to summoning a bigger genie with more compute in a pile, but mostly due to pretraining on tons of proprietary data; and anyway, how could you have strong expectations for the ChatGPT-GPT4 gap without having insight into the inputs for either?

This makes the models notoriously tough to align even for basic things, like hardening them against exfiltration of training data.

Again overhyped.

What makes LLMs «tough to align» against adversarial attacks by humans is not their inscrutability but that they are dumb text processors without ego and any «intent» sans approximating the most probable continuation of a text prompt. This is in fact the most parsimonious explanation of what they do. Stop anthropomorphising them even as you demonize them.

The fact that the most likely candidate for AGI is as Yudkowsky said some just some "giant matrices of trillions inscrutable floating-point numbers" is terrifying - there may be googleplex combinations of viable matrices like that and we do not know what subset of those can be considered aligned

This is wordcelism or, more specifically, counting argument and it was used in the past by Chomsky to rule out the possibility of statistical machines learning natural language. You know how it went. (Relatedly, Yud, who has always been easy to drive to religious ecstasy or terror with Big Numbers, was surprised by GPT-4, which completely discredits him as an AI analyst in my book). Pope puts it this way:

To show how arguments about the general structure of mathematical objects can fail to translate into the "expected" real world consequences, let's look at thermodynamics of gas particles. Consider the following argument for why we will all surely die of overpressure injuries, regardless of the shape of the rooms we're in:

Gas particles in a room are equally likely to be in any possible configuration.

This property is "orthogonal" to room shape, in the specific mechanistic sense that room shape doesn't change the relative probabilities of any of the allowed particle configurations, merely renders some of them impossible (due to no particles being allowed outside the room).

Therefore, any room shape is consistent with any possible level of pressure being exerted against any of its surfaces (within some broad limitations due to the discrete nature of gas particles).

The range of gas pressures which are consistent with human survival is tiny compared to the range of possible gas pressures.

Therefore, we are near-certain to be subjected to completely unsurvivable pressures, and there's no possible room shape that will save us from this grim fate.

This argument makes specific, true statements about how the configuration space of possible rooms interacts with the configuration spaces of possible particle positions. But it still fails to be at all relevant to the real world because it doesn't account for the specifics of how statements about those spaces map into predictions for the real world (in contrast, the orthogonality thesis doesn't even rigorously define the spaces about which it's trying to make claims, never mind make precise claims about the relationship between those spaces, and completely forget about showing such a relationship has any real-world consequences). The specific issue with the above argument is that the "parameter-function map" between possible particle configurations and the resulting pressures on surfaces concentrates an extremely wide range of possible particle configurations into a tiny range of possible pressures, so that the vast majority of the possible pressures just end up being ~uniform on all surfaces of the room. In other words, it applies the "counting possible outcomes and see how bad they are" step to the space of possible pressures, rather than the space of possible particle positions.
The classical learning theory objections to deep learning made the same basic mistake when they said that the space of possible functions that interpolate a fixed number of points is enormous, so using overparameterized models is far more likely to get a random function from that space, rather than a "nice" interpolation.
They were doing the "counting possible outcomes and seeing how bad they are" step to the space of possible interpolating functions, when they should have been doing so in the space of possible parameter settings that produce a valid interpolating function. This matters for deep learning because deep learning models are specifically structured to have parameter-function maps that concentrate enormous swathes of parameter space to a narrow range of simple functions (https://arxiv.org/abs/1805.08522, ignore everything they say about Solomonoff induction).
I think a lot of pessimism about the ability of deep learning training to specify the goals on an NN is based on a similar mistake, where people are doing the "count possible outcomes and see how bad they are" step to the space of possible goals consistent with doing well on the training data, when it should be applied to the space of possible parameter settings consistent with doing well on the training data, with the expectation that the parameter-function map of the DL system will do as it's been designed to, and concentrate an enormous swathe of possible parameter space into a very narrow region of possible goals space.

This is why deep learning works at all, for capabilities too! Specifying rules of language is not more intractable than specifying «alignment»!

We are just adding compute and are amazed that the thing that is growing in our petri dish is getting more and more capable

But no, we found out that just throwing compute on very simple game of "predict next word in text" is enough to gain multimodality and make the output more general expanding to domains like computer generated graphics, speech recognition and other areas that were previously separate fields

I suppose you've been misinformed: LLMs do not learn multimodal capabilities from text alone. In general it's just more in-context learning developed over a bigger dataset of token sequences. That people believe those sequences for different modalities are very essentially different, and are amazed, is immaterial to them not being really different.

Except it was proven that the model achieved results by learning some different concepts, it probably learned a completely different "game" and winning at go for years was just a sidefect. It did not learn very primitive concept that even amateurs at the game can grasp. The "alignment" of the model with basic rules of Go was a lie.

This is quite a hilarious exemplar of motivated thinking. A model trained on self-play diverges from the ground truth, news at 11! Maybe, instead of fretting about the misalignment, you could see this as an issue of overrated «capabilities»? How can you even distinguish them? How far do you think an intelligence explosion, recursive self-improvement etc. will fare if self-play stumbles into fragile local minima on a 19x19 grid with clear reward signal? Back in AlphaZero's day, Yud was so terrified of this self-play superpower, confirming his worst fears of FOOM:

AlphaGo Zero uses 4 TPUs, is built entirely out of neural nets with no handcrafted features, doesn't pretrain against expert games or anything else human, reaches a superhuman level after 3 days of self-play, and is the strongest version of AlphaGo yet.

The architecture has been simplified. Previous AlphaGo had a policy net that predicted good plays, and a value net that evaluated positions, both feeding into lookahead using MCTS (random probability-weighted plays out to the end of a game). AlphaGo Zero has one neural net that selects moves and this net is trained by Paul Christiano-style capability amplification, playing out games against itself to learn new probabilities for winning moves.

As others have also remarked, this seems to me to be an element of evidence that favors the Yudkowskian position over the Hansonian position in my and Robin Hanson's AI-foom debate.

– and now we see this develops massive blind spots which would lead to trivial failures in reality. But you want to be scared so you construe this as a matter of «learning a different game». Tails you win, heads I lose.

I look forward to this intellectual tradition being relegated to the dustbin of history.

self_made_human Kai su, teknon? georgioz 1yr ago

Hmm, I would say that most of those specific concerns were already priced in for me by 2021, hence why I already had such a high p(doom) at the time.

What specific means of exfiltration are you talking about? If you mean the recent claims that getting it to endlessly repeat a string will make it "leak" training data or recent interactions with other users, in the case of ChatGPT-

A) It's cost and time prohibitive to do so.

B) It's possible that the bug is with a JSON parser, or the plausible seeming outputs are just a hallucination.

If there's another way of getting it to leak training data, I can't recall one.

I've read more commentary on Q*, and the current consensus seems to be that it's not that big of a deal. I would have to look up the specific arguments, but they came from reputable sources.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats