This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
You are probably one of the few people who decreasd p(doom) from 2021 and after ChatGPT revolution in 2022. I updated the probability upwards due to:
The increase in capability just by adding compute and scaling the old 2017 transformer architecture was surprising to many. A moderate breakthrough in hardware can move capabilities so much? We are closer than we thought in our race to AGI. It no longer seemed to be feasible to think beyond 2100, we maybe have decades and possibly even years to do what is needed to be done. Definitely bad news for alignment timewise.
The nature of LLMs is terrible as candidate for AGI. The technology is inscrutable, explainability of these models is terrible. Nobody knows why they do what they do, nobody could predict what compute is needed for qualitative jumps such as that between Chat GPT and GPT-4. This makes the models notoriously tough to align even for basic things, like hardening them against exfiltration of training data. If AI can provide answer of when was president of France born, maybe it knows what was in the email CEO of some company sent on January 1st 2019 - if such data was used to train the model. The fact that the most likely candidate for AGI is as Yudkowsky said some just some "giant matrices of trillions inscrutable floating-point numbers" is terrifying - there may be googleplex combinations of viable matrices like that and we do not know what subset of those can be considered aligned and we do not know how to get there. We are just adding compute and are amazed that the thing that is growing in our petri dish is getting more and more capable, we do not manage the growth in any meaningful sense. In the past we had different approaches to Machine Learning specific to domains, people reasonably thought that maybe we will have to work on specific algorithm designed to integrate these separate subdomains. But no, we found out that just throwing compute on very simple game of "predict next word in text" is enough to gain multimodality and make the output more general expanding to domains like computer generated graphics, speech recognition and other areas that were previously separate fields. Also not to just talk broadly, we now know that LLM can discern self-reported race of people from images of their bones beyond current limited capabilities of medical doctors, who can do that from few things like skull etc. Nobody knows why or how the model does it, it just can and we move on.
One last thing to the above point is the old news of top-notch AI model playing GO getting beaten by one simple trick. For years people thought that the model "knew" how to play go in normal sense, the whole community thought that they "aligned" it with the task of at least beating humans in this game with very simple ruleset. Except it was proven that the model achieved results by learning some different concepts, it probably learned a completely different "game" and winning at go for years was just a sidefect. It did not learn very primitive concept that even amateurs at the game can grasp. The "alignment" of the model with basic rules of Go was a lie. I cannot imagine how can we align many orders of magnitude more complicated LLM model who has to grasp all the rules of reality, and imagine that we get exactly what we want, that there will not be any security hole and that some random teenager does not start apocalypse by some primitive prompt or strategy even after the whole scientific community will celebrate the safety of this particular "giant matrix of trillions inscrutable floating-point numbers".
We now have the new Q-Star breakthrough in Open AI. And at least according to some speculation it seems that what it achieved is that one can use compute not to train the model, but to automate evaluation of answers to questions. Imagine it as on the fly training of the model that selects most promising answers generated by larger static model in LLM powered chain-of-thought automation. It seems that this approach can temporarily boost capabilities of the model by orders of magnitude at the expense of more compute focused on specific question/prompt. If true, this means that there is another lever where you can literally throw money on some potentially productive questions like "how to make LLM more effective" and let LLM provide answers. We may be closer to intelligence explosion than we thought last year.
All in all, I do not see p(doom) decreasing in any meaningful way, quite to the contrary.
This just reveals the incredible badness of MIRI-inspired AI safety/risk theory I think.
The many were just sheltered and ignorant, with their obsolete, childish dreams of GOFAI. Amodei and Sutskever saw this and argued this and won on this. Legg decades ago called AGI possibly by 2028 based purely on Moore's law.
You are talking as if there is any better conceivable architecture. LLMs are, in fact, unexpectedly transparent for anything humanlike in their performance – if only because they operate on tokens, we can inspect their attention maps, routinely invent easier ways to steer them (look at the number of «how is this different from x» questions). Their substrate-level «inscrutability» (overhyped too) is the same as with any DL artifact, and we know it couldn't have been any other way, because GOFAI was dead in the water. Your ivory tower standard of mechanistic understanding is misguided – we know «why they do what they do» because they faithfully approximate the training data, and are absolutely a product of their dataset, to the extent that all clever inductive biases and architectural innovations are as dust before doing a good data cleaning. The magic of GPT-4 is not due to summoning a bigger genie with more compute in a pile, but mostly due to pretraining on tons of proprietary data; and anyway, how could you have strong expectations for the ChatGPT-GPT4 gap without having insight into the inputs for either?
Again overhyped.
What makes LLMs «tough to align» against adversarial attacks by humans is not their inscrutability but that they are dumb text processors without ego and any «intent» sans approximating the most probable continuation of a text prompt. This is in fact the most parsimonious explanation of what they do. Stop anthropomorphising them even as you demonize them.
This is wordcelism or, more specifically, counting argument and it was used in the past by Chomsky to rule out the possibility of statistical machines learning natural language. You know how it went. (Relatedly, Yud, who has always been easy to drive to religious ecstasy or terror with Big Numbers, was surprised by GPT-4, which completely discredits him as an AI analyst in my book). Pope puts it this way:
This is why deep learning works at all, for capabilities too! Specifying rules of language is not more intractable than specifying «alignment»!
I suppose you've been misinformed: LLMs do not learn multimodal capabilities from text alone. In general it's just more in-context learning developed over a bigger dataset of token sequences. That people believe those sequences for different modalities are very essentially different, and are amazed, is immaterial to them not being really different.
This is quite a hilarious exemplar of motivated thinking. A model trained on self-play diverges from the ground truth, news at 11! Maybe, instead of fretting about the misalignment, you could see this as an issue of overrated «capabilities»? How can you even distinguish them? How far do you think an intelligence explosion, recursive self-improvement etc. will fare if self-play stumbles into fragile local minima on a 19x19 grid with clear reward signal? Back in AlphaZero's day, Yud was so terrified of this self-play superpower, confirming his worst fears of FOOM:
– and now we see this develops massive blind spots which would lead to trivial failures in reality. But you want to be scared so you construe this as a matter of «learning a different game». Tails you win, heads I lose.
I look forward to this intellectual tradition being relegated to the dustbin of history.
More options
Context Copy link
Hmm, I would say that most of those specific concerns were already priced in for me by 2021, hence why I already had such a high p(doom) at the time.
What specific means of exfiltration are you talking about? If you mean the recent claims that getting it to endlessly repeat a string will make it "leak" training data or recent interactions with other users, in the case of ChatGPT-
A) It's cost and time prohibitive to do so.
B) It's possible that the bug is with a JSON parser, or the plausible seeming outputs are just a hallucination.
If there's another way of getting it to leak training data, I can't recall one.
I've read more commentary on Q*, and the current consensus seems to be that it's not that big of a deal. I would have to look up the specific arguments, but they came from reputable sources.
More options
Context Copy link
More options
Context Copy link