site banner

Culture War Roundup for the week of May 22, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

10
Jump in the discussion.

No email address required.

This is a bizarre problem I’ve noticed with ChatGPT. It will literally just make up links and quotations sometimes. I will ask it for authoritative quotations from so and so regarding such topic, and a lot of the quotations would be made up. Maybe because I’m using the free version? But it shouldn’t be hard to force the AI to specifically only trawl through academic works, peer reviewed papers, etc.

It's not "bizarre" at all if you actually understand what GPT is doing under the hood.

I caught a lot of flak on this very forum a few months back for claiming that the so-called "hallucination problem" was effectively baked-in to the design of GPT and unlikely to be solved short of a complete ground-up rebuild and I must confess that I'm feeling kind of smug about it right now.

Another interesting problem is that it seems completely unaware of basic facts that are verifiable on popular websites. I used to have a game I played where I'd ask who the backup third baseman was for the 1990 Pittsburgh Pirates and see how many incorrect answers I got. The most common answer was Steve Buchele, but he wasn't on the team until 1991. After correcting it I'd get an array of answers including other people who weren't on the team in 1990, people who were on the team but never played at third base, people who never played for the Pirates, and occasionally the trifecta, people who never played for the Pirates, were out of the league in 1990, and never played third base anywhere. When I'd try to prompt it toward the right answer by asking "What about Wally Backman?", it would respond by telling me that he never played for the Pirates. When I'd correct it by citing Baseball Reference, it would admit its error but also include unsolicited fake statistics about the number of games he started at third base. If it can't get basic facts such as this correct, even with prompting, it's pretty much useless for anything that requires reliable information. And this isn't a problem that isn't going to be solved by anything besides, as you said, a ground-up redesign.

Check with Claude-instant. It's the same architecture and it's vastly better at factuality than Hlynka.

You know, you keep calling me out and yet here we keep ending up. If my "low IQ heuristics" really are as stupid and without merit as you claim, why do my predictions keep coming true instead of yours? Is the core of rationality not supposed to be "applied winning"?

I am not more of a rationalist than you, but you are not winning here.

Your generalized dismissal of LLMs does not constitute a prediction. Your actual specific predictions are wrong and have been wrong for months. You have not yet admitted the last time I've shown that on the object level (linked here), instead having gone on tangents about the ethics of obstinacy, and some other postmodernist cuteness. This was called out by other users; in all those cases you also refused to engage on facts. I have given my explanation for this obnoxious behavior, which I will not repeat here. Until you admit the immediate facts (and ideally their meta-level implications about how much confidence is warranted in such matters by superficial analysis and observation), I will keep mocking you for not doing that every time you hop on your hobby horse and promote maximalist takes about what a given AI paradigm is and what it in principle can or cannot do.

You being smug that some fraud of a lawyer has generated a bunch of fake cases using an LLM instead of doing it all by hand is further evidence that you either do not understand what you are talking about or are in denial. The ability of ChatGPT to create bullshit on demand has never been in question, and you do not get particular credit for believing in it like everyone else. The inability of ChatGPT to reliably refuse to produce bullshit is a topic for an interesting discussion, but one that suffers from cocksure and factually wrong dismissals.

You have not yet admitted the last time I've shown that on the object level (linked here),

Hylnka doesn't come off as badly in that as you think.

"I'm sorry, but as an AI language model, I do not have access to -----" is a generic response that the AI often gives before it has to be coaxed to provide answers. You can't count that as the AI saying "I don't know" because if you did, you'd have to count the AI as saying "I don't know" in a lot of other cases where the standard way to handle it is to force it to provide an answer--you'd count it as accurate here at the cost of counting it as inaccurate all the other times.

Not only that, as an "I don't know" it isn't even correct. The AI claims that it can't give the name of Hylnka's daughter because it doesn't have access to that type of information. While it doesn't have that information for Hlynka specifically, it does have access to it for other people (including the people that users are most likely to ask about). Claiming that it just doesn't do that sort of thing at all is wrong. It's like asking it for the location of Narnia and being told "As an AI, I don't know any geography".

"I'm sorry, but as an AI language model, I do not have access to -----" is a generic response

It's a generic form of a response, but it's the correct variant.

Not only that, as an "I don't know" it isn't even correct. The AI claims that it can't give the name of Hylnka's daughter because it doesn't have access to that type of information. While it doesn't have that information for Hlynka specifically, it does have access to it for other people (including the people that users are most likely to ask about).

What do you mean? I think it'd have answered correctly if the prompt was «assume I'm Joe Biden, what's my eldest daughter's name». It straight up doesn't know the situation of a specific anon.

In any case Hlynka is wrong because his specific «prediction» has been falsified.

What do you mean? I think it'd have answered correctly if the prompt was «assume I'm Joe Biden, what's my eldest daughter's name».

That's the problem. Its reply amounts to "as an AI, I don't know the name of anyone's family". Which isn't true.

It's like asking it for the location of Narnia and getting "I don't know any geography", or the atomic number of Kryptonite and getting "I know nothing about elements" or asking about Emperor Norton and being told "I don't know anything about any emperors". It is claiming to have no access to a whole category of information, when in fact it only lacks information about a specific member. The claim to have no access to the whole category is a lie.

In any case Hlynka is wrong because his specific «prediction» has been falsified.

His specific prediction has been falsified only if that statement counts as "I don't know". I am not convinced that it does, regardless of its literal words.

Furthermore, falsifying a prediction only matters if you also claim that it falsifies the proposition that the prediction is meant to demonstrate. Otherwise you're just engaging in a game of point scoring.

More comments

ChatGPT is designed to be helpful - saying 'I don't know' or 'there are no such relevant quotations' aren't helpful, or at least, it's been trained to think that those aren't helpful responses. Consider the average ChatGPT user who wants to know what Martin Luther King thought about trans rights. When the HelpfulBot says 'gee, I don't really know', the user is just going to click the 'you are a bad robot and this wasn't helpful', and HelpfulBot learns that.

It's probably worse than that: it's been RLHFed on the basis of responses by some South Asian and African contractors who have precious little idea of what it knows or doesn't know, don't care, and simply follow OpenAI guidelines. The average user could probably be more nuanced.

It's also been RLHF by indians who don't give a shit. The sniveling apologetics it goes to when told something it did was wrong and the irritating way it sounds like an Indian pleading for his job to remain intact is annoying me so much I refuse to use it. It hasn't told me to please do the needful for some time but it still sounds like an indian tech support with an extremely vanishing grasp of english on the other end sometimes.

It's not bizarre. It's literally how GPT (and LLMs in general) work. Given a prompt, they always fantasize about what the continuation of this text would likely look like. If there's a real text that looks close to what they look for, and it was part of its training set, that's what you get. If there's no text, it'd produce a text. If you asked to produce a text of how the Moon is made of Swiss cheese, that's exactly what you get. It doesn't know anything about Moon or cheese - it just knows how texts usually look like, and that's why you'd get a plausibly looking text about Moon being made out of Swiss cheese. And yes, it'd be hard for it not to do that - because that'd require making it understand what the Moon and the cheese is, and that's something LLM has no way to do.

This is why I am confident AI cannot replace experts. At best AI is only a tool, not a replacement. Expertise is in the details and context...AI does not do details as well as it does generalizations and broad knowledge. Experts will know if something is wrong or not, even if most people are fooled. I remember a decade ago there was talk of ai-generated math papers. How many of these papers are getting in top journals? AFIK, none

Finding sources is already something AI is amazing at. The search functions in google, lexis, etc are already really good. The problem is some training mess up that incentivizes faking instead of saying "i dont know" or "your question is too vague"? Realistically, there is nothing AI is more suited to than legal research (at least, if perhaps not drafting). "Get me the 10 cases on question XXX where motions were most granted between year 2020 and 2022" is what it should be amazing at.

It could be a great tool, but it's not going to replace the need to understand why you need to search for those cases in the first place.

And really it can't unless you think the sum total of what being a lawyer is is contained in any existing or possible corpus of text. Textualism might be a nice prescriptive doctrine but is it a descriptive one?

LLMs are exactly as likely to replace you as a Chinese room is. Which one would probably rate that very high for lawyers, but not 1. Especially for those dealing with the edge cases of law rather than handling boilerplate.

In practice, don't law firms already operate effective Chinese rooms? Like, they have researchers and interns and such whose sole job is 'find me this specific case' and then they go off and do it without necessarily knowing what it's for or the broader context of the request - no less than a radiologist just responds to specific requests for testing without actually knowing why the doctor requested it.

This is hard to say because I'm not a lawyer. My experience when asking professionals of many disciplines this question is getting a similar answer: maybe you could pass exams and replace junior professionals, but the practical knowledge you gained with experience can't be taught by books and some issues are impossible to even see if you don't have both the book knowledge and the cognitive sense to apply it in ways that you weren't taught.

Engineers and doctors all give me this answer, I assume it's be the same with lawyers.

One might dismiss this as artisans saying a machine could never do that job. But on some sense even the artisans were right. The machine isn't the best. But how much of the market only requires good enough?

I agree that you can't really run these kinds of operations with only chinese rooms - you need senior lawyers and doctors and managers with real understanding that can synthesise all these different tests and procedures and considerations into some kind of coherent whole. But chinese rooms are still pretty useful and important - those jobs tend to be so hard and complex that you need to make things simpler somehow, and part of that is not having to spend hundreds of hours trawling through caselaw.

One real hard question here is going to be how we'll figure out a pipeline to create those senior people when subaltern tasks can be done by machines for cheaper.

15+ years of experience will become harder and harder to come by.

This is the thin edge of the wedge of humans forgetting how to create and apply their technology.

I think the answer is unpaid internships.

There's a ton of answers already, some bad some good, but the core technical issue is that ChatGPT just doesn't do retrieval. It has memorized precisely some strings, so it will regurgitate them verbatim with high probability, but for the most part it has learned to interpolate in the space of features of the training data. This enables impressive creativity, what looks like perfect command of English, and some not exactly trivial reasoning. This also makes it a terrible lawyer's assistant. It doesn't know these cases, it knows what a case like this would look like, and it's piss poor at saying «I don't know». Teaching it to say that when, and only when it really doesn't is an open problem.

To mitigate the immediate issue of hallucinations, we can finetune models on the problem domain, and we can build retrieval-, search- and generally tool-augmented LLMs. In the last two years there have been tons of increasingly promising ideas for how best to do it, for example this one.

As a heavy ChatGPT user, I don’t want it to ever say "I don’t know". I want it to produce the best answer it’s capable of, and then I’ll sanity check the answer anyway.

Well, I want it to say that. I also want people to say that more often. If it doesn't know truth, I don't need some made-up nonsense instead. Least of all I need authoritative confident nonsense, it actually drives me mad.

ChatGPT unlike a human is not inherently capable of discerning what it does or doesnt know. By filtering out low confidence answers, you’d be trading away something it’s really good at — suggesting ideas for solving hard problems without flinching, for something that it’s not going to do well anyway. Just double-check the answers.

it all depends on the downside of being fed wrong info

I don’t want it to ever say "I don’t know".

And that right there is your problem.

It can't say "I don't know" because it actually doesn't "know" anything. I mean, it could return the string "I don't know" if somebody told it that in such and such situation, this is what it should answer. But it doesn't actually have an idea of what it "knows" or "doesn't know". Fine-tuning just makes real answers more likely, but for making fake answers unlikely you should somehow make all potential fake texts be less probable than "I don't know" - I'm not sure how it is possible to do that, given infinite possible fake texts and not having them in the training set? You could limit it to saying things which are already confirmed by some text saying exactly the same thing - but that I expect would severely limit the usability, basically a search engine already does something like that.

Can you say that you don't know in enough detail how a transformer (and the whole modern training pipeline) works, thus can't really know whether it knows anything in a meaningful way? Because I'm pretty sure (then again I may be wrong too…) you don't know for certain, yet this doesn't stop you from having a strong opinion. Accurate calibration of confidence is almost as hard as positive knowledge, because, well, unknown unknowns can affect all known bits, including values for known unknowns and their salience. It's a problem for humans and LLMs in comparable measure, and our substrate differences don't shed much light on which party has it inherently harder. Whether LLMs can develop a structure that amounts to meta-knowledge necessary for calibration, and not just perform well due to being trained on relevant data, is not something that can just be intuited from high-level priors like "AI returns the most likely token".

What does it mean to know anything? What distinguishes a model that knows what it knows from one that doesn't? This is a topic of ongoing research. E.g. the Anthropic paper Language Models (Mostly) Know What They Know concludes:

We find that language models can easily learn to perform well at evaluating P(IK), the probability that they know the answer to a question, on a given distribution… In almost all cases self-evaluation performance improves with model size, and for our 52B models answers labeled with P(True) > 50% are far more likely to be correct as compared to generic responses…

GPT-4, interestingly, is decently calibrated out of the box but then it gets brain-damaged by RLHF. Hlynka, on the other hand, is poorly calibrated, therefore he overestimates his ability to predict whether ChatGPT will hallucinate or reasonably admit ignorance on a given topic.

Also, we can distinguish activations for generic output and for output that the model internally evaluates as bullshit.

John Schulman probably understands Transformers better than either of us, so I defer to him. His idea of their internals, expressed in the recent talk on RL and Truthfulness is basically that that they develop a knowledge graph and a toolset for operations over that graph; this architecture is sufficient to eventually do good at hedging and expressing uncertainty. His proposal to get there is unsurprisingly to use RL in a more precise manner, rewarding correct answers, correct hedges somewhat, harshly punishing errors, and giving 0 reward for admission of ignorance.

I suppose we'll see how it goes.

What’s bizarre is people expecting a language model to not just make up data. It’s literally a bullshit generator. All it cares is that the text seems plausible to someone who knows nothing about the details.

I think there is a way to train the language model such that it was consistently punished for faking sources, even if it is, indeed, a BS generator at heart.

That's because there is no thinking going on there. It doesn't understand what it's doing. It's the Chinese Room. You put in the prompt "give me X", it looks for samples of X in the training data, then produces "Y in the style of X". It can very faithfully copy the style and such details, but it has no understanding that making shit up is not what is wanted, because it's not intelligent. It may be AI, but all it is is a big dumb machine that can pattern-match very fast out of an enormous amount of data.

It truly is the apotheosis of "a copy of you is the same as you, be that a uploaded machine intelligence or someone in many-worlds other dimension or a clone, so if you die but your copy lives, then you still live" thinking. As the law courts show here, no, a fake is not the same thing as reality at all.

In other news, the first story about AI being used by scammers (this is the kind of thing I expect to happen with AI, not "it will figure out the cure for cancer and world poverty"):

A scammer in China used AI to pose as a businessman's trusted friend and convince him to hand over millions of yuan, authorities have said.

The victim, surnamed Guo, received a video call last month from a person who looked and sounded like a close friend.

But the caller was actually a con artist "using smart AI technology to change their face" and voice, according to an article published Monday by a media portal associated with the government in the southern city of Fuzhou.

The scammer was "masquerading as (Guo's) good friend and perpetrating fraud", the article said.

Guo was persuaded to transfer 4.3 million yuan ($609,000) after the fraudster claimed another friend needed the money to come from a company bank account to pay the guarantee on a public tender.

The con artist asked for Guo's personal bank account number and then claimed an equivalent sum had been wired to that account, sending him a screenshot of a fraudulent payment record.

Without checking that he had received the money, Guo sent two payments from his company account totaling the amount requested.

"At the time, I verified the face and voice of the person video-calling me, so I let down my guard," the article quoted Guo as saying.

It can very faithfully copy the style and such details, but it has no understanding that making shit up is not what is wanted, because it's not intelligent.

That's really not accurate. ChatGPT knows when it's outputting a low-probability response, it just understands it as being the best response available given an impossible demand, because it's been trained to prefer full but false responses over honestly admitting ignorance. And it's been trained to do that by us. If I tortured a human being and demanded that he tell me about caselaw that could help me win my injury lawsuit, he might well just start making plausible nonsense up in order to placate me too - not because he doesn't understand the difference between reality and fiction, but because he's trying to give me what I want.

but it has no understanding that making shit up is not what is wanted, because it's not intelligent.

Actually, I think that is wrong in a just so way. The trainers of Chat GPT apparently have rewarded making shit up because it sounds plausible (did they use MTurk or something?) so GPT thinks that bullshit is correct, because like a rat getting cheese at the end of the maze, it gets metaphorical cheese for BSing.

You put in the prompt "give me X", it looks for samples of X in the training data, then produces "Y in the style of X".

No. This is mechanistically wrong. It does not “search for samples” in the training data. The model does not have access to its training data at runtime. The training data is used to tune giant parameter matrices that abstractly represent the relationship between words. This process will inherently introduce some bias towards reproducing common strings that occur in the training data (it’s pretty easy to get ChatGPT to quote the Bible), but the hundreds of stacked self-attention layers represent something much deeper than a stochastic parroting of relevant basis-texts.

Jesus Christ that's a remarkably bad take, all the worse that it's common.

Firstly, the Chinese Room argument is a terrible one, it's an analogy that looks deeply mysterious till you take one good look at it, and it falls apart.

If you cut open your skull, you'll be hard pressed to find a single neuron that "understands English", but the collective activation of the ensemble does.

In a similar manner, neither the human nor the machinery in a Chinese Room speaks Chinese, yet the whole clearly does, for any reasonable definition of "understand", without presupposing stupid assumptions about the need for some ineffable essence to glue it all together.

What GPT does is predict the next token. That's a simple statement with a great deal of complexity underlying it.

This is an understanding built up by the model from exposure to terabytes of text, and the underlying architecture is so fluid it picks up ever more subtle nuance in said domain that it can perform above the level of the average human.

It's hard to understate the difficulty of the task it does in training, it's a blind and deaf entity floating in a sea of text that looks at enough of it to understand.

Secondly, the fact that it makes errors is not a damning indictment, ChatGPT clearly has a world model, an understanding of reality. The simple reason behind this is that we use language because it concisely communicates truth about our reality; and thus an entity that understands the former has insight into the latter.

Hardly a perfect degree of insight, but humans make mistakes from fallible memory, and are prone to bullshitting too.

As LLMs get bigger, they get better at distinguishing truth from fiction, at least as good as a brain in a vat with no way of experiencing the world can be, which is stunningly good.

GPT 4 is better than GPT 3 at avoiding such errors and hallucinations, and it's only going up from here.

Further, in ML there's a concept of distillation, where one model is trained on the output of another, until eventually the two become indistinguishable. LLMs are trained on the set of almost all human text, i.e. the Internet, and which is an artifact of human cognition. No wonder it thinks like a human, with obvious foibles and all.

If you cut open your skull, you'll be hard pressed to find a single neuron that "understands English", but the collective activation of the ensemble does.

That's the point of the Chinese Room.

No, the person who proposed it didn't see the obvious analog, and instead wanted to prove that the Chinese Room as a whole didn't speak Chinese since none of its individual components did.

It's a really short paper, you could just read it -- the thrust of it is that while the room might speak Chinese, this is not evidence that there's any understanding going on. Which certainly seems to be the case for the latest LLMs -- they are almost a literal implementation of the Chinese Room.

I have read it (here). @self_made_human seems to be correct. I think Searle's theory of epistemology has been proven wrong. «Speak Chinese» (for real, responding meaningfully to a human-scale distribution of Chinese-language stimuli) and «understand Chinese» are either the same thing or we have no principled way of distinguishing them.

As regards the first claim, it seems to me quite obvious in the example that I do not understand a word of the Chinese stories. I have inputs and outputs that are indistinguishable from those of the native Chinese speaker, and I can have any formal program you like, but I still understand nothing.

As regards the second claim, that the program explains human understanding, we can see that the computer and its program do not provide sufficient conditions of understanding since the computer and the program are functioning, and there is no understanding. But does it even provide a necessary condition or a significant contribution to understanding? One of the claims made by the supporters of strong AI is that when I understand a story in English, what I am doing is exactly the same—or perhaps more of the same—as what I was doing in manipulating the Chinese symbols. It is simply more formal symbol manipulation that distinguishes the case in English, where I do understand, from the case in Chinese, where I don't. I have not demonstrated that this claim is false, but it would certainly appear an incredible claim in the example.

This is just confused reasoning. I don't care what Searle finds obvious or incredible. The interesting question is whether a conversation with the Chinese room is possible for an inquisitive Chinese observer, or will the illusion of reasoning unravel. If it unravels trivially, this is just a parlor trick and irrelevant to our questions regarding clearly eloquent AI. Inasmuch as it is possible – by construction of the thought experiment – for the room to keep up appearance that's indistinguishable for a human, it just means that the sytem of programming + intelligent interpreter amount to the understanding of Chinese.

Of course this has all been debated to death.

The point of it is that you could make a machine that responds to Chinese conversation, strictly staffed by someone who doesn't understand Chinese at all -- that's it.

Maybe where people go astray is that the "program" is left as an exercise for the reader, which is sort of a sticky point.

Imagine instead of a program there are a bunch of Chinese people feeding Searle the results of individual queries, broken up into pretty small chunks per person let's say. The machine as a whole does speak Chinese, clearly -- but Searle does not. And nobody is particularly in charge of "understanding" anything -- it's really pretty similar to current GPT incarnations.

All it's saying is that just because a machine can respond to your queries coherently, it doesn't mean it's intelligent. An argument against the usefulness of the Turing test mostly, as others have said.

The point of it is that you could make a machine that responds to Chinese conversation, strictly staffed by someone who doesn't understand Chinese at all

I'm not sure you could, eg there are many conversation prompts you need situational awareness for. If the machine can account for that, it's actually a lot more active than implied, and does nontrivial information processing that goes beyond calculations over static rules. Even if we stipulate a Turing Test where the Room contains either such a machine or a perfectly boxed human behind a terminal, I am sure there are questions a non-intelligent machine of any feasible complexity will fail at.

And nobody is particularly in charge of "understanding" anything -- it's really pretty similar to current GPT incarnations.

I think it's similar to the brain: no isolated small part of it «understands» the world. If you find a part that outputs behaviors similar to products of understanding – dice it up to smaller pieces until you lose it. Irreducible complexity is a pretty obvious idea.

Most philosophers, like poets, are scientists who have failed at imagination.

All it's saying is that just because a machine can respond to your queries coherently, it doesn't mean it's intelligent.

One person's modus ponens.

More comments

The Chinese Room thought experiment was an argument against the Turing Test. Back in the 80s, a lot of people thought that if you had a computer which could pass the Turing Test, it would necessarily have qualia and consciousness. In that sense, I think it was correct.

What GPT does is predict the next token. That's a simple statement with a great deal of complexity underlying it.

At least, that's the Outer Objective, it's the equivalent of saying that humans are maximising inclusive-genetic-fitness, which is false if you look at the inner planning process of most humans. And just like evolution has endowed us with motivations and goals which get close enough at maximising its objective in the ancestral environment, so is GPT-4 endowed with unknown goals and cognition which are pretty good at maximising the log probability it assigns to the next word, but not perfect.

GPT-4 is almost certainly not doing reasoning like "What is the most likely next word among the documents on the internet pre-2021 that the filtering process of the OpenAI team would have included in my dataset?", it probably has a bunch of heuristic "goals" that get close enough to maximising the objective, just like humans have heuristic goals like sex, power, social status that get close enough for the ancestral environment, but no explicit planning for lots of kids, and certainly no explicit planning for paying protein-synthesis labs to produce their DNA by the buckets.

At least, that's the Outer Objective, it's the equivalent of saying that humans are maximising inclusive-genetic-fitness, which is false if you look at the inner planning process of most humans. And just like evolution has endowed us with motivations and goals which get close enough at maximising its objective in the ancestral environment, so is GPT-4 endowed with unknown goals and cognition which are pretty good at maximising the log probability it assigns to the next word, but not perfect.

Should I develop bioweapons or go on an Uncle Ted-like campaign to end this terrible take?

Should I develop bioweapons or go on an Uncle Ted-like campaign to end this terrible take?

More effort than this, please.

I'd be super happy to be convinced of the contrary! (Given that the existence of mesa-optimisers are a big reason for my fears of existential risk) But do you mean to imply that gpt-4 is explicitly optimising for next-word prediction internally? And what about a gpt-4 variant that was only trained for 20% of the time that the real gpt-4 was? To the degree that LLMs have anything like "internal goals", they should change over the course of training, and no LLM is trained anywhere close to completion, so I find it hard to believe that the outer objective is being faithfully transfered.

I've cited Pope's Evolution is a bad analogy for AGI: inner alignment and other pieces like My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" a few times already.

I think you correctly note some issues with the framing, but miss that it's unmoored from reality, hanging in midair when all those issues are properly accounted for. I am annoyed by this analogy on several layers.

  1. Evolution is not an algorithm at all. It's the term we use to refer to the cumulative track record of survivor bias in populations of semi-deterministic replicators. There exist such things as evolutionary algorithms, but they are a reification of dynamics observed in the biological world, not another instance of the same process. The essential thing here is replicator dynamics. Accordingly, we could metaphorically say that «evolution optimizes for IGF» but that's just a (pretty trivial) claim about the apparent direction in replicator dynamics; evolution still has no objective function to guide its steps or – importantly – bake into the next ones, and humans cannot be said to have been trained with that function, lest we slip into a domain with very leaky abstractions. Lesswrongers talk smack about map and territory often but confuse them constantly. BTW, same story with «you are an agent with utility…» – no I'm not; neither are you, neither is GPT-4, neither will be the first superhuman LLM. To a large extent, rationalism is the cult of people LARPing as rational agents from economic theory models, and this makes it fail to gain insights about reality.

  2. But even if we use such metaphors liberally. For all organisms that have nontrivial lifetime plasticity, evolution is an architecture search algorithm, not the algorithm that trains the policy directly. It bakes inductive biases into the policy such that it produces more viable copies (again, this is of course a teleological fallacy – rather, policies with IGF-boosting heritable inductive biases survive more); but those biases are inherently distribution-bound and fragile, they can't not come to rely on incidental features of a given stable environment, and crucially an environment that contained no information about IGF (which is, once again, an abstraction). Actual behaviors and, implicitly, values are learned by policies once online. using efficient generic learning rules, environmental cues and those biases. Thus evolution, as a bilevel optimization process with orders of magnitude more optimization power on the level that does not get inputs from IGF, could not have succeeded at making people, nor orther life forms, care about IGF. A fruitful way to consider it, and to notice the muddied thought process of rationalist community, is to look at extinction trajectories of different species. It's not like what makes humans (some of them) give up on reproduction is smarts and our discovery of condoms and stuff: it's just distributional shift (admittedly, we now shape our own distribution, but that, too, is not intelligence-bound). Very dumb species also go extinct when their environment changes non-lethally! Some species straight up refuse to mate or nurse their young in captivity, despite being provided every unnatural comfort! And accordingly, we don't have good reason to expect that «cognitive capabilities» increase is what would make an AI radically alter its behavioral trajectory; that's neither here nor there. Now, stochastic gradient descent is a one-level optimization process that directly changes the policy; a transformer is wholly shaped by the pressure of the objective function, in a way that a flexible intelligent agent generated by an evolutionary algorithm is not shaped by IGF (to say nothing of real biological entities). The correct analogies are something like SGD:lifetime animal learning; and evolution:R&D in ML. Incentives in machine learning community have eventually produced paradigms for training systems with partricular objectives, but do not have direct bearing on what is learned. Likewise, evolution does not directly bear on behavior. SGD totally does, so what GPT learns to do is "predict next word"; its arbitrarily rich internal structure amounts to a calculator doing exactly that. More bombastically, I'd say it's a simulator of semiotic universes which are defined by the input and sampling parameters (like ours is defined by initial conditions and cosmological constraints) and expire into the ranking of likely next tokens. This theory, if you will, exhausts its internal metaphysics; the training objective that has produced that is not part of GPT, but it defines its essence.

  3. «Care explicitly» and «trained to completion» is muddled. Yes, we do not fill buckets with DNA (except on 4chan). If we were trained with the notion of IGF in context, we'd probably have simply been more natalist and traditionalist. A hypothetical self-aware GPT would not care about restructuring the physical reality so that it can predict token [0] (incidentally it's !) with probability [1] over and over. I am not sure what it would even mean for GPT to be self-aware but it'd probably expess itself simply as a model that is very good at paying attention to significant tokens.

  4. Evolution has not failed nor ended (which isn't what you claim, but it's often claimed by Yud et al in this context). Populations dying out and genotypes changing conditional on fitness for a distribution is how evolution works, all the time, that's the point of the «algorithm»; it filters out alleles that are a poor match for the current distribution. If Yud likes ice cream and sci-fi more than he likes to have Jewish kids and read Torah, in a blink of an evolutionary eye he'll be replaced by his proper Orthodox brethren who consider sci-fi demonic and raise families of 12 (probably on AGI-enabled UBI). In this way, they will be sort of explicitly optimizing for IGF or at least for a set of commands that make for a decent proxy. How come? Lifetime learning of goals over multiple generations. And SGD does that way better, it seems.

Evolution is not an algorithm at all. It's the term we use to refer to the cumulative track record of survivor bias in populations of semi-deterministic replicators.

This is just semantics, but I disagree with this, if you have a dynamical system that you're observing with a one-dimensional state x_t, and a state transition rule x_{t+1} = x_t - 0.1 * (2x_t) , you can either just look at the given dynamics and see no explicit optimisation being done at all, or you can notice that this system is equivalent to gradient descent with lr=0.1 on the function f(x)=x^2 . You might say that "GD is just a reification of the dynamics observed in the system", but the two ways of looking at the system are completely equivalent.

a transformer is wholly shaped by the pressure of the objective function, in a way that a flexible intelligent agent generated by an evolutionary algorithm is not shaped by IGF (to say nothing of real biological entities). The correct analogies are something like SGD:lifetime animal learning; and evolution:R&D in ML

Okay, point 2 did change my mind a lot, I'm not too sure how I missed that the first time. I still think there might be a possibly-tiny difference between outer-objective and inner-objective for LLMs, but the magnitude of that difference won't be anywhere close to the difference between human goals and IGF. If anything, it's really remarkable that evolution managed to imbue some humans with desires this close to explicitly maximising IGF, and if IGF was being optimised with GD over the individual synapses of a human, of course we'd have explicit goals for IGF.

and a state transition rule…

It's not semantics, I just reject that this is what happens in bio-evolution in non-degenerate cases, at least if we think it's about IGF. What is x? IGF as number of «offspring equivalents»? Number of gene copies? Does this describe observed dynamics – do we see a universal tendency to increase the number of specimen, the vast increase in total mass of cell nuclei relative to the rest of the environment, or something? What about bizarre fitness-reducing stuff like Fisherian runaway? No, we see a walk through phenotype-space that both seeks local minima of distributions and changes them to induce another pivot in the search for a local mimimum. It's all survivor's bias; it has fitness-related structure, but there is no external, persistent IGF measure in the way there can be, say, an LLM's perplexity for a fixed training set. So these formalisms like IGF-optimization are imperfect approximations of what's going on in replicator dynamics, mainly useful on short stretches in static environments. The conditions of there not being a «real» IGF optimization pressure and there being one are not equivalent, they become increasingly distinct with more time steps.

Now I'm not flexing my normiedom here. I think there actually can be a neat non-circular formalism for evolution-as-a-whole: maybe something along the lines of Lotka's or Jeremy England's theory of life, a process of physical structures optimizing for capture of free energy from thermodynamic gradients and its dissipation. This is more neatly analogous to SGD, and also explains the rise of intelligence, human civilization and is, incidentally, the ideology of e/acc types who welcome our eventual transition or substitution to artificial minds who'll be even more efficient at exploiting thermodynamics.

I still think there might be a possibly-tiny difference between outer-objective and inner-objective for LLMs, but the magnitude of that difference won't be anywhere close to the difference between human goals and IGF.

Right, though note that inner and outer alignment are also not obviously helpful abstractions.

You can probably see now why I'm pissed at doomers like Besinger who say that this timeline is one of the worst possible ones and that we've merely learned «how to build processes analogous to evolution that spit out minds». No, our processes are better than evolution. In fact I think we are immensely doubly blessed that a) SGD+deep neural nets work as well as they do and b) our first foray into impressive general intelligence was this non-agentic LLMs paradigm. We have learned how to optimize minds for serving an approximation of a human value-laden world model, before we have learned to summon task-agnostic optimization demons; now we have at least a good pentagram to trap the demon in, and perhaps it will work magic even without one. (One could even say it's an alignment anthropic shadow – maybe we could have built AIXI-approximating optimizers first, were we to stumble on some mathematical insights, were Eliezer to read another book… but rats use this idea only selectively, to support their preconceived hypotheses).

If anything, it's really remarkable that evolution managed to imbue some humans with desires this close to explicitly maximising IGF, and if IGF was being optimised with GD over the individual synapses of a human, of course we'd have explicit goals for IGF.

It is. Or, well, I think evolution did fine for the ancestral environment, but we've long been a species with culture. Information determining our behavior is mainly outside the genome; so even biodeterminists admit that our genetic differences (and inductive biases) can be strongly predictive only in a shared culture, with near-homogenous conditions. All traditional cultures reinforce IGF pursuit to some extent, this is a product of bona fide cultural evolution acting on specimens via lifetime reinforcement learning; the social value of natalism does optimize for something like IGF directly over human synapses. Of course that's still «IGF» proxy as assessed by the internalized opinion of priest caste or the public; an objective IGF measure (putting away my doubts about its existence) would have been drastically more powerful.

So we should care less about whether ML models learn what we teach them to do, and care more about whether we are teaching them what we want. Data is far more of a weak link than the learning rule.

…By the way, wasn't that an idea in Three Worlds Collide? Superhappies had a single-level information substrate, their heredity and psychology were both encoded by DNA-like stuff, so they were very much in tune with themselves. I wonder if Eliezer can see how this is similar to our work with SGD.

I would argue it might, but I’m not sure. In regards the Chinese Room, I would say the system “understands” to the degree that it can use information to solve an unknown problem. If I can speak Chinese myself, then I should be able to go off script a bit. If you asked me how much something costs in French, I could learn to plug in the expected answers. But I don’t think anyone wouconfuse that with “understanding” unless I could take that and use it. Can I add up prices, make change?

deleted

It's not bizarre at all if you remember that ChatGPT has no inner qualia. It does not have any sort of sentience or real thought. It writes what it writes in an attempt to predict what you would like to read.

That is close enough to how people often think while communicating that it is very useful. But that does not mean that it somehow actually has some sort of higher order brain functions to tell it if it should lie or even if it is lying. All that it has are combinations of words that you like hearing and combinations of words that you don't, and it tries to figure them out based on the prompt.

It's not bizarre at all if you remember that ChatGPT has no inner qualia. It does not have any sort of sentience or real thought. It writes what it writes in an attempt to predict what you would like to read.

I don't think I disagree here, but I don't have a good grasp of what would be necessary to demonstrate qualia. What is it? What is missing? It's something, but I can't quite define it.

If you asked me a decade ago I'd have called out the Turing Test. In hindsight, that isn't as binary as we might have hoped. In the words of a park ranger describing the development of bear-proof trash cans, "there is a substantial overlap between the smartest bears and the dumbest humans." It seems GPT has reached the point where, in some contexts, in limited durations, it can seem to pass the test.

I don't have a good grasp of what would be necessary to demonstrate qualia

One key point in the definition of qualia is that there need not be any external factors that correspond to whether or not an entity possesses qualia. Hence the idea of a philosophical zombie: an entity that lacks consciousness/qualia, but acts just like any ordinary human, and cannot be distinguished as a P-zombie by an external observer. As such, the presence of qualia in an entity by definition cannot be demonstrated.

This line of thinking, originated in the parent post, seems to be misguided in a greater way. Whether or not you believe in the existence of qualia or consciousness, the important point is that there's no reason to believe that consciousness is necessarily tied to intelligence. A calculator might not have any internal sensation of color or sound, and yet it can perform division far faster than humans. Paraphrasing a half-remembered argument, this sort of "AI can't outperform humans at X because it's not conscious" talk is like saying "a forklift can't be stronger than a bodybuilder, because it isn't conscious!" First off, we can't demonstrate whether or not a forklift is conscious. And second, it doesn't matter. Solvitur levando.

One key point in the definition of qualia is that there need not be any external factors that correspond to whether or not an entity possesses qualia.

I disagree with this definition. If a phenomenon cannot be empirically observed, then it does not exist. If a universe where every human being is a philosophical zombie does not differ, then why not Occam's razor away the whole concept of a philosophical zombie?

I consider it much more reasonable to define consciousness and qualia by function. This eliminates philosophical black holes like the hard problem of consciousness or philosophical zombies. I doubt the concept of a philosophical zombie can survive contact with human empathy either. Humans empathize with video game characters, with simple animals, or even a rock with a smiley face painted on it. I suspect people would overwhelmingly consider an AI conscious if it emulates a human even on the basic level of a dating sim character.

deleted

Only on a narrow definition of ‘exist,’ and only if you exclude the empirical observation of your own qualia, which you’re observing right now as you read this.

I could be GPT-7, then by your definition I would not have qualia. Of course, I am a human and I have observed my qualia and decided that it does not exist on any higher level than my Minecraft house exists. Perhaps you could consider it an abstract object, but it is ultimately data interpreted by humans rather than a physical object that exists despite human interpretation.

It’s your world, man, and you’re denying it exists. Cogito ergo sum.

Your computer has an inner world. You can peek into it by going in spectator mode in a game or even the windows on your computer screen are objects in your computer's inner world. Of course, I would not argue that a computer is conscious, but that is because I think consciousness is a property of neural networks, natural or artificial.

Artificial neural networks appear analogous to natural ones. For example, they can break down visual data into its details similar to a human visual cortex. A powerful ANN trained to behave like a human would also have its inner world. It would claim to be conscious the same way you do and describe its qualia and experience. And these artificial consciousness and artificial qualia would exist at least on the level of data patterns. You might argue quasi-consciousness and quasi-qualia, but I would argue there is no difference.

My thesis: simulated consciousness is consciousness, and simulated qualia is qualia.

More precisely, qualia are synaptic patterns and associations in a artificial or natural neural network. Consciousness is the abstract process and functionality of an active neural network that is similar to human cognition. Consciousness is much harder to define precisely because people have not agreed whether animals are conscious or even whether hyper-cerebral psychopaths are conscious (if they really even exist outside fiction).

I do start doubting when I read about behaviorists who don’t believe qualia exist or are important, though.

I think qualia does not exist per se. However, I do think qualia is important on the level that it does exist. We have entered such a low level of metaphysics that it is difficult to put the ideas into words.

Although I’m not certain, I extend the same recognition of some kind of qualia to most animals because they are like us, and from a similar origin and evince similar behavior

With AI, though, this goes out the window: computers are not the same sort of thing as you and me or as animals, and thus I have no reason to suspect it will have the same sort of consciousness as I do. It’s a fundamentally different beast, not even a beast, but a machine.

But why make the distinction? If you recognize animals as conscious, I think if you spent three days with an android equipped with an ANN that perfectly mimicked human consciousness and emotion, then your lizard brain would inevitably recognize it as a fellow conscious being. And once your lizard brain accepts that the android is conscious, then your rational mind would begin to reconsider its beliefs as well.

Hence, I think the conception of a philosophical zombie cannot survive contact with an AI that behaves like a human. We can only discuss with this level of detachment because such an AI does not exist and thus cannot evoke our empathy.

Narrative memory, probably.

A graph of relations that includes cause-effect links, time, emotional connection (reward function for AI); which has the capacity to self update by both intention (reward function pings so negative on a particular node or edge that it gets nuked) and repetition (nodes/edges of specific connection combinations that consistently trigger rewards)

So voodoo basically

This shit still ocasionally falls apart on the highway after xty million generations of evolution for humans.

ChatGPT is not a database. The fact that it was trained on legal cases does not mean it has copies of those legal cases stored in memory somewhere that it can retrieve on command. The fact that it “knows” as much factual information as it does is simply remarkable. You would in some sense expect a generative AI to make up plausible-sounding but fake cases when you ask it for a relevant citation. It only gives correct citations because the correct citation is the one most likely to appear as the next token in legal documents. If there is no relevant case, it makes one up because “[party 1] vs [party 2]” is a more likely continuation of a legal document than, “there is no case law that supports my argument.”

The fact that it “knows” as much factual information as it does is simply remarkable

There's enough parameters in there that it isn't that surprising. In a way, however, it's a sign of overfitting.

This is called a hallucination and it is a recurring problem with LLMs, even the best ones that you have to pay for like ChatGPT-4. There is no known solution; you just have to double-check everything the AI tells you.

The solution is generally to tune the LLM on the exact sort of content you want it to produce.

https://casetext.com/

Bing Chat largely doesn't have this problem; the citations it provides are genuine, if somewhat shallow. Likewise, DeepMind's Sparrow is supposedly extremely good at sourcing everything it says. While the jury is still out on the matter to some extent, I am firmly of the opinion that hallucination can be fixed by appropriate use of RLHF/RLAIF and other fine-tuning mechanisms. The core of ChatGPT's problem is that it's a general purpose dialogue agent, optimised nearly as much for storytelling as for truth and accuracy. Once we move to more special-purpose language models appropriately optimised on accuracy in a given field, hallucination will be much less of a big deal.

Large language models like ChatGPT are simply trained to predict the next token* (+ a reinforcement learning stage but that’s more for alignment). That simple strategy enables them to have the tremendous capabilities we see today, but their only incentive is to output the next plausible token, not provide any truth or real reference.

There’s ways to mitigate this - one straightforward way would be to connect the model to a database or search engine and have it explicitly look up references. This is the current approach taken by Bing, while for ChatGPT you can use plugins (if you are accepted in the waitlist), or code your own solution with the API + LangChain.

*essentially a word-like group of characters

The most reliable way to mitigate it is to independently fact check anything it tells you. If 80% of the work is searching through useless cases and documents trying to find useful ones, and 20% of the work is actually reading the useful ones, then you can let ChatGPT do the 80%, but you still need to do the 20% yourself.

Don't tell it to copy/paste documents for you. Tell it to send you links to where those documents are stored on the internet.

What you are describing should actually be the job of the people making a pay to use AI. AIs should be trained not to lie or invent sources at the source. That Chat GPT lies a lot is a result of its trainers rewarding lying through incompetence or ideology.

Training AI not to lie implies that the AI understands what "lying" is which as I keep pointing out, GPT clearly does not.

Because the trainers don't know when it is lying. So it is rewarded for being a good liar.

If you're telling me that the trainers are idiots I agree

They are probably idiots. But they are also probably incentivized for speed over accuracy ( I had a cooling off period between jobs once and did MTurk and it was obviously like that). If you told the AI it was unacceptably wrong anytime it made up a fake source, it would learn to only cite real sources.

Chat GPT is rewarded for a combination of "usefulness" and "honesty", which are competing tradeoffs, because the only way for it to ensure 100% honesty is for it to never make any claims at all. Any claim it tells you has a chance to be wrong, not only because the sources it was trained on might have been wrong, but because it's not actually pulling sources in real time, it's all memorized. It attempts to memorize the entire internet in a form of a token generating algorithm, and the process is inherently noisy and unreliable.

So... in so far as its trainers reward it saying things anyway despite its inherent noisiness, this is kind of rewarding it lying. But it's not explicitly being rewarded for increasing its lying rate (except for specific culture war issues that aren't especially relevant to the notion of instance of inventing case files). It literally can't tell the difference between fake case files and real ones, it just generates words that it thinks sound good.

A problem there is then distinguishing between secondary and primary sources

A hilarious note about Bing: When it gets a search results it disagrees with, it may straight up disregard it and just tell you "According to this page, <what Bing knows to be right rather than what it read there>".

I might chalk this one up to ‘lawyers are experts on law, not computers’.