site banner

Culture War Roundup for the week of March 27, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

11
Jump in the discussion.

No email address required.

a lot haven't taken the time to investigate what I think are the two philosophical domains most important to curret LMMs and "baby" AIs; epistemology and linguistics.

How many fancy linguistic theories have been thrown into the dustbin of history by brute-forcing a neural network on digital libraries? Look how linguists like Noam Chomsky and Emily M Bender cover their ears and squeal in pure terror as their life’s work is falsified before their very eyes.

Can someone spell out how this falsification works? Do we actually understand how LLMs parse things? Or if you don't think they parse, then does anyone know what the hell they do instead?

As far as I know, the argument goes something like, attention mechanism, context matters, yada yada. Which doesn't really cut it.

Falsifying a theory does not, in fact, require proposing an alternative theory – only showing how the theory's predictions do not come to pass, and predictions of generative linguists have absolutely failed.

Though on a broad level I'd say that, yes, we do know how LLMs "parse things", which is why we can build them. They are a successful and very informative application of a specific theory about language acquisition (and more generally statistical learning, see Chomsky vs Norvig debate, the Bitter Lesson, etc.)

How is the specification of the attention mechanism, informed by decades of research in NLP, less a proof of understanding than Chomskyite's purely speculative hot takes like merge? It's not like we randomly sampled through the space of architectures until something clicked and a shoggoth was summoned (even if Yud believes this is how it goes). This progress in machine learning is research and gaining understanding in the classical scientific manner, even if it's often looked down upon, whereas the sort of "understanding" and "interpretability" that linguists and safetyists require is Talmudic verbal magic, conveniently compact and "elegant" by the standards of a comic book, where every symbol of the incantation can be resolved into human-parseable logic in each activation. Sorry, Grothendieck didn't get it and neither will we. That's okay. That's how science works.

Consider the article linked here

https://www.themotte.org/post/421/culture-war-roundup-for-the-week/79642?context=8#context

Thanks.

I'd like to say I'm going to read and absorb your links, but we'll see if I get time.

Thanks for the link; I had missed that post.

I'm not sure I understand the argument, though. Clearly LLM's don't have Chomsky's concept of what universal grammar looks like hard-coded into them, but that seems like a pretty weak proof that humans don't either. To me, that argument sounds like "now that we've built an airplane, we know wing-flapping is not relevant to flight in birds". It's pretty basic math to show multi-layer perceptrons (better known as neural nets) can approximate any computable function, and yet there's interesting things to say about the structure of plenty of computable functions.

One way to see the distinction is to look at the difference in response between humans and LLMs on nonsense inputs. For instance this YouTube video about glitch tokens mentions the sentence "profit usageDuel creeping Eating Yankees USA USA USA USA" which GPT-3 highly confidently predicts will continue "USA". A human is going to predict the sentence is not grammatical and the speaker is possibly having a stroke and needs medical attention.

which GPT-3 highly confidently predicts will continue "USA"

GPT-3 is a naive token predictor, while humans have situational awareness and social cognition; crudely analogizing, they always interpret any text string with the frame like «you are a person called $name, located in $place, it is $time etc. etc.; the $entity is producing [text], what do you make of it and how do you respond?». We don't run this script explicitly, but then again this is what our life is about, we can't not keep its values in context. LLMs «live» in the text-world, or rather are text-worlds; persistent humanlike contexts have to be finetuned or prompt-engineered into them to yield humanlike reactions.

It's perfectly cromulent to infer that the next token will also be "USA". A language model finetuned on dialogue assistance, which provides it with some fascimile of the above human qualities, will respond differently. E.g.:


> what is the continuation of "profit usageDuel creeping Eating Yankees USA USA USA USA"?

The sentence after this one, which would be something like 'Yankies are delicious' or perhaps even more likely to have been written by a bot.  

None of this is terribly relevant to the crux of Chomsky's linguistic theory and its failure.

Once again I recommend at least skimming the article. Sometimes people who write academic papers summarizing decades of research actually anticipate common-sensical comebacks (not always). And in the age of GPT-4 I don't feel like summarizing it.

There are weak and strong versions of the Chomskian thesis between which linguists oscillate depending on convenience – motte and bailey, as it happens. The motte is that LLMs may work in some relevant sense, but humans have innate linguistic priors or especial predisposition to learn «natural» languages; it's basically true, I think.

The half-bailey is that natural languages have certain nontrivial universal properties Chomsky describes, and there exist very specific genetically encoded operators and schemas for parsing and generating semantically coherent natural language utterances.

The full bailey is that a) those adaptations have emerged basically instantly, undergone an evolutionary step change in the relatively recent history of our species; and b) that language is essentially not learnable for any system without the corresponding inductive biases, irrespective of the compute and data we throw at the problem (or at least not learnable at economically feasible scale); and c) that the best that can be achieved with statistical learning not biased by those adaptations is some stochastic parroting.

This bailey, in turn, inflates the prior for the half-bailey from «a bold hypothesis, Cotton», to «very likely true!» and allows to smuggle it back into the motte, e.g. claiming that humans can't learn statistically because it'd be computationally intractable and produce gibberish. Look at what Chomsky explicitly says in his NYT opinion:

But ChatGPT and similar programs are, by design, unlimited in what they can “learn” (which is to say, memorize); they are incapable of distinguishing the possible from the impossible. Unlike humans, for example, who are endowed with a universal grammar that limits the languages we can learn to those with a certain kind of almost mathematical elegance, these programs learn humanly possible and humanly impossible languages with equal facility. Whereas humans are limited in the kinds of explanations we can rationally conjecture, machine learning systems can learn both that the earth is flat and that the earth is round. They trade merely in probabilities that change over time.

For this reason, the predictions of machine learning systems will always be superficial and dubious. Because these programs cannot explain the rules of English syntax, for example, they may well predict, incorrectly, that “John is too stubborn to talk to” means that John is so stubborn that he will not talk to someone or other (rather than that he is too stubborn to be reasoned with). Why would a machine learning program predict something so odd? Because it might analogize the pattern it inferred from sentences such as “John ate an apple” and “John ate,” in which the latter does mean that John ate something or other. The program might well predict that because “John is too stubborn to talk to Bill” is similar to “John ate an apple,” “John is too suborn to talk to” should be similar to “John ate.” The correct explanations of language are complicated and cannot be learned just by marinating in big data.

Given the amorality, faux science and linguistic incompetence of these systems, we can only laugh or cry at their popularity.

Naturally the John sentence is a contrived problem, a colossal dumbing down in comparison to industrial and research benchmarks like Winogrande… and crucially it's bullshit, as anyone who's played around with SoTA models can understand. People have instantly checked it. GPT-3.5 can understand the sentence perfectly well. GPT-4 can fucking parse its morphology on the level of a linguistics undergrad, and output a renderable scheme.

And for the hell of it, here's something from my GPT4All-7B (a 4.2 Gb file that can run on a potato-tier system), model file hash 963fe3761f03526b78f4ecd67834223d. Even Chomsky can reproduce it, if he so chooses and asks some student with a laptop to help out (hi Noam):


main: seed = 1680209684

llama_model_load: loading model from 'gpt4all-lora-unfiltered-quantized.bin' - please wait ...

[…]

sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


[…]

> How do you understand the statement "John is too stubborn to talk to"?

The statement suggests John's unwillingness or inability to communicate with others, which could be due to his being overly opinionated and not open-minded. It may also indicate that he has a tendency towards closed-minded thinking and is unable/unwilling to consider other perspectives on issues important to him.

> What do you mean by "unwillingness or inability to communicate with others" in this context?

The statement suggests that John has a tendency towards closed-minded thinking and is unable/unwilling to consider other perspectives on issues important to him.

> Could it mean, rather, that John is so stubborn that he will not talk to someone or other?

Yes! That's a possibility as well. It could also be interpreted in the context of this conversation/discussion about communication and openness towards others.

> In your estimation, which version is more plausible?

Based on my research, I believe that both versions are possible but it's hard to tell without any additional information or contextual clues about the conversation/discussion in question.

> In general, how should we understand the statement "X is too stubborn to talk to"?

The phrase could be interpreted as a criticism of X's unwillingness to communicate or engage with others on an equal level. It may also suggest that they are inflexible and uncompromising in their beliefs, which can lead them to become isolated from the rest of society due to their refusal to listen or consider other perspectives.

Another run, same seed:


> Bill says; "John is too stubborn to talk to". What does he mean?

It means John refuses or fails to comply with requests, demands, orders etc., and will not be persuaded by any argument.

> Provide one alternative explanation.

Another possible interpretation of Bill's statement is that he believes it would take too much effort or time to convince John about something, so there may already have been a previous disagreement between them on this topic and they are now at an impasse where neither party will budge.

It fails hard in many scenarios, but the point stands. Those are not cherrypicked examples.

Again; Chomsky dismissed GPT-3.5-175B as linguistically incompetent. This is something 25 times smaller, finetuned on a set of GPT-3.5 generations by 4chan edgelords yesterday, with some mediocre sampler. Rather then merely beating his challenge, it helps us notice how Chomsky is similar to John, I believe.

A proper investigation would be not asking an LLM trick questions like in a Russian prison, but generate a large set of possible phrasings, run them with different seeds and conclude whether LLMs are indeed statistically significantly worse than humans at parsing such utterances correctly. But that's science. Chomsky is a public intellectual – a priest and a guru; science is beneath him.

Like, come on, this is a slam dunk as far as empirical science is concerned. How can a civil discussion be had with those hacks until they update on the crushing immensity of evidence and cease their ignorant elitist pooh-poohing of a vastly superior paradigm?

You have it backwards. It's not that LLM proponents claim birds don't need to flap their wings (although they do argue that flapping is indeed not necessary in the general sense). It's Chomsky who says that whatever planes do is not meaningfully flying, because birds have special bird-flight-ness of almost mathematical elegance, which is not reducible to normal biomechanics and aerodynamics, which he can't show or reproduce, but which he can write hundreds of papers about.

Well, GPT-4 can churn out not-even-wrong deepities fast enough to drown his whole field, and this couldn't come a moment too soon.

The correct explanations of language are complicated and cannot be learned just by marinating in big data.

[...]

You have it backwards. It's not that LLM proponents claim birds don't need to flap their wings (although they do argue that flapping is indeed not necessary in the general sense). It's Chomsky who says that whatever planes do is not meaningfully flying, because birds have special bird-flight-ness of almost mathematical elegance, which is not reducible to normal biomechanics and aerodynamics, which he can't show or reproduce, but which he can write hundreds of papers about.

Thank you for the in-depth explanation; I was misunderstanding the claim. I agree that Chomsky's claims as you describe them are utter nonsense and display either a complete failure to comprehend complexity theory and theoretical machine learning or a non-scientific belief in dualism. And I'm pretty sure Chomsky understands complexity theory, given there's a core concept in it literally named for him.