This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
How many fancy linguistic theories have been thrown into the dustbin of history by brute-forcing a neural network on digital libraries? Look how linguists like Noam Chomsky and Emily M Bender cover their ears and squeal in pure terror as their life’s work is falsified before their very eyes.
Can someone spell out how this falsification works? Do we actually understand how LLMs parse things? Or if you don't think they parse, then does anyone know what the hell they do instead?
As far as I know, the argument goes something like, attention mechanism, context matters, yada yada. Which doesn't really cut it.
Falsifying a theory does not, in fact, require proposing an alternative theory – only showing how the theory's predictions do not come to pass, and predictions of generative linguists have absolutely failed.
Though on a broad level I'd say that, yes, we do know how LLMs "parse things", which is why we can build them. They are a successful and very informative application of a specific theory about language acquisition (and more generally statistical learning, see Chomsky vs Norvig debate, the Bitter Lesson, etc.)
How is the specification of the attention mechanism, informed by decades of research in NLP, less a proof of understanding than Chomskyite's purely speculative hot takes like merge? It's not like we randomly sampled through the space of architectures until something clicked and a shoggoth was summoned (even if Yud believes this is how it goes). This progress in machine learning is research and gaining understanding in the classical scientific manner, even if it's often looked down upon, whereas the sort of "understanding" and "interpretability" that linguists and safetyists require is Talmudic verbal magic, conveniently compact and "elegant" by the standards of a comic book, where every symbol of the incantation can be resolved into human-parseable logic in each activation. Sorry, Grothendieck didn't get it and neither will we. That's okay. That's how science works.
Consider the article linked here
https://www.themotte.org/post/421/culture-war-roundup-for-the-week/79642?context=8#context
Thanks.
I'd like to say I'm going to read and absorb your links, but we'll see if I get time.
More options
Context Copy link
Thanks for the link; I had missed that post.
I'm not sure I understand the argument, though. Clearly LLM's don't have Chomsky's concept of what universal grammar looks like hard-coded into them, but that seems like a pretty weak proof that humans don't either. To me, that argument sounds like "now that we've built an airplane, we know wing-flapping is not relevant to flight in birds". It's pretty basic math to show multi-layer perceptrons (better known as neural nets) can approximate any computable function, and yet there's interesting things to say about the structure of plenty of computable functions.
One way to see the distinction is to look at the difference in response between humans and LLMs on nonsense inputs. For instance this YouTube video about glitch tokens mentions the sentence "profit usageDuel creeping Eating Yankees USA USA USA USA" which GPT-3 highly confidently predicts will continue "USA". A human is going to predict the sentence is not grammatical and the speaker is possibly having a stroke and needs medical attention.
GPT-3 is a naive token predictor, while humans have situational awareness and social cognition; crudely analogizing, they always interpret any text string with the frame like «you are a person called $name, located in $place, it is $time etc. etc.; the $entity is producing [text], what do you make of it and how do you respond?». We don't run this script explicitly, but then again this is what our life is about, we can't not keep its values in context. LLMs «live» in the text-world, or rather are text-worlds; persistent humanlike contexts have to be finetuned or prompt-engineered into them to yield humanlike reactions.
It's perfectly cromulent to infer that the next token will also be "USA". A language model finetuned on dialogue assistance, which provides it with some fascimile of the above human qualities, will respond differently. E.g.:
None of this is terribly relevant to the crux of Chomsky's linguistic theory and its failure.
Once again I recommend at least skimming the article. Sometimes people who write academic papers summarizing decades of research actually anticipate common-sensical comebacks (not always). And in the age of GPT-4 I don't feel like summarizing it.
There are weak and strong versions of the Chomskian thesis between which linguists oscillate depending on convenience – motte and bailey, as it happens. The motte is that LLMs may work in some relevant sense, but humans have innate linguistic priors or especial predisposition to learn «natural» languages; it's basically true, I think.
The half-bailey is that natural languages have certain nontrivial universal properties Chomsky describes, and there exist very specific genetically encoded operators and schemas for parsing and generating semantically coherent natural language utterances.
The full bailey is that a) those adaptations have emerged basically instantly, undergone an evolutionary step change in the relatively recent history of our species; and b) that language is essentially not learnable for any system without the corresponding inductive biases, irrespective of the compute and data we throw at the problem (or at least not learnable at economically feasible scale); and c) that the best that can be achieved with statistical learning not biased by those adaptations is some stochastic parroting.
This bailey, in turn, inflates the prior for the half-bailey from «a bold hypothesis, Cotton», to «very likely true!» and allows to smuggle it back into the motte, e.g. claiming that humans can't learn statistically because it'd be computationally intractable and produce gibberish. Look at what Chomsky explicitly says in his NYT opinion:
Naturally the John sentence is a contrived problem, a colossal dumbing down in comparison to industrial and research benchmarks like Winogrande… and crucially it's bullshit, as anyone who's played around with SoTA models can understand. People have instantly checked it. GPT-3.5 can understand the sentence perfectly well. GPT-4 can fucking parse its morphology on the level of a linguistics undergrad, and output a renderable scheme.
And for the hell of it, here's something from my GPT4All-7B (a 4.2 Gb file that can run on a potato-tier system), model file hash
963fe3761f03526b78f4ecd67834223d
. Even Chomsky can reproduce it, if he so chooses and asks some student with a laptop to help out (hi Noam):Another run, same seed:
It fails hard in many scenarios, but the point stands. Those are not cherrypicked examples.
Again; Chomsky dismissed GPT-3.5-175B as linguistically incompetent. This is something 25 times smaller, finetuned on a set of GPT-3.5 generations by 4chan edgelords yesterday, with some mediocre sampler. Rather then merely beating his challenge, it helps us notice how Chomsky is similar to John, I believe.
A proper investigation would be not asking an LLM trick questions like in a Russian prison, but generate a large set of possible phrasings, run them with different seeds and conclude whether LLMs are indeed statistically significantly worse than humans at parsing such utterances correctly. But that's science. Chomsky is a public intellectual – a priest and a guru; science is beneath him.
Like, come on, this is a slam dunk as far as empirical science is concerned. How can a civil discussion be had with those hacks until they update on the crushing immensity of evidence and cease their ignorant elitist pooh-poohing of a vastly superior paradigm?
You have it backwards. It's not that LLM proponents claim birds don't need to flap their wings (although they do argue that flapping is indeed not necessary in the general sense). It's Chomsky who says that whatever planes do is not meaningfully flying, because birds have special bird-flight-ness of almost mathematical elegance, which is not reducible to normal biomechanics and aerodynamics, which he can't show or reproduce, but which he can write hundreds of papers about.
Well, GPT-4 can churn out not-even-wrong deepities fast enough to drown his whole field, and this couldn't come a moment too soon.
Thank you for the in-depth explanation; I was misunderstanding the claim. I agree that Chomsky's claims as you describe them are utter nonsense and display either a complete failure to comprehend complexity theory and theoretical machine learning or a non-scientific belief in dualism. And I'm pretty sure Chomsky understands complexity theory, given there's a core concept in it literally named for him.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link