This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
That's unfair, it really struggles comprehending that words are comprised of individual letters. It's opaque to the alphabet by the nature of what it can see and learn from. I asked it to generate anagrams and it was absolutely hopeless at it. It gave me nonsense like 'overwrite is an anagram of obverse'. When I really coaxed it for an anagram of obverse and observe, it gave me rubbish like 'oversbe' and 'beovers' but recognized they weren't words. It couldn't get verbose, which was really ironic seeing as it was incredibly verbose in its descriptions.
It also could not answer a question about perfect numbers, it could not find the pattern between 8128, 496 and 28 no matter how I coaxed it. I doubt a human would've made that error after being prodded and poked toward seeing the answer.
But I don't talk about the other 50 university challenge questions it got right. I don't talk about the fairly creative and reasonable ideas it came up with for how to redesign vehicles. Teething problems like the perfect number issue and your time zone issue could surely be solved by increasing the power a thousandfold - that's what GPT-4 will be doing. At least it's likely that's the case. The anagram problem or 'word ending in i' problem requires a different kind of data processing but it's not really that important. You don't need to be able to identify anagrams to be functionally intelligent and achieve things in the real world.
If we have an intelligence that's 95-98% human-level, with superhuman speed and knowledge, we're not that far from AGI.
We've been seeing, and can predict, that there are plenty of tasks which can be done by machines at the 95%-98% level, but which can't be done better than that by machines because the 95%-98% are the low hanging fruit and the remaining few percent requires much more intelligence. (Self-driving cars are one of these, but it's been known for far longer than this.)
More options
Context Copy link
My experience with it is different. I've never seen it answer any question intelligently. It can fool me into thinking it's intelligent by being extremely verbose and pivoting from the question to some generic pat that is vaguely on topic. There is something fundamental missing.
What examples did you find of it behaving unintelligently? I think they just programmed it to be verbose after so many episodes of people asking it to elaborate in the previous configuration. I agree that it just doesn't know when to shut up.
For example:
Now the better answer would've been 'blank', since clean verse isn't really a thing. But clean is pretty good. I think clean verse could be a thing. That's a fairly intelligent answer.
It's pretty good at maths too:
It also got this question right:
I think it's generally intelligent, only with a few weird weaknesses like perfect numbers and a couple of other things I jotted down, it got confused at the wording of some more complicated questions.
No, it isn't.
People tend to interpret this kind of thing as if it was produced by an intelligent creature. After all, it's in proper grammar, and is phrased in a way that seems to resemble thoughts. It's hard to think of it as just being a text processor.
But it is. You shouldn't be making charitable interpretations of errors made by machines. "Clean verse" in this context is a mistake. It doesn't become not-a-mistake by saying "well, it's pretty good even though it's clearly the wrong answer". If a human said that, you'd probably say "oh, he was thinking of 'blank verse'", but the computer isn't a human, and wasn't thinking of anything; it shouldn't get partial credit for that.
But it literally justified 'clean verse' as verse that didn't have profanity in it. There's a clear relationship with meaning, it created a plausible phrase. If someone used the phrase 'clean verse' in context, it's unobjectionable and the meaning comes through.
If the machine said 'Australopithecus verse' or 'sabot-discarding verse' or 'rhinocerous verse' then I'd have a serious problem with it. It's not clearly a wrong answer if I had to check that it's not a real term. Maths questions in exams are graded on how many parts of the question you get right. Even if you get a wrong answer as long as part of your working is right you can still get some marks. I would give the machine 2/3 for its answer, it's a good attempt.
Now, the University Challenge format doesn't give half-marks, you're either right or wrong. Even so, there's being wrong and being spectacularly wrong. At one point they had an appallingly bad set of human teams. They made catastrophic, ridiculous errors.
https://youtube.com/watch?v=VLD3MtSXv5s?list=PLkjGBrjEcmjUBZSXKv5eCCrdlhP5WcRTR&t=433
IBM! They answer IBM! IBM is certainly not the correct answer, it's not even a mathematician. If that answer came from a machine you'd surely call it fundamentally flawed and inhumanly stupid, yet it came from a team of four (highly credentialed) people. Quality of thought should be graded on results, not on the kind of processing machinery that's used to produce it.
If you're grading the machine on quality of thought, it should get zero because it has no thoughts.
This also applies to giving it partial credit for wrong answers because it was "thinking" along the right lines, or something like that.
The machine can judge, solve problems and reason. Therefore it thinks. I have tested this experimentally.
Wrongness of answers is not an all or nothing affair, even in artificially simple questions like this. Partial credit for wrong answers is standard practice.
No, it definitely can't reason. It can't reason itself out of a wet paper bag unless it had read some blogspam about the top 5 ways to get yourself out of a wet paper bag.
How is this not reasoning? I got the questions from something that says it's from 2022 so it shouldn't have seen them before. I imagine you'd say those are too easy or it might have seen them before anyway, so I made up my own harder question.
Now you see that it identifies the rule correctly and tries to apply the rule but jumbles up the logic and arithmetic. It's an inhuman failure, no person would write 172+3=37 and mean 17x2+3=37 or 81+2=82. Nevertheless, there's reasoning ability within the machine. It's not very good mathematically, the reasoning ability isn't great but it's definitely there. It's like the alien equivalent of a mediocre high-school student, apparently it scored in the 52nd percentile of a SAT test.
https://twitter.com/teddynpc/status/1598767389390573569
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
There's a structural and idiosyncratic reason for this, which has to do with how text is processed before it goes in and out of the model. Basically it processes "tokens," which are chunks of words. Training text is "tokenized" before being fed in, and the model itself outputs tokens which are converted back to text before being printed. The specific tokenization scheme that it uses (as far as we know, based on prior iterations of OpenAI's large language models) is "byte-pair encoding," which has itself been optimized so that common words are a single token, and then common chunks of words are tokens, and then (way down the list) individual letters are tokens. BPE is designed to minimize the size of the tokenized training data within a vocabulary of (I think) ~65k possible tokens. So the trained model has no way of knowing that the word "overwrite" starts with the letter 'o', since most likely it sees one token for "over" and another for "write" and the encoding doesn't need to resort to tokens for the individual letters, and it does not know that the "over" token has any special relationship to the "o," "v" etc. tokens. Gwern hypothesizes that BPE encoding is why it also cannot rhyme, and I assume he's right although the specific mechanism for BPE creating that deficiency isn't as clear to me.
if [way], [may], and [day] are entirely different tokens (they are in gpt2 at least), it can't learn that they rhyme from their structure (the way we would), only from those specific tokens being used in other rhymes, i think?
More options
Context Copy link
Can't rhyme? A poet friend of mine was lamenting that it rhymes too much!
A whisper on air,
A silent thought
That lingers there.
Though none can hear
It still exists,
A hushed connection
Of souls that share.
Glorpth yoozled down the blizbevy tree, His muttazels aflutter with glee. Beside him stood the zibbablock, Gloating and ready with a shock. But the brave Glorpth didn't falter, He battled the beast with a clatter. He thrashed and he slashed 'til the night was done, And the Jabberwock was won.
I specifically asked for three nonsense words for this one...
Through the dark woods I heard a howl,
Flittering through like a silken owl.
The slonky creatures stirred below,
As they rose and began to flow.
Crossing through the gnelpy grove
I saw what I could not move.
It was covered in gloopy scales
And had a thousand hissy tails.
It was the jabberwocky, fierce and wild
Ready to attack with a baleful smile.
I ran and ran and never looked back
The terror of its presence slack.
I guess it can! Maybe it finally has enough training to overcome whatever impediment hobbled previous incarnations' attempts.
More options
Context Copy link
That's a shoddy non-rhyme, but if we accept Blake's "eye / symmetry"...
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link