This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
Gpt-4 has been released!. Looks like the cat is finally out of the bag. The CW implications of large language models are obvious and have been discussed here, so I figured I would drop a few fun facts.
Also, here's a peek at LessWrong freaking out.
The full technical report gives some fascinating information. Here are some highlights:
GPT-4 can pass a bar exam and score a 5 on several AP exams.
GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.
GPT-4 can accept images as inputs and generate captions, classifications, and analyses.
GPT-4 is capable of handling over 25,000 words of text, allowing for use cases like long form content creation, extended conversations, and document search and analysis.
Of all of these, passing the bar exam is the one that sticks out. We'll have to see how much it still hallucinates, but this is clearly a water mark, at least for the legal profession.
I'll go ahead and stake a perhaps dramatic but I believe warranted claim - the culture war is about to get ugly. Creating ads, propaganda, and bots to argue politics has never been easier. Whichever side moves first on scaling and implementing these language models to persuade humans to their camp will own the future.
…after a bunch of lawyers rewrite the questions and have it repeat the test multiple times with different settings and questions.
That's what you'll find if you read the paper this claim is based on, and this significantly diminishes the impressiveness of the results. A model that only gets results when led around carefully by a skilled human is more like a fancy search engine than the rosy picture of an near-human independent operator that the press releases paint.
Having questions rewritten by another person is almost certainly not allowed in the bar exam - the idea that someone who can understand legal principles and jargon can't comprehend a three-part question is laughable. And taking multiple tries at the same exam to get a better score is definitely out.
In my opinion, a reasonable claim that GPT can pass a bar exam would require demonstration that their chosen parameters generalize to other bar exams and the model would need to be able to answer exams without needing questions to be re-formatted.
Right now this claim looks like false advertising.
P.S. Did you know that the bar exam results were marked by the study authors? Or that all four authors of the study work in one of two companies planning to deliver products applying GPT to law?
Thanks for digging this up - casts some serious doubt on how good it is. OpenAI is good at building hype at least.
More options
Context Copy link
On one hand it's important to put it into perspective. On the other... I seem to remember similar arguments being made when Kasparov lost to Deep Blue.
I guess we have another 20 years before lawyers consider it unfair to use an AI assist when arguing a case.
While I do lean towards the skeptical side about how far AI capabilities are going to get long-term, the main goal was to deflate a bit the exaggerated OpenAI claim about current performance that seems to have been cautiously taken at face value so far. Like some others in this thread I found the claim a bit unbelievable, and I had some time to dig into where it came from.
GPT might get good enough to compete with lawyers in the future, but the study doesn't prove that it's there now. In fact, things like needing the exam adjusted to provide each question separately strongly indicate the opposite.
I'm not familiar with them, maybe you could give some examples?
Yes, hold on a second, just let me get out my collection of dead-tree computer magazines from 1996.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
deleted
Sounds like stand-up comedian is still available.
More options
Context Copy link
I suspect that the world's oldest profession may become the world's final profession as well.
More options
Context Copy link
At the moment, paid, meaningful work is pretty much a white privilege. The Last Psychiatrist wrote about in '13.
Yeah, future where we're all going to be like blacks - hardly employable, grudingly tolerated pets of the powers that be is a quite likely one.
Ya think FDA will allow you to experiment with I dunno, brain interfaces or embedding AI systems into yourself so you could get off the dole?
Why, that'd be unsafe and irresponsible.
Maybe in countries that don't prosecute people for selling raw milk.
For a section of the white population; those who have the marketable skills that are in demand (see the current hand-wringing over the tech company lay-offs, and others writing reassuring articles that it's okay, it's just trimming the fat of all the hiring during the pandemic and the useless people, but real engineers are always in demand).
That Last Psychiatrist piece annoys me, probably because I'm from the section of society where people are welders, and it's a good, respectable trade:
No, dickhead, people who do hard manual labour don't decide that "welp, I'm 45, I'd prefer to laze about for the next thirty years so I'll pretend I can't work anymore". Mostly they keep working until they can't work, because their body really does wear out. And very few of them do nice, tidy, 9-5 hours. If somewhere in your house springs a leak, do you decide "Can't call a plumber, it's after 5 pm so they've clocked off!" No, you'll try ringing round and see if you can get a guy to call out sometime, anytime soon (because often they're very busy, being in demand for all kinds of jobs).
Do some people game the system? Yes, of course. Just like educated white guys who like to pretend they're the Hunter Thompson of mental health professionals and write their hot takes. I don't know any welders who retired at 45, I wonder if Mr Big Brain here even does know anyone who works with their hands? I know plenty of people who worked themselves into the ground, or had legitimate health reasons for retiring early. Man, I had no idea that living off social welfare was so easy, I could have done it when I was 45 and lived high on the hog! Allegedly!
I agree about the "never worked" cohort, but again - that's complicated. There are some people who never worked because they're incapable of holding down a job, because they're too weird or they have untreated mental problems (not even outright crazy, never diagnosed with a disorder as a kid and left to sink or swim through school and life) or some other reason. Again, yes of course there are people gaming the system with fake illnesses and scamming their way through life. Humans do that kind of thing.
Excellent advice - were it not for the snide bit earlier about the welders (and plumbers, presumably, and other tradespeople) who scammed the system by pretending they could no longer work the 9-5 job and decided to live off the public purse so they could laze about.
More options
Context Copy link
Honestly we're approaching a future where the vast majority of people are nothing more than a deadweight drag on humanity. I wonder how long before we (humanity as a whole) decide to deprecate them, by which I mean discourage their reproduction until they slowly disappear. Perhaps modernity is already a preliminary strategy to do so, but if so at the moment it's most effective on the very people we want to reproduce more so tweaking is needed.
That decision has effectively been made. Populations capable of maintaining technological modernity aren't breeding.
More options
Context Copy link
I don't think 'we, humanity as a whole, acting together' - like universalist, democratic humanism - works to 'deprecate the vast majority of humanity'. It'd have to be some small group doing that.
It's, arguably, good - some people clearly are more useful, or have better lives, or w/e than others, and whether you want 'the greatest good for the most' or just 'power', somehow replacing normal people with better people advances that cause. And it's very "natural", that's what evolution does, every time you want to marry an attractive girl instead of an ugly one, or someone of "good character" instead of poor character, you 'discouraging the reproduction of those with bad genes' in your own genes' reproductive interests. On the other hand, this may happen in the form of gene editing - and "normal people giving their children better lives with gene technology" and "prevent the useless people from reproducing" end up having the same observable long-term effects.
The problem is the winnowing down - as technological society progresses, the requirements to be the kind of person who is "more useful" or "has a better life" become more and more specialised and stratified. So the number of such people reduces down and down.
So now we're talking not "100 fewer peasants, 50 more factory workers", we're talking "tens of millions out of hundreds of millions" (if we even get that high) or "millions out of billions".
If only the 'useful, productive, creative' people should replace 'normal people', or those are the ones not a 'deadweight drag', then we are talking about the vast majority of humanity being deadweight that - what? should be euthanised? allowed to die of illness, hunger and other conditions of the developing world? Even within the developed world, how many people out of the entire population of the United States would you say are "the better people"? This is a general query, I'm not aiming it at either commenter above.
If I take some economic data for 2021, then the result seems to be - get rid of the whites and the blacks, leave the USA to the Asians alone, since by household earnings at the highest rate recorded here, they are plainly the "better people":
Let's see what that would look like, population-wise:
So reducing down the population to something over 20,000,000. But well worth it, because that leaves just the better people and not the deadweight!
No idea why you want to keep just Asians regardless of their skills, a high skill white/black is better than a low skill Asian. And we don't need to starve anyone. In fact we can deprecate them with kindness, at puberty (and later should they choose it) offer to hook them up to electrodes that stimulate the pleasure centres of their brain/simulate a massive open world they get to virtually rule over and leave them in bliss until they die of natural causes without leaving heirs (something like Nozick's experience machine).
In fact you can make this offer open to every single citizen and the people who refuse it will be enriched for better humans, just repeat this for a few generations and the dregs will have weeded themselves out.
Of course gene editing is also good and should be done alongside such a program.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I'm personally not at all afraid it'll break society; that would be one of the better outcomes. Personally I'm much more disturbed it may bend it into something unrecognizable before it breaks, at least in my lifetime.
There are very few outcomes of AI that seem positive. A curse on all who went into this field.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Maybe those protesting Stanford law students should be very concerned; now there looks to be a choice coming between "can do basic legal work" AI and taking on a clerk who last week was in a screaming frenzy yelling at a judge, who is that judge likely to pick? 😁
All this sounds really impressive, but I don't know how concerned, if any, to be. I haven't been playing around with DALL-E or Sydney or any of the versions because I am simply not interested in getting a dumb machine to tell me lies. If the thing really is moving out of the realm of "dumb machine", what should I anticipate using it for? Or will the end result be something that the ordinary joe won't get anywhere near (the 'better search engines' thing was just a blip) and it'll be straight into playing the stock market to make money for the big finance firms?
If some people the AI ethics crowd have their way, the clerk might be less disruptively progressive, so maybe I'd go with them.
The focus on not providing "disallowed content" worries me. As Orwell pointed out, some weapons increase the power of the masses (like rifles) and other increase the power of the elites (like bombers, nuclear weapons).
More options
Context Copy link
It'll do everything everywhere all at once. Expect search to include it by default in the next year or so. More spicy take anywhere, especially online, that you interact with a stranger will become an interaction with an ai you are familiar with somehow. There will be continuity with the entity that you interact with on social media to the entity that secures your credit card. This entity will have a name and you will trust it.
Chance would be a fine thing, because I don't trust nobody 😁
Some anonymous fuckface pops up online and is all "Hey there Far, it's me your good old buddy Bill! Remember me from those scented candles you liked?"? They can take a hike, I don't know anybody named Bill and I don't have good old buddies, and hard sell just makes me dig my heels in and balk like a mule.
Do you trust your bank with your money? Part of what will make you trust it is that it won't be doing such hacky shit.
I trust my bank to receive my wages paid in and to let me withdraw money, and only because I have to trust it that much. I don't trust it or any entity popping up claiming to be a real person trying to get me to take out loans, invest in this, have you considered your retirement funds that, and all the rest of the shit they already try to sell me.
So an AI generated "Hi this is Bob, your personal financial adviser" will get treated the exact same way I already treat 'Bob' the real human from the call-centre who cold-calls me and tries to sell me shit. I don't care how friendly AI Bob sounds or how well it imitates "oh but you hurt my feelings by refusing to purchase this" - I have more sympathy for call-centre Bob who I know works on commission and gets dog's abuse yelled at him by supervisors if he doesn't make the set targets of sales from calls, but I'm still not buying.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I don't see why this would be the case.
There's already more anti-leftist material available than anyone could reasonably consume in a lifetime, and yet rightists continue to lose more and more ground to leftists with each passing year. Why would a few more petabytes of propaganda added to the already existing ocean of material move the needle? It doesn't seem like the raw volume of material is the issue.
So your thesis would have to rely on the assumption that GPT-4 is capable of generating uniquely persuasive propaganda, more persuasive than the existing human-generated material, and I haven't seen any evidence for this. If you were to, say, have GPT-4 write an essay arguing for some view that is unpopular among mottizens and post it here, I doubt it would actually change anyone's mind.
I don't think this is the case, but that's because sneaky and even crudely disguised dissident speech is actually so similar to official, allowed speech that I don't think the simplistic kinds of modelling/guardrails used for GPT chat will be able to tell the difference. The Daily Stormer frequently puts out headlines which wouldn't be out of place in The Forward, and there's so little consistency in non-problematic goodthink that preventing false possibles will be impossible. How, exactly, is the AI going to tell the difference between a comment from a brave trans witch talking about how all rich white men need to die and an antisemitic screed talking about how we need to kill all the bankers/grabblers? To a large degree the difference between problematic speech and wholesome regime-defender speech depends on the identity of the speaker, and the moment you make that explicit the game is up and the notion that we aren't just dealing with tribalistic conflict-theorists evaporates into thin air.
I assumed that in this case we're dealing with sneaky, disguised dissident speech, which means that it would be coming from people who are not a-priori assumed to be shitlords. When the system has access to all the data you're describing then you don't actually need the AI - you just round up all the people who go to LightningSpeedEnthusiast.com or read the DailyThunderbolt, and then all the people who start having increases in anxiety after the first purge. If you've already got access to those capabilities, then you don't need the AI. If you don't, then the AI is going to have big problems. That's not to say it won't be tried or attempted, I can already see a project like this being a hilariously embarrassing boondoggle.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Using the model here using it's GRE scores.
GPT-3.5 Verbal IQ = 118
GPT-3.5 Quantitative IQ = 101
GPT-4 Verbal IQ = 144
GPT-4 Quantitative IQ = 130
It's inconsistent though. It completely bombed the AMC 10/12
I'd say the AMC is harder than the GRE where it counts (fluid g).
Of course the AMC is harder, but this did about as poor as random guessing on the AMC 10. That's much worse than "quantitative IQ 130" level.
More options
Context Copy link
More options
Context Copy link
Less study material in its dataset I should think -- you would do well on the GRE too if you memorized every study guide on the internet.
Then that suggests that it's GRE score is not an effective measure of its "IQ"
I have a vested interest in the GRE being an effective measure because it would make me rather high IQ lol.
So as a defense of the GRE I will let it be known that ETS (GRE test makers) hires a suspiciously large number of psychometric PhDs, the last time I checked more than half the job openings were for psychometricians, and they know exactly what they are doing (making a socially acceptable IQ test). If its bad at that, it's probably not for a lack of trying.
And that an "IQ test" a language model has an advantage on, probably because of plenty of training data doesn't imply humans are prone to that failure (success) mode of the test. Not for a difference in kind but magnitude, no human reads literal billions of tokens.
I don't know what the implications/inferences are. It's certainly interesting that a LLM can do non first order quantitative reasoning questions at all to me. Suggests to me that is there an overlap of language and quantitative reasoning in whatsoever space GPT is pulling its inferences from, might even be universal.
More options
Context Copy link
Sure -- same goes for the tests that it did score well on though.
Ed: sorry misread -- yes, that's exactly what I'm suggesting. All of the tests they are giving it are testing its ability to memorize study material from its training set -- which may be useful for some things, but in no sense is "intelligence", particularly not "intelligence" as in "Artificial General Intelligence"
Ideally one could test this by writing a test of pretty low difficulty level (say tenth grade non-advanced math) but with questions framed in a completely different way from the AP/SAT type stuff in the dataset. Then compare results with an actual tenth grader.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
There's an updated version of that here on which the score is even higher. Although there is probably plenty of online info on standardized tests and whatnot in its training set, if it's from scrubbing the internet, so I doubt you can infer a ton about how "smart" it is in general from these. The revised conversion yields Verbal IQ=146, Quant IQ=135.5, overall IQ=145.4.
This has been an annoying aspect of LLM AI hype. There are plenty of indicators of something going on but many of the test results are not of that set. If you train them on the question sets and answer keys for repeatably mechanically gradable exams like the SAT, GRE or bar exams then it should be expected that they will perform well on them.
What would be really nice is if whenever the AI produced content it had to also tell you the minimal edit distance between that content and some content (or a direct combination of contents) in its training set. That way you could have a good measure of how much original content it was actually producing vs. how much it was just paraphrasing its training set. Or at least it would be useful to have extensive data on the average edit distance between a response and some item in the corpus.
Your proposed method doesn't work - even if you just turned a query into an embedding, picked the closest text in its dataset, and then ran that text through google translate and back a few times to obfuscate word choice, order, and other things like that, it'd change enough that literal edit distance would still be very high.
An analogy to image models, here's a claimed example of of taking inspiration from a particular photo in a training set. It's really not that close.
Not really, no. Edit distance is relative to which operations count as primitive "edits" and the "cost" of each use of that operation. There are specific forms of edit distance of which what you are saying is true, but you could also have an edit distance where "run it through Google Translate" is a primitive edit operation. Obviously, you would have to pick the operations to fit the specific model, e.g. what external resources it has access to.
Okay, I assumed you meant character-level edit distance, because that's what the article you linked was exclusively about. But without that, 'edit distance' isn't really a useful term, as we don't know what a 'primitive operation' is in the context of a LLM because we do not know that much if anything about what they actually do internally.
Based on what I've seen, 'most similar example in training set' doesn't capture the extent to which LLMs memorize things. Even if they are memorizing a lot, it's memorization in a very complicated way - otherwise the 'write a story about X in the style of Y', where X and Y hadn't ever been done before, just wouldn't work.
how is this memorizing? or this? Like, there's some extent to which it's memorizing things more than humans do, certainly. But it's a very vague sense, and positing a metric like edit distance that fully captures that sense just restates the problem, because we don't know what that is
That isn’t true, the formal definition doesn’t restrict what operations there are (if you disagree then you should quote it and tell me where), it just requires that they be operations on strings. Whatever else using Google Translate is, it’s an operation on strings.
I’m not suggesting that we base primitive operations on what the LLM does internally. Your initial example wasn’t about what the LLM does internally, it was about “what if the LLM ran things through Google Translate a few times to trick you.” I’m saying you could supplement a more basic measure with additional operations to capture when the model has access to external programs like Google Translate.
I never said that the AI was just “memorizing” things, that’s an obvious strawman. All I said was that edit distance would help you tell how much the AI was paraphrasing from its training set, which doesn’t presuppose that it’s not paraphrasing very little. You seem to have misinterpreted me as saying that an edit distance would show the AI was paraphrasing a lot, when all I said was an edit distance would be a helpful metric of how much it was. I didn’t make any specific commitments about how much that would be across a wide range of cases. Nor did I suggest that an edit distance would fully capture things, just that it would give a good sense of how close things were.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
A related tidbit: Khan Academy is rolling out personal tutors powered by GPT-4. I'm wondering how effective this will be: although personal tutors are probably the ideal way to learn, if kids don't believe they're interacting with a conscious human being whose approval they seek, will it be effective? A lot of what teachers actually provide is just baby sitting and a human adult for kids to look at as an authority, and I'm not sure kids will react in the same way with GPT-4. Though it's certainly a huge boon to self-motivated kids who want to learn.
ETA: The paper itself is available here: https://cdn.openai.com/papers/gpt-4.pdf . See page 32 and onwards for some examples of GPT-4 reasoning around visual inputs (charts, papers, memes).
Some tech professional working from home wrote that he needs a security guard with a club to watch him and hit him if he looks at his phone during work hours. That would greatly improve his productivity.
I believe that was me back at the old place. The context was that managers are mostly useless, because I don't need them to "pass on expertise" or "provide strategic direction" or have any domain knowledge at all (I'd have more than them anyway because I'm at the cliff face rather than diverting my attention into being a people-organiser). What I need is for someone to just shame me into work through the panopticon effect, which could be accomplished by a monkey in a security guard's hat.
More options
Context Copy link
Idea: TaskmasterGPT. A multimodal model that watches everything you do, identifies questionable websites and activities, and issues command sequences to a robotic arm to start lashing you if you browse The Motte while you should be working.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link