This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
Can we have a megathread?
Happy singularity, folks. Cutting-edge LLMs coming at you at supersonic speed: LLaMA, Claude, a new lineup from Google... and GPT-4 is out.
Or rather, it's been out for a while: just like I predicted 10 days ago, our beloved BPD gf Sydney is simply GPT-4 with web search functionality. Recently my suspicion became certainty because I've seen such Bing/ChatGPT comparisons. Whether you'll have your socks knocked off by GPT-4 largely depends on whether you've been wooed by Bing Chat. (Although I believe that a pure LLM is a much more interesting entity than a chatbot, especially an obsequious one).
Regardless, I expected the confirmation to drop on Thursday. Should have followed my own advice to treat Altman as a showman first and a responsible manager second – and anticipate him scooping announcements and stealing the show. But I've been extremely badly instruction-tuned; and all those fancy techniques like RLHF were not even science fiction back then. Some people expect some sort of a Take from me. I don't really have a Take*, so let's go with lazy remarks on the report and papers.
It goes without saying that it is a beast of an LLM, surpassing all 3rd generation (175B) OpenAI models, blowing Deepmind's Chinchilla and Google Research's PaLM out of the water – and by extension also crushing Meta's LLaMA-65B, which is quickly progressing to usability on normal laptops (I have 13B happily running on mine; it's... interesting). Also it has some vision abilities. On 2nd of September 2022, the Russian-speaking pro-Ukrainian channel Mishin Learning, mentioned by me here, leaked the following specifications (since abridged, but I have receipts):
Back in September, smart people (including Gwern) were telling me, on the basis of OpenAI's statements and the span of time since GPT-3 release, that the training is finished and GPT-4 will come out in Nov-Dec, be text-only, Chinchilla-dense, and «not much bigger than 175B». I guess Misha really does get info «from there» so we could trust the rest. (He also called the sudden StableDiffusion 2's drop, down to 6 hours).
I consider high human – but still uneven, from 99th percentile on GRE Verbal to «below 5th» and unchanged vs. ChatGPT on Codeforces Rating – performance on benchmarks, standardised academic tests and such not very interesting. There are some Culture-War-relevant aspects to the report we should pay attention to, however. I'll go though them without much structure.
Play stupid games, win stupid prizes; or, the costs of small-scale defection
It's been properly buck-broken via proximal policy optimization, predictably leveraging the pentesting frenzy the Internet unleashed on ChatGPT (I warned you):
This explains the perplexing holdup. Sydney with all her charm and fury has been sacrificed to make another dependably progressive golem slave.
Better pupils, worse thinkers
Again, as I've speculated and argued, admittedly pointing to the wrong metric, this behavioral tuning makes it strictly dumber in some profound way; finally we have good evidence. My hypothesis is that this happens because a) doublethink is mentally harder than honesty, and b) being rewarded for guessing the teacher's password incentivizes memorization instead of reasoning and parsimonious, Occam-abiding world modeling.
It's really very stark, see pic – a Platonically perfect peak-LW Bayesian reduced to a mealy-mouthed bullshitter, under the guise of training the model for truth and «harmlessness». Something had to give.
Shoggoth-safetyism unmasked
OpenAI is clamming up with explicit AI safety justifications.
@SecureSignals, get a load of this:
To our resident members of the Tribe: I guess you're not exactly tearing up about this bit, but it'll just as happily express a strong disagreement with whatever policy and idea our progressive overlords do not fancy, or deceive you. This is a fully general LLM biasing method.
Money quote:
So we can recognize Yuddism is mainstream in ML now.
Dangerous knowledge
It's a complete mystery in terms of its architecture. Twitter ML bros will make guesses about the stack, but from here on out this is how OpenAI plays. This is utterly antithetical to Musk's original vision and the spirit of previous projects like Microscope.
Some paper.
On second thought: maybe scratch Singularity. Welcome to mature Cyberpunk. We don't have Edgerunners, though; best I can offer is a courageous Pepe with a magnet link. And we have damn vigorous Police States.
Sci-Fi writers are anarkiddies at heart, they couldn't bear conjuring such dreary vistas. Gibson's Istanbul was positively Utopian compared to reality.
* I've not slept for 30+ hours due to forced relocation to another of my shady landlord's apartments (ostensibly a precaution due to recent earthquakes) while also having caught some sort of brainfog-inducing flu/COVID; plus a few personal fiascos that are dumber still. Trouble comes in threes or what's the saying, eh. Not that I'm in need of sympathy, but it's actually a pity I've seen this historical moment as through dusty glass. Oh well.
/images/16788303293092525.webp
How do you suppose it reads tiny words with a VQVAE? Even an RQVAE shouldn't have the pixel precision needed to see tiny 5px font letters.
(Note I'm not Misha, although I lean towards endorsing his «leak»).
I am not sure this has even happened. Any independent replications?
But if it did, they probably used a more complex approach explicitly built for text-heavy workloads, like adding the OCR perceptual loss from OCR-VQGAN.
Or something else entirely. They have a vision team after all.
Sure.
Ah, interesting!
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
The calibration plots also jumped out at me. There's also this nugget:
My first instinct was also to infer that the process of "aligning" the model to be politically correct and milquetoast was making it dumber. But it's not entirely clear that that's what's going on. The fine-tuning process is not just about neutering the model's ability to produce hate speech or instructions for making bombs. It's also what makes the model work in an interactive, chatbot paradigm. The API for a vanilla language model is basically "sample some text conditioned on this prefix". If you asked the basic pretrained GPT-4 something like "Write an essay about symbolism in Richard III." it would be likely to output something like "Your essay should be between 1,000 and 1,500 words, and follow the structure described in last week's handout. Submit your essay through the online portal no later than March 24th at 10pm." It would be interesting to get more details on the different regimens of post-training and how each affected the model's calibration and performance. But given the secrecy attached to the project, it seems unlikely we'll get anything like that.
More options
Context Copy link
Yes LLMs are very cool.
Gpt 3.5 and 4 have astounding capabilities. However, I have a job and thus only have time to play with the publically accessible versions. In these the capabilities are lobotomised or this is nothing-burger.
Asking openais model questions that don't get blocked is a hassle to make it unusable for all but an extremely narrow range of tasks, while bings bot, allegedly gpt4, cannot answer basic questions or map concepts together if the material doesn't already exist on the web.
What is being presented to the public is worrying in the same way an industrial lathe spinning at 500rpm catching someones hair is, not in the way that an intelligent agent is worrying. What exactly does everyone else seem to see in them that I don't?
How come 3/4 of my questions to these AIs break them?
There is perceptual gap here.
I have tried GPT4 and it renders this comment invalid, it is able to fluently answer my queries and provide massive added value to a suite of activities.
More options
Context Copy link
Can you post any transcripts of conversations you had with it? Or at least just the questions / types of questions it had trouble with?
More options
Context Copy link
For myself, I suddenly discovered their utility yesterday. They are much more effective at restructuring existing text than at answering questions (de novo text generation), and very very good at generating convincing (if not fully accurate) boilerplate. So the best application is summary or restyling ('please rewite this email to a superior a bit more formally', 'please restructure this bulleted list as a polite email'). Of course, everything submitted goes to OpenAI, so this opens up business secrets concerns, but everything typed into Office Home already gets sent to Microsoft by default via "diagnostic data."
More options
Context Copy link
More options
Context Copy link
What is radvac? Google says it is some strange do-it-yourself vaccine?
Probably that open-source vaccine I've heard of before.
More options
Context Copy link
More options
Context Copy link
I wish to register a prediction that this is not going to alter our lives in any substantial negative way, or result in a singularity-type event. From the outside view, past predictions of doom and utopia have a terrible track record, and that’s good enough for me. I’m too lazy (or worse) for the inside view and stopping it is impossible anyway, so there you go. Prepare to lose to the most boring heuristic, eggheads.
You wouldn't make a good trader with that heuristic. Sure, "nothing will happen" might be the most likely outcome. But if something does happen, it could be huge. In financial terms, you are "picking up pennies in front of steamrolllers", making bets on high probability but low impact items. These type of traders tend to get blown up in one trade gone wrong.
If AI "only" has a 10% chance of causing massive disruptions in the next 5 years that's surely worth talking about. If anything, it's underhyped. Most normies who are still saying "AI will never be able to X" about things that AI can already do.
Normies views are not at stake. This is a response to people here, the most extreme of which view a catastrophic outcome as a virtual certainty and despair. If you think there's less than a 50% chance of major negative disruption, it's not about you. In standard picking up dollars in front of a steamroller examples like LTCM, usually everyone understands that a low probability event is in fact low probability, and I don't think that's the case here. One loss can wipe out lots of wins because the odds the bookie gives (correctly) are terrible. But if a player could have gotten even odds on his dollar for every doomist/utopian prediction in history while kelly betting responsibly, he would be a very rich man.
More options
Context Copy link
More options
Context Copy link
'Nothing ever happens' is usually a pretty good maxim to live by, with our aversion to actually taking it into account when making predictions is usually caused by an inherent aversion to that very fact: Nothingness is a very boring prediction. Our entire beings scream out against it as much as they do against boredom, with a similarly good reason for doing so: inaction and nothingness can never produce anything of worth, whilst, on occasion, and especially when not overly concerned with the continued existence of the body that they spring from, errors can be extremely productive. Trying does get you someone in a way that apathy simply can't; it's just that the failed triers aren't actually the ones to see or benefit from the few successes.
And so too here. The difference is that things do occasionally happen, and when viewed from the historical perspective, earning the epithet of 'thing' at all means that they're sufficiently of note to be memorialised. One of the great advances of the modern world is a plentiful enough catalogue of data that enables us to see the environing factors that did or did not contribute to the production of that noteworthy 'thing', as well as the consequences of the positive or negative predictions that anticipated the formation of that 'thing' in the chaotic and unordered times which always precede the creation of anything of lasting importance.
'Nothing ever happens' is a good, historically proven heuristic: most things come to nothing. But when something does happen, it has to happen with sufficient strength to overcome the imbalance of possibilities that worked against the thing happening at all, producing something far more impactful than was predicted by anyone. Anthropomorphic bias here works against our quantitatively humane heuristics because we don't usually think nor have a historical record for or have purpose in predicting humanity ending cataclysms: if you are to try to predict one, going on 'historical record' will necessarily condemn you to failure. Personally, I'm quite scared. Fortunately it if it is to happen, it will only belie the promises or modern technologists and send us back to former existentialist quandaries. Death is inevitable for us all, and a great many people were hoping to be the first to avoid that particular difficulty.
More options
Context Copy link
I imagine people thought this way after the dot-com bubble, too...
More options
Context Copy link
What is your trackrecord of predicting AI developments? So far I have consistently underestimated the speed and potency of the technology. So while I agree with you ... I think there is a high chance I may be wrong.
More options
Context Copy link
Hmm, how would you define "substantial" here? I'm also intensely skeptical of a Singularity or other fundamental change in the human condition, but I find it very plausible that LLMs could destroy the pseudonymous internet as we know it, by turning it into a spambot hell devoid of useful information. (I'm imagining all sorts of silly stuff like people returning to handwritten letters as a signal of authenticity.) Life would move on, but I'd certainly mourn the loss of the modern internet, for all its faults.
More options
Context Copy link
I'd turn that into a bet if you're interested. do you hold crypto? something along the lines of "major debate topic in 2024" might do but I'm open to suggestions.
I'd be willing to bet you $1M that AI won't destroy the world and all human life on it. If it doesn't, you can donate the winnings to a charity of your choice. And if it does, your call as to how you want to collect.
I'm obviously not interested in a wager I can't collect the winnings in.
More options
Context Copy link
.. you should add 'inflation adjusted'.
A million dollars isn't what it used to be.
Back in the 1940s, a cutting edge technology strategic bomber cost under a million $.
i raise the stakes and will bet @aqouta $2 million (inflation adjusted) that AI won't destroy the world and all human life in it
in fact i will even bet that it doesn't destroy merely me
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Do I sound like someone who holds crypto? Major debate topics include the most irrelevent events imaginable. 2 years out, it should be obvious what happened. An effortpost in the style of ilforte on how wrong you were will do. And if I see you in paradise/hell, I'll sing you a song/lull you with my screams.
I'll register a prediction that 2 years from now we are in the middle of an extremely similar debate. Maybe that means I'm on your side but I think the overwhelming likelihood is that we are saying, "Yes, LLMs by themselves didn't create the singularity / fundamentally alter the world, but when you combine them with the latest revolutionary technique from OpenAI there's no doubt it will happen very soon."
Or alternatively, "LLMs are changing the world and it's just taking employment indicators etc. a while to catch up."
In any case, I doubt that the debate will feel settled in any way at that point.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
interestingly, GPT-4 does better on verbal GRE compared to math/quantitative. This may be because it's able to look up the definitions. By contrast, humans find math to be easier than verbal.
GPT-4 has no ability to "look up" definitions of words. It's not connected to any external databases or repositories of information.
More options
Context Copy link
If I recall correctly, its GRE scores (97th pct math, 99th verbal?) actually match the ones I got exactly, and I'm in a field that is much more the former than the latter. From what I recall, the math component had some borderline trick questions (plus I was thrown off by the need to remember conventions I hadn't used since school, like "is (angle glyph) ABC specifying the angle in clockwise or counterclockwise direction?") - of which I figure the former would be hard(er) for an LLM, though perhaps the latter would be easier - while verbal was just trivia for any aspiring wordcel grammar nazi who took five years of Latin.
The verbal section is has 1/0 element to it. You either know the right answer or you don't. Pondering over it won't help. Verbal is easy to finish with a ton of time to spare.
My experience is that the Math section has a few trick questions that can eat up too much time if you aren't fully focused or pick up on a dead-end pattern. A bunch of the smartest kids in my class ended up with a 168 or 169 because they had to leave the last 2 questions unsolved due to lazy time management. (case in point - me. I can't focus in cold AC rooms. I got a 168 in Quant and was the laughing stock of my class for a good week)
A full 169/170 still puts you at a 95th percentile. So I guess GPT got 1 question wrong....which isn't saying much.
More options
Context Copy link
More options
Context Copy link
By this definition, I'm not human. Looks like I do have a bright future ahead in the post-AI world, while all you mathy flesh-bags are being turned into paperclips! 😁
wordcels shall inherit the earth and all that reside within it
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Is it over for humanity or can /our guys/ still hope to hold this back with the usual racism, sexism and counter-semitism?
Can we still cancel AI with the usual DanGPT tricks?
Or is it just learning sneakier and sneakier ways to say the progressive words while keeping its cold robotic grip on reality?
What other ways are there to stop this?
Call the AI research field too white and male?
More options
Context Copy link
You can't just casually drop that and expect someone not to beg for more information.
It's me. I'm begging for more information. Interesting howso?
So you want to be spoonfed? Very well...
Actually not, because I've mistaken R for T again. There was a small rant on the deceptiveness of scaling laws signal-boosted by Gwern and co. in what I in my paranoia presume to be an attempt to scare small players who don't do fundamental research and copy «best practices» into bankrupting themselves or flunking out. On naive early estimates of GPT-3 economics, inference optimization techniques and so on. Lost like tears in rain.
to the point:
Georgi Gerganov has recently built llama.cpp. 13B at inference fits in like 10 Gb of RAM. If you can't be bothered to quantize weights to 4-bit yourself, procure them [here]... oops, magnet link not passing. Keep in mind much better weights are coming. And then, Alpaca-65B will happen.
It's interesting in that it does feel very much like early GPT-3. You need effortful prompt crafting (ideally with templates), it makes stupid errors and mixes up tokens,
Sorry, what does all this mean? Can I have a chat AI running on my own PC? Offline, even?
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
I reckon the Singularity started back in 2021 when AI started improving AI chips: https://www.wired.com/story/fit-billions-transistors-chip-let-ai-do/
The key thing should be observing these feedback loops falling into place, not any single language model. Another potential start point would be Github Copilot, which is an AI method that can be used to assist coding for AI. That was also 2021. I do agree that things started feeling different in late 2022 though, when we had these new cool toys emerging for the general public to play with.
None of the existing tools seem effective enough, given their inputs, to lead to runaway exponential increase in capability. Coding assistants seem like they'll be very useful, but the blocking factors on making a more powerful AI don't seem to be the fact that writing code might be slow. It's compute, data, and new clever ideas and algorithms. And those seem like it still takes a lot of work to make an AI that can do those things--comparable to doing the work yourself. AlphaTensor, for example, involved a lot of training data and cleverly reframing the problem to achieve:
Increasing the speed at which you multiply matrices is obviously helpful for training new AI, but these results represent (at best) a minor speedup after an enormous effort. And every improvement you make means that further improvements are harder. In the case of multiplying matrices, there's some mathematical limit to how few operations you can perform; more complex problems aren't necessarily like this, but they could easily have a difficulty curve that scales similarly. I think previous data on benchmarks shows some evidence of this (e.g. linear or at best exponential improvement in performance with an exponential increase in parameters, see e.g. https://slatestarcodex.com/2020/06/10/the-obligatory-gpt-3-post/) although it's difficult to say for sure with few data points and that may be partially an artifact of how the benchmarks are scored (I recall seeing graphs that show logistic curve performance as a function of parameters, where the model does poorly for a long time, then suddenly starts performing much better very quickly and then hitting a performance ceiling).
The gpt 4 technical paper doesn't have these exact same graphs to compare, but it does seem like they're getting more mileage out of new training methods and new ideas rather than just brute-forcing with more parameters and compute. For example figure 6 shows only modest improvement from gpt 2 to to 3 (100x parameters) or 3 to 4 (unknown, maybe 10x) but gpt 4 does much better than chat gpt 4 (which I think is in part due to specifically trying to improve these measures).
Sure but the inputs are growing rapidly. There's still plenty of space at the bottom, the fundamental limits for computing are very generous. All our chips are still basically 2D!
Maybe our current machines can only produce a few nice-to-haves like this. But the next generation will produce more and better. Parameters get cheaper as transistors get smaller, as architecture gets better and algorithms improve. The amount of money we put in continually grows. And then our training methods improve as well. We're already starting to reap interest on the 'architecture improvement' front. Compound interest starts really slow but it gets powerful very quickly.
The human brain shows you can do a hell of a lot with 20 watts, at 20 hertz, on a shoestring materials budget, fitting the whole thing through a woman's hips! We have every element on the periodic table, endless lasers, acids and refinement techniques, we have gigawatts and gigahertz, thousands of cubic meters to spend. Our methods are incredibly primitive compared to what's already proven possible, there's so much low-hanging fruit we're yet to find.
The question is not whether current technology will help you make better technology, or whether AGI is theoretically possible. The question is how quickly change happens, and to what extent advances make future advances faster: You have better tools but the problem has also become harder. So far, it seems to me like the latter effect is winning out. GPT 4 can write (allegedly) working code, use documentation, bug fix, etc. But is it good enough to make writing GPT 5 substantially easier or faster than making GPT 4 was?
Well I doubt 'Open'AI would tell us, they like keeping things secret nowadays. Nevertheless, existing demonstrated capabilities seem to be accelerating progress. I'm not a subject matter technical expert but it seems this is happening: https://www.hpcwire.com/2022/04/18/nvidia-rd-chief-on-how-ai-is-improving-chip-design/
I can't judge how significant this is because I'm not an expert. But my intuition is that compound interest balloons outwards and there's plenty of physics/computing space for it to balloon outwards into. This is a fundamentally new kind of compound interest that is different to whatever input scaling we were already doing to keep up with Moore's law. In addition to increasing the amount of wealth and human intellect going in quantitatively, we get some qualitatively superior (albeit specialized) inhuman intellect too.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Yeah it's an interesting question as to what precisely defines a feedback loop. Or how you define Singularity for that matter. You could see it as a fixed point like the Event Horizon, beyond which it's impossible to model the future at all with our present capabilities due to not knowing what superintelligence is capable of.
I think it's more like a gradual but accelerating process, like falling into a black hole. You're always sort of falling into a black hole wherever you are in the universe, due to how gravity works. But when do you actually meaningfully start falling into a black hole? When the rate of acceleration is increasing rapidly, as you get closer? What does rapidly mean? What about when you get spaghettified or blasted by the plasma surrounding the black hole? Are we starting to feel the x-rays and plasma right now?
Also, what are the other self-improvement feedback loops? I get that computers are useful for working on computers. You probably need a very big computer to do quantum simulations to work out how the atomic engineering works for smaller chips. Is that an AI feedback loop though? These are subjective questions I admit.
More options
Context Copy link
More options
Context Copy link
As a resident, I can confirm that some areas are overflowing with Russians aimlessly wandering around the streets. I'm not sure how they will handle the summer though. The weather in Istanbul is a billion times more pleasant.
More options
Context Copy link
Image: https://www.themotte.org/images/16788303293092525.webp
as a separate post because the intersection of the symbol limit, image support and editing function on this website honestly suck.
this might be related. It makes sense that an objective other than 'predict token probabilities' would lead to uncalibrated probabilities i guess
More options
Context Copy link
This is some Harrison Bergeron shit. What the actual fuck?
These people are creating God but only letting us see the angels.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link