And killing somebody after infection is the easy part - somehow spreading to "every single person", or even a significant fraction, is a million times harder. People really underestimate how hard it is to build superweapons that can end civilization (it's easy in the movies!). I think if there are going to be problems with widespread unfiltered AI, it'll be because a large number of unstable individuals become individually capable of killing thousands of people, rather than a few people managing to kill billions.
This is exactly @jeroboam's point - you say "AI is a junior engineer" as if that's some sort of insult, rather than unbelievably friggin' miraculous. In 2020, predicting "in 2025, AI will be able to code as well as a junior engineer" would have singled you out as a ridiculous sci-fi AI optimist. If we could only attach generators to the AI goalposts as they zoom into the distance, it would help pay for some of the training power costs... :)
It's weird and a surprise that current AI functions differently enough from us that it's gone superhuman in some ways and remains subhuman in others. We'd all thought that AGI would be unmistakable when it arrived, but the reality seems to be much fuzzier than that. Still, we're living in amazing times.
Great post. But I'm pessimistic; Scott's posted about how EA is positively addicted to criticizing itself, but the trans movement is definitely not like that. You Shall Not Question the orthodox heterodoxy. People like Ziz may look ridiculous and act mentally deluded (dangerously so, in retrospect), but it wouldn't be "kind" to point that out!
When I go to rationalist meetups, I actually think declaring myself to be a creationist would be met more warmly than declaring that biology is real and there's no such thing as a "female brain in a male body". (Hell, I bet people would be enthused at getting to argue with a creationist.) Because of this, I have no way to know whether 10% or 90% of the people around me are reasonable and won't declare me an Enemy of the People for saying unfashionable true things. If it really is 90% ... well, maybe there's hope. We'd just need a phase change where it becomes common knowledge that most people are anti-communist gender-realist.
Haven't read BoC, but a LitRPG parody I quite enjoyed was "This Quest is Bullshit!".
I didn't really like the 3 body problem either, but at least I could tell it had a professional author and editing.
Yeah, I thought it was bad sci-fi (if you judge it fairly, not putting your thumb on the scale because it's basically the only breakout Chinese sci-fi novel that exists). But it wasn't badly written, at least by the standards of English sci-fi.
The clueless people who made Last Wish really messed up. They were supposed to make a soulless by-the-numbers sequel to a forgettable spinoff of an overrated series. Instead they made one of the best animated films in years, better than anything Pixar's done since Coco. I sure hope somebody got fired for that.
They do not have a cognitive architecture that resembles human neurology. In terms of memory, they have a short-term memory and a longterm one, but the two are entirely separate, without an intermediate outside of the training phase. The closest a human would get is if they had a neurological defect that erased the consolidation of long term memory.
Insofar as any analogy is really going to help us understand how LLMs think, I still think this is a little off. I don't believe their context window really behaves in the same way as "short-term memory" does for us. When I'm thinking about a problem, I can send impressions and abstract concepts swirling around in my mind - whereas an LLM can only output more words for the next pass of the token predictor. If we somehow allowed the context window to consist of full embeddings rather than mere tokens, then I'd believe there was more of a short-term thought process going on.
I've heard LLM thinking described as "reflex", and that seems very accurate to me, since there's no intent and only a few brief layers of abstract thought (ie, embedding transformations) behind the words it produces. Because it's a simulated brain, we can read its thoughts and, quantum-magically, pick the word that it would be least surprised to see next (just like smurf how your brain kind of needle scratches at the word "smurf" there). What's unexpected, of course - what totally threw me for a loop back when GPT3 and then ChatGPT shocked us all - is that this "reflex" performs so much better than what we humans could manage with a similar handicap.
The real belief I've updated over the last couple of years is that language is easier than we thought, and we're not particularly good at it. It's too new for humans to really have evolved our brains for it; maybe it just happened that a brain that hunts really really well is also pretty good at picking up language as a side hobby. For decades we thought an AI passing the Turing test, and then understanding the world well enough to participate in human civilization, would require a similar level of complexity to our brain. In reality, it actually seems to require many orders of magnitude less. (And I strongly suspect that running the LLM next-token prediction algorithm is not a very efficient way to create a neural net that can communicate with us - it's just the only way we've discovered so far.)
So, I had a little more success than you last year, and you can see my transcript here. Part of the reason is that I didn't give it a minimal prompt. Try to give it full context for what it's doing - this is an LLM, not a Google search, and brevity hurts it. And don't "help" it by removing the story from the problem - after all, English comprehension is its strength. Tell it, up front, exactly how you're going to interact with it: it can "think step by step", it can try some experiments on its own, but you won't help it in any way. The only thing you'll do is run the code it gives you, or submit an answer to the site, telling it the (exact) error message that AoC generates.
To reiterate, give it all the information a human solving AoC would have. That's the fairest test.
My prediction is that o1 will do better (of course), maybe solving a few in the day 10-20 range. However, I think it'll still have problems with certain problems, and with debugging, especially when text output (or a diagram in the input) needs to be parsed character-by-character. This is a fundamental problem with LLMs: textual output that looks well-formatted and readable to us is fed into the LLM as a gobbledegook mixture of tokens, and it just has no clue how to process it (but, sadly, pretends that it can). This is related to how they have trouble with anagrams or spelling questions (e.g. how many Rs are in "strawberry"). I wonder if there's some way we could process text output so it tokenizes properly.
For something based off 20 year old (if not more) tech, it retains excellent particle effects and highlights. And the capship scales... so good. Fighting around a Hecate class in a nebula remains a standout experience. Irritating, but standout.
Not to mention the plot! Wing Commander got all the accolades (and even name-brand actors), but most of its story was melodrama. It never managed to convey the same sense of desperation as Freespace, of fighting a true all-out war for humanity's survival against impossible odds.
I'd say a steelmanning of the Yuddite view is this: "Yes, we along with everyone else did not predict that LLMs could be so powerful. They do not fit our model of an agentic recursive neural net that runs on reward signals, and even a superintelligent LLM is likely to super-understand and do what its creator wants (which is still a risk, but of a different kind). However, it would be a mistake to extrapolate from these last few years where LLMs are ahead in the AI race and assume that this will continue indefinitely. It is still possible that agentic AIs will once again surpass predictive models in the short-to-mid-term future, so there is still risk of FOOM and we need to keep studying them."
I've spoken with some doomers who have this level of intellectual humility. I can't imagine seeing it from Yudkowsky himself, sadly.
Any layman can tell you that the airplane flies.
And that's the point. That's the one, last, important step that (much of) science is lacking. Have you built something that works AT ALL? It's not that engineering doesn't suck. It's that modern "science" is even worse, because so much of its product (random unreplicated research papers, written on esoteric subjects, skimmed by friendly peer reviewers and read by nobody else) never needs to pass that final filter.
Citation needed...? It's a little hard to ask the pig. And even if true, should I care overmuch that the pig "feels stressed" for the last hour of its life? Humans go through worse (to say nothing of how animals die in nature!). If you want me to care about animal welfare, you should focus on the part that really matters - the life the pig lived - rather than the lurid, but ultimately unimportant, details of its death.
"Brutally" slaughtering a pig in "disgusting" "industrial" conditions? Those are very subjective words. The pig doesn't care that it's not being given a dignified sendoff by its loving family at the end of a fulfilled life in a beautiful grassy glade with dandelions wafting in the breeze. Humans fear death; animals don't even understand the concept. As long as we kill them quickly, I really don't give a shit how it's done.
Which isn't to say I don't have concerns about factory farming. The rest of the pig's life may be filled with suffering, and (IMO) we're rich enough, as a society, to do better. My morality-o-meter is ok with sacrificing, say, 0.01% of value to humans to improve the life of pigs by 500%.
Maybe I'm missing some brilliant research out there, but my impression is we scientifically understand what "pain" actually is about as well as we understand what "consciousness" actually is. If you run a client app and it tries and fails to contact a server, is that "pain"? If you give an LLM some text that makes very little sense so it outputs gibberish, is it feeling "pain"? Seems like you could potentially draw out a spectrum of frustrated complex systems that includes silly examples like those all the way up to mosquitos, shrimp, octopuses, cattle, pigs, and humans.
It'd be nice if we could figure out a reasonable compromise for how "complex" a brain needs to be before its pain matters. It really seems like shrimp or insects should fall below that line. But it's like abortion limits - you should pick SOME value in the middle somewhere (it's ridiculous to go all the way to the extremes), but that doesn't mean it's the only correct moral choice.
I agree, when I worked at Google I remember their security measures being extremely well-thought-out - so much better than the lax approach most tech companies take. However, I DON'T trust their ideological capture. They won't abuse people's information by accident, but I will not be surprised if they start doing it on purpose to their outgroup. And they have the tools to do it en masse.
On most forums, if you're a bad actor waging the culture war, it's probably a decent strategy to post a bunch of links like this that are ridiculous non-sequiturs. Most people are too lazy to follow them and have the (usually reasonable) assumption that what's said in them is being accurately represented. Fortunately, I think The Motte is better than that. Looking forward to guesswho's inevitable (re-)permabanning. We need good leftist posters, but he's not one.
I'm far from an expert (and I doubt anyone else in this thread is either), but I'm not sure I really agree with your "extremely dangerous" assessment. Lots of things have a 100% kill rate. Like, congratulations, they've reinvented rabies? A virus that represents a serious risk to society needs to combine a number of unlikely factors, and "killing the host" is probably the easy part. (Ironically, after a certain point, high lethality makes a virus less threatening - a virus's host needs to survive to spread it on!) To truly threaten civilization, you'd have to combine it with a long asymptomatic but highly contagious incubation period.
Of course, because the media are idiots, the article you linked mentions the "surprisingly rapid" death of the mice as if that's supposed to make it more, not less, scary. Ah, journalists, never change.
Ugh, what a ridiculous take. The ability to move a body and process senses and learn behaviour that generates food is miraculous, yes. We can't build machines that come close to this yet. It's amazing that birds can do it! And humans! And cats, dogs, pigs, mice, ants, mosquitos, and 80 million other species too. Gosh, wow, I'm so agog at the numinous wondrousness of nature.
That doesn't make it intelligence. Humans are special. Intelligence is special. Until transformers and LLMs, every single story, coherent conversation, and, yes, Advent of Code solution was the creation of a human being. Even if all development stops here, even if LLMs never get smarter and these chatbots continue to have weird failure modes for you to sneer at, something fundamental has changed in the world.
Do you think you're being super deep by redefining intelligence as "doing what birds can do?" I'd expect that from a stoner, not from a long-standing mottizen. Words MEAN things, you know. If you'd rather change your vocabulary than your mind, I don't think we have anything more to discuss.
Wow, you're really doubling down on that link to a video of a bird fishing with bread. And in your mind, this is somehow comparable to holding a complex conversation and solving Advent of Code problems. I honestly don't know what to say to that.
Really, the only metric that I need is that ChatGPT makes me more productive in my job and personal projects. If you think that's "unreasonably low", well, I hope that our eventual AI Overlords can hope to meet your stringent requirements. The rest of the human race won't care.
Then I tried it on Day 7 (adjusting the prompt slightly and letting it just use Code Interpreter on its own). It figured out what it was doing wrong on Part 1 and got it on the second try. Then it did proceed to try a bunch of different things (including some diagnostic output!) and spin and fail on Part 2 without ever finding its bug. Still, this is better than your result, and the things it was trying sure look like "debugging" to me. More evidence that it could do better with different prompting and the right environment.
EDIT: Heh, I added a bit more to the transcript, prodding ChatGPT to see if we could debug together. It produced some test cases to try, but failed pretty hilariously at analyzing the test cases manually. It weakens my argument a bit, but it's interesting enough to include anyway.
So, I gave this a bit of a try myself on Day 3, which ChatGPT failed in your test and on Youtube. While I appreciate that you framed this as a scientific experiment with unvarying prompts and strict objective rules, you're handicapping it compared to a human who has more freedom to play around. Given this, I think your conclusions that it can't debug are a bit too strong.
I wanted to give it more of the flexibility of a human programmer solving AoC, so I made it clear up front that it should brainstorm (I used the magic "think step by step" phrase) and iterate, only using me to try to submit solutions to the site. Then I followed its instructions as it tried to solve the tasks. This is subjective and still pretty awkward, and there was confusion over whether it or I should be running the code; I'm sure there's a better way to give it the proper AoC solving experience. But it was good enough for one test. :) I'd call it a partial success: it thought through possible issues and figured out the two things it was doing wrong on Day 3 Part 1, and got the correct answer on the third try (and then got Part 2 with no issues). The failure, though, is that it never seemed to realize it could use the example in the problem statement to help debug its solution (and I didn't tell it).
Anyway, the transcript's here, if you want to see ChatGPT4 troubleshooting its solution. It didn't use debug output, but it did "think" (whatever that means) about possible mistakes it might have made and alter its code to fix those mistakes, eventually getting it right. That sure seems like debugging to me.
Remember, it's actually kind of difficult to pin down GPT4's capabilities. There are two reasons it might not be using debug output like you want: a) it's incapable, or b) you're not prompting it right. LLMs are strange, fickle beasts.
I'm on Apple's AI/ML team, but I can't really go into details.
...are you seriously asking this? I'm not an insect. If you want to claim some observation of insect behavior has even the slightest relevance to human society, the burden of proof's on you.
- Prev
- Next
Hoo boy. Speaking as an programmer who uses LLMs regularly to help with his work, you're very, VERY wrong about that. Maybe you should go tell Google that the 20% of their new code that is written by AI is all garbage. The code modern LLMs generate is typically well-commented, well-reasoned, and well-tested, because LLMs don't take the same lazy shortcuts that humans do. It's not perfect, of course, and not quite as elegant as an experienced programmer can manage, but that's not the standard we're measuring by. You should see the code that "junior engineers" often get away with...
More options
Context Copy link