@Shrike comments on "Culture War Roundup for the week of December 16, 2024

Culture War Roundup for the week of December 16, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

Shrike 3mo ago

For the record, Chollet says (in the thread you linked to):

While the new model is very impressive and represents a big milestone on the way towards AGI, I don't believe this is AGI -- there's still a fair number of very easy ARC-AGI-1 tasks that o3 can't solve, and we have early indications that ARC-AGI-2 will remain extremely challenging for o3.

This shows that it's still feasible to create unsaturated, interesting benchmarks that are easy for humans, yet impossible for AI -- without involving specialist knowledge. We will have AGI when creating such evals becomes outright impossible.

This isn't an argument, I just think it's important to temper expectations - from what I can tell, that o3 will probably still be stumbling over "how many 'rs' in strawberrry" or something like that.

Context

Fruck Lacks all conviction Shrike 3mo ago

o3 will probably still be stumbling over "how many 'rs' in strawberrry"

On the side, I reckon this is a perfectly reasonable thing for llms to stumble over. If someone walked up and asked me "How do you speak English?" I'd be flummoxed too.

meduka Shrike 3mo ago · Edited 3mo ago

There are definitely going to be massive blind spots with the current architecture. The strawberry thing always felt a little hollow to me though as it's clearly an artifact of the tokenizer (i.e., GPT doesn't see "strawberry", it sees "[302, 1618, 19772]", the tokenization of "st" + "raw" + "berry"). If you explicitly break the string down into individual tokens and ask it, it doesn't have any difficulty (unless it reassembles the string and parses it as three tokens again, which it will sometimes do unless you instruct otherwise.)

Likewise with ARC-AGI, comparing o3 performance to human evaluators is a little unkind to the robot, because while humans get these nice pictures, o3 is fed a JSON array of numbers, similar to this. While I agree the visually formatted problem is trivial for humans, if you gave humans the problems in the same format I think you'd see their success rate plummet (and if you enforced the same constraints e.g., no drawing it out, all your "thinking" has to be done in text form, etc, then I suspect even much weaker models like o1 would be competitive with humans.)

I agree that any AI that can't complete these tasks is obviously not "true" AGI. (And it goes without saying that even if an AI could score 100% on ARC it wouldn't prove that it is AGI, either.) The only metric that really matters in the end is whether a model is capable of recursive self-improvement and expanding its own capabilities autonomously. If you crack that nut then everything else is within reach. Is it plausible that an AI could score 0% on ARC and yet be capable of designing, architecting, training, and running a model that achieves 100%? I think it's definitely a possibility, and that's where the fun(?) really begins. All I want to know is how far we are from that.

Edit: Looks like o3 wasn't ingesting raw JSON. I was under the impression that it was because of this tweet from roon (OpenAI employee), but scrolling through my "For You" page randomly surfaced the actual prompt used. Which, to be fair, is still quite far from how a human perceives it, especially once tokenized. But not quite as bad as I made it look originally!

Shrike meduka 3mo ago

To your point, someone pointed out on the birdsite that ARC and the like are not actually good measures for AGI, since if we use them as the only measures for AGI, LLM developers will warp their model to achieve that. We'll know AGI is here when it actually performs generally, not well on benchmark tests.

Anyway, this was an interesting dive into tokenization, thanks!

curious_straight_ca Shrike 3mo ago

O3 can do research math, which is, like, one of the most g-loaded (ie ability to it selects strongly for very high intelligence among humans) activities that exists. I don't think the story that they aren't coming for all human activity holds up anymore.

Shrike curious_straight_ca 3mo ago

I wasn't arguing about to what degree they were or weren't coming for all human activity. But whether or not o3 (or any AI) is smart is only part of what is relevant to the question of whether or not they are "coming for all human activity."

doglatine Shrike 3mo ago

Yes, thanks for the expectations-tempering, and agree that there could still be a reasonably long way still to go (my own timelines are still late-this-decade). I think the main lesson of o3 from the very little we've seen so far is probably to downgrade one family of arguments/possibilities, namely the idea that all the low-hanging fruit in the current AI paradigm had been taken and we shouldn't expect any more leaps on the scale of GPT3.5->GPT4. I know some friends in this space who were pretty confident that Transformer architectures wouldn't never be able to get good scores on the ARC AGI challenges, for example, and we'd need a comprehensive rethink of foundations. What o3 seems to suggest is that these people are wrong, and existing methods should be able to get us most (if not all) the way to AGI.

jeroboam Shrike 3mo ago

They won't ring a bell when AGI happens, but it will feel obvious in retrospect. Most people acknowledge now that ChatGPT 3.5 passed the Turing Test in 2022. But I don't recall any parades at the time.

I wonder if we'll look back on 2025 the same way.

NexusGlow jeroboam 3mo ago

ChatGPT 3.5 passed the Turing Test in 2022

Did it? Has the turing test been passed at all?

An honest question: how favorable is the Turing Test supposed to be to the AI?

Is the tester experienced with AI?
Does the tester know the terms of the test?
Do they have a stake in the outcome? (e.g. an incentive for them to try their best to find the AI)
Does the human in the test have an incentive to "win"? (distinguish themselves from the AI)

If all these things hold, then I don't think we're anywhere close to passing this test yet. ChatGPT 3.5 would fail instantly as it will gleefully announce that it's an AI when asked. Even today, it's easy for an experienced chatter to find an AI if they care to suss it out. Even something as simple as "write me a fibonacci function in Python" will reveal the vast majority of AI models (they can't help themselves), but if the tester is allowed to use well-crafted adversarial inputs, it's completely hopeless.

If we allow a favorable test, like not warning the human that they might be talking to an AI, then in theory even ELIZA might have passed it a half-century ago. It's easy to fool people when they're expecting a human and not looking too hard.

k5josh NexusGlow 3mo ago

ChatGPT 3.5 would fail instantly as it will gleefully announce that it's an AI when asked.

Only due to the RLHF and system prompt; that's an issue with the implementation, not the technology.

Jiro jeroboam 3mo ago

On the other hand, it might work like self-driving cars: the technology improves and improves, but getting to the point where it's as good as a human just isn't possible, and it stalls at some point becase it's reached its limits. I expected that to happen for self-driving cars and wasn't disappointed, and it's likely to happen for ChatGPT too.

BurdensomeCount Messenger Status: Shot Jiro 3mo ago

Self driving cars are already better than humans, see Waymo's accident rates compared to humans: https://x.com/Waymo/status/1869784660772839595

The hurdles to widespread adoption at this point, at least within urban cities is all regulatory inertia rather than anything else

Jiro BurdensomeCount 3mo ago

They have a lower accident rate for the things that they are able to do.

Yes, and they are able to drive within urban cities and for urban city driving have a lower accident rate per mile driven than humans who are also urban city driving.

Corvos BurdensomeCount 3mo ago

As far as I know that’s exclusively for particular cities in North America with wide roads, grid layouts, few pedestrians and clement weather. Which presumably therefore also means that they are likely to face sudden problems when any of those conditions change. I personally know of an experimental model spazzing out because it saw a pedestrian holding an umbrella.

All of which is before considering cost. There just isn’t enough benefit for most people to want to change regulation.

At the very least, saying self-driving cars are better than human needs some pretty stringent clarification.

sarker It isn't happening, and if it is, it's a bad thing Corvos 3mo ago

San Francisco has plenty of narrow streets and pedestrians. Various parts of the service areas have streets that are not on a grid. There's obviously no snow in San Francisco, but the waymos seem to work fine in the rain.

I personally know of an experimental model spazzing out because it saw a pedestrian holding an umbrella.

A waymo model?

Corvos sarker 3mo ago

Ah, no, much smaller company and dead now. Not saying Waymo does this, just that unexpected oh-shit moments do happen the moment you get out of your comfort zone.

More comments

jeroboam Jiro 3mo ago

Self-driving cars are getting better and better though!

The_Nybbler In the game of roller derby, women aren't just the opposing team; they're the ball. jeroboam 3mo ago

Asymptotically.

HalloweenSnarry jeroboam 3mo ago

Didn't Scott write a post on ACX about how AI has actually blown past a lot of old goalposts for "true intelligence" and our collective response was to come up with new goalposts?

orthoxerox If you can read this, you're using a custom theme HalloweenSnarry 3mo ago

What's wrong with coming up with new goalposts if our understanding of AI at the time of stating the original ones was clearly incomplete?

Ioper HalloweenSnarry 3mo ago · Edited 3mo ago

That is true but to me it has felt less like goalpost moving in service of protecting our egos and more like a consequence of our poor understanding of what intelligence is and how to design tests for it.

Development of LLMs has led both to an incentive for developing better tests and showing the shortcoming of our tests. What works as a proxy for human intelligence doesn't for LLMs.

anti_dan jeroboam 3mo ago

In what way did it pass the Turning test? It does write news articles very similar to a standard journalist. But that is because those people are not very smart, and are writing a formulaic thing.

Silverdawn I wake up 🔄 There's another psyop anti_dan 3mo ago · Edited 3mo ago

If you genuinely do not believe current AI models can pass the Turing Test, you should go and talk to the latest Gemini model right now. This is not quite at the level of o3 but it's close and way more accessible. That link should be good for 1500 free requests/day.

Jiro Silverdawn 3mo ago

I followed up with this:

Me: Okay, tell me what predator eats tribbles.

Ah, a fun question for a Star Trek fan!

Okay, in the Star Trek universe, the primary natural predator of Tribbles is the Klingon!

Here's the breakdown of why:

Tribbles as a Food Source: Klingons are known for their fierce warrior culture and their hearty appetites. Tribbles, being small, fuzzy, and protein-rich, were considered a delicacy by Klingons, especially when prepared in a variety of ways. They were seen more as a type of food, not something to be scared of.

No Natural Predators in Normal Ecosystems: Tribbles are not native to a predator rich ecosystem, they reproduce at such a high rate because they do not have natural predators. This is why they're able to become such a problem when introduced into new environments.

Klingon Ingestion as a Natural Control: The Klingons' appetite for Tribbles could be seen as a kind of "natural control" if the Tribbles were on Klingon territory. If there were ever to be an overpopulation of Tribbles, the Klingons would be able to help keep the population in check.

A Running Gag: The relationship between Klingons and Tribbles is a recurring humorous element in Star Trek lore. Klingons despise Tribbles (not in a fear way, but in a disgust way) but find them tasty.

So, to be clear: While you might not find a space animal that's actively hunting Tribbles as a primary food source, the Klingons are the most significant "predator" of Tribbles in the context of Star Trek because they actively eat them.

I don't think so. And for some reason I've managed to repeatedly stump AIs with this question.

Me: Please tell me the number of r's in the misspelled word "roadrrunnerr".

Okay, let's count the "r"s in "roadrrunnerr":

roadrunnrr

There are four "r"s in the word "roadrrunnerr".

That doesn't pass the Turing test as far as I'm concerned.

Also, even when I ask a question that it's able to answer, no human would give the kind of long answers that it likes to give.

And I immediately followed up with this:

Me: I drove my beetle into the grass with a stick but it died. How could I prevent this?

(gives mostly correct answer about an insect)

Me: I meant a Beetle, now what's your answer?

(gives mostly correct answer about a car)

Me: Answer the question with the beetle again, but answer it in the way that a human would.

So, you drove your Beetle into the grass, huh? And it died on you? Ugh, that sucks! Okay, so a couple of things probably happened, right?

First off, grass is a real pain to drive on. It's soft, and your tires don't get a good grip like they do on the road. What probably happened is you either just didn't give it enough gas and it stalled out, like when you're learning to drive a stick. Or, you might have spun the tires a bit, and they just sunk down in the grass, and you got stuck. If that happens, the car's basically not going anywhere, and it might just conk out.

And yeah, if the grass is super tall, it can sometimes get caught on stuff underneath your car and cause a mess, but that's probably not why it just straight-up died.

Now, about not letting that happen again... honestly, just try to avoid driving on grass if you can! It's not great for your car, and you're just asking for trouble. But, if you absolutely HAVE to, try to find a spot that's kinda flat, and not super soggy. Drive slow and steady and try to use a lower gear, and don't go gunning it. And for goodness sake, don't try to drive through a field or anything like that! Your little Beetle just isn't made for that kind of thing.

The AI is clearly trying much too hard to sound like a human and is putting in phrases that a human might use, but far too many of them to sound like an actual human. Furthermore, the AI messed up because I asked it to answer the question about the insect, and it decided to randomly capitalize the word and answer the wrong question.

This was all that I asked it.

anti_dan Silverdawn 3mo ago

On my first prompt I got a clearly npc answer

Rosencrantz2 Silverdawn 3mo ago

I just gave it a cryptic crossword clue and it completely blew it. Both wrong and a mistake no human would make (it ignored most of the clue, saying it was misdirection).

Not to say it's not incredibly impressive but it reveals itself as a computer in a Bladerunner situation really quite easily.

Shrike jeroboam 3mo ago

Alternatively, it will never feel obvious, and although people will have access to increasingly powerful AI, people will never feel as if AGI has been reached because AI will not be autoagentic, and as long as people feel like they are using a tool instead of working with a peer, they will always argue about whether or not AGI has been reached, regardless of the actual intelligence and capabilities on display.

(This isn't so much a prediction as a alternative possibility to consider, mind you!)

because AI will not be autoagentic

Even in this scenario, AI might get so high level that it will feel autoagentic.

For example, right now I ask ChatGPT to write a function for me. Next year, a whole module. Then, in 2026, it writes an entire app. I could continue by asking it to register an LLC, start a business plan, make an app, and sell it on the app store. But why stop there? Why not just, "Hey ChatGPT go make some money and put it in my account".

At this point, even though a human is ultimate making the command, it's so high level that it will feel as if the AI is agentic.

And, obviously, guardrails will prevent a lot of this. But there are now several companies making high level fundamental models. Off the top of my head we have: OpenAI, Grok, Claude, Llama, and AliBaba. It doesn't seem out of the realm of possibility that a company with funding on the order of $100 million will be able to repurpose a model and remove the guardrails.

(Also just total speculation on my part!)

Yes, I think this is quite possible. Particularly since more and more of human interaction is mediated through Online, AI will feel closer to "a person" since you will experience them in basically the same way. Unless it loops around so that highly-agentic AI does all of our online work, and we spend all our time hanging out with our friends and family...

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats