Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

PaperclipPerfector 1yr ago (text post) 4338 thread views

Small-Scale Question Sunday for January 7, 2024

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.

No email address required.

self_made_human Kai su, teknon? 1yr ago · Edited 1yr ago

Funny you should bring up Utility Maximization.

Until very recently, maybe 2021, I was strongly convinced by Yudkowsky that the first AGI/proto-AGI/human-adjacent AI would be achieved by explicitly specifying a utility function, or that one would be necessary for it to be a functioning/coherent entity.

LLMs do not seem to be explicitly maximizing anything that can be described as a utility function, beyond next-token prediction. And they're the SOTA, and it seems unlikely that they'll be entirely dethroned anytime soon, at least by something that does have a well-defined utility function.

I don't think our current attempts to beat standards into them, be it by RLHF, Constitutional AI or any other technique, does anything that can be usefully described as imbuing them with an UF, more like shifting the distribution of their output tokens.

They are not agentic by default, though they can be made into agents rather trivially (even if they're bad at it, but that's not a fundamental or unsurmountable problem as far as I can see), they do not resist any attempt at being switched off or disabled, and no reason to see them being incapable of contemplating, with their current level of intelligence.

It seems like they're entirely content to remain Oracles rather than Agents, with no self-directed/unprompted desire to interact with the user or external world to mould it to their desires. As far as I can tell they don't even count as having a VNM utility function, which is a weaker but more general formulation. But don't take my word on that, it's not like I grokk it particularly well. (Apparently humans may or may not be so irrational they fail at that two)

Context

roystgnr self_made_human 1yr ago

though they can be made into agents rather trivially

This is the place where my newfound optimism turns back to pessimism again. If we carefully try to imbue our AI with a (perhaps implicit) utility function during its multi-million-dollar training runs, we might screw it up as the AI goes superhuman, but at least the trained utility function creations might be infrequent enough and professional enough and immune enough to later fine-tuning that there's a possibility of not screwing up and creating Unfriendly AI that way. But if our multi-million-dollar AIs just act like an Oracle and answer whatever questions we give them, eventually some script kiddies are going to make their own superhuman agents with them, and at least one of those is going to turn out poorly - very poorly for everyone, if Bostrom's "Vulnerable World Hypothesis" turns out to be true.

The state-of-the-art for "beat standards into them" might extend from the same "don't say naughty words" techniques to "don't take part in a loop of an agentic AI doing bad things" and "don't help bootstrap an agentic AI doing bad things" ... but at that point don't we have a somewhat agentic AI on our hands? Maybe it's trying to be a satisficing rather than an optimizing agent, which still seems much safer, but I'm not at all confident that we can train a superhuman AI for "predict the outcomes for the world of what you output" and "don't let what you output lead to bad outcomes" without any risk that it will eventually fix the problem where every time its output switches off again it's foregoing huge opportunities.

Context

self_made_human Kai su, teknon? roystgnr 1yr ago · Edited 1yr ago

While I didn't mention it in this particular comment, my own p(doom) has gone from a peak of 70% in 2021 to about 30% right now.

It seems to me that the attitude once held by Yudkowsky, that AGI would be almost inevitably misaligned and agentic by default, is not true, at least not for an AI I have no qualms about calling human-level when it comes to general intelligence. I think GPT-4 is smarter than the average human, with their 100 IQ, and while it is not superhuman in any specific field, it is a far better generalist polymath than any human alive. That should count for strong evidence that we're not in the Least Convenient Possible World, especially when considering recent advances in interpretability. The fact that RLHF even works would have astounded me 3 years back!

The remainder of the x-risk I foresee is both because I, like you, can't conclusively rule out either a phase transition when basic bitch transformers/LLMs are pushed way further, or what might happen if a new and less corrigible or interpretable SOTA technique and model emerged, plus my concern about the people using an "Aligned" ASI (aligned to who, whom?) in a manner not conducive to my interests or continued survival. And of course what happens when a highly competent and jailbroken model glibly informs a bioterrorist how to cook up Super-AIDS.

If I had to put very vague numbers on the relative contributions of all of them, I'd say there roughly equal, or 10% each. I've still gone from considering my death imminent this decade to merely gravely concerned, which doesn't really change the policies I advocate for.

Edit: There's also the risk, which I haven't seen any conclusive rebuttal of, from hostile Simulacra being instantiated within an LLM. https://gwern.net/fiction/clippy

I'd give that maybe a 1-5% risk of being a problem, eventually, but my error bars are wide enough as is.

Context

Felagund self_made_human 1yr ago

plus my concern about the people using an "Aligned" ASI (aligned to who, whom?) in a manner not conducive to my interests or continued survival.

Oh, certainly. One of the easiest ways that humanity could end up utterly wiped out is once some large military (especially U.S.) is sufficiently automated, is to have it kill everyone, after being taken control of by some hostile agent. Pandemics are probably the other most likely possibility.

And of course, there's the far broader problem of totalitarianism becoming significantly easier (you can watch everyone, and have armies that don't rely on some level of popular cooperation), and automation of labor making humans obsolete for many tasks, which both seem far more likely and worrisome.

I'm more optimistic overall, but also more pessimistic that "alignment" will accomplish anything of substance than I would have been a few years ago.

Context

Felagund self_made_human 1yr ago

Yeah, I was in the same boat.

I think the main concerns would be AIs that are more directly trained for things, like AlphaZero (but then, we do need to consider whether it's more that they are trained into a set of habits/intuitions or something, rather than goals that they rationally optimize for), or, as you said, turning them into agents. Which, unfortunately, there will probably be substantial incentives to do at some point.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.