This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
I know what people have written about mesa-optimizers. They've also written about the Walugi effect. I am not sure I «know» what mesa-optimizers with respect to ML are. The onus is on those theorists to mechanistically define them and rigorously show that they exist. For now, all evidence that I've seen has been either Goodhart/overfitting effects well-known in ML, or seeing-Jesus-in-a-toast tier things like Waluigi.
To be less glib, and granting the premise of mesa-optimizers existing, please see Plakhov section here. In short: we do not need to know internal computations and cogitations of a model to know that the regularization will still mangle and shred any complex subroutine that does not dedicate itself to furthering the objective.
And it's not like horny-humans-versus-evolution example, because «evolution» is actually just a label for some historical pattern that individual humans can frivolously refuse to humor with their life choices; in model training, the pressure to comply with the objective bears on any mesa-optimizer in its own alleged «lifetime», directly (and not via social shaming or other not-necessarily-compelling proxy mechanisms) . Imagine if you received a positive or negative kick to the reward system conditional on your actions having increased/decreased your ultimate procreation success: this isn't anywhere near so easy to cheat as what we do with our sex drive or other motivations. Evolution allows for mesa-optimizers, but gradient descent is far more ruthless.
…Even that would be something of a category error. Models or sub-models don't really receive rewards or punishments, this is another misleading metaphor that is, in itself, predicated upon our clunky mesa-optimizing biological mechanisms. They're altered based on the error signal; results of their behavior and their «evolution» happen on the same ontological plane, unlike our dopaminergic spaghetti one can hijack with drugs or self-deception. « Reinforcement learning should be viewed through the lens of selection, not the lens of incentivisation».
Humans have a pervasive agency-detection bias. When so much depends on whether an agent really is there, it must be suppressed harshly.
I beg to differ.
The doomers have been wrong for decades, and keep getting more wrong; the AI naysayers are merely wrong in another way. Yudkowsky's whole paradigm has failed, in large part because he's been an AI naysayer in all senses that current AI has succeeded. Who is being proven correct? People Yud, in his obstinate ignorance, had been mocking and still mocks, AI optimists and builders, pioneers of DL.
You are simply viewing this through the warped lens of Lesswrongian propaganda, with the false dichotomy of AI skepticism and AI doom. The central position both those groups seek to push out of the mainstream is AI optimism, and the case for it is obvious: less labor, more abundance, and everything good we've come to expect from scientific progress since the Enlightenment, delivered as if from a firehose. We are literally deploying those naive Golden Age Sci-Fi retrofuturist dreams that tech-literate nerds loved to poke holes in, like a kitchen robot that is dim-witted yet can converse in a human tongue and seems to have personality. It's supposed to be cool.
Even these doomers are, of course, ex-optimists: they intended to build their own AGI by 2010s, and now that they've made no progress while others have struck gold, they're going to podcasts, pivoting to policy advice, attempting to character-assassinate those more talented others, and calling them derogatory names like «stupid murder monkeys fighting to eat the poison banana».
Business as usual. We're discussing a similar thing with respect to nuclear power in this very thread. Some folks lose it when a technical solution makes their supposedly necessary illiberal political demands obsolete, and begin producing FUD.
Good point about mesaoptimizers and the difference between evolution and gradient descent.
Here's where I disagree. As someone once said, "he who rules is he who sets the null hypothesis". I claim that the onus is on AI researchers to show that their technology is safe. I don't have much faith in glib pronouncements that AI is totally understood and safe.
Nuclear power, on the other hand, is well understood, has bounded downside, and is a mature technology. It's not going to destroy the human race. We can disprove the FUD against it. But in 1945, I might have felt differently.
It's not impossible but very hard in practice to prove a negative. You know that anti-nuclear people also demand extremely strong, cost-prohibitive proofs of safety, which is why we're in this mess. Of course, they have other nefarious motives to suppress human flourishing, but so do AI alarmists.
More to the point: decades ago, Nick Bostrom has proposed a taxonomy of X-risks. Those risks should be rigorously compared, for we must hedge all of them somehow. Some of those risks seem highly likely to me, follow from our prior social failures and even particularities of the current trend, and are comparable to «total human death» in moral (if not «utilitarian») badness, so the argument about «risk from AI cannot be quantified» doesn't hold. Bostrom:
It is counterproductive to focus only on the well-propagandized model of of AI takeover through FOOM, in the age where AI built on principles radically different from those preferred by FOOM-argument-inventors is undergoing its Cambrian explosion; and in doing so exacerbate those Crunch-type risks. It is unprincipled. Moreover, it's wishful thinking: if only we could guard our asses from this one threat model! Perhaps one type of risk is truly greater than another, in raw probability or expected negative value or both. But just rehashing thought experiments about Seed AI from the 90s won't suffice to prove that the orthodox AI risk is the greater evil.
Now Bostrom himself proposes building a 6.3 regime, and Eliezer helpfully paves the way to it through his alarmism about training of capable models. I say we should at least demand they spell out why the possibility of eternity under their benevolent yoke, or fizzling out due to squandering our chances to expand, is preferable to getting paperclipped.
Because for me it is not so clear-cut. And be aware that we can fizzle out. I've argued about this here. We evidently have more than one chance to build an «aligned» (or as I'd rather have it, no-alignment-needed) AGI. We don't have infinite time for globohomo committees to surmount their perverse incentives, discover the true name of God through the game of musical chairs at Davos and immanentize Dath Ilan before proceeding to build said AGI – nor, I'd say, very good odds at aligning those committees to play the game in our interest.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link