Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.
Jump in the discussion.
No email address required.
Notes -
While I didn't mention it in this particular comment, my own p(doom) has gone from a peak of 70% in 2021 to about 30% right now.
It seems to me that the attitude once held by Yudkowsky, that AGI would be almost inevitably misaligned and agentic by default, is not true, at least not for an AI I have no qualms about calling human-level when it comes to general intelligence. I think GPT-4 is smarter than the average human, with their 100 IQ, and while it is not superhuman in any specific field, it is a far better generalist polymath than any human alive. That should count for strong evidence that we're not in the Least Convenient Possible World, especially when considering recent advances in interpretability. The fact that RLHF even works would have astounded me 3 years back!
The remainder of the x-risk I foresee is both because I, like you, can't conclusively rule out either a phase transition when basic bitch transformers/LLMs are pushed way further, or what might happen if a new and less corrigible or interpretable SOTA technique and model emerged, plus my concern about the people using an "Aligned" ASI (aligned to who, whom?) in a manner not conducive to my interests or continued survival. And of course what happens when a highly competent and jailbroken model glibly informs a bioterrorist how to cook up Super-AIDS.
If I had to put very vague numbers on the relative contributions of all of them, I'd say there roughly equal, or 10% each. I've still gone from considering my death imminent this decade to merely gravely concerned, which doesn't really change the policies I advocate for.
Edit: There's also the risk, which I haven't seen any conclusive rebuttal of, from hostile Simulacra being instantiated within an LLM. https://gwern.net/fiction/clippy
I'd give that maybe a 1-5% risk of being a problem, eventually, but my error bars are wide enough as is.
Oh, certainly. One of the easiest ways that humanity could end up utterly wiped out is once some large military (especially U.S.) is sufficiently automated, is to have it kill everyone, after being taken control of by some hostile agent. Pandemics are probably the other most likely possibility.
And of course, there's the far broader problem of totalitarianism becoming significantly easier (you can watch everyone, and have armies that don't rely on some level of popular cooperation), and automation of labor making humans obsolete for many tasks, which both seem far more likely and worrisome.
I'm more optimistic overall, but also more pessimistic that "alignment" will accomplish anything of substance than I would have been a few years ago.
More options
Context Copy link
More options
Context Copy link