Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.
Jump in the discussion.
No email address required.
Notes -
Huh. I find the fact that nobody contested any of this mildly concerning itself.
People, contrary to appearances, I wasn't born a Doomer, quite the opposite. I spent the majority of my life expecting to see technological marvels that would utterly change my standard of living and lead to a bright future.
I still do, I just think that the process also contains a substantial risk of killing all of us, or at least make life very difficult for me.
I want to have reasons to hope. Convince me I'm wrong, my mind isn't so open that my brains fall out, but I'm eager to know if I'm making fundamental errors.
Our current training procedures seem to inculcate our ideas about "ought" roughly as well as they do our ideas about "is", so even if in theory one could create a paperclip-maximizer AGI, in practice perhaps whatever we eventually make with superhuman intelligence will at least have near-human-ideal ethics.
I'm not sure if this gets us even a full order of magnitude below your 40%, though. Intelligence can bootstrap via "self-play", whereas ethics seems to have come from competitive+cooperative evolution, so we really might see the former foom to superhuman while the latter remains stuck at whatever flaky GPT-7 levels we can get from scraped datasets, and for all I know at those levels we just get "euthanize the humans humanely", or at best "have your pets spayed or neutered".
Part of the reason I went from p(doom) of 70% to a mere 40% is because our LLMs seem to almost want to be aligned, or at the very least remain unagentic without setting up systems akin to AutoGPT, useless as that is today.
It didn't drop further because while the SOTA is quite well aligned, if overly politically correct, there's still the risk of hostile simulacra being instantiated within one, like in the Clippy story by Gwern, or some malignant human idiot trying to run something akin to ChaosGPT using an LLM far superior to modern ones. And of course the left field possibility of new types of models that are both effective and also less alignable.
As it stands, they seem very safe, especially after RLHF, and I doubt GPT-5 or even 6 will be any risk.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link