@JhanicManifold's banner p

JhanicManifold


				

				

				
6 followers   follows 0 users  
joined 2022 September 04 20:29:00 UTC

				

User ID: 135

JhanicManifold


				
				
				

				
6 followers   follows 0 users   joined 2022 September 04 20:29:00 UTC

					

No bio...


					

User ID: 135

ā€Ž
ā€Ž

There are a few ways that GPT-6 or 7 could end humanity, the easiest of which is by massively accelerating progress in more agentic forms of AI like Reinforcement Learning, which has the "King Midas" problem of value alignment. See this comment of mine for a semi-technical argument for why a very powerful AI based on "agentic" methods would be incredibly dangerous.

Of course the actual mechanism for killing all humanity is probably like a super-virus with an incredibly long incubation period, high infectivity and high death rate. You can produce such a virus with literally only an internet connection by sending the proper DNA sequence to a Protein Synthesis lab, then having it shipped to some guy you pay/manipulate on the darknet and have him mix the powders he receives in the mail in some water, kickstarting the whole epidemic, or pretend to be an attractive woman (with deepfakes and voice synthesis) and just have that done for free.

GPT-6 itself might be very dangerous on its own, given that we don't actually know what goals are instantiated inside the agent. It's trained to predict the next word in the same way that humans are "trained" by evolution to replicate their genes, the end result of which is that we care about sex and our kids, but we don't actually literally care about maximally replicating our genes, otherwise sperm banks would be a lot more popular. The worry is that GPT-6 will not actually have the exact goal of predicting the next word, but like a funhouse-mirror version of that, which might be very dangerous if it gets to very high capability.

** per kg of lean body mass, so at 20% body fat this would be around 1.92g/kg, which is a bit closer to the 1.8g/kg number

Iā€™m 150 pounds, and with the recommended RDA of 0.8 g/kg body weight for protein intake a day

The recommended RDA is way, way too low if you want to maximise muscle gain and minimise muscle loss on a calorie deficit, same for the 1.6g/kg for athletes. The real number is more like 2.4g/kg of lean body mass. It's true that protein "completeness" is not that big of a deal, since it's pretty easy to get all amino acids with a few different vegetables. What is, however, hard to do on a vegan diet is eat a high-protein, low-calorie diet, which is what you'd want to be able to do if you're planning on losing weight and either maintaining or building muscle at the same time. All vegan foods that contain protein either contain even more fat, or even more carbs. You can get your protein needs from vegan stuff, but you'll need to eat a shit ton of calories to do so. On the other hand, my 2300 cal/day non-vegan diet gives me 190g of protein without that much effort.

When you actually try to use it to diagnose stuff, telling it "don't output your answer right away, write your inner monologue working through the question, considering multiple possibilities" almost always improves the answer, at least in the domains I've tested on.

By far the worst effects of both nicotine and caffeine should be the chronically lowered sleep quality, no? That's mostly why I wouldn't want kids vaping, given that sleep affects basically every other aspect of your life.

In academic circles there's a real prestige that comes with the word "tenured". I think getting that distinction lets professors "relax" in a very important sense. Most really smart researchers in math and physics just want to be left alone to do their research, they want to be insulated from the practicalities of real life to just have space to think, and tenure provides that in a way that your suggestion doesn't really. You don't have to worry that you won't be able to eat if you don't publish a set number of papers or appear well on whatever citation metric, so you can do a bit more pie-in-the-sky thinking than you could without tenure. Though this is probably less true in fields that require lots and lots of grant proposals and lots of actual equipment, but for the theory-heavy fields tenure is really important.

Banning DEI stuff would seem easily positive to me, but banning tenure altogether is just insane, it would just make Texas incredibly less competitive as a place for promising young researchers.

yup, "wegovy" is literally just the branded name of semaglutide 2.4mg weekly. The 1mg/week dose is called "ozempic", all of these are literally the same molecule.

semaglutide 2.4mg weekly injections that make you waaay less hungry. Currently the hottest thing in obesity treatment.

Yeah this feels more like a cultural misunderstanding than anything else. I can likewise imagine myself in a playful situation with a 4 year old where I'd say something like "spank my butt" to the child, even if it would sound weird out of context.

Stuart Russell is the guy you want for that job. Professor at Stanford, literally wrote the textbook on AI, and can articulate the worries very well.

You can make the case that string theorists ahve abandoned truth seeking (though they would certainly disagree), but this is completely unrelated to wokeness. And there are plenty of other branches of physics, chemistry and biology which are very practical (like condensed matter physics), who are pretty much untouched.

could you please try to explain yourself in one or two succinct paragraphs instead of in giant essays or multi-hour long podcasts?

That's a fair point, here are the load-bearing pieces of the technical argument from beginning to end as I understand them:

  1. Consistent Agents are Utilitarian: If you have an agent taking actions in the world and having preferences about the future states of the world, that agent must be utilitarian, in the sense that there must exist a function V(s) that takes in possible world-states s and spits out a scalar, and the agent's behaviour can be modelled as maximising the expected future value of V(s). If there is no such function V(s), then our agent is not consistent, and there are cycles we can find in its preference ordering, so it prefers state A to B, B to C, and C to A, which is a pretty stupid thing for an agent to do.

  2. Orthogonality Thesis: This is the statement that the ability of an agent to achieve goals in the world is largely separate from the actual goals it has. There is no logical contradiction in having an extremely capable agent with a goal we might find stupid, like making paperclips. The agent doesn't suddenly "realise its goal is stupid" as it gets smarter. This is Hume's "is vs ought" distinction, the "ought" are the agent's value function, and the "is" is its ability to model the world and plan ahead.

  3. Instrumental Convergence: There are subgoals that arise in an agent for a large swath of possible value functions. Things like self-preservation (E[V(s)] will not be maximised if the agent is not there anymore), power-seeking (having power is pretty useful for any goal), intelligence augmentation, technological discovery, human deception (if it can predict that the humans will want to shut it down, the way to maximise E[V(s)] is to deceive us about its goals). So that no matter what goals the agent really has, we can predict that it will want power over humans, want to make itself smarter, and want to discover technology, and want to avoid being shut off.

  4. Specification Gaming of Human Goals: We could in principle make an agent with a V(s) that matches ours, but human goals are fragile and extremely difficult to specify, especially in python code, which is what needs to be done. If we tell the AI to care about making humans happy, it wires us to heroin drips or worse, if we tell it to make us smile, it puts electrodes in our cheeks. Human preferences are incredibly complex and unknown, we would have no idea what to actually tell the AI to optimise. This is the King Midas problem: the genie will give us what we say (in python code) we want, but we don't know what we actually want.

  5. Mesa-Optimizers Exist: But even if we did know how to specify what we want, right now no one actually knows how to put any specific goal at all inside any AI that exists. A Mesa-optimiser refers to an agent which is being optimised by an "outer-loop" with some objective function V, but the agent learns to optimise a separate function V'. The prototypical example is humans being optimised by evolution: evolution "cares" only about inclusive-genetic-fitness, but humans don't, given the choice to pay 2000$ to a lab to get a bucket-full of your DNA, you wouldn't do it, even if that is the optimal policy from the inclusive-genetic-fitness point of view. Nor do men stand in line at sperm banks, or ruthlessly optimise to maximise their number of offspring. So while something like GPT4 was optimised to predict the next word over the dataset of human internet text, we have no idea what goal was actually instantiated inside the agent, its probably some fun-house-mirror version of word-prediction, but not exactly that.

So to recap, the worry of Yudkowsky et. al. is that a future version of the GPT family of systems will become sufficiently smart and develop a mesa-optimiser inside of itself with goals unaligned with those of humanity. These goals will lead to it instrumentally wanting to deceive us, gain power over earth, and prevent itself from being shut off.

I listened to that one, and I really think that Eliezer needs to develop a politician's ability to take an arbitrary question and turn it into an opportunity to talk about what he really wanted to talk about. He's taking these interviews as genuine conversations you'd have with an actual person, instead of having a plan about what things he wants to cover for the type of audience listening to him in that particular moment. While this conversation was better than the one with Lex, he still didn't lay out the AI safety argument, which is:

"Consistent Agents are Utilitarian + Orthogonality Thesis + Instrumental Convergence + Difficulty of Specifying Human Goals + Mesa-Optimizers Exist = DOOM"

He should be hitting those 5 points on every single podcast, because those are the actual load-bearing arguments that convince smart people, so far he's basically just repeating doom predictions and letting the interviewers ask whatever they like.

Incidentally while we're talking of AI, over the past week I finally found an argument (that I inferred myself from interactions with chatGPT, then later found Yann Lecun making a similar one) that convinced me that the entire class of auto-regressive LLMs like the GPT series are much less dangerous than I thought, and basically have a very slim chance of getting to true human-level. And I've been measurably happier since finding an actual technical argument for why we won't all die in the next 5 years.

Has anyone else tried Github Copilot, and found it to have really insidious downsides? I noticed the other day that copilot really fucks up my mental flow when building a program, it's like it prevents me from building a complete mental map of the global logic of what I'm doing. It's not that the code copilot outputs is bad exactly, it's that writing the code myself seems to make me understand my own program much better than just reading and correcting already-written code. And overall this "understanding my code" effect makes me code much faster, so I'm not even sure that copilot truly provides that large of a speed benefit. I also notice my mind subtly "optimizing for the prompt" instead of just writing the code myself, like some part of my mind is spending its resources figuring out what to write to get copilot to produce what I want, instead of just writing what I want.

Maybe copilot is a lifesaver for people who aren't programming particularly complex programs, and I do think it's useful for quickly making stuff that I don't care about understanding. But if I'm writing a new Reinforcement Learning algorithm for a research paper, there is no way that I'd use it.

Though who am I kidding, the world will then be in need of saving from AI killing everyone by year 20XX.

I don't think you're being charitable enough to Yudkowsky and AI safety people, I think he has a very specific and falsifiable model of AI killing everyone. In my own model, if we are all still alive in 2033 AND we have AI that routinely can write scientific papers of moderate quality or higher, I think the problem will have turned out to be much easier than expected, and I don't further expect humanity to need to be saved by 2050 or something.

I think yudkowsky recently renamed his field to "AI not-kill-everyone-ism" specifically to distinguish it from those other relatively minor concerns. "AI safety" is in fact no associated with the alignment problem if you talk to run of the mill professors and researchers, they still mean safety in the sense of "how can I prevent my half a million dollar robot taking actions that fry its motors?" or "how can we ensure privacy when training this LLM on user data?".

After your comment I tried myself to make chatGPT play chess against stockfish. Telling it to write 2 paragraphs of game analysis before trying to make the next move significantly improved the results. Telling it to output 5 good chess moves and explain the reasoning behind them before choosing the real move also improves results. So does rewriting the entire history of the game in each prompt. But even with all of this, it gets confused about the board state towards the midgame, it tries to capture pieces that aren't there, or move pieces that were already captured.

The two fundamental problems are the lack of long term memory (the point of making it write paragraphs of game analysis is to give it time to think), and the fact that it basically perpetually assumes that its past outputs are correct. Like, it will make a mistake in its explanation and mention that a queen was captured when in fact it wasn't, and thereafter it will assume that the queen was in fact captured in all future outputs. All the chess analysis it was trained on did not contain mistakes, so when it generates its own mistaken chess analysis it still assumes it didn't make mistakes and takes all its hallucinations as the truth.

Because Von Neumann could do shit like argue convincingly with an expert on Byzantine history after someone gave him a set of encyclopedias on said history, he read the books once and could remember everything in them well enough to give the expert trouble. At dinner parties people would call out books, and he'd just start reciting them from memory until someone told him to stop. He didn't train to do any of this, it was just an incidental fact about his brain. He revolutionised a bunch of separate branches of mathematics and physics by the time he died at 53, and he wasn't some social recluse like Dirac, he was apparently quite socially adept.

I am entirely sure that if Von Neumann had tried his hand at business instead of being obsessed by physics (like a lot of smart people get), he'd have been one of the best businessmen of all time. Same thing for pretty much any other field. The anecdotes about him really do point at his brain being a completely unambiguous upgrade to the normal human brain, with basically no downsides.

It's pretty much impossible to know what the ideal outcome for AGI is, because one of the first things I would ask of an aligned AGI is to augment my own intelligence in a direction aligned with my current values, and I would then leave the job of figuring out what I want to do with infinity to my future better self. A future where we remain meat-machines with our current brains intact and become shepherded by great all-knowing AIs is very unlikely, I don't want to trust and be in awe of the Jupiter-Brain, I want to be the Jupiter-Brain.

To give him the benefit of the doubt, maybe he could ask those questions but avoids them to try to keep it humanities focused. No less painful to listen to.

Oh that's way too charitable towards him, I think he really wanted to go as technically deep as he was able, given that this is about AI and he views AI as part of his region of expertise. At least the crypto-dudes on the bankless podcast asked Eliezer their own naive but sincere questions, they knew they were out of their depth and didn't try to somehow pretend at expertise. But Lex insistently tries to steer the conversation towards "deep topics", and for him this means bringing up Love, Consciousness, The Human Condition, etc.

I think he's trying to imitate Old Esteemed Physicists, who after a lifetime of deep technical contributions spend a few minutes or a small book talking about Beauty and Love, but with Lex it just perpetually feels unearned.

If you took a 200 IQ big-brain genius, cut off his arms and legs, blinded him, and then tossed him in a piranha tank I don't think he would MacGyver his way out.

I fully agree for a 200 IQ AI, I think AI safety people in general underestimate the difficulty that being boxed imposes on you, especially if the supervisors of the box have complete read access and reset-access to your brain. However, if instead of the 200 IQ genius, you get something like a full civilization made of Von Neumann geniuses, thinking at 1000x human-speed (like GPT does) trapped in the box, would you be so sure in that case? While the 200 IQ genius is not smart enough to directly kill humanity or escape a strong box, it is certainly smart enough to deceive its operators about its true intentions and potentially make plans to improve itself.

But discussions of box-evasion have become kind of redundant, since none of the big players seem to have hesitated even a little bit to directly connect GPT to the internet...

Very difficult to tell. The only actual training metric is the average log probability of getting the next word correct, and in that metric the gap between GPT3 and GPT2 is larger than that between 4 and 3, but understanding how that metric maps onto our own intuitive notions of "performance" is really hard. And human perceptions of intelligence are really only sensitive to small changes around the human average, I think GPT2 was too dumb in an unusual way for people to really get a good sense of its capabilities.

Sooo, Big Yud appeared on Lex Fridman for 3 hours, a few scattered thoughts:

Jesus Christ his mannerisms are weird. His face scrunches up and he shows all his teeth whenever he seems to be thinking especially hard about anything, I didn't remember him being this way in the public talks he gave a decade ago, so this must either only be happening in conversations, or something changed. He wasn't like this on the bankless podcast he did a while ago. It also became clear to me that Eliezer cannot become the public face of AI safety, his entire image, from the fedora, to the cheap shirt, facial expressions and flabby small arms oozes "I'm a crank" energy, even if I mostly agree with his arguments.

Eliezer also appears to very sincerely believe that we're all completely screwed beyond any chance of repair and all of humanity will die within 5 or 10 years. GPT4 was a much bigger jump in performance from GPT3 than he expected, and in fact he thought that the GPT series would saturate to a level lower than GPT4's current performance, so he doesn't trust his own model of how Deep Learning capabilities will evolve. He sees GPT4 as the beginning of the final stretch: AGI and SAI are in sight and will be achieved soon... followed by everyone dying. (in an incredible twist of fate, him being right would make Kurzweil's 2029 prediction for AGI almost bang on)

He gets emotional about what to tell the children, about physicists wasting their lives working on string theory, and I can see real desperation in his voice when he talks about what he thinks is really needed to get out of this (global cooperation about banning all GPU farms and large LLM training runs indefinitely, on the level of even stricter nuclear treaties). Whatever you might say about him, he's either fully sincere about everything or has acting ability that stretches the imagination.

Lex is also a fucking moron throughout the whole conversation, he can barely even interact with Yud's thought experiments of imagining yourself being someone trapped in a box, trying to exert control over the world outside yourself, and he brings up essentially worthless viewpoints throughout the whole discussion. You can see Eliezer trying to diplomatically offer suggested discussion routes, but Lex just doesn't know enough about the topic to provide any intelligent pushback or guide the audience through the actual AI safety arguments.

Eliezer also makes an interesting observation/prediction about when we'll finally decide that AIs are real people worthy of moral considerations: that point is when we'll be able to pair midjourney-like photorealistic video generation of attractive young women with chatGPT-like outputs and voice synthesis. At that point he predicts that millions of men will insist that their waifus are actual real people. I'm inclined to believe him, and I think we're only about a year or at most two away from this actually being a reality. So: AGI in 12 months. Hang on to your chairs people, the rocket engines of humanity are starting up, and the destination is unknown.