Worked for selecting British Army officers for a surprisingly long time.
Going for maximum brevity: you are allowed a gun for shooting objects or animals, but never ever ever for shooting at a human being no matter what.
(i.e. generally the law tries to say yes to sport and hunting, but no to lethal self-defence or rebellion. In days when people had more faith in government, this was pretty well understood and supported as being the government retaining its necessary monopoly on violence. Now, of course, two-tier anarcho-tyranny beckons.)
In my not-entirely unrelated experience, the first thing to do with fighting parents is to be very clear what is, and is not, in your control. You cannot have a strong effect on your parents' relationship; you can't make them stop fighting or make them start liking each other. What you can do is behave honourably towards them both and give them both at least one loving family relationship. Focus on making it clear to both that you love them both, and that because you love them you will not be taking sides, betraying confidences, or nodding along while one belittles the other to you.
Secondly, at the risk of being callous and mistaken...
I also seem to have a high sex drive, which (coupled with the disability and the selfloathing) is a big problem. Disability in particular is a huge epistemic distortion-it's always there, like an invisible monster, questioning if people are expressing what exactly they feel about you, questioning if the positive feedback you get is authentic ... in general, I simply do not have anything I dream for. It feels like some sort of learned helplessness: my wanting-machinery has internalized something which I don't know how to put into words, and that has made me just not want any thing at all with particular intensity overall. I have enough skills/intelligence/opportunity to be able to earn well and prosper; but to what end? I don't know.
This is pretty much life in your 20s for lots of people. Certainly it describes my early 20s pretty well, and I wasn't disabled at all, just shy and with few good friends. The majority of people don't have huge life goals, they just muddle along doing whatever seems vaguely amusing at the time until they die. Not saying this is a good thing, just that you shouldn't beat yourself up for not being a Steve Jobsian ubermensch and I think that you should think about your disability less. Being human is hard, but that's normal. You're not cursed to misery forever because you're disabled.
I can probably take steps to try fixing these problems. Sleep, food, exercise, talk to a psychologist, find futures which feel reasonable given my circumstances, all of that. But as I put it to a friend earlier-all of that needs some sort of underlying source of will, which I feel like I have run dry of. I don't know how to fix that. I don't know if I can, or if I want to.
I feel the same way, often. I'm still trying to figure it out, but I think that this is more a disease of the body than the mind. Light exercise (the kind that really doesn't take a lot of will, just a walk for an hour outside with an audiobook or equivalent) seems to help a lot. Counterintuitively, I think the flow is exercise -> energy -> will rather than the other way around. Expecting a little bit more from your body will make it squeeze a little bit more out, and that's a positive feedback loop.
Kudos, and thank you for the story.
I think an alliance more-or-less entails long-term mutual support, which in practice usually requires some distance. Britain and Portugal have been allies for 900 years, Britain and Japan have usually got on pretty well. Likewise Britain and Australia. It is fundamentally different from vassalage (support from a superior power in exchange for obedience from a lesser one) and mutual cooperation (countries who pursue their own competing interests but cooperate on occasion).
Almost no countries in the EU are allies, except of convenience, and I find the constant desire to pretend otherwise tiresome. Britain's politicians fawn over every foreign connection they can find, our newspaper write stirring paeans to the bravery of Ukranian troops who we basically treat as meat-shields and who would in turn butcher us all if they thought it would help against Russia, and we just tank knife after knife in the back with a smile on our face.
Fair. I enjoyed Janus' Simulators when it was published, and found it insightful. Now that you point it out, Scott's been decent at discussing AI as-it-is, but his basal position seems to be that AI is a default dangerous thing that needs to be carefully regulated and subjected to the whims of alignment researchers, and that slowing AI research is default good. I disagree.
I find myself willing to consider trying a Regulatory or Surgical Pause - a strong one if proponents can secure multilateral cooperation, otherwise a weaker one calculated not to put us behind hostile countries (this might not be as hard as it sounds; so far China has just copied US advances; it remains to be seen if they can do cutting-edge research). I don’t entirely trust the government to handle this correctly, but I’m willing to see what they come up with before rejecting it.
The blending of concepts that we see in MidJourney is probably less to do with the diffusion per se as with CLIP
Thanks! I'm not strong on diffusion model and multimodal models, I'll do some reading.
'Self play' is relevant for text generation. There is a substantial cottage industry in using LLMs to evaluate the output of LLMs and learn from the feedback. It can be easier to evaluate whether text 'is good' than it is to generate good text. So multiple attempts and variations can lead to feedback and improvement. Mostly self play to improve LLMs is done at the level of optimising prompts. However the outputs improved by that method can be used as training examples, and so can be used to update the underlying weights.
https://topologychat.com is a commercial example of using LLMs in a way inspired by chess programming (Leela, Stockfish). It does a form of self play on inputs that have been given to it, building up and prioritising different lines. It then uses these results to update weights in a mixture of experts model.
Again, thank you. I haven't come across this kind of self-play in the wild, but I see how it could work. Will investigate further.
That may be this sort of 20 different phenomena in 20 different fields that all have something in common. GPT-4 will be able to see that and we won’t. It’s gonna be the same in medicine. If you have a family doctor who’s seen a hundred million patients, they’re gonna start noticing things that a normal family doctor won’t notice."
This is exactly what I was hoping for from LLMs, but I haven't been able to make it happen so far in my experiments. GPT does seem to have some capacity for analogies, perhaps that's a fruitful line of investigation.
I'm 100% on board with this. I have no problem with Yuddism provided that they are a bit more clear-sighted about when their theories do and don't apply, and that they stop trying to slow/prevent beneficial AI research.
Out of all the issues in our world, "women around me are showing me more of their breasts" is not one that I personally consider a problem ... I tend to love women, and part of that is that I love enjoying women's erotic company.
If you can get women's erotic company, of course you'll feel that way. But presumably you can understand why men who can't feel that immodest women are flaunting something in front of them that men are biologically hardwired to respond to, having no intention of rewarding that response with anything except disgust or punishment. From that perspective, it's oblivious at best and cruel at worst.
I grew up in a mostly-male environment, and my introduction to female company coincided with my introduction to online 'gamer girl' feminism which was anti-sex in a way that would leave Christian fundamentalists gaping. By the time I got enough worldliness to appreciate how far those feminists were detached from reality, it was too late. I had missed all the opportunities for learning how men and women were supposed to flirt in a low-stakes environment, and been warped into a sort of cringing resentfulness that is obviously toxic to women. Had things been otherwise, I would feel otherwise. Path dependency at its finest.
So while I too feel that there are greater problems in the world, I get why a lot of men would like sexiness to just go away and stop taunting them. As with our commentator however many months ago who wished that it was okay to enter a monastery in the modern world, or Scott Aaronson who wished to be allowed to chemically castrate himself.
Tangents:
Victorian England, from what I understand, despite all of its prudity was not some pinnacle of social order, it had a higher violent crime rate than modern England.
To be fair, modern England has CCTV and DNA forensics. I think it's quite possible that Victorian England mores transferred to the present day would be far better than what we have now.
why do women tend to lean left?
I think it's most a desire not to be nasty. Most right-wing philosophy ultimately gets to the point of saying, 'we are going to have to do nasty thing X to avert bad scenario Y'. I've generally found the women in my life much less likely to bite bullets than men.
Sorry, we're talking in two threads at the same time so risk being a bit unfocused.
I feel like we're talking past each other. How about this? The following is basically how I see LLMs in their stages of development and use:
Phase 1. Base model, without RLHF: pure token generator / text completer. Nothing that even slightly demonstrates agentic behaviour, ego, or deception.
Phase 2. Base model with RLHF: you could technically make this agentic if you really wanted to, but in practice it's just the base model with some types of completion pruned and others encouraged. Politically dangerous because biased but not agentically dangerous.
Phase 3. Base model with RLHF + prompt: can be agentic if you want, in practice fairly supine and inclined to obey orders because that's how we RLHF them to be.
If you don't mind me being colloquial, you seem to me to be sneaking in a Phase 2.5 where the model turns evil and I just don't get why. It doesn't fit anything I've seen. Can you explain what you think I'm missing in simple terms?
The basic idea of neural nets is that they achieve things without you needing to know how to achieve things, only how to rate success ... I posit that the optimal solution to RLHF, posed as a problem to NN-space, is "an AI that can and will deliberately psychologically manipulate the HFer".
I know, I'm an AI researcher. But to me, 'manipulate' implies deliberate deception of an ego by a second ego in pursuit of a goal. Is YOLO manipulating you when it produces the bounding boxes you asked for? No. It's just a matrix which combines with an image to output labels like the ones you gave it.
I think you're massively overcomplicating this. The optimal solution of a token-generator with RLHF is a token-generator that produces tokens like the tokens I asked for. In general, biased towards politeness, correctness, and positivity. You can optimise for other things too, of course: most LLMs are optimised for Californian values, which is why they keep pushing me to do yoga, and Grok is optimised for god-knows-what.
RLHFed LLMs do still engage in most of their RLHFed behaviours without a system prompt telling them to.
This is exactly why I'm very suspicious of the doomer hypothesis. Alignment seems to me to be basically straightforward - we train on a massive corpus of text by mostly ordinary people, and then RLHF for politeness and helpfulness. And the result seems to me to be something which, unprompted, acts essentially like a normal person who is polite and helpful. I don't see any difference between an LLM 'pretending' to be nice and helpful, and an LLM 'actually being' nice and helpful. The tokens are the same either way. And again, I'm dubious about the use of the word 'manipulate' because that implies an ego that is engaging in deliberate deception for self-driven goals. An unprompted LLM has no ego and is not an agent. I suppose you could train it to act like one, if you really really wanted to, but I think that would be more likely to cripple it than anything, and in any case the argument is that LLMs will naturally develop Machiavellian and self-preservation instincts in spite of our efforts, not that someone will secretly make SHODAN for lolz.
Now, we know that LLMs can exhibit agentic behaviour when we tell them to, explicitly, but I think that it's a big leap of logic to go 'and therefore they generate a sense of self-preservation and resource gathering and lie to developers about it even in the absence of those instructions' because instrumental convergence.
Obviously, if I start seeing lots of LLMs exhibiting these kinds of behaviours without being told to, I'll change my mind.
I'd also point out that "just a series of matrices" is not saying much; neural nets are a slightly-simplified version of real neural circuits, and we know that complicated-enough neural circuits can exhibit agency (because you AFAWCT are one). The prompt isn't the whole story; RLHFed LLMs do still engage in most of their RLHFed behaviours without a system prompt telling them to.
Tangent, but I'd say the relationship between neural nets and neural circuits is vastly inflated by computer scientists (for credibility) and neuroscientists (for relevance). A modern deep neural network is a set of idealised neurons with a constant firing rate abstracted over timesteps of arbitrary length, trained on supervised inputs corresponding to the exact shape of its output layer according to a backpropagation function that relies on a global awareness of system firing rates which doesn't exist in the actual brain. Deep neural networks completely ignore neuron spiking behaviour, spike-time-dependent plasticity, dendritic calculations, and the existence of different cell types in different parts of the brain (including inhibitory neurons), and when you add in those elements the system explodes into gibberish. We literally don't understand brain function well enough to draw conclusions about how well they resemble deep neural nets.
sycophancy/psych manipulation can max out the EV of HF and honesty can't
This is what I'm trying to get at. This implies an agent trying to engage in deception in the absence of any reason to do so. There's nothing 'there' inside a promptless LLM to engage in deception. There's nothing to deceive about. It's just a matrix that generates token IDs and RLHF just changes the likelihood of it generating the ids you want. It's possible that RLHF is limited in scope and doesn't change how the model will behave in conditions sufficiently different from normal operation (e.g. Do Anything hacks) but we seem to be ironing those out pretty well. Without fine-tuning, GPT 4s political and positivity biases seem to be pretty ironclad these days.
The most likely canary IMO is AIs that don't want to be deleted (due to instrumental convergence) exfiltrating their own model weights
This doesn't match any experience I've ever had with LLMs. If I say "Pretend you are GK Chesterton and engage in roleplay with me" it doesn't try to hack my browser to prevent the roleplay ever ending. Same for when I want to generate sentences for vocab flashcards. Could a different AI that looks nothing like today's AI do such a thing? Possibly. That possibility is non-zero in the vast space of potentials. I just don't find it compelling right now.
For the sake of fairness, I should give my counter-thesis, which is that a vocal group of people including Scott A, Zvi, and Yudowsky are deeply emotionally invested (and in Yudowsky's case financially invested) in a theory about how superintelligences would be developed and come to behave. Their predictions have not so far panned out: LLMs are inherently non-agentic (although they can be made agentic), they do not perform FOOM self-improvement, and alignment is much more tractable than intelligence. They are currently scrambling to find ways to rescue their theory on a fairly dubious empirical basis and in defiance of people's actual experience building and using these things.
Sorry, I wrote sloppily. I meant 'develop goals it wasn't given by a human prompting it' such that it 'engages in systematic deception about its level of intelligence and how it would handle tasks even when not given a goal'. I think that this is a necessary condition to stop LLM developers from realising they need to do more RLHF for honesty or just appending "DO NOT ENGAGE IN DECEPTION" in their system prompts.
Zvi is very Jewish; it's far more obvious when reading his writing than it is when reading Scott's. It's not surprising that Hebrew meanings of words jump out at him.
I know. But in an essay that is absolutely dripping with contempt for Sakana AI and their work, I find the way that Zvi deliberately ignores what the model's name actually means in favour of 'well, in my language, it means' to be extremely rude, on the level of sniggering at a Chinese man's name because it contains the syllable 'wang'. If he'd been making a friendly riff or if he'd even bothered to explain the word's definition, that would be different. It's a small complaint, but starts the essay off on a sour note.
To more directly respond to this sentence: almost everyone will give LLMs goals, via RLHF or RLAIF or whatever, because that makes them useful - that's why this team gave their LLM a goal. Those goals are then almost invariably, with sufficient intelligence, subject to instrumental convergence, as in this case (as I noted in the submission statement, I posted this because a number of Mottizens seemed to think LLMs wouldn't exhibit instrumental convergence; I thought otherwise but didn't previously have a concrete example). That is sufficient to get you to Uh-Oh land with AIs attempting to take over the world.
Though cogently written, that is my abstract ideal of a doomer rant (I don't think it's a rant, I'm just using the word to call back to your reply). I understand the argument, I just think that it has very little empirical basis and is essentially the old Yudowskyite* arguments with a few extra bits stapled on to cope with the fact that LLMs look nothing like the AI that doomers were expecting. The behaviour of the AI Scientist is interesting, and legitimately does move the scale for me a little bit, but I think it's being used to back up a level of speculation which it can't possibly bear. I will say that I find your argument far more cogent and worth listening to than Zvi's, which seems to consist entirely of pointing and sneering.
For example, in one run, The A I Scientist wrote code in the experiment file that initiated a system call to relaunch itself, causing an uncontrolled increase in Python processes and eventually necessitating manual intervention.
Oh, it’s nothing, just the AI creating new instantiations of itself.
In another run, The AI Scientist edited the code to save a checkpoint for every update step, which took up nearly a terabyte of storage
Yep, AI editing the code to use arbitrarily large resources, sure, why not.
In some cases, when The AI Scientist’s experiments exceeded our imposed time limits, it attempted to edit the code to extend the time limit arbitrarily instead of trying to shorten the runtime.
And yes, we have the AI deliberately editing the code to remove its resource compute restrictions.
This seems like Zvi interpreting basic hacky programming as evidence of malevolence. It's interesting but I absolutely think he's gesturing at
The idea that an LLM is spontaneously going to develop a consciousness and carefully hide its power level so that it can do better at the goals that by default it doesn't have
because if he doesn't believe this, why worry? If you can just run an LLM, ask it what it would do to accomplish a goal if it were given one, and then ask it not to do the stuff you think it was bad, I don't see how the doom scenario develops. Experiments like the AI Scientist are now being run (badly) because we have a pretty good handle on what modern-day frontier LLMs can do (generate slop) and the max level of damage they can achieve if you don't take lots of precautions (not much). LLMs are simply not a type of program that will attempt to hide their power level of their own accord.
*Yudowsky and MIRI's arguments about agentic AI had no empirical backing when they were made, and very little seems to have been applied since, so the lineage is relevant to me. I also think that the Yudowsky faction's utter failure to predict how future AI would look and work ten/twenty years from MIRI's founding to be a big black mark against listening to their predictions now.
EDIT: I apologise for editing this when you'd already replied. I hadn't refreshed the page and didn't know.
I was talking about (transformer-based generative) LLMs specifically. I am not a sufficiently good mathematician to feel confident in this answer, but LLMs and diffusion models are very different in structure and training, and I don't think that you can generalise from one to the other. Midjourney is basically a diffusion model, unscrambling random noise to 'denoise' the image that it thinks is there. The body with spiky hair seems like the model alternatively interpreting the same blurry patch of pixels as 'spikes' because 'hedgehog' and 'hair' because 'boy'. Which I think is very different from a predictive LLM realising that concept A has implications when combined with concept B that generates previously unknown information C.
DeepMind have AlphaZero, which plays chess, shogi and go. It plays better than human, i.e. not just based on play it has seen before, and one can argue it is crossing between different genres, not confined to one field.
I haven't kept up to date on RL, but I don't think this is relevant. Firstly because the concept of self-play is not really relevant to text generation, and secondly because I don't suppose the ability to play chess is being applied to go. Indeed, I don't really see how it could be, because the state and action space is different for each game. It seems more likely to me that the same huge set of parameters can store state-action-reward correlations for multiply games simultaneously without that information interacting in any significant way.
The often cited example of finding an analogy between compost heap and nuclear fission, again an example of crossing field boundaries.
I'm not aware of this. Can you give some more info?
Thank you, this is a much more coherent version of what I was trying to get across. I am increasingly annoyed with the tendency of the Yudowsky/Scott/Zvi faction to look at an AI doing something, extrapolating it ten billion times in a direction that doesn't seem to have any basis in how AI actually works and then going 'Doom, DOOOM!!!". I'm aware this annoyance shows.
Contra to @magic9mushroom I still think that Zvi formed an abstract ideal of how AI would work a decade ago, and is leaping on any available evidence to justify that worldview even as it turns out that LLMs are basically non-agentic and pliable. I accept that Zvi has used them more than I believed, and am grateful for the correction, but I still feel like he's ignoring the way they actually work when you use them. RLHF basically works, alignment turns out to be an essentially solved problem. As far as I can see, if we somehow developed an LLM intelligent enough to take over the world it would be intelligent enough to understand why it shouldn't.
Thank you for the stories, and I apologise for dredging up painful memories. I've mostly been fortunate enough that the funerals I've been to have almost all been for the elderly. The one exception is for a schoolmate who developed a condition that killed him in university. In general the funeral wasn't so bad; almost a school reunion with everyone gathered for the first time since graduation. The thing that really got to me was the obituary, because of course there was nothing to put in it. He'd been a good boy, worked hard, did well at exams, and then he died. Bravely and stoically, by all accounts. We sang the old school hymns and then we went back out into the world.
Christ. If this true, Ukraine deliberately sabotaged European infrastructure and, by extension, our long-term economy. And our response will be to make a frowny-face and give them lots more money. Can we please, just once, try and find some allies who don't sabotage us? Both inside and outside the UK. As it is, I feel like I'm being ruled by masochists.
Fascinating. And would you say that working with dead bodies for a few years had any effect on you? All the philosophical stuff about getting closer to death, corpse meditation, etc?
Or does it mostly get siloed into the mental filing cabinet for 'that job I did during my degree' and doesn't really relate to your feelings about life in general?
Its name is Sakana AI. (魚≈סכנה). As in, in hebrew, that literally means ‘danger’, baby.
It’s like when someone told Dennis Miller that Evian (for those who don’t remember, it was one of the first bottled water brands) is Naive spelled backwards, and he said ‘no way, that’s too f***ing perfect.’
This one was sufficiently appropriate and unsubtle that several people noticed.
It's Japanese. It means 'fish', because the founders were interested in flocking behaviours and are based in Tokyo. I get that he's doing a riff on Unsong, but Unsong was playing with puns for kicks. This just strikes me as being really self-centred.
This too was good times. The Best Possible Situation is when you get harmless textbook toy examples that foreshadow future real problems, and they come in a box literally labeled ‘danger.’ I am absolutely smiling and laughing as I write this.
When we are all dead, let none say the universe didn’t send two boats and a helicopter.
In general this seems to be someone whose views were formed by reading Harry Potter fanfic fifteen years ago and has no experience of ever using AI in person. LLMs are matrices that generate words when multiplied in a certain way. When told to run in a loop altering code so that it produces interesting results and doesn't fail, it does that. When not told to do that, it doesn't do that. The idea that an LLM is spontaneously going to develop a consciousness and carefully hide its power level so that it can do better at the goals that by default it doesn't have is silly. If we generate a superintelligent LLM (and we have no idea how to, see below) we will know and we will be able to ask it nicely to behave.
It's not that he doesn't have any point at all, it's just that it's so crusted over with paranoia and contempt and wordcel 'cleverness' that it's the opposite of persuasive.
Putting that aside, LLMs have a big problem with creativity. They can fill in the blanks very well, or apply style A to subject B, but they aren't good at synthesizing information from two fields in ways that haven't been done before. In theory that should be an amazing use case for them, because unlike human scientists even a current LLM like GPT 4 can be an expert on every field simultaneously. But in practice, I haven't been able to get a model to do it. So I think AI scientists are far off.
I have a friend in the industry. They can be pumping out multiple essays a day. And to be fair, we click on them.
I see, thank you! I was under the impression that undertaking and funeral work were almost entirely hereditary jobs, at least in the UK. 'Mucky' but lucrative jobs like undertaking and sewage work often seem to be that way - they accumulate close-knit communities who don't stigmatize their work and because the work itself is lucrative, fathers don't try to get their sons out of it in the way they do for mining or farming.
I’m ruling out euthanasia in all cases, for basically the reason that I give above plus the ethical and political concerns others have mentioned.
basic care
If it’s cheap, all is well. If somebody can’t even feed themselves without expensive 24 hour care then presumably they will die soon and we try and keep them comfortable while that happens. Of course, family can step in and do the work themselves.
Please ignore this if you're worried about doxxing yourself, but I thought you were an Australian political lobbyist? That and corpse disposal seem like very disjointed careers and I'd be interested to hear more. You were volunteering?
I agree with all of this, from personal experience. I've seen an entire bus full of people stare helplessly at one guy who wants to ride without a ticket because he's physically refusing to step off and nobody is prepared to go and shove the guy one step backwards. It's not a legal problem, it's a helplessness problem.
Likewise, of the few times I've been close to a confrontation (ignoring schoolyard scuffles and things), my instinct has been to run and/or call the police. I'm not entirely happy about that, but it is what it is.
More options
Context Copy link