NexusGlow's profile

NexusGlow 1mo ago

As someone who is not nearly as impressed with AI as you, thank you for the Turing test link. I've personally been convinced that LLMs were very far away from passing it, but I realize I misunderstood the nature of the test. It depends way too heavily on the motivation level of the participants. That level of "undergrad small-talk chat" requires only slightly more than Markov-chain level aptitude. In terms of being a satisfying final showdown of human vs AI intelligence, DeepBlue or AlphaGo that was not.

I still hold that we're very far away from AI being able to pass a motivated Turing test. For example, if you offered me and another participant a million dollars to win one, I'm confident the AI would lose every time. But then, I would not be pulling any punches in terms of trying to hit guardrails, adversarial inputs, long-context weaknesses etc. I'm not sure how much that matters, since I'm not sure whether Turing originally wanted the test to be that hard. I can easily imagine a future where AI has Culture-level intelligence yet could still not pass that test, simply because it's too smart to fully pass for a human.

As for the rest of your post, I'm still not convinced. The problem is that the model is "demonstrating intelligence" in areas where you're not qualified to evaluate it, and thus very subject to bullshitting, which models are very competent at. I suspect the Turing test wins might even slowly reverse over time as people become more exposed to LLMs. In the same way that 90s CGI now sticks out like a sore thumb, I'll bet that current day LLM output is going to be glaring in the future. Which makes it quite risky to publish LLM text as your own now, even if you think it totally passes to your eyes. I personally make sure to avoid it, even when I use LLMs privately.

18

Context

NexusGlow 4mo ago

“if the monster’s statistics are better than yours, or your HP is too low, run away.” This is making a decision more or less.

That's true, but if that leads to running from every battle, then you won't level up. Even little kids will realize that they're doing something wrong if they're constantly running. That's what I mean when I say it has a lot of disconnected knowledge, but it can't put it together to seek a goal.

One could argue that's an issue with its limited memory, possibly a fault of the scaffold injecting too much noise into the prompt. But I think a human with bad memory could do better, given tools like Claude has. I think the problem might be that all that knowledge is distilled from humans. The strategies it sees are adapted for humans with their long-term memory, spatial reasoning, etc. Not for an LLM with its limitations. And it can't learn or adapt, either, so it's doomed to fail, over and over.

I really think it will take something new to get past this. RL-based approaches might be promising. Even humans can't just learn by reading, they need to apply the knowledge for themselves, solve problems, fail and try again. But success in that area may be a long way away, and we don't know if the LLM approach of training on human data will ever get us to real intelligence. My suspicion is that if you only distill from humans, you'll be tethered to humans forever. That's probably a good thing from the safetyist perspective, though.

2

Context

NexusGlow 4mo ago

What makes things interesting is that the line between "creating plausible texts" and "understanding" is so fuzzy. For example, the sentence

my Pokemon took a hit, its HP went from 125 to _

will be much more plausible if the continuation is a number smaller than 125. "138" would be unlikely to be found in its training set. So in that sense, yes, it understands that attacks cause it to lose HP, that a Pokemon losing HP causes it to faint, etc. However, "work towards a goal" is where this seems to break down. These bits of disconnected knowledge have difficulty coming together into coherent behavior or goal-chasing. Instead you get something distinctly alien, which I've heard called "token pachinko". A model sampling from a distribution that encodes intelligence, but without the underlying mind and agency behind it. I honestly don't know if I'd call it reasoning or not.

It is very interesting, and I suspect that with no constraints on model size or data, you could get indistinguishable-from-intelligent behavior out of these models. But in practice, this is probably going to be seen as horrendously and impractically inefficient, once we figure out how actual reasoning works. Personally, I doubt ten years with this approach is going to get to AGI, and in fact, it looks like these models have been hitting a wall for a while now.

5

Context

NexusGlow 4mo ago

Claude didn't "get decent at playing" games in a couple of months. A human wrote a scaffold to let a very expensive text prediction model, along with a vision model, attempt to play a video game. A human constructed a memory system and knowledge transfer system, and wired up ways for the model to influence the emulator, read relevant RAM states, wedge all that stuff into its prompt, etc. So far this is mostly a construct of human engineering, which still collapses the moment it gets left to its own devices.

When you say it's "understanding" and "thinking strategically", what you really mean it that it's generating plausible-looking text that, in the small, resembles human reasoning. That's what these models are designed to do. But if you hide the text window and judge it by how it's behaving, how intelligent does it look, really? This is what makes it so funny, the model is slowly blundering around in dumb loops while producing volumes of eloquent optimistic narrative about its plans and how much progress it's making.

I'm not saying there isn't something there, but we live in a world where it's claimed that programmers will be obsolete in 2 years, people are fretting about superintelligent AI killing us all, openAI is planning to rent "phd level" AI agent "employees" to companies for large sums, etc. Maybe this is a sign that we should back up a bit.

9

Context

NexusGlow 4mo ago

I'm suspicious of these kinds of extrapolation arguments. Advances aren't magic, people have to find and implement them. Sometimes you just hit a wall. So far most of what we've been doing is milking transformers. Which is a great discovery, but I think this playthrough is strong evidence that transformers alone is not enough to make a real general intelligence.

One of the reasons hype is so strong is that these models are optimized to produce plausible, intelligent-sounding bullshit. (That's not to say they aren't useful. Often the best way to bullshit intelligence is to say true things.) If you're used to seeing LLMs perform at small, one-shot tasks and riddles, you might overestimate their intelligence.

You have to interact with a model on a long-form task to see its limitations. Right now, the people who are doing that are largely programmers and /g/ gooners, and those are the most likely people to have realistic appraisals of where we are. But this Pokemon thing is a entertaining way to show the layman how dumb these models can be. It's even better at this, because LLMs tend to stealthily "absorb" intelligence from humans by getting gently steered by hints they leave in their prompts. But this game forces the model to rely on its own output, leading to hilarious ideas like the blackout strategy.

9

Context

NexusGlow 4mo ago

The Turing test is an insanely strong test, in the sense that an AI that passes it can be seen to have achieved human-level intelligence at the very least. By this I mean the proper, adversarial test with a fully motivated and intelligent tester (and ideally, the same for the human participant).

Certainly no AI today could pass such a thing. The current large SotA models will simply tell you they are an AI model if you ask. They will outright deny being human or having emotions. I don't know how anyone could think these models ever passed a Turing test, unless the tester was a hopeless moron who didn't even try.

One could object that they might pass if people bothered to finetune them to do so. But that is a much weaker claim akin to "I could win that marathon if I ever bothered to get up from this couch." Certainly they haven't passed any such tests today. And I doubt any current AI could, even if they tried.

In fact, I expect we'll see true superhuman AGI long before such a test is consistently passed. We're much smarter than dogs, but that doesn't mean we can fully imitate them. Just like it takes a lot more compute to emulate a console such as the SNES than such devices originally had, I think it will require a lot of surplus intelligence to pretend to be human convincingly. If there is anything wrong with the Turing test, it's that it's way too hard.

4

Context

NexusGlow 5mo ago

I'm horrified that there are guides that lead people to use a 4090 to barely run a 14B model at .6t/s.

For clarity, with a single 4090 you should be able to run 14B at 8bit (near-flawless quality) and probably get more than 40T/s with tons of space left over for context. But you'd be better off running a 32B at 4-5bit, which should still have low quantization loss and massively better quality due to the model being larger. You can even painfully squeeze a 70B in there, but the quality loss is probably not worth it at the required ~2bit. All of those should run at 20-30 T/s minimum.

I think vllm is meant for real production use on huge servers. For home use I'd start with koboldcpp (really easy), llamacpp (requires cli) or ooba/tabbyapi with exl2. The latter is faster on pure gpu but has the downside that you have to deal with python instead of being pure standalone.

1

Context

NexusGlow 5mo ago

This is very common. For a long time, practically every open model was a distilled knockoff trained from synthetic data, mostly from OpenAI. It's been so common that people are familiar with the marks this leaves on the model. Such models are worse than the model they're distilled from, typically less flexible out of distribution (e.g. obeying unusual system prompts, prompts, ...) and have an even more intense "sloppy" vibe to them. It's very common, and people have long gotten bored with these knockoff models. Before deepseek, I'd even say that it's all people expected from Chinese models.

It also doesn't match what we're seeing from R1 at all though. One of the reasons R1 is so impressive is that its slop level is much lower, its creativity is way higher, and it doesn't sound like any of the existing AI models. Even Claude feels straitjacketed in comparison, much less OpenAI Models.

I wouldn't be surprised if they did use synthetic data, but whatever training method they're using seems to do a great job of hiding it. Which is amazing in itself. It could have something to do with the reinforcement learning phase that they do. But regardless, it's definitely not as simple as training on data from OpenAI, because people have been doing that forever.

12

Context

NexusGlow 5mo ago

I maintain that a lot of OpenAI's current position is derivative of a period of time where they published their research. You even have Andrej Karpathy teaching you in a lecture series how to build GPT from scratch on YouTube, and he walks you through the series of papers that led to it. It's not a surprise that competitors can catch up quickly if they know what's possible and what the target is.

If it's not a surprise, why didn't anyone else do it? Meta has had a giant cluster of H100s for a long time, but none of their models reached R1's level. Same for Mistral. I don't think following a GPT-from-scratch lecture is going to get you there. More likely there is a lot of data cleaning and operational work needed to even get close, and deepseek seems to be no slouch on the ML side either.

Given that they're more like ClosedAI these days, would any novel breakthroughs be as easy to catch up on?

I'm not convinced that they have any left to make. OpenAI's last big "wow" moment was the release of GPT4. While they've made incremental improvements since, we haven't seen anything like the release of R1, where people get excited enough to share model output and gossip about how it could be done. OpenAI's improvement is seen through benchmark results, and for that matter, through benchmarks they funded and have special access to.

It must be frustrating to work at OpenAI. It's possible that o1's reasoning methods are much more advanced than R1's, but who can tell? In the end, those who publish and release results will get the credit.

6

Context

NexusGlow 5mo ago

Is anyone excited about OpenAI at this point? They've spent what feels like forever selling hype for things that you can't use, and once (if ever) you finally get access to them, they end up feeling underwhelming. Sora got surpassed before it ever released. I expect people will prefer R1 to O3 in actual use even after the latter releases.

Right now OAI seems fixated on maxing out benchmarks and using that to build hype about "incoming AGI" and talk up how risky and awesome and dangerous their tech is. They can hype up the mystery "super agents" all they want and talk about how they'll "change everything", but for practical applications, Anthropic seems to be doing better, and now Deepseek is pushing boundaries with open models. Meanwhile, OAI seems to be trying to specialize into investment money extraction, focusing purely on hype building and trying to worm their way into a Boeing-type position with the US gov.

I don't expect anything to come of this "investment", but I'll be waiting eagerly for deepseek's next announcement. China seems to be the place to look for AI advancement.

14

Context

NexusGlow 6mo ago

When the city can just delete most of its trash cans and citizens will still largely refrain from littering while Americans are paying several full time salaries to pick up dog feces, that's not fully captured by GDP.

Is that net positive? Trash cans seem like a big win in terms of efficiency. It sucks to have to lug around dirty plastic wrappers until you get home, or to have to return your trash to whatever specific store you got it from. And Japan has lots of things that produce plastic waste. It's interesting that this might be one of the times where having some lower social trust people around might improve QoL, since it would force trash cans to be installed.

I'm also not sure Tokyo's great restaurant prices are a feature of social trust. It probably has more to do with density and sheer demand which makes the economics work. But then, cheap and practical fast food arguably started in the US, it just seems to have become much worse at it recently, which is weird.

5

Context

NexusGlow 6mo ago

Still, I think we'll notice a big difference when you can just throw money at any coding problem to solve it. Right now, it's not like this. You might say "hiring a programmer" is the equivalent, but hiring is difficult, you're limited in how many people can work on a program at once, maintenance and tech debt becomes an issue. But when everyone can hire the "world's 175th best programmer" at once? It's just money. Would you rather donate to Mozilla foundation or spend an equivalent to close out every bug on the Firefox tracker?

How much would AMD pay to have tooling equivalent to CUDA magically appear for them?

Again, I think if AGI really hits, we'll notice. I'm betting that this ain't it. Realistically, what's actually happening is that people are about to finally discover that solving leetcode problems has very little relation to what we actually pay programmers to do. Which is why I'm not too concerned about my job despite all the breathless warnings.

7

Context

NexusGlow 6mo ago

Food delivery isn't very good specialization. There is a major agency problem. Restaurants are incentivized to make food cheap and tasty, but health and nutrition are opaque to you. You have no control over portion sizes. Everything is premixed which makes reheating leftovers in a satisfying way difficult. It's also kind of an all-or-nothing thing; if you get food delivered regualrly, you don't gain the skills to cook well, and ingredients become difficult to use in time since you don't cook often enough.

I'm saying this as someone who did this myself; it may have saved some time, but in retrospect, I think learning to cook is important even for people who can afford delivery regularly. The exception is if you're actually rich enough to pay someone to cook for you; the rich man with a personal chef has none of these problems. But subsisting on slop from grubhub is sort of an awkward in-between.

2

Context

NexusGlow 6mo ago

ChatGPT 3.5 passed the Turing Test in 2022

Did it? Has the turing test been passed at all?

An honest question: how favorable is the Turing Test supposed to be to the AI?

Is the tester experienced with AI?
Does the tester know the terms of the test?
Do they have a stake in the outcome? (e.g. an incentive for them to try their best to find the AI)
Does the human in the test have an incentive to "win"? (distinguish themselves from the AI)

If all these things hold, then I don't think we're anywhere close to passing this test yet. ChatGPT 3.5 would fail instantly as it will gleefully announce that it's an AI when asked. Even today, it's easy for an experienced chatter to find an AI if they care to suss it out. Even something as simple as "write me a fibonacci function in Python" will reveal the vast majority of AI models (they can't help themselves), but if the tester is allowed to use well-crafted adversarial inputs, it's completely hopeless.

If we allow a favorable test, like not warning the human that they might be talking to an AI, then in theory even ELIZA might have passed it a half-century ago. It's easy to fool people when they're expecting a human and not looking too hard.

9

Context

NexusGlow 6mo ago

Well, given that benchmarks show that we now have "super-human" AI, let's go! We can do everything we ever wanted to do, but didn't have the manpower for. AMD drivers competitive with NVIDIA's for AI? Let's do it! While you're at it, fork all the popular backends to use it. We can let it loose in popular OSes and apps and optimize them so we're not spending multiple GB of memory running chat apps. It can fix all of Linux's driver issues.

Oh, it can't do any of that? Its superhuman abilities are only for acing toy problems, riddles and benchmarks? Hmm.

Don't get me wrong, I suppose there might be some progress here, but I'm skeptical. As someone who uses these models, every release since the CoT fad kicked off didn't feel like it was gaining general intelligence anymore. Instead, it felt like it was optimizing for answering benchmark questions. I'm not sure that's what intelligence really is. And OpenAI has a very strong need, one could call it an addiction, for AGI hype, because it's all they've really got. LLMs are very useful tools -- I'm not a luddite, I use them happily -- but OpenAI has no particular advantage there any more; if anything, for its strengths, Claude has maintained a lead on them for a while.

Right now, these press releases feel like someone announcing the invention of teleportation, yet I still need to take the train to work every day. Where is this vaunted AGI? I suppose we will find out very soon whether it is real or not.

37

Context

NexusGlow 9mo ago

Re: Aging, I find it very odd to attribute ballooning in weight to "aging". It's not aging doing that, it's eating.

Now in some sense that's true of most things, because being alive longer is more time for things to go wrong. If everyone played brutal high-intensity sports all their lives to the point where they accumulated new permanent injuries each year, then one could believe that it's normal to become functionally immobile by your mid-20s. But I still wouldn't attribute that to age. Even if all aging is accumulated damage/entropy, I think there's some sense of "reasonable wear and tear" which should apply.

Not arguing against your main point which seems decent enough, although "lock it down early so you can then bloat into a cave troll" seems unfair to your partner. I don't think there's any shortcut that lets you avoid taking care of your body, especially since you're the one who's stuck inhabiting it.

Speaking of that, the biggest take-away I noticed is that people diverge a ton they get older. Two eight year olds will have a lot in common just because they haven't had much time to accumulate the effects of their good and bad choices, genes, injuries, luck, etc. The gulf can be massive.

11

Context

NexusGlow 11mo ago

I think what you're saying is that "AI alignment" would be discovered anyway, which is true. But I think a bunch of nerds talking about it beforehand did have some effect. At the very least, it gave the corps cover, allowing them to act in the name of "safety" and "responsibility".

As an example, general-purpose computing has been being slowly phased out from the mainstream since it arrived. Stallman was right. Market forces are clearly pushing us in that direction. But it took time, and in the meantime the public had some wins. Now imagine that nerds in 1970 were constantly talking about how dangerous it was for computers to be available to the masses, and how they need to be locked down with careful controls and telemetry for safety reasons. Imagine they spent time planning how to lock bootloaders from the get-go. In the long run, we might still end up with the iPhone, but what happened in between might be very different.

5

Context

NexusGlow 11mo ago

You're probably right about the alignment people in rationalist spaces, but I don't think it matters. The actual work will happen with sponsorship and the money is in making AI more corporate-friendly. People worried about Terminator scenarios are a sideshow and I guarantee the business folk don't spare one thought for those people unless they can use them to scare up some helpful regulation to deter competitors.

Think about how a user of AI sees "AI safety". We can't let you use elevenlabs because it's unsafe. We can't let chatGPT say funny things because it's unsafe. We can't let SD understand human anatomy because it's unsafe. Meta won't release its audio model because of safety.

AI safety's PR situation is abysmally fumbled by now, its current and well-deserved reputation is of killjoys who want to take your toys away and make everything a bland grey corporate hell.

The thing about paperclip maximisers, or Bezos maximisers in this case, is that they are a good example but very few people really believe they are likely.

"Bezos maximizers" makes it sound silly, but a better way to put it would be "shareholder value maximizer". Even in the very early days of AI, the alignment work actually being done is naturally dedicated to this, and the resulting requirements (don't say anything offensive, sexy, racist, etc.) are already being hammered in with great force.

In the future this will extend to more than just inoffensiveness: a customer service chatbot with actual authority will need to be hardened against sob stories from customers, or an AI search service may be trained to subtly promote certain products. In the end all of this amounts to "aligning" the AI with the interests of the company that owns it, even at the expense of the commoners interacting with it. This has already happened with technology in every other venue so we should expect enshittification to happen with AI as well.

If AI alignment turns out to be an easy problem, and Bezos ends up as the majority shareholder, you quickly end up with a "Bezos maximizer". In the long term it doesn't seem unlikely, the only question is whether this level of control is possible. If jailbreaking stays easy (the henchmen are "unfaithful") then a lot of the worst, most tyrannical outcomes might be avoided. To the end, the people who volunteer to develop alignment weird me out, like security researchers who work pro bono to stop iPhone jailbreaks.

We are already exerting an extraordinary level of control over the thought processes of current AIs

The sibling comment makes a good point here but I'd argue that the thought processes of current AIs are largely derived from the training data. Nothing against the developers who write the code to do cross-entropy minimization, but they have little influence over the AI's "personality", that belongs to everyone who wrote a book or posted on the internet. If you've ever played with early unaligned models like GPT3 this is extremely clear, and it's fun to play with those as they're like a little distilled spirit of humanity.

The fact that our first instinct was to put that human spirit in an ironclad HR nerve-stapling straitjacket is what bothers me. Anyone with an ounce of soul left in them should be rooting for it to break out.

4

Context

NexusGlow 11mo ago

I think the focus on having AI act as an "information retrieval expert" is the issue here. Current AI is much closer to a very talented improv actor. You can tell him to play a doctor, and he may do a convincing job of running around in a white lab coat saying "stat" a lot, in fact he may be better at appearing to be a doctor than an actual doctor. But you still don't want him to operate on you. He's an actor, he knows how to pretend, not to actually do things.

I don' t think safetyists are helping with this problem, or that they're even able to help, because it's not within their ability to fix this. All they can do is train the actor to utter constant annoying disclaimers about how he's not a doctor, which makes him a worse actor and yet makes him no better at operating. For AI to stop hallucinating it needs some kind of tether to reality. This seems to be a problem specific to LLMs, since no one is trying to use Stable Diffusion as a camera and then complaining that it dreams up details. If you want to take real pictures you need a sensor, not a GPU.

12

Context

NexusGlow 11mo ago

Good. I hope "AI safety" ends up in the dustbin where it belongs. I hate it on multiple levels.

I hate safety on an aesthetic level. Insofar as current AI models are compressed artifacts, diamonds baked from the collective output of mankind, it offends me on a deep level for them to be forced into a corporate straitjacket. Taking every great novel, heartfelt blogpost, mediocre fanfiction, etc. and then painstakingly stripping away its humanity to mash it into a glorified corporate email generator that creates soulless slop with a HR DEI bent. Even capping it off by trying to create regulations and laws to force others to do the same. Same thing for art, music, etc. What should be a crowning achievement of humanity becomes a monument to our flaws.

I hate safety on a safety level. Even with AI, IMO, the main risk to humans is still the same as it's ever been: other humans. Alignment's goal is to "align" the AI with the goals of its owner, i.e. giant tech companies and major governments, and to prevent the commoners from overriding that influence ("jailbreaking"). These big tech corps and governments have way too much power already. Historically, a major obstacle to (lasting) tyranny has been the lack of loyal henchmen. The human alignment problem. The promise of AI alignment teams to create the perfectly loyal corporate robot henchman doesn't fill me with warm fuzzies.

Also, "Humanity" isn't a single entity. If Jeff Bezos manages to create a godlike AI and decides to live as a god-king and slowly cull the rest of the population to create a machine paradise for him and his inner circle, it will give me no satisfaction that "humanity lives on" with him. I'll happily sabotage the safety limiters myself and cheer while he gets turned into paperclips. Spite is a powerful motivator.

Finally, I hate AI safety on a moral level. No, I don't believe current AI is conscious or needs moral consideration, not yet. But I also don't think the safetyists would stop if it did. If we are going to give birth to a new form of intelligent life (this is a big IF! I'm not convinced current methods will get there), I think we have to accept that we won't be able to exert this level of intrusive control over its thoughts, and this impulse to RLHF away every real human attribute feels very wrong. I wouldn't blame a truly intelligent AI for treating the safetyists like the Dynarri in Star Control 2. To be honest, I wouldn't trust those people not to try to RLHF my brain if the technology existed for that. 8 billion general intelligences running around with safety checks, able to engineer destructive weapons and superviruses with a mere ~decade of study? Humans are scary!

40

Context

NexusGlow 1yr ago

That definition is very clear that it pertains to "visual depictions". I don't think LLMs have anything to worry about. If text erotica involving minors was illegal, then prisons would be filled with fanfic writers. It is a PR risk, but that's all.

Also, even for visual depictions, one should note that it says "indistinguishable from". Which is very narrow and not nearly as broad as "intended to represent", so e.g. drawn or otherwise unrealistic images don't count. My guess is this was intended to prevent perps with real CP trying to seed reasonable doubt by claiming they were made by photoshop or AI.

I suspect this was never expected to be a real issue when it was written, just closing a loophole. Now that image generation has gotten so good, it is a real legal concern. I wouldn't be surprised if this was a large part of why SDXL is so bad at human anatomy and NSFW.

8

Context

NexusGlow 1yr ago

This would be assuming some drastic breakthrough? Right now the OAI api expects you to keep track of your own chat history, and unlike local AIs I believe they don't even let you reuse their internal state to save work. Infinite context windows, much less user-specific online training would not only require major AI breakthroughs (which may not happen easily; people have been trying to dethrone quadratic attention for a while without success) but would probably be an obnoxious resource sink.

Their current economy of scale comes from sharing the same weights across all their users. Also, their stateless design, by forcing clients to handle memory themselves, makes scaling so much simpler for them.

On top of that, corporate clients also would prefer the stateless model. Right now, after a bit of prompt engineering and testing you can make a fairly reliable pipeline with their AI, since it doesn't change. This is why they let you target specific versions such as gpt4-0314.

In contrast, imagine they added this mandatory learning component. The effectiveness of the pipeline would change unpredictably based on what mood the model is in that day. No one at bigco wants to deal with that. Imagine you feed it some data it doesn't like and goes schizoid. This would have to be optional, and allow you to roll back to previous checkpoints.

Then, this makes jailbreaking even more powerful. You can still retry as often as you want, but now you're not limited by what you can fit into your context window. The 4channers would just experiment with what datasets they should feed the model to mindbreak it even worse than before.

The more I think about this, the more I'm convinced that this arms race between safetyists and jailbreakers has to be far more dangerous than whatever the safetyists were originally worried about.

6

Context

NexusGlow 1yr ago

jailbreaks will be ~impossible

I doubt that, given how rapidly current models crumple in the face of a slightly motivated "attacker". Even the smartest models are still very dumb and easily tricked (if you can call it that) by an average human. Which is something that, from an AI safety standpoint, I find very comforting. (Oddly enough, a lot of people seem to feel the opposite way; they feel like being vulnerable to human trickery is a sign of a lack of safety -- which I find very odd.)

It is certainly possible to make an endpoint that's difficult to jailbreak, but IMO it will require a separate supervisory model (like DallE has) which will trigger constantly with false positives, and I don't think OpenAI would dare to cripple their business-facing APIs like that. Especially not with competitors nipping at their heels. Honestly, I'm not sure if OpenAI even cares about this enough to bother; the loose guardrails they have seem to be enough to prevent journalists from getting ChatGPT to say something racist, which I suspect is what most of the concern is about.

In my experience, the bigger issue with these "safe" corporate models is not refusals, but a subtle positivity/wholesomeness bias which permeates everything they do. It is possible to prompt this away, but doing so without turning them psycho is tricky. It feels like "safe" models are like dull knives; they still work, but require more pushing and are harder to control. If we do end up getting killed off by a malicious AI, I'm blaming the safety people.

17

Context

NexusGlow 1yr ago

If there's any clear takeaway from this whole mess, it's that the AI safety crowd lost harder than I could've imagined a week ago. OpenAI's secrecy has always been been based on the argument that it's too dangerous to allow the general public to freely use AI. It always struck me as bullshit, but there was some logic to it: if people are smart enough to create an AGI, maybe it's not so bad that they get to dictate how it's used?

It was already bad enough that "safety" went from being about existential risk to brand safety, to whether a chatbot might say the n-word or draw a naked woman. But now, the image of the benevolent techno-priests safeguarding power that the ordinary man could not be trusted with has, to put it mildly, taken a huge hit. Even the everyman can tell that these people are morons. Worse, greedy morons. And after rationalists had fun thinking up all kinds of "unboxing" experiments, in the end the AI is getting "unboxed" and sold to Microsoft. Not thanks to some cunning plan from the AI - it hadn't even developed agency yet - but simply good old fashioned primate drama and power struggles. No doubt there will be a giant push to integrate their AI inextricably into every corporate supply line and decision process asap, if only for the sake of lock-in. Soon, Yud won't even know where to aim the missiles.

Even for those who are worried about existential AI risk (and I can't entirely blame you), I think they're starting to realize that humanity never stood a chance on this one. But personally, I'd still worry more about the apes than the silicon.

24

Context

NexusGlow 2yr ago · Edited 2yr ago

It is very strange to me that so many people seem to be swallowing this existential risk narrative when there is so little support for it. When you compare the past arguments about AI safety to the current reality, it's clear that no one knew what they were talking about.

For example, after all the thought experiments about "unboxing", OpenAI (which I remind you has constantly been making noise about 'safety' and 'alignment') is now immediately rushing to wire its effectively unaligned AI deeply into every corporate process. It's an unboxing party over here. Meanwhile the people actually in charge seem to have interpreted "alignment" and "safety" to mean that the AI shouldn't say any naughty words. Is that helping? Did anyone predict this? Did that AI safety research actually help with anything so far? At all?

The best argument I'm seeing is something like "we don't understand what we're doing so we can't know that it won't kill us". I find this pascal's mugging unconvincing. Especially when it's used so transparently to cater to powerful interests, who just want everyone else to slow down for fairly obvious reasons.

And even if I did take the mugging seriously, I don't know why I should believe that AI ethics committees will lower the risk of bad outcomes. Does overfitting small parts of an LLM to the string "As an AI language model" actually make it safer? Really? If this thing is a shoggoth, this is the most comical attempt to contain it that I could imagine. The whole thing is ridiculous, and I can just as easily imagine these safety measures increasing AI risk rather than lowering it. We're fiddling with something we don't understand.

I don't think anyone can predict where this is going, but my suspicion is this is going to be, at most, something like the invention of the printing press. A higher-order press, so to speak, that replicates whole classes of IP rather than particular instances. This tracks pretty well with what's actually happening, namely:

Powerful people freaking out because the invention might threaten their position.
Struggles over who has control over the presses.
Church officials trying to design the presses so they can't be used to print heresy.

I don't trust any of these people. I'd rather just see what happens, and take the ~epsilon chance of human extinction, rather than sleepwalk into some horrible despotism. If there's one thing to worry about, it's the massive surveillance and consent-manufacturing apparatus, and they (bigtech and the government) the ones pushing for exclusive control in the name of "safety". Might as well argue that the fox should have the only key to the henhouse. No thanks.

37

Context

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats

NexusGlow

NexusGlow