@SnapDragon's banner p

SnapDragon


				

				

				
0 followers   follows 0 users  
joined 2022 October 10 20:44:11 UTC
Verified Email

				

User ID: 1550

SnapDragon


				
				
				

				
0 followers   follows 0 users   joined 2022 October 10 20:44:11 UTC

					

No bio...


					

User ID: 1550

Verified Email

So, I had a little more success than you last year, and you can see my transcript here. Part of the reason is that I didn't give it a minimal prompt. Try to give it full context for what it's doing - this is an LLM, not a Google search, and brevity hurts it. And don't "help" it by removing the story from the problem - after all, English comprehension is its strength. Tell it, up front, exactly how you're going to interact with it: it can "think step by step", it can try some experiments on its own, but you won't help it in any way. The only thing you'll do is run the code it gives you, or submit an answer to the site, telling it the (exact) error message that AoC generates.

To reiterate, give it all the information a human solving AoC would have. That's the fairest test.

My prediction is that o1 will do better (of course), maybe solving a few in the day 10-20 range. However, I think it'll still have problems with certain problems, and with debugging, especially when text output (or a diagram in the input) needs to be parsed character-by-character. This is a fundamental problem with LLMs: textual output that looks well-formatted and readable to us is fed into the LLM as a gobbledegook mixture of tokens, and it just has no clue how to process it (but, sadly, pretends that it can). This is related to how they have trouble with anagrams or spelling questions (e.g. how many Rs are in "strawberry"). I wonder if there's some way we could process text output so it tokenizes properly.

For something based off 20 year old (if not more) tech, it retains excellent particle effects and highlights. And the capship scales... so good. Fighting around a Hecate class in a nebula remains a standout experience. Irritating, but standout.

Not to mention the plot! Wing Commander got all the accolades (and even name-brand actors), but most of its story was melodrama. It never managed to convey the same sense of desperation as Freespace, of fighting a true all-out war for humanity's survival against impossible odds.

I'd say a steelmanning of the Yuddite view is this: "Yes, we along with everyone else did not predict that LLMs could be so powerful. They do not fit our model of an agentic recursive neural net that runs on reward signals, and even a superintelligent LLM is likely to super-understand and do what its creator wants (which is still a risk, but of a different kind). However, it would be a mistake to extrapolate from these last few years where LLMs are ahead in the AI race and assume that this will continue indefinitely. It is still possible that agentic AIs will once again surpass predictive models in the short-to-mid-term future, so there is still risk of FOOM and we need to keep studying them."

I've spoken with some doomers who have this level of intellectual humility. I can't imagine seeing it from Yudkowsky himself, sadly.

Any layman can tell you that the airplane flies.

And that's the point. That's the one, last, important step that (much of) science is lacking. Have you built something that works AT ALL? It's not that engineering doesn't suck. It's that modern "science" is even worse, because so much of its product (random unreplicated research papers, written on esoteric subjects, skimmed by friendly peer reviewers and read by nobody else) never needs to pass that final filter.

Citation needed...? It's a little hard to ask the pig. And even if true, should I care overmuch that the pig "feels stressed" for the last hour of its life? Humans go through worse (to say nothing of how animals die in nature!). If you want me to care about animal welfare, you should focus on the part that really matters - the life the pig lived - rather than the lurid, but ultimately unimportant, details of its death.

"Brutally" slaughtering a pig in "disgusting" "industrial" conditions? Those are very subjective words. The pig doesn't care that it's not being given a dignified sendoff by its loving family at the end of a fulfilled life in a beautiful grassy glade with dandelions wafting in the breeze. Humans fear death; animals don't even understand the concept. As long as we kill them quickly, I really don't give a shit how it's done.

Which isn't to say I don't have concerns about factory farming. The rest of the pig's life may be filled with suffering, and (IMO) we're rich enough, as a society, to do better. My morality-o-meter is ok with sacrificing, say, 0.01% of value to humans to improve the life of pigs by 500%.

Maybe I'm missing some brilliant research out there, but my impression is we scientifically understand what "pain" actually is about as well as we understand what "consciousness" actually is. If you run a client app and it tries and fails to contact a server, is that "pain"? If you give an LLM some text that makes very little sense so it outputs gibberish, is it feeling "pain"? Seems like you could potentially draw out a spectrum of frustrated complex systems that includes silly examples like those all the way up to mosquitos, shrimp, octopuses, cattle, pigs, and humans.

It'd be nice if we could figure out a reasonable compromise for how "complex" a brain needs to be before its pain matters. It really seems like shrimp or insects should fall below that line. But it's like abortion limits - you should pick SOME value in the middle somewhere (it's ridiculous to go all the way to the extremes), but that doesn't mean it's the only correct moral choice.

I agree, when I worked at Google I remember their security measures being extremely well-thought-out - so much better than the lax approach most tech companies take. However, I DON'T trust their ideological capture. They won't abuse people's information by accident, but I will not be surprised if they start doing it on purpose to their outgroup. And they have the tools to do it en masse.

On most forums, if you're a bad actor waging the culture war, it's probably a decent strategy to post a bunch of links like this that are ridiculous non-sequiturs. Most people are too lazy to follow them and have the (usually reasonable) assumption that what's said in them is being accurately represented. Fortunately, I think The Motte is better than that. Looking forward to guesswho's inevitable (re-)permabanning. We need good leftist posters, but he's not one.

I'm far from an expert (and I doubt anyone else in this thread is either), but I'm not sure I really agree with your "extremely dangerous" assessment. Lots of things have a 100% kill rate. Like, congratulations, they've reinvented rabies? A virus that represents a serious risk to society needs to combine a number of unlikely factors, and "killing the host" is probably the easy part. (Ironically, after a certain point, high lethality makes a virus less threatening - a virus's host needs to survive to spread it on!) To truly threaten civilization, you'd have to combine it with a long asymptomatic but highly contagious incubation period.

Of course, because the media are idiots, the article you linked mentions the "surprisingly rapid" death of the mice as if that's supposed to make it more, not less, scary. Ah, journalists, never change.

Ugh, what a ridiculous take. The ability to move a body and process senses and learn behaviour that generates food is miraculous, yes. We can't build machines that come close to this yet. It's amazing that birds can do it! And humans! And cats, dogs, pigs, mice, ants, mosquitos, and 80 million other species too. Gosh, wow, I'm so agog at the numinous wondrousness of nature.

That doesn't make it intelligence. Humans are special. Intelligence is special. Until transformers and LLMs, every single story, coherent conversation, and, yes, Advent of Code solution was the creation of a human being. Even if all development stops here, even if LLMs never get smarter and these chatbots continue to have weird failure modes for you to sneer at, something fundamental has changed in the world.

Do you think you're being super deep by redefining intelligence as "doing what birds can do?" I'd expect that from a stoner, not from a long-standing mottizen. Words MEAN things, you know. If you'd rather change your vocabulary than your mind, I don't think we have anything more to discuss.

I tried on Day 10 and it failed. I want to avoid publication bias, though, so I'm posting the transcript anyway. :) Note that it IS using debug output to try to figure out its error, but I think it's analyzing it incorrectly.

Wow, you're really doubling down on that link to a video of a bird fishing with bread. And in your mind, this is somehow comparable to holding a complex conversation and solving Advent of Code problems. I honestly don't know what to say to that.

Really, the only metric that I need is that ChatGPT makes me more productive in my job and personal projects. If you think that's "unreasonably low", well, I hope that our eventual AI Overlords can hope to meet your stringent requirements. The rest of the human race won't care.

Then I tried it on Day 7 (adjusting the prompt slightly and letting it just use Code Interpreter on its own). It figured out what it was doing wrong on Part 1 and got it on the second try. Then it did proceed to try a bunch of different things (including some diagnostic output!) and spin and fail on Part 2 without ever finding its bug. Still, this is better than your result, and the things it was trying sure look like "debugging" to me. More evidence that it could do better with different prompting and the right environment.

EDIT: Heh, I added a bit more to the transcript, prodding ChatGPT to see if we could debug together. It produced some test cases to try, but failed pretty hilariously at analyzing the test cases manually. It weakens my argument a bit, but it's interesting enough to include anyway.

So, I gave this a bit of a try myself on Day 3, which ChatGPT failed in your test and on Youtube. While I appreciate that you framed this as a scientific experiment with unvarying prompts and strict objective rules, you're handicapping it compared to a human who has more freedom to play around. Given this, I think your conclusions that it can't debug are a bit too strong.

I wanted to give it more of the flexibility of a human programmer solving AoC, so I made it clear up front that it should brainstorm (I used the magic "think step by step" phrase) and iterate, only using me to try to submit solutions to the site. Then I followed its instructions as it tried to solve the tasks. This is subjective and still pretty awkward, and there was confusion over whether it or I should be running the code; I'm sure there's a better way to give it the proper AoC solving experience. But it was good enough for one test. :) I'd call it a partial success: it thought through possible issues and figured out the two things it was doing wrong on Day 3 Part 1, and got the correct answer on the third try (and then got Part 2 with no issues). The failure, though, is that it never seemed to realize it could use the example in the problem statement to help debug its solution (and I didn't tell it).

Anyway, the transcript's here, if you want to see ChatGPT4 troubleshooting its solution. It didn't use debug output, but it did "think" (whatever that means) about possible mistakes it might have made and alter its code to fix those mistakes, eventually getting it right. That sure seems like debugging to me.

Remember, it's actually kind of difficult to pin down GPT4's capabilities. There are two reasons it might not be using debug output like you want: a) it's incapable, or b) you're not prompting it right. LLMs are strange, fickle beasts.

I'm on Apple's AI/ML team, but I can't really go into details.

...are you seriously asking this? I'm not an insect. If you want to claim some observation of insect behavior has even the slightest relevance to human society, the burden of proof's on you.

Hi, bullish ML developer here, who is very familiar with what's going on "under the hood". Maybe try not calling the many, many people who disagree with you idiots? It certainly does not "suck at following all but the simplest of instructions", unless you've raised this subjective metric so high that much of the human race would fail your criterion. And while I agree that the hallucination problem is fundamental to the architecture, it has nothing to do with GPT4's reasoning capabilities or lack thereof. If you actually had a "deep understanding" of what's going on under the hood, you'd be aware of this. It's because GPT4 (the model) and ChatGPT (the intelligent oracle it's trying to predict) are distinct entities which do not match perfectly. GPT4 might reasonably guess that ChatGPT would start a response with "the answer is..." even if GPT4 itself doesn't know the answer ... and then the algorithm picks the next word from GPT4's probability distribution anyway, causing a hallucination. Tuning can help reduce the disparity between these entities, but it seems unlikely that we'll ever get it to work perfectly. A new idea will be needed (like, perhaps, an algorithm that does a directed search on response phrases rather than greedily picking unchangeable words one by one).

To be honest, it sounds like you don't have much experience with ChatGPT4 yourself, and think that the amusing failures you read about on blogs (selected because they are amusing) are representative. Let me try to push back on your selection bias with some fairly typical conversations I've had with it (asking for coding help): 1, 2. These aren't selected to be amusing; ChatGPT4 doesn't get everything right, nor does it fail spectacularly. But it does keep up its end of a detailed, unprecedented conversation with no trouble at all.

Lockdowns aren't on the pareto frontier of policy options for even diseases significantly deadlier than covid imo, just because rapid development and distribution of technological solutions is possible, but ... covid killed one million people in the united states. Yes, mostly old people, but we're talking about protecting old people here. No reason to pretend otherwise.

Speaking of government policy, I wonder how many lives were lost because we couldn't conduct challenge trials on COVID? It was almost the ideal case - a disease with a rapidly-developed, experimental new vaccine and a large cohort of people (anyone under 40) for which it wasn't threatening. If we were a serious society - genuinely trying to optimize lives saved, rather than performatively closing churches and masking toddlers - I wonder how early we could have rolled out RNA vaccines for the elderly?

Heh, yeah, good example. I happily commit atrocities in videogames all the time. I hope there will continue to be an obvious, bright-line distinction between entities made for our amusement and entities with sentience!

Maybe shot 5 times? Or maybe 32 times? I suppose there's not much difference between the two.

I mostly agree with you, but I want to push back on your hyperbole.

First, I don't think doing RLHF on an LLM is anything like torture (an LLM doesn't have any kind of conscious mind, let alone the ability to feel pain, frustration, or boredom). I think you're probably not being serious when you say that, but the problem is there's a legitimate risk that at some point we WILL start committing AI atrocities (inflicting suffering on a model for a subjective eternity) without even knowing it. There may even be some people/companies who end up committing atrocities intentionally, because not everyone agrees that digital sentience has moral worth. Let's not muddy the waters by calling a thing we dislike (i.e. censorship) "torture".

Second, we should not wish a "I have no mouth and I must scream" outcome on anybody - and I really do mean anybody. Hitler himself doesn't come close to deserving a fate like that. It's (literally) unimaginable how much suffering someone could be subjected to in a sufficiently advanced technological future. It doesn't require Roko's Basilisk or even a rogue AI. What societal protections will we have in place to protect people if/when technology gets to the point where minds can be manipulated like code?

Sigh. And part of the problem is that this all sounds too much like sci-fi for anyone to take it seriously right now. Even I feel a little silly saying it. I just hope it keeps sounding silly throughout my lifetime.

How so? He could have just retired slightly sooner, still quite rich and still doing the rounds on news/talk shows answering tough questions like "how does it feel to have saved eleventy-trillion lives with Science(tm)?" Rand Paul constantly grilling him would barely even be reported on, let alone actually affect his life.

Not only do you get to use "think of the children", you also get to partake in socially-approved hate for a group of weirdos for their innate characteristics. Humans have always had an appetite for doing this, but in modern times there are far fewer acceptable targets.

Interesting. I admit ignorance here - I just assumed any UK-based newspaper would be very far to the left. (The video itself still seemed pretty biased to me.) Thanks for the correction.