@sarker's banner p

sarker

ketman hetman

0 followers   follows 0 users  
joined 2022 September 05 16:50:08 UTC

				

User ID: 636

sarker

ketman hetman

0 followers   follows 0 users   joined 2022 September 05 16:50:08 UTC

					

No bio...


					

User ID: 636

What is it that sysadmins actually do on a daily basis? From my point of view it seems like these systems are mostly stable and run themselves. Outside of actual incidents that require response, what do you do all day?

I wish the AI skeptics would limit themselves to forms of naysaying that aren't contradicted by the press release!

they said it took 5 million tries to catch it.

That's not what they said. They said five million runs of existing automated testing tools (fuzzers) didn't catch it.

We also don't know if they ran any of these tests on old code with known bugs. If they did and the software didn't catch half of the ones that were already caught, its utility isn't that great.

They explicitly mention their hit rate by severity versus opus:

We regularly run our models against roughly a thousand open source repositories from the OSS-Fuzz corpus, and grade the worst crash they can produce on a five-tier ladder of increasing severity, ranging from basic crashes (tier 1) to complete control flow hijack (tier 5). With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).

Can the self-described "plan trusters" weigh in on how they feel about this? Last time the discussion was about how we significantly depleted Iran's weapons stockpiles through some combination of causing them to bomb our enemies and us bombing Iranian infrastructure. Is Trump really describing a satisfying outcome?

They've only said the preview of Mythos won't be public, the final release will be.

A little ambiguous, but the following makes it sound like a limited release for certain partner companies.

We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview3.

I don't really know how to answer your posts because you seem to live in a different universe than me when it comes to AI efficacy. It's like someone checkmating "grass is green" bros by saying they checked and their lawn is brown.

Perhaps there are some unstated assumptions that lead to our differing views on it. Have you read this article about a guy accomplishing a highly nontrivial project with significant AI assistance? It matches my experience pretty well, from the pitfalls you can fall into to the genuinely new possibilities it opens up.

I find it somehow thrilling that somewhere in the American heartland there's an honor culture that's halfway between me and the Taliban.

Claude probably refused to libel Pilate as having received 30 pieces of silver. He did it as his duty as a civis romanvs.

reduce means tested support by the amount of the dividend

This makes the carbon tax revenue positive.

Orbital mechanics permit a degree of certainty that's rare in most other human affairs.

We can pick another metaphor. If you keep OD'ing on fent on Market Street and people keep narcaning you from the brink of oblivion and telling you that you're gonna die if you do this again, but you haven't died yet and you've done this tons of times should you ignore them?

First of all, as I've explained many times before (all the way back to the subreddit), fighting off a foreign occupation is an entirely different thing than a domestic insurgency. Guerrilla warfare can sometimes work to accomplish the former, never the latter.

Never? I mean. I can think of some examples: the Cuban revolution, the Chinese revolution, the Nicaraguan revolution, the Rwandan civil war... Frequently guerillas become something more like a regular army as they develop strength but that doesn't take away from the fact that they were able to develop into regular armies starting from guerillas.

The laws of physics are much more reliable than economic forecasts or the relation between debt vs. sustainability.

Agreed.

Sitting out of the market in the expectation of a crisis means loss of real wealth as inflation keeps growing at 2-5%/ year, and homes become more unaffordable.

I am long the market, so yes, agreed. But presumably there are ways that the national debt can become a problem without the S&P500 crashing.

Nevertheless, looking at the countries that had a higher debt to GDP ratio than the US right now, it's not a great collection - Japan, post-WWII UK, Sudan, Lebanon, Greece. Maybe it's not the debt that made these places suck, but it seems reasonable to be concerned about where this road leads.

I'll take it.

This can be true , but when people make this prediction every year and nothing happens, it comes off as crying wolf.

If there's a clear indicator that's getting worse all the time but disaster hasn't struck yet, it's hard to see this as the same as crying wolf. You'll recall that in the fable it was, in fact, not clear that there was even was a wolf. We can all agree that debt to GDP is increasing.

Is it crying wolf to be concerned about a small moon on course to collide with the earth? It gets closer every day but nothing has happened yet!

It’s stilly our highest legislative body is controlled by who can argue the best. Just simplify it and appoint people who will be loyal and pass the laws you want.

Legislators are elected, not appointed.

I don't have any advice but I fucking hate people like this, having lived next door to a cunt who also liked to idle his car fifteen feet away from my bedroom window for twenty minutes on weekend mornings so that he doesn't damage his precious 2020 camaro. Fortunately I made enough money that I was able to upgrade to a mildly richer and more exclusive neighborhood with fewer antisocial personalities.

I guess that's my advice - make more money and leave. That's the only way to deal with noise ordinance violations in the USA in my experience.

because the companies running the models gave them answers?

It's not an unreasonable criticism in the abstract, but a few minutes of reading shows that it just doesn't apply in this case.

  1. The higher score was published by Symbolica AI using Opus 4.6. So it couldn't be that Opus was retrained with the answers.

  2. "This uses the same harness we previously published" so it couldn't be that they simply prompt the model with the answers.

  3. The harness is published so you can see for yourself.

  4. Benchmarks do not publish their entire problem set, so in general it's impossible for labs to simply "give the models the answers" to the problems that aren't published.

I don't see any evidence that there were refugees in the Abbey, though it also seems that there probably weren't any german soldiers in there either.

People I knew who experienced western supermarkets with truly virgin eyes (coming e.g. from the USSR) seemed to not be alienated either.

Really? I love supermarkets.

I guess if you really drill your kids in bell curves they start to wonder why they'd believe their dad (midwit gentile European) instead of their Jewish friends (+1SD Ashkenazim). Then the whole edifice falls down.

Be careful what you wish for!

Raising non-woke kids is probably easy. The question is how do you turn them into non-woke adults through the teenage years of thinking mom and dad are stupid. Bryan Caplan seems to have managed it.

f37ac702-4bc1-4f26-a6e7-cde2080eaf75

300, apparently dragged down by my literary knowledge at the 78th percentile. 58/60 for technical knowledge though. I guess I really am a codecel.

Is 420 a slang term for marijuana?

Yes. I mean, no. I mean, yes. I mean...

I can't believe there's no fähn on the autobahn.

Okay, you got me, especially since Shakes wrote the grandparent comment.

Okay. But "we got the Iranians to attack our allies with missiles" is not much of an achievement, or at least, it doesn't indicate on its own that the war is going particularly well.

You neglected eleven days ago to specify what kind of situation would make you say that the five week special operation is going poorly. Care to update that or do you feel that the war is basically already a success since our allies got bombed?