site banner

Culture War Roundup for the week of July 15, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

9
Jump in the discussion.

No email address required.

Ok this might just be funny to me, but the CloudStrike Crowdstrike worldwide outage is the funniest thing to happen in computer security this decade.

If you haven't caught up, 100+ million (billion?) computers around the world were simulatenously broken in an instant. It's black comedy for sure. Hospital & emergency systems around the world have crawled to a halt, and there will be a few hundred deaths that will be traced back to this event. Millions of $$ will be lost. But, the humor comes from the cause of it.

Here is how things panned out:

  • CloudStrike Crowdstrike is a 100 billion valuation tech company that provides security services to a bulk of the world business.
  • Most sensitive organizations (govt, military, healthcare) will refuse to work with you unless you are compliant & all your machines have this installed.
  • It is effectively an anti-virus that sits 1 level below your operating system, 'protecting' your organization from 'bad outcomes'.
  • On Friday afternoon (which we all know is the best time), CloudStrike Crowdstrike deployed a software update that began this outage
  • For any other software this would be a simple restart or uninstall away, but since CloudStrike Crowdstrike is a 'trusted' secuirty tool, it sits under the OS layer, bricking the whole device.
  • Alright, so how do they fix it ?...... THEY CANT !
  • The beauty of bricked device, is you can't send any more software updates to it. You must do it manually. Raw dog it like the 90s.....all 100 million of these computers.
  • That's bad, but surely they can give those instructions to people and each person can fix their laptops themselves. Divide the labor.....
  • NOPE !
  • This software is used in vending machines, kiosks, tablet displays....and all sorts of devices that sometimes don't have keyboards and other times haven't been looked at for years. But at least there is a fix right ?
  • Yes....... but it needs you to start the computer in safe mode....which you can't because 'Bitlocker'.
  • Ah yes, Bitlocker. Turns out, another security measure, makes it so that 99% of a company's employees can't open safe mode.
  • So yes, a few hundred IT people will be responsible for fixing hundreds upon hundreds of laptops, daily, for weeks !

This is the Y2k that was promised.

The world spends billions in computer security every year, and no virus has managed the kind of world-wide disruption caused by one simple bug by the premier security company in the world.


No direct culture war implications, but goes to show just how much of a house-of-cards the tech ecosystem is. 1 little, simple, stupid bug can bring the whole world to a halt. Yet, the industry continues quarterly-earnings chasing.

Jobs keep getting cut, senior members get aged out, timelines get thinner and 'how many features did you deploy' remains the only metric for evaluation.

In tech, staying at a job for more than 3 years is seen as coasting. Devs are increasingly expected to do everything, because 'everyone should be full stack' and everything that isn't feature development (testing, staging, canaries) get deprioritized. Overworked novices means carelessness, carelessness creates mistakes.

At the same time, devs get zero agency. Random HR types make list of regulations mandating certain checkboxes for compliance, while having near-zero knowledge of the risks-and-benefits of these technical decisions. Therefore, the implications of a mistake are opaque to decisions makers. So by being compliant, you've suddenly given CloudStrike Crowdstrike a button to shut your entire business down.

This kind of error should literally be impossible in a company of the size of CloudStrike Crowdstrike . If such an error happens, it should be impossible for giant corporations to crumble zero backup. Incompetence on display, on all sides. Having worked in 'prestigious tech companies', especially in 2024, it isn't surprising. At times, the internal dysfunction is seriously alarming, other times it's a tuesday.


I'm not going to hope for much out of this. Just like Spectre & Solar , people will cry about it for weeks, demand change and everyone will get collective amnesia about it as the next quarter rolls around.

End of the day, tech workers are treated as disposable labor. Executive bean counters are divorced from the product. And the stock price is the only incentive that matters.

As long as tech is run by MBAs and smooth talkers, this will go on.

Some choice photos:

Yep, you're right.... i was going through twitter and got duped.

Once again, rawdogging the Internet pays off!

No direct culture war implications, at least not directly left/right. However, this was easily predictable by readers of Michael Crichton or Ayn Rand, both names in the “up/down” culture war (to coin a phrase).

Crichton’s most famous work, Jurassic Park, was largely about chaos theory. When working with a complex system, that is to say one driven by logic and rules, an outlier can bring down a house of cards through emergent effects. John Hammond not paying for a team of programmers led to dinos eating people. Today’s a mundane version of that.

Rand had a lot to say about innovative producers versus free riders, and apropos to today, about smart people who can create or repair machines versus everyday people who can just use their interfaces until something goes wrong. When it does, the cynical cry of, “Who is John Galt?” escapes their lips.

In the classic book “Atlas Shrugged”, the phrase Who is John Galt is a cry of despair and hopelessness. It describes a situation wherein the pistons are removed from an engine making that whole metal mass of a car useless.

The pistons form a small part of a vehicle’s mass, but provide the entire reason for a (petrol) car’s existence. Similarly most great organisations and societies are moved by a small group of people — the innovators. When those are removed, the entire thing falls apart. And the engine is usually among the last parts of the car to give up. And when the engine gives up, usually you don’t find a replacement — you just sell the vehicle to scrap. When the small minority of truly creative, entrepreneurial, risk-taking people are removed from a society, the society completely falls apart.

John Galt is a symbol of that risk taking, entrepreneurial guy. And when he gives up, the despair sets in. “Who is John Galt” is a cry from the masses who are confused about what is happening and who are despairing to get back the people in charge [the people who can take charge of reality through reason and bend it to their will].

And the truly creative, entrepreneurial guy need not be a rich industrialist. He can be a worker. He can be an artist. He can be a banker. He can be a professor. It is not about their wealth, but about how much they move the status quo.

The American IT industry was hit hard by COVID. Businessmen, C-suite execs, saw their people remoting in from home and trying not to return to the office. These execs, many of them free riders, realized they could halve their costs by hiring remote MSPs from out of country for IT and relying on Crowdstrike to be their security bottom line. A flood of IT layoffs happened this past year, deflating IT wages and making entry level jobs scarce.

Then today, only people with the admin password or a modicum of critical thought could restore the most well-protected systems. Today, companies across the globe learned who their John Galts were, their Eddie Willers, their Dagny Taggarts.

Although, as to the left/right culture war, imagine if this or worse had happened on Election Day and all the votes had to be hand-counted.

No direct culture war implications, at least not directly left/right.

Crowdstrike was the company who the DNC had analyze their network and blame Russia for Guccifer 2.0. I can write up a conspiracy theory about this being a result of the deep state panicking over the failed Trump assassination and forcing a patch to create a backdoor to cause a future major outage to maintain control pretty easily.

The American IT industry was hit hard by COVID. Businessmen, C-suite execs, saw their people remoting in from home and trying not to return to the office. These execs, many of them free riders, realized they could halve their costs by hiring remote MSPs from out of country for IT and relying on Crowdstrike to be their security bottom line. A flood of IT layoffs happened this past year, deflating IT wages and making entry level jobs scarce.

I do think that it is slightly more complicated than that. First off all the lay offs of 80% of Twitter showed everyone that you don't need that many people to run a website. It was predicted by multiple of people that if Twitter didn't stop working other big tech companies would follow. Then there is the whole deal with Section 174 also that has affected the bottom line. Tech isn't unaffected by higher interest rates, when money was cheap they could amass people to be ready for "initiatives". Well not anymore.

I can give you the point of the free riders. The worst thing about them is that they actively make our tech worse to promote some number go up on their OKRs. Google is making search worse so people stay longer trying to find what they came to google for and watch more ads. Windows search always hits Bing when you do a search locally on your computer, just that it increments a number so a free rider can get a bonus. Just to take examples of search.

Your Section 174 link was fascinating. I feel that it underplayed the back story. It was sketched very briefly, but appears to go like this:

There are fiscal responsibility rules. If the US government passes a tax cut, the law should also include a tax increase in the future to balance the budget over the longer term. Legislators game this by writing a future tax increase that is stupid. Yes, it is in the law, but there is a nudge and wink that it will be repealed before it takes effect. This time the repeal never happened, so the deliberately stupid tax increase goes into effect.

This compounding disfunction bodes ill for the future of the US.

In tech, staying at a job for more than 3 years is seen as coasting. Devs are increasingly expected to do everything, because 'everyone should be full stack' and everything that isn't feature development (testing, staging, canaries) get deprioritized. Overworked novices means carelessness, carelessness creates mistakes.

This may be true amongst the Dev Set, but it's very much not once you get outside of that small corner of 'tech' and into the infrastructure side of things. While there are plenty of greenhorns puttering around, there are also the true gray beards who have been in the same position for decades and know literally everything about the systems they administer (and design and build).

I'm a network engineer and there are enough grizzled old men on my team that our collective experience no doubt stretches into the centuries, and several of these guys have gotten a big chunk of it at this one place. We just had a guy retire last year who started in the late 70s...

That bit jumped out at me as well. Even amongst the Dev Set that strikes me as a very FAANG / SV centric point of view.

I don't think this is too apocalyptic, probably most computers will be fixed by Monday.

But you bet your ass that everyone lost a lot of money today and that it may take weeks (or months) for some businesses to get back to the black.

Does anyone disagree with me that the amount of value destroyed by this failed patch outweighs all of the economic value CrowdStrike has ever provided? Imagine working at a company that would have been better off never existing.

There’s a reasonable case to be made that CrowdStrike isn’t a "real" company anyway: it’s a DeepState actor, worming its way into systems by enabling managers to check a box that satisfies regulatory compliance while giving wholesale control of their system to this opaque third-party.

I work in the industry and while I can confirm that regulatory compliance related to cybersecurity is theatrical bullshit, your assessment of CrowdStrike is completely wrong and nonsensical. It's certainly not the case for every vendor in the industry, but CrowdStrike's products and services do significantly reduce the risk of certain types of cybersecurity threats companies face.

Does anyone disagree with me that the amount of value destroyed by this failed patch outweighs all of the economic value CrowdStrike has ever provided?

I think it depends a lot on what your next alternative is. The morbid possibility is that CrowdStrike could be incompetent and also beat their customers rawdogging the internet. Even if this incident cost 1b USD, that's something like fifty major ransomware strikes. CrowdStrike could conceivably have blocked that many this year.

Of course, CrowdStrike isn't the only alternative. Businesses can use a variety of other protections and/or make themselves more robust to successful attacks. Whether they're more reliable or not is a !!fun!! question, but underneath that, there's a funner one: could businesses have made it? Contra a lot of reporting, I don't know that every regulated company has to use CrowdStrike specifically, but I do know that for even low levels of regulated industry it's a very common requirement that's accepted as a box checked, where alternatives that I could find required additional support not all IT teams would be able to provide.

This is effectively the argument that Lucas Critiqued.

In fact, it's almost exactly analogous to the Fort Knox example given in the article.

Why do you believe that CrowdStrike provides value?

Maybe it does but where is the proof? The half of the world didn't use CrowdStrike and how did they fare?

I would even say, let's do RCT to prove that CrowdStrike improves outcomes. It is perfect case when it could be done.

Maybe nobody wants to do such a test because they are afraid that it will show that CrowdStrike provides no value.

Remember masks during covid. The evidence is that they provided either minimal value or no value at all. And yet the government mandated their use in many countries. Sometimes people do stupid things on large scale.

I'm not saying CS provides no security, but it's hard to believe it provided as much global security as the damage it caused and that a competitor wouldn't have been better.

Sure. And asking how much security they provided requires addressing the counterfactual

I disagree. Crowdstrike Falcon Sensor is meant to keep ransomware from happening, especially to (or through) the Internet of Things. Without it, at least some of the dozens of hospital systems which went down today would have already been hit by sophisticated unscrupulous organized criminals.

I feel sorriest for MGM, who got BSOD’d by Crowdstrike after getting ransomwared last year.

But you bet your ass that everyone lost a lot of money today and that it may take weeks (or months) for some businesses to get back to the black.

The market's reaction was surprisingly sanguine to this. CRWD stock opened 11% lower and stayed that way; almost everyone thought it would be down 30% or more. The Nasdaq was green for the first 2 hours and then went red, which could have been due to anything.

The economy is huge. Even when critical things fail, there is enough stuff that works, plus rapid response to fix the problem, that the damage is not as bad as the hype would suggest. Ironically ,a bigger problem entails a more rapid response to fix it, so it ends up being briefer or not as bad.

Is the implication that the market would properly "punish" them for destroying more value than they've ever created? It could just as easily reward them for extracting rents for "malware defense" while making all of its clients worse off.

The price of any asset is the net present value of all expected future cash flows.

It's not about the stock market punishing a company. It's about the stock market trying to correctly evaluate how much other parties might try to punish the company. If we look at Boeing, we know that increased regulatory scrutiny is very unlikely to increase cash flows, and spectacular reputational damage is unlikely to increase future business. And so cash forecasts are updated accordingly.

sure, but my claim was CrowdStrike has probably caused more economic loss from this one patch than they have ever provided, which is somewhat orthogonal to a statement about their stock price

the fact that their stock price is not zero only indicates that the world's ability to hold them liable for these losses is minimized

(or that they can be held liable and that my estimate of the damage caused is way, way off)

The cope is that this incident just shows how important CrowdStrike is.

Kinda like Boeing. They can have plane crashes, faulty parts, kill whistleblowers, etc... But we still have to buy Boeing planes – because we don't have a choice!

I'm less sanguine about Crowdstrike. Elon said he is ripping them out of all his companies. While the typical CEO drone probably won't do the same, Crowdstrike won't live this down. Maybe ever.

I predict a slow bleed out in their stock, although there's a good chance that internet morons bid it up higher over the next few weeks.

I think Elon did the right thing.

I have never heard about Crowdstrike. No computer I work with had it installed.

I totally understand that an average user is clueless and we need to protect him from his own actions. And yet, if this is such a necessity, why wouldn't Microsoft implement it directly in the OS?

Crowdstrike might be bleeding edge The need for bleeding edge is always overvalued.

It reminds me all times when everybody was trying to install antivirus software. Instead I always removed it because it only consumed resources and provided very little benefit. The best protection was to limit what user can do – do not install unauthorized software, don't even browse internet for fun, just use your work assigned software and web sites.

I think those who relied on third party antivirus software had worse outcomes because their users were more relaxed and less disciplined. At the same time those antivirus software makers got rich.

Probably the same has happened with Crowdstrike. Gradually Microsoft will implement something similar for no extra cost, everybody will realize that Crowdstrike is pointless. Until new challenges will come along and a new opportunistic company, playing on people's fears will convince to buy another scammy service.

And yet, if this is such a necessity, why wouldn't Microsoft implement it directly in the OS?

They did. The only thing missing from Windows integrated security is that it lacks the options to spy on users (breaking multiple privacy laws) and doesn't make it as easy to disrupt productive work by locking down the computer way too much. It also doesn't slow everything to crawl. Naturally corporate IT managers can't stand that.

It's already over. The biggest non-crisis ever even if it was the among the most widespread.

No direct culture war implications, but goes to show just how much of a house-of-cards the tech ecosystem is. 1 little, simple, stupid bug can bring the whole world to a halt. Yet, the industry continues quarterly-earnings chasing.

That seems overdramatic. I have not noticed any disruption at all; if not for all the headlines this morning I would have never known about this. An upgrade was rolled out and it was fixed. This requires manual intervention of servers, which is why IT exists as a profession in the first place. It's not a crisis like on the order of Covid or 2008, but more like a mass disruption. I think too many people are overreading into this as some sort of harbinger of the awaited collapse, and really it's not.

Airlines were grounded; again, this is a common occurrence. There was a similar incidence as recently as 2023 when many flights were grounded https://www.reuters.com/world/us/why-us-flights-were-grounded-by-faa-system-outage-2023-01-11/

it's bad, no doubt, but the mass-grounding of flights is something that typically happens every 2-3 years.

End of the day, tech workers are treated as disposable labor. Executive bean counters are divorced from the product. And the stock price is the only incentive that matters.

The fact they are paid so well and exhaustively vetted in the hiring process suggests they are not disposable. Companies invest a lot of resources in new hires . There is also a loss of perspective in that people forget the other 3650 days of the past decade in which there is no major failure, but a single failure is suddenly a major indictment on the entire tech industry, as opposed to something more mundane like a mistake.

Crowdstike stock was only down 11% today, which is far less than expected given that it has been implicated in the greatest IT failure ever. By comparison, Meta stock fell 15% in a day last after it missed the highest of earnings estimates. This is reason to believe it's not as bad as the overly dramatic language would suggest.

One would hope such companies learn from past mistakes, but as tech changes, consequently so do the mistakes. So I can expect incidents like this in the future.

Wikipedia reports that 5.9% flights were cancelled worldwide. It's definitely a lot of flights but also not that much on global perspective.

Twitter had flightradar24 animations showing flights disappearing with Community Notes saying that this animation is fake and not from CrowdStrike fault event. You wouldn't really notice 6% decrease visually or would notice only a slight reduction.

People love to lie on twitter for dramatic effect.

If we assume 6% reduction of global economic activity for one day, it certainly is loss of billions of dollars. And yet it is less than one extra holiday per year.

I can't help but think about this post that was linked on the SSC reddit a few days ago: https://matt.sh/panic-at-the-job-market

(long, rambly post that I don't fully agree with but it did say a few interesting things) In particular these two quotes:

Modern tech hiring, due to industry-wide persistent fear mongering about not hiring “secretly incompetent people,” has become a game divorced from meaningfully judging individual experience and impact

...

Such job descriptions also means: your job is physically impossible. You will always feel drained and incompetent because you can’t actually do everything everyday. You will always be behind because each of those bullet points can be multiple days of work per week just on their own (plus, how are you supposed to be productive in 35 different areas requiring months to years of experience if you actually want to be good at each task?). So, from day 1, you will already be about 4 months behind on your expected job responsibilities and you’ll never catch up. It turns into an endless game of managers and executives saying you are “underperforming” because you have 18 primary tasks, each primary task requires 4 to 20 hours of effort, and every manager wants their task done within 4 hours. You are setup to fail. What’s the point?

Maybe a point is some companies just shouldn’t exist if they can’t afford the fully staffed professional teams required to build and maintain their products? The worst secret in tech is amateur developers are happy to act like entry level workers across 20 arbitrary roles for years (in the absence of never having enough time to focus on building up long-term experience or best practices). You can’t get gud if you are always rushed from task to task without any chance of leveling up knowledge and capability through “deep work” as we would historically expect of professionals.

I don't think it's like that at every company, or even the majority. But there are certainly some companies like that. They, in theory, care greatly about their tech workers, because the salaries are high and they have a vague understanding that tech is important. But they don't have a good system for actually hiring good tech workers. And then, once hired, they use them all as generalists, moving quickly from one thing to another, with no chance to actually develop expertise or fix deep underlying issues. And they are never given any kind of decision-making authority in the company, only responsibility to "just fix whatever breaks."

I think that behavior happens the most in companies that are not "tech companies," but still use tech. Banks, airlines, large retailers, that sort of thing. They need tech to function, but it's just a cost center to them- they want to just pay a fixed price per month to "handle tech" and then not think about it ever again. And it seems like those are the ones being bitten in the ass by this thing, because it turns out that running a windows server with third-party antivirus on it with automatic updates is not actually very secure! I wonder if we'll see any restructuring, or if this sort of thing is just going to happen every so often forever, as companies get blindsided by tech issues that they don't understand and never cared to try and understand?

i think the culture of secrecy is the bigger problem. they are paid lot and expected to not blabber to the media if they expect to be employed now or in the future by other companies

I had a flight canceled today. I am fucking livid. This was over 12 hours after the rollout. Luckily I was able to get rescheduled onto a flight tomorrow, but frankly I have no confidence that that flight will happen either.

I was just going on a silly vacation. I cannot imagine how I would feel if I missed something important. There will never be justice for this. In a fair world, Crowdstrike would be sued into bankruptcy like Purdue Pharma. I'll be lucky if I get a drink voucher out of this.

Are you sure you can get nothing? Last time an airline messed up my connection I got around 800 euros out of it

in the United States, airlines aren't legally required to compensate customers for delays at all. I had a United flight recently delayed by eight hours and received a $15 lunch voucher and $100 in airline credit though.

If everyone uses this, what determined who got hit? Did they do a random staged rollout and stop once the problems started?

Pure hearsay--but my IT guy says "if your system had Crowdstrike installed, and it was on and running automatic updates when the updates was pushed, then you got hit. If your system happened to be off, power-cycling, delaying updates, etc., then you missed it, and the actual fix was rolled out very quickly to prevent further problems."

So now "zero day" protection is also a zero day exploit.

Something something security monoculture? Truly critical infrastructure should probably be running multiple operating systems on vendor-diverse hardware in parallel, I guess?

security monoculture?

Tangential, but it's shocking how much small differences can impact results. In my industry, people decorrelate WTI from Brent, and then Brent from other Brent, by using4% instead of 5% stoplosses. They then make the full range 1,2,3....% on each, then bottle them up into different ensembles, and after a few days they show massive divergence.

If your system was up when the update rolled out in the afternoon, and you turned off or reset your computer before the rollback patch, you got a BSOD easily fixable by anyone with the admin privilege.

Part of security is a monopoly on force — sorry, on access — so nobody dumb can infect the system, and few people had the privilege. I was one of the clever few who could boot with a Windows installation USB, delete the affected files, and be back up in minutes. Whereupon I was asked to get other PCs up in our building, which I gladly did.

On reddit, someone said they’d been speaking with their crowdstrike security rep the previous week, who said they had a beta for the new version which was getting BSOD on some windows systems, so they weren’t going to push it out until the bug was squashed. It’s assumed in IT the bad update accidentally got into global distrib.

Who is John Galt?

I was one of the clever few who could boot with a Windows installation USB, delete the affected files, and be back up in minutes. Whereupon I was asked to get other PCs up in our building, which I gladly did.

Sounds like your IT security is subpar. No drive encryption and USB boot devices not blocked? This means anyone can exfiltrate the contents of any of the drives.

Nonono, they are clearly following Best Practices (tm) -- after, they have Crowdstrike!

Who is John Galt?

People old enough to have done things like "boot from a USB drive," but not so old as to be confused by computing devices generally?

Thirty years ago, relatively few undergraduates brought their own computers to college, though most had access to some kind of computer "lab." Twenty years ago, most undergrads brought their own computers to college. Ten years ago, it was common for many programs of higher learning to "give" students a laptop for curricular use, testing, etc. Today, I get a surprising number of students whose only computing device is their cell phone, or a similarly hobbled tablet-style appliance. They live in walled gardens and think that computing begins and ends with "apps." Throwaway consumption devices are, slowly but surely, crowding from our collective consciousness the general purpose (and modular!) machines that delivered the Information Age.

And in some ways, I suppose, that was always the goal ("it was always the plan to put the world in your hands...")--just as we don't need everyone to change their own oil, or know how to fly airplanes, we don't need everyone to be using desktop computers. But in much the way that the average American utterly fails to understand or, therefore, appreciate the systems that keep them fed, keep the power on, etc., I suspect that failure to even slightly understand the technology on which our civilization functions contributes to some pretty distorted perspectives--on the world, on life, on politics, etc.

I’m mean, one way of looking at it is that the affected computers are now very well protected from viruses.

Considering how millions of computers are gonna have to be booted into safe mode and have OS/antivirus files tampered with, just wait until malicious actors start "helpfully" supplying USB thumb drive images that promise to deploy the fix automatically (alongside rootkits). Rootkits that might silently disable or bypass Crowd Strike entirely.

This is happening and CrowdStrike already has multiple page warning about various efforts.

Likely eCrime Actor Uses Filenames Capitalizing on July 19, 2024, Falcon Sensor Content Issues in Operation Targeting LATAM-Based CrowdStrike Customers

Falcon Sensor Content Issue from July 19, 2024, Likely Used to Target CrowdStrike Customers

From the second link:

CrowdStrike Intelligence has monitored for malicious activity leveraging the event as a lure theme and received reports that threat actors are conducting the following activity:

  • Sending phishing emails posing as CrowdStrike support to customers
  • Impersonating CrowdStrike staff in phone calls
  • Posing as independent researchers, claiming to have evidence the technical issue is linked to a cyberattack and offering remediation insights
  • Selling scripts purporting to automate recovery from the content update issue

I was going to do a lot of stuff at work today. Was.

So I keep trying to find any information on the technical aspects of this failure. As in, why is it bricking systems. I get that it's a driver that runs under the operating system, and it's failing to load. But why? I've only seen random reports that Crowdstrike literally pushed a corrupted file onto millions of systems, which is rather remarkable if true. If it was actually a bug, I'm deeply curious to hear what the bug was and how it slipped through.

To get really wild and speculative, lately it's been getting reported that Intel 13th and 14th gen I9 CPUs might be defective at incredibly high rates, upwards of 50%. These defects manifest in whole hosts of ways like BSOD, software crashes, and memory errors. I wonder if it's possible a defective Intel CPU borked the executable of an otherwise rigorously tested release. Like I said though, pure speculation. The nature of the Intel failures are still being investigated anyways.

Ok so there's an update on what happened.

The exact crash is caused by dereferencing a null pointer the offending assembly is readable by anyone, and it is as follows mov r9d.dword ptr [r8], the key is that the value of r8 is 0000 0000 0000 009c 9c is an offset of some sort set earlier, so it's derefrencing a null pointer. The pointer is NULL because the value in the file C-00000291.sys was published to be all 0s causing r9d to get loaded as all 0s

So the offending assembly probably looks like

read r8 C-00000291.sys (some offset)

add r8 9c

mov r9d.dword ptr [r8]

causing the bug.

From this, it kind of sounds like rather than having an on-disk data representation that would be parsed and converted to an in-memory data structure, they just loaded the file and accessed the raw bytes as a data structure with internal pointers. Which is... an approach, I guess.

It's an executable; that's how executables work.

Not really. The linker and a bunch of other transformations are going to happen before any of your instructions run. Dumping and loading bytes of a structure straight out of memory has long been considered a lazy and dangerous thing to do; no one is surprised that this sort of bug arose from it.

Eh, not really? Executable files have structure in them other than raw code and still have to be parsed by a loader. A file that's all zeros should fail to load. (Yes, I know DOS had .com files with were just code blobs loaded at a fixed address and immediately executed and I'm sure there are even more ancient examples of that sort of thing, but surely Windows kernel modules can't work like that.)

Anyway, the rumors I've read said that it was actually a data file and that's why they considered it acceptable to deploy it on a Friday -- the assumption being that changing configuration without rolling out a new version of the executable wouldn't break things too badly.

That might be how executables in an operating system work. Wouldn't be how extremely low level BIOS or ROM code that is meant to be executed before the OS loads would work. I can't say for certain exactly how that works these days, but when I was troubleshooting some BIOS code on an old computer of mine, I found myself decompiling a VGA BIOS. And that basically works by being in a certain memory block, it begins with a consistent signature to signal "Yup, there is code here" to the motherboard BIOS, and then it begins loading and executing instructions at a certain offset to initialize the card. Fun fact, you can actually reinitialize the VGA BIOS with a short assembly program that just CALL's to that location if memory serves.

What you are describing sounds more like a boot sector, i.e., raw machine code meant to be read from bootable media and executed directly by firmware (the mobo BIOS in your example)

I’d be surprised if in any modern operating system, executables (even those loaded and run at boot time) were handled that way. Then again, one is reminded of the old chestnut about idiot-proofing software…

The problem with turing machines is that pretty much everything becomes equivalent at high enough levels of generality. Windows EXEs (and DLLs) have a specific format that make it impossible to load an empty or (most) malformed files, but if the surrounding format is correct enough you can absolutely have it followed by a bunch of nonsensical instructions and memory locations -- there is a checksum, but (infamously), it isn't actually mandatory to load or run.

Worse, there's no rule that your executable is the only place that such instructions can come from, and few architectures try. Even in Harvard architectures like Atmels or PICs, there are specific instructions to transfer from the data bus into the program and vice versa. Modern operating systems on von Neumann architectures try to stop you from doing so by accident, by setting memory pages as either instruction or data, and in modern Windows machines further isolating data instructions with DEP, but it's ultimately just a set of flags.

There are arguments against doing this, in favor of having a having your base program load from more conventional configuration files with a strict format (eg JSON), or even having a very limited programming language that your core driver then 'runs'. They have some tradeoffs! But ultimately the problem is a lot more boring: in each case, you have to be able to recognize and respond to a corrupt file. And that's a solved problem! But you have to recognize it.

More comments

I'm pretty sure I could write a C program right now that would run in Windows 10 that will load and run arbitrary assembly instructions from a binary file. The C program might have all the trappings of a proper Win10 executable, but the file it loads and runs sight unseen wouldn't. I'm pretty sure that's what the Crowdstrike driver is doing with the file full of 00's.

Apparently, the corrupted file was just filled with nulls:

https://twitter.com/jeremyphoward/status/1814364640127922499

I'm trying to image what might cause that; truncating the file and then failing to write it? My filesystem-fu isn't really up to par.

Saving a file using a filesystem that journals metadata followed by computer crash that happens before the file contents are flushed is one way to achieve it.

Wasn't there an old joke about an MBA cutting costs in half by getting rid of the 1s and standardizing on 0s?

This is not confirmed information, but I am hearing it on various technical grapevines and it seems plausible:

The primary bug is not new - the kernel-level driver that Crowdstrike runs (and has been running) has a dormant bug in the portion of it that parses config/data files. This update was "just" a config/data file, so deemed low-risk and put through fewer/simpler rounds of testing than a "real" update to their actual software. Whether it was a weird corner case or a malformed file, the kernel driver tripped over it and triggered the dormant bug. Since it's a kernel-level driver, crashing can affect the OS - and it did, generating an exception on a bad memory access (perfectly routine type of bug, but with privileges!) so the OS crashed.

Im not in a position to confirm but that seem quite plausible and dove-tails with some of what I've heard.

For some reason i find myself thinking of this old XKCD 😉

Lol that is amazing. Sounds like the most plausible explanation, but maybe even worse because it seems like that should have been caught in a dev or staging environment

Forget about dev or staging, there's no excuse for not fuzz testing your config parser in current year plus nine.

Interestingly this crash has seemingly barely affected Finnish businesses and organizations at all (apart from cases where they have projects with foreign companies, of course). Apparently there were some minor glitches at the system of the bank I use, but I didn't notice it at all.

I wonder if it's simply that Finnish companies are patriotically committed to using F-Secure/WithSecure solutions above all others...

I’ve never heard of CrowdStrike and I’ve worked as a programmer for 25 years, so I assume more or less nobody uses CrowdStrike in Finland.

Heard much the same from a programmer friend.

Finnish = Linus Torvalds = Linux servers ?

Possibly a part of the explanation, too.

The competency crisis rages on. Boeing's planes fall out of the sky. The Secret Service forgets to check the nearby roof. Anti-virus software bricks your computer. These sorts of incidents have always happened, but it's hard to deny that they have gotten more frequent.

Boeing's planes fall out of the sky.

I could be wrong, but the number of fatal Boeing crashes or lesser incidents is not an outlier compared to past incidents and other manufactures before all the media scrutiny. Anyone remember the 737 rudder jams during the 90s? https://en.wikipedia.org/wiki/Boeing_737_rudder_issues#:~:text=During%20the%201990s%2C%20a%20series,board%2C%20157%20people%20in%20total.

It was a different model and hardly got similar media attention despite two major accidents with lots of fatalities close together

The Boeing issue was somwhat unique in that it was arguably the result of a vulnerability that had been purposefully introduced.

A conscious choice was made to change the emergancy autopilot disconnect from a physical switch to a software one and also to exempt certain autopilot functions from said disconnect switch thus invalidating the existing pilot checklist procedure for bad air data.

but it's hard to deny that they have gotten more frequent.

I'm always skeptical but never dismissive of such common sense. It could be recency bias and the availability heuristic at work.

I am starting to think there's the opposite of that kind of bias at play. 'Instinct distrust bias'?

I don't know what to call it, but it certainly feels like a lot of people turn very 'skeptical' when an aspect of their supported or preferred worldview is poked at in some way. The most obvious example of this would be mass immigration and the rise of housing prices. Implying a causal connection simply isn't a part of the program. Yet instinct would tell us it's the most obvious and important part of the entire problem in most if not all western countries.

Ditto depressed wages, rise of "the gig economy", etc...

Pick me! I’ll deny it!

I have zero reason to believe capability has gotten worse by any reasonable metric. Maybe—just maybe—that’s propped up by technology even as competency has tanked? But if so, I think there should be better evidence than black swans.

Compare complaints about the land boats of old. Why can’t we buy sweet Caddys anymore? I dunno, because they were death traps in an accident.

I’m still trying to find that Onion skit about accidentally invading the wrong Middle Eastern country.

I agree that our competence probably hasn’t declined that much. But our systems are much more integrated with a lot more single points of failure. I doubt that bad updates were ever that unusual. But it wasn’t quite the same as it would have been in 1990 when there were dozens of different OS and virus software combinations and so on. One company doing one update would have only affected the few companies that had the wrong combination of systems that got a bad update. Now the combination of cloudflare and Windows is common enough that one bad update takes out thousands of computers in thousands of companies.

Why can’t we buy sweet Caddys anymore? I dunno, because they were death traps in an accident.

Well, I think it has more to do with fuel efficiency standards. They were also death traps, or not as perfectly safe as possible, but rounding off all the edges for aerodynamic efficiency gives all calls a sameness that's striking when compared to older designs.

You can build a land boat that's as safe as you like, but it's not going to meet fuel efficiency standards unless it's classified as a truck somehow. This also relates to the rise of SUVs: they're not-sedans, and so they don't have the same standards.

I remember a video about the old standard of round headlights. Super convenient for everything except aerodynamics. There was an awkward transition where companies tried to put the aero shell around their legally-mandated headlights, but that was unnecessary after the regulation got removed. Wish I could find it again.

Boeing used to be better. I believe the Secret Service was as well. But anti-virus software, and the companies which make it, have always sucked.

Ehh I'm going to press X to doubt on the Secret service.

John F. Kennedy got Killed (I'll admit Trump would have been killed by Lee Harvey Oswald too)

Gerald Ford had 2 assasination attempts on him both of which he got lucky and survived but both were even crazier than Trumps

and just looking through wikipedia the list is just so long and full of examples that it beggars the question if Trump was even remotely unusual.

Boeing I'll grant you though, I think a part of it is that every corporation has its ups and downs and we have 1 down for Boeing right now, but remember the ford pinto? Boeing's issues are nowhere near as bad.

Checking to see if my flight will be delayed due to this. It still says "on schedule", but following the chain of "where is this plane coming from" backwards in time to see where my plane is, I see one flight where the expected departure time is before the expected arrival time of the airplane.

A website that follows and shows you the chain of previous flights of your plane sounds like a pretty cool idea

I do hope the fallout from this crap will be immense. Cloud was bad idea from beginning. This type of cloud security too.

This is like seeing a jet plane crash in the 1960s and being like "this idea will not work" or the Titanic sinking and thinking the same thing. Enough companies rely on such services that evidently it's worthwhile despite these risks.

Enough companies also relied on massive amounts of lead in the fuel they use for decades. In the digital era - not having all your data and services under your roof will forever be a bad idea. It's just that the beancounters were tired of paying those pesky sysadmins a livable wage.

I am not against the concept of services per-se. But the critical ones should always be self hosted. And moronic, useless antivirus part of the security theater is up there with critical. Anti virus hasn't been needed on windows for a long long time.

The best comment on all things cloud is way before the cloud was even a concept:

With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead

This is the opposite of the cloud.

It is Software-as-a-service, but the processing wasn’t being done on someone else’s computer.

This isn't "cloud" in any meaningful sense.

Indeed, if these computers were in the cloud, they'd be fixed much faster.

Hmm centrally managed, by a third party, not on premises, critical security infrastructure with kernel access? There is definitely reading of cloud service that describes it.

The machines are on prem. That's the whole point.

If the machines were off prem they would be managed by some company with at least basic sysadmin competence and it would merely be a major annoyance to fix this. As it is, every mom and pop with a moron for an IT department is going to have to fix it themselves.

My company’s shiny new ERP system is hosted by our vendor, a large and growing company which sells to many industry verticals. The system is still down.

If our little SMB IT department had been running it on premises, we would never have installed endpoint protection on our servers. We may have had all kinds of other problems we couldn’t hedge against because of our scale, but we have the good sense to weigh the risks ourselves instead of complying with a customer’s backside-covering audit checklist.

You're right that I failed to consider that there are tiny cloud shops out there. When I was talking about cloud I was thinking about the big three.

Well, not cloud, but internet in general.

These machines all updated something, because they are connected to the internet and set up for automatic updates.

People learn pretty quickly that automatic updates are a terrible idea. Even if the update doesn't screw up your data or your workflow, e.g. by taking away some feature you were depending or crapping up the UI, it's likely the update process will kick in at an inconvenient time (like in the middle of a presentation). So they turned them off. Security people started crying about unpatched bugs, and got enough corporate power to get automatic updates considered a "best practice" (when it's not), and here we are.

The problem is that no automatic updates is also a terrible idea, as a majority of systems don't get patched, ever. The ideal is manual updates but responsible companies/admins testing before deployment, and sadly I don't think that's gonna happen. The second best is gradual/tiered deployments with the ability to opt out, which is more realistic but still require more effort than many companies are willing to provide.

I personally think that "no automatic updates" is better than the current hellscape of "lol we can break your device at any time", even with the problems it causes. I'd rather have hella security issues on the Internet than have my stuff randomly break (or just get worse) without my intervention.

Automatic updates are the worst thing . Everyone hates them yet companies do it.

Automatic Windows updates destroyed two of my work laptops at my last job.

I've had Windows 10 updates fuck up some of the older software I have running for my job.

And people wonder why I turn Windows 10 updates off.

Now I'm going to have to fight off a Windows 11 upgrade, so as to not fuck up said software. You'd think local IT would be more paranoid about just gleefully installing whatever it is Microsoft tells them too, but...

I can't speak for your IT department, but in the past we would always test updates across a cross section of the business before rolling them out to everyone. Maybe like 10% of the computers would get the test updates, and we would only deploy if we had no issues on the test PCs. That's really all you can do though, sometimes issues come up even with testing.

"Internet was a bad idea from the beginning" is certainly an interesting argument.

I can definitely agree that canary-less fast global rollouts were a bad idea from the very beginning though.

How long do you wager it'll be before a major car company [thinking of Tesla here but I'm pretty sure they all do this now] bricks a significant number of its electric cars by pushing a bad update (rendering the car unable to start)?

That seems best case. What if it bricks while driving?

Probably highly unlikely. I have worked on mission critical software. While it wasn't automotive it was in a similar field. The code I wrote took six months to reach production. At that company we wrote maybe 5% as many lines of code per work week compared to a normal company. There was also extensive testing.

There may be individual events that happen. Mass brickings are unlikely.

Considering the overall quality of automotive software is 100% garbage I'm not as certain a massive screw-up would be as unlikely.

More like, for all of its benefits the internet has always been, and will always be, a point of vulnerability.

Note that it's crowdstrike, not cloudstrike. Doesn't detract from the post that much but just thought it was worth pointing out.

ClownStrike, I think

Thanks, fixed

Can you remove the strikethroughs, at least after the first one? It's a bit jarring.

If what is being is reported is true and they released some unrunnable or improperly formatted file, I can’t even comprehend that level of incompetence. There is a lot of bullshit at my company which is also dealing with many of the issues you’ve addressed in your post, and of course we have incidents, but something so basic being released with such insane permissions would not be possible at my workplace. Of course that’s discounting any malicious actor, but the number of QA cycles and slow rollout that we go through would have caught something like this 5 weeks before it sniffed release.

Something or someone is deeply rotten at crowdstrike. They need to make a big-time firing or I predict that people will start fleeing in droves.

They don't stage releases sending them out to limited groups one at a time? They do one global update and hope for the best?

There's such obvious ways to limit the impact of this sort of screwup.

I'm going to play Karnak the Magnificent here and say they do indeed do staged rollouts.

They just don't properly check if one stage has succeeded before moving on to the next.

Rumors suggest that it may have been rolled out Friday morning local time.

Of course, a slow rollout is pointless if you have no canary process and no means of determining if you just bricked all Australia...

This seems to me like a fairly usual level of competence from a bolt-on-security-as-a-product or compliance-as-a-service company. Examples:

  • CVE-2016-2208: buffer overflow in Symantec Antivirus "This is a remote code execution vulnerability. Because Symantec use a filter driver to intercept all system I/O, just emailing a file to a victim or sending them a link is enough to exploit it. [...] On Windows, this results in kernel memory corruption, as the scan engine is loaded into the kernel (wtf!!!), making this a remote ring0 memory corruption vulnerability - this is about as bad as it can possibly get". Basically "send an email with an attachment to pwn someone's computer. They don’t have to open the attachment, as long as they have Norton Antivirus (or anything that uses the Symantec Antivirus Engine) installed".
  • CVE-2020-12271: "A SQL injection issue was found in SFOS 17.0, 17.1, 17.5, and 18.0 before 2020-04-25 on Sophos XG Firewall devices, as exploited in the wild in April 2020. [...] A successful attack may have caused remote code execution that exfiltrated usernames and hashed passwords for the local device admin(s), portal admins, and user accounts used for remote access"
  • Okta data breach a couple months back: "For several weeks beginning in late September 2023, intruders had access to [Okta's] customer support case management system. That access allowed the hackers to steal authentication tokens from some Okta customers, which the attackers could then use to make changes to customer accounts, such as adding or modifying authorized users."

It's not that it's amateur hour specifically at CrowdStrike. It's the whole industry.

A general rule: the further a software product is away from "engineering candy", the worse it is.

Software engineers are some of the most entitled, overpaid people on the planet. (I should know!) They have lots of career options.

To get good engineers you need to either pay an outrageous salary or have an interesting product like a video game. Want to find engineers to work on your compliance software? Good luck. Hell, even Google engineers making 400k/year can't be bothered to work on essential but boring products, preferring instead to chase shiny baubles.

No one wants to do the dirty work where good job means not messing up.

I think the problem is that "good job" doesn't mean "not messing up" in the context of these compliance-as-a-service or security-blanket-as-a-service companies. Instead, "good job" is "implement as many features as possible to a level where it's not literally fraud to claim your product has thay feature, and then have a longer checklist of supported features in your product than the competition has so the MBA types choose your product".

CrowdStrike's stock price is only down by about 10% today on one of the highest-impact and highest-profile incidents of this type I've seen. I'm pretty sure their culture of "ship it even if it's janky and broken" has netted them more than a 10% increase in net revenue, so it's probably net positive to have that kind of culture.

Their net revenue is under a billion a year. The total economic damage caused by this single bug is almost certainly larger than the total net income of the entire history of the company. In fact, it is almost certainly larger than the total gross income of the entire history of the company. I do not know where the valuation is coming from, but it certainly isn't from their revenue figures.

Lol P/E of 644.

But it's a hyper-growth company bro, surely they'll be able to pivot to making money once they've captured the full market bro.

Yeah but if they're not liable what relevance does that have to their share price?

I don't know if they're liable or not. I doubt Crowdstrike knows if they're liable or not.

the further a software product is away from "engineering candy", the worse it is.

To get good engineers you need to either pay an outrageous salary or have an interesting product like a video game.

I mean, you could get good engineers with a video game project, but for that you have to be willing to also pay them the outrageous salary. Video game projects are more art than engineering, requiring more designers than engineers. And the brilliant engineers won't work for that much below market rate; if that were their goal they'd go into research or try to get into an early-stage startup, not join a project that's just the application of an existing engine to a new gameplay design. The game projects that appeal to engineers don't sell enough for AAA development, they're nerd games like Factorio or RimWorld (sorry friends).

Not that game companies don't capitalize on the appeal of their projects to talent. They just capitalize by taking lower-tier but motivated engineers/artists/designers and running them into the ground.

I have always been of the opinion that antivirus is a poor idea, and at best, a half-baked solution preventing you from adopting better solutions, such as sandboxing/virtualization and general human security hygiene. I haven't run an antivirus (besides Windows's built-in Defender) in years on any of my computers or phones, and I've never gotten malware on my systems simply because I don't open any sketchy apps or files, and if I do, it's in a virtual machine isolated from the rest of my system.

That an entire industry (the antivirus industry) exists based on the premise of a bad idea that is not only ineffective but adds massive attack surface simply because attackers can exploit what is essentially a privileged system component with deep access to all parts of the system - a cure worse than the disease - should be a lesson in how easy it is for someone to get the basics of a skill (such as security) wrong.

The problem is that simply receiving a text may count as "opening a sketchy file". You really can't expect every boomer pecking at a computer to know the ins and outs of security.

This is not to defend this particular software, but your view leaves out some things as well.

Bad example? If you're targeted with zero-days like Pegasus, an antivirus software is not going to stop it. In fact the standard defense for this sort of thing is what I've advocated - isolation of system components via sandboxing/virtualization. I'm not sure what your argument is.

AV can at least detect anomalous network traffic or unexpected processes, which is obviously not as good as preventing the infection in the first place but is still valuable.

In this case, the systems were sandboxed - FORCEDENTRY escaped the sandbox. Sandboxing isn't a magical technology without vulnerabilities.

Would antivirus have actually detected this infection? Ignoring the fact that phones don't usually run antivirus (because they employ sandboxing security measures), in the case of FORCEDENTRY, the exploit was discovered because Citizen Lab specifically examined the phone of an anonymous Saudi activist. They don't say what exactly led to the phone being examined by them, but I'm willing to bet that it exhibited signs of infection that any general-purpose antivirus like McAfee wouldn't have detected.

Yes, sandboxing technology can still be vulnerable, but antiviruses are not a better security practice than sandboxing. Moreover - since you brought up a targeted spyware attack - if you're being specifically targeted by nation-state actors aided by NSO Group, you need to up your security anyways. So your comment that

You really can't expect every boomer pecking at a computer to know the ins and outs of security.

immediately after discussion of FORCEDENTRY confused me, because if your threat model includes zero-day attacks like FORCEDENTRY (for example, you're a political activist, journalist, or whistleblower), then yes, I do expect such a person to know the ins and outs of security. They should stay on top of their game, because their life literally depends on it. At that level of threat modeling, if you're genuinely worried about attacks from well-funded nation-states, then security is not something you can just ignore and expect to have taken care of for you.

Yes, sandboxing technology can still be vulnerable, but antiviruses are not a better security practice than sandboxing.

It's not one or the other.

Moreover - since you brought up a targeted spyware attack - if you're being specifically targeted by nation-state actors aided by NSO Group, you need to up your security anyways.

Bringing this up as an example was my mistake since it seems to have derailed the conversation.

There are plenty of vulnerabilities out there that are not zero days. There are plenty of systems out there that are vulnerable to such attacks. Not everything is patched as soon as the CVE is published and not every system is updated as soon as the patch is published. It's a simple fact of life that there is a time period between a vulnerability being disclosed and all systems being updated, even if those systems are enrolled in some kind of regular update scheme. Arguing against the need for at least detection and monitoring for threats because you have a lot of faith in sandboxing does not make sense.

More comments

I incidentally just learned about the Okta breach yesterday simply by getting frustrated with it and searching on Twitter evidence on whether everyone else hates using it continuously as much as I do.

I have the opinion that the more data you give out, the more likely it will just get breached. Especially personal data meant to authenticate your identity. The best thing to do would be to not give data out at all - data that doesn't exist, can't be stolen - but most of the rest of the world doesn't think the same way, and are extremely unlikely to question why we have normalized people giving away their data without a second thought.

It's mid-July. Likely an intern bypassing a safety check to try to get his project completed on time.

Don't they deploy updates like this in a development evironment first to test for exactly this kind of thing? I work in very low-level, mostly unimportant IT and I sweat breaking a single website that gets 100 visitors per month. How does something as big as this not get tested first?

I thought this is exactly why they rollout updates instead of distributing them all at once. Do we know for sure there was a rollout or could they have mistakenly pushed this everywhere at once?

In this case I think it depends on what is being pushed. You have to keep in mind that this is a security tool specifically promising and designed to implement rapid defense against zero-day security exploits. Holding off for a week or so on a threat under active exploitation is not what they are being paid for.

Yeah, the paranoid option is that there was some serious zero-day that they were trying to react against, it worked fine on the development environment, and they made a tradeoff of the risk of this sort of incident against not pushing the big red button.

But being derpy is always an option.

That’s a good point, but they have to have some kind of staging environment or slow rollout right? You can’t just release to all customers at once, that’s absolutely insane and asking for something like this to happen even if it’s security-critical.

I don't even understand why something like clownstrike is necessary in 2024. It should be possible for the OS to be locked down to the point where it's not necessary to have an anti-virus running. And if you need some other security system because you are worried about zero day exploit from nation state threats then you should really consider your threat model because the clownstrike system is effectively a malware distribution platform. I guess its fine if you trust clownstrike and the US government but its a far from ideal situation. Clownstrike seems to have a very nice relationship with the US security state. For example they were brought in to do the hacking investigation by the DNC and provided attribution to Russia.

OS vendors should really expose some kind of interface that allows security vendors to perform these deep inspections 'safely'. I think linux has EBPF which I think some vendors have been using for providing file system monitoring and network monitoring.

Also, the SOC2/etc compliance mandates a lot of this stuff. We run most of our software on Fargate ECS where the compute is completely managed by AWS. I've been using this as an excuse as to why we can't run file monitoring and other garbage on our systems that use Fargate. I also suspect why these managed docker/managed kurbernetes systems are popular because potentially you can avoid some of the tickbox security work. We also run all of our containers with a read-only rootfilesystem so I don't even understand the threats that a file system monitoring system would be trying to remediate in our situation. Technically some kernel exploit could allow the root filesystem to be modified even if its read only or AWS employees could fuck with us but I suspect in these cases the file system monitoring could also be trivially bypassed.

I don't even understand why something like clownstrike is necessary in 2024.

Clownstrike and all the other security stuff is the triumph of the security engineers and MBA types over users and cowboy developer types. Security incidents happen. Security engineers blame users and cowboy developer types, come up with software to make computers crappier and less useful. MBAs (especially MBAs at companies making this malware) call this "best practices" and push to have them required by corporations and governments. Developers and users complain that their computers are slow and don't work, the MBAs and security engineers say 'that's how you know it's working'. Then something like this happens and the cowboy types indulge in schadenfreude.

I don't even understand why something like clownstrike is necessary in 2024.

Because, just like with DEI and other stupid corpo bullshit, business necessity has nothing to do with efficacy. You do the rituals and check the boxes because someone somewhere figures this lets the company cover its ass. Whether there was an actual threat of ass exposure to begin with doesn't even get considered.