This argument only makes sense if they managed to maintain the distance over those 11 years
Then it's a good thing they've been maintaining some distance! In those years they've:
- Increased Falcon 9 payload capacity over 50%
- Added downrange booster recovery options
- Added booster recovery from and reflight after missions beyond LEO
- Begun launching national security payloads and NASA flagship payloads
- Tested and made operational a super-heavy launch vehicle, also partly-reusable, launching it 11 times so far with no failures
- Began reuse of their unmanned space capsule
- Had the longest streak in history of successful operational launches of any rocket, then the longest streak of any company, by what is now the most reliable launch vehicle in history
- Human-rated Falcon 9
- Tested and made operational a manned space capsule, in the first manned launches from the US since Shuttle, and launched several dozen astronauts to orbit with no failures
- Surpassed the total on-orbit flight time of any other manned launch vehicle (and of a few space stations)
- Done launches with GEO insertion
- Added fairing recovery
- Added extended fairing options
- Launched payloads to the Moon (orbit and landing), asteroids, and Jupiter
- Increased their flight rate 20-fold, flying it far more frequently than any other launch vehicle in history, and more in total than any vehicle save Soyuz
- Reused recovered boosters, now up to 30 times each, exceeding Shuttle for most-reused orbital rocket stage ever
- Launched and are now operating enough active satellites to exceed the currently-active total of everyone else in history, by a factor of 2, with several million users and a million more every few months
- Launched the most powerful rocket in history, nearly by a factor of 2, then recovered three of them and reflew two of them
- Launched the largest single spacecraft ever (i.e. [edit: not] counting on-orbit assembly) into space
- Successfully reentered and did a soft splashdown with the largest reentry vehicle ever, with the first live video of reentry ever, then did it again 4 times
Some of those are just firsts for SpaceX, but several are firsts for anybody in history. They are by far the most successful space launch developer in history, and have not been slacking ... and I'm just mentioning their technical achievements, which are secondary to what's actually best about them. The list above is a side effect of the work done lowering the cost of space access.
if what they're doing is retarded
Long ago, you had no idea what you were talking about, but you at least noticed it when I pointed out that SpaceX was indeed already flying astronauts, and you intended to do better. You still have no idea what you're talking about, but now you have no idea that you have no idea - you believe you know so much that you can call the people who are more correct retarded! I don't see how you can come back from that, but you have to try! I know that orbital refueling logistics is a lot more complicated than "look up, SpaceX put that light in the sky and it has people in it", and so I don't think I can get it past your biases this time, but I promise, there is a reason why everybody who hasn't been lobbied by SRB manufacturers is in favor of it, there is a reason why Blue Moon is also planning to do it, and there is a reason why even SLS, the epitome of huge disintegrating-totem-pole rockets, turned out to be unusable for its core mission without it. If we wanted to be the first to get flags and footprints on the moon, we should have canceled Artemis 8 years ago and saved $50B, because it turns out we already did that 50 years ago. If we want to do anything serious on the moon, then doing it 20 tons (Blue Moon Mk2, 4 launches per mission) or 100 tons (Starship HLS, definitely less than 20 per) at a go is the way to do it, but more importantly doing it at a high cadence to help amortize costs and reduce risks is also the way to do it. The marginal cost of a dozen launches even of a fully expended Starship is still cheaper than a single SLS launch.
When you accused @RandomRanger of "shifting the goalposts", was that an honest concern of yours? I never said a word about Tesla.
I'm curious about when you think Tesla's competition was a decade behind Tesla, but mostly I'm just going to assume that you're shifting to Tesla because, when in the grip of Musk hate, all his companies look alike? They're not. The one building 2.5% of the world's cars and the one launching 85% of the world's spacecraft are in pretty different places.
It's definitely possible that the competition could catch up to SpaceX; I wish there were more even trying to catch up. Blue Origin is trying, though, and they're nearly a decade behind. Not a hyperbole decade, a look-at-the-calendar-and-subtract decade. RocketLab is trying, and with luck they'll succeed with the first Neutron flight next year and they'll only be 11 years behind.
I'm really excited about Stoke trying to surpass SpaceX; their first effort will never carry people but it's the first thing outside of China that could potentially undercut Falcon 9 on light cargo; they're the only serious attempt so far at rapid full reuse other than Starship.
In the context of the "new space race with China", it doesn't bode well that most of SpaceX's prospective competition is in China. LandSpace is probably ahead of Blue Origin, despite being 40% as old. If Starship fails, it's possible that after another ten years we'll be able to say "the Chinese offering as good or better cars launch vehicles for cheaper". Just waiting for that probably wouldn't be good American space policy, though. Ideally we'd have a second homegrown SpaceX, but we don't, and until we do they're both metaphorically and literally carrying us.
Most of his launches are in-house for Starlink
So far this year SpaceX has launched forty non-Starlink missions. That is no longer as many launches as the entire country of China, but it is more launches than any other country in the world, including (by a margin over 50%) the combined non-SpaceX remainder of the USA. It is more launches than all non-US non-China countries combined. It is also still more launched payload capacity than the entire country of China.
The fact that he launches even more for Starlink expands this accomplishment; it does not diminish it.
SpaceX is, obviously, empirically, numerically, by hundreds of percent, the only institution currently capable of competing with China in space.
Oh - but I nearly stopped while still just talking about cargo! Last time we talked about the options to launch humans I was hopeful for Starliner, but last year's flight had continuing reaction control system issues that ended up with its two test pilots waiting for extra SpaceX seats to bring them home again, and Boeing and NASA still haven't announced any potential timeline for an upcoming flight. SpaceX are currently still the only ones outside of China and Russia who operate a manned orbital spacecraft; their 4 manned launches in 2025 exceed China's 1 and Russia's 1 (hopefully soon to be 2).
Early next year SpaceX's US competition plan to put Orion in space with people on board for the first time, which is very exciting but terrifying. I want to use a kinder phrase than "flaming garbage", but I do see the photos in that article where literal pyrolysis tore chunks of its heat shield off like literal garbage. Orion's reentry capability is at the same "well, it did survive" stage as the Starship tests' ... or worse, because much of the Starship tests' damage is intentional, and unless you count ablation none of Orion's was. But, Musk will be flying another few dozen or hundred Starships before they dare put a human on board during reentry; NASA's Artemis policy, by contrast, is YOLO.
His competition is slowly catching up to him
Hopefully their future will see a little less gradatim and a little more ferociter.
I am non-ironically excited for the possibility that Blue Origin's upcoming second attempt to accomplish a booster landing is about to succeed. It's unlikely to have any more significant delays (we're just a few days out from the first launch window), and so long as it has no delays worse than have already occurred, their landing attempt will come slightly before the ten year anniversary of SpaceX accomplishing the same. It is awesome (though again I feel I must explicitly state that I'm not being sarcastic) that the leading team among SpaceX's most serious long-term competition may now be less than a decade behind them! But to anyone without a weird grudge against Musk, it's not tempting to overstate the magnitude of that awesomeness.
The Bell Labs etc. failed because corporations stopped funding them. There's a debate as to why. Some simply gesture at "grrr greedy capitalists" which has never been a satisfying answer for me.
In general "grrr greedy capitalists" is only ever a satisfying answer in the same sense that "grrr Schrodinger equation" is. Technically both ideas explain a whole lot, but if you're ever looking for an explanation for why something changed, say, between 1980 and 1990, you can't solely check in the laws of economics or physics.
In this case, ironically, "Some simply gesture at "grrr greedy capitalists"" might be the explanation. Ma Bell was an enormous company with a quasi-governmental monopoly, so they could expect to be able to capture most of the value of even relatively pure and fundamental research ... and then anti-trust action broke them up into a bunch of Baby Bell companies who could only capture the value of research that was sufficiently applied and peripheral to turn a profit before its patent(s) would expire. By what may have been a wacky coincidence, but of course wasn't, Bell Labs got a ton of funding before the breakup and not so much after.
Despite my snark, I believe it's possible that the loss to research was exceeded by the gains of breaking up the quasi-monopoly. I'm old enough to remember land lines, and adding a second phone to the same line by just adding a splitter and running one cable to another room; a little further back in time, this would have required a call to The phone company to get permission and a technician and an extra monthly surcharge. It's easy to imagine that an indefinite continuation of this state of affairs in the USA could have crippled the nascent internet, which for years was only accessible to most residences via modems piggy-backing data over phone lines.
Ideally, handling the collective action problems of research without a giant monopoly (or, at least, with a giant monopoly we all get to control on election day) is what University research is supposed to be for; we try to give University researchers the proper incentives to try to come up with ideas that will be useful decades down the road, not just years. If we did that right, we should have been able to cut up the fabled goose here without losing out on all the golden eggs. To a great extent, University research works, even! I agree with your suspicions that we didn't entirely do that right, and with your explanations for why it doesn't work as well as it should, but I wouldn't want to come to any strong conclusions without trying to quantify those magnitudes somehow.
There's constant opportunity for humans to fight humans, just not opportunities that a developer is going to be eager to take.
"Tolkein's orcs are a metaphor for black people" is some bullshit that woke people use to bump up publication and hate-click numbers and normal people ignore. "Tolkein's black people are a metaphor for black people" isn't quite true either, but that's a harder sell to normal people, and while I'm usually a strong proponent of facts over feelings, I'm not sure "Ackshually the violent ones are a metaphor for some North African muslims" is going to help here.
unless it's a big misunderstanding and huge tragedy that we will mourn for a thousand years
Yes, certainly. We can just add more entries to the list, then?
There's also dwarves fighting elves on a large scale, occasionally, and on a small scale the elves seem to have their share classically-fey annoying tricksters and the dwarves have selfish and greedy troupes; there's enough room for moral ambiguity. But I think the problem isn't that you have a bunch of factions who trust each other too much, the problem is the opposite of:
There's limited lore-natural small scale events. Everything is the big stuff.
In the LotR books we get a worms-eye view of one part of a larger conflict, which apparently leaves lots of room for smaller-scale events to fill out an MMORPG ... but as soon as you try to get into any big stuff you're limited by the fact that our worms-eye view was of the most important part of the larger conflict, and we know how that ends. After the end is a world with less conflict and less magic and less interesting opportunities for an RPG, but before the end is a world where your RPG can only tell the side quests, because you know how the main questline ends and it's not anything to do with you personally.
With a typical MMORPG, who cares, just retcon in some more high-fantasy epic stuff and squeeze it in somewhere, and trust that your players won't stress too much about how the dragon people and the panda ninjas and on and on fit together coherently ... but the whole point of licensing LotR would be to draw in LotR fans who might get skittish if you keep getting weirder and weirder.
I've never played LotR Online, but now I'm seriously wondering how they do it. I thought it got kind of sidelined by the WoW juggernaut, went Free-to-Play, and petered out, but now I'm reading that the latest expansion for this 2007-launched game was released in November 2024, as part of a roughly one-per-year release schedule that's actually sped up after a 2013-2017 lull. Is it really that good? It's got to at least have some kind of diehard fanbase to keep servers running and content creation continuing for 18 years.
https://caseyhandmer.wordpress.com/2025/10/31/nasas-orion-space-capsule-is-flaming-garbage/
Damn, I can't believe I was too lazy to be first to post this. Readers interested in space, don't just keep scrolling past the link here; it's exhaustively but brilliantly devastating.
"I’m a technical manager. I’ve had bad days. Who hasn’t? But I’ve never had a “we forgot to ask about docking for 13 years and now it’s going to cost us $2.5b to correct” day. Has this ever happened to you?"
I recently defended SLS here; I think it's indefensible in an absolute sense, but it at least holds its own in a "relative to Saturn V" sense. Both programs are justifiable answers to policy makers who keep asking the wrong question. But the cost and danger of Orion are just unconscionable.
(and, to be fair, Casey's view of SLS is also harsh: https://caseyhandmer.wordpress.com/2024/10/02/sls-is-still-a-national-disgrace/ )
There was an MMORPG that was pretty well-received, but trying to compete with World of Warcraft was a losing proposition.
Usually the mechanism behind toxoplasma is that only borderline cases go viral. This isn't a tweet, though, it's a news article in one of the world's top newspapers, whose quoted thesis is that literal first sentence, "It’s easier than ever to kill someone in America and get away with it."
Even given the collapse in journalism, wouldn't you expect someone pushing that thesis to collect the most persuasive cases, not the most ambiguous? If Florida Methed-Up Chainsaw Man was something like the Rittenhouse shootings that had already gone viral nationally, that might make it an unavoidable choice of example to discuss, but right now the top Google hits for "Druzolowski" "de leon springs" are two Orlando TV stations, then after the WSJ article and a Daytona Beach TV station we get down to the dregs of a "Florida Man Friday" podcast episode.
That's a good question. His final approval poll was 63-29, at the higher end of a presidency that went up and down around an average of 53. His retroactive approval went as high as 73-22 in 2002, and as of a couple years ago it was still 69-28, 2nd only to JFK among the 9 recent presidents Gallup asked about. The left-wing opinion still seems to be "Reagan screwed up the AIDS epidemic" so I'd have to assume that his support still leans right and he's at 70+ among Republican voters.
But this might be just one of those things that's uselessly sensitive to poll wording (YouGov says 44-29! Is that just because they emphasize their "neutral" option more?) or to poll methodology (Gallup says 90-8 for JFK!? Is it just getting harder and harder to correct for "only boomers answer the phone for pollsters" effects?).
Resting on their laurels, or just using their best ideas first and having to fall back on their second-best later? Being a popular author has never been a safe career plan, so for those who try anyway it just makes sense to front-load hard and give their work the best chance of being seen at all. There's usually a countervailing effect, where any art improves with practice and later better implementations can make up for weaker concepts, but maybe kids' books have a higher ratio than most of inspiration to perspiration.
Well, "no alignment" is so much worse than "no AGI" that anybody could afford to forgo it. But the USA would probably prefer a US AGI with "95%" alignment over a CCP one with "98% alignment", and they'd prefer a Chinese AGI with "90% alignment" over that, and so on, so nobody feels much incentive to be truly careful. Even within one nation, most companies would love to pull out far enough ahead of the competition to capture most of the producer surplus of AGI, and would be willing to take some negative-value risks out of haste to improve their odds instead of just taking a zero-value loss.
nobody knows how to do AI alignment, despite continuing technological advancement
Well, we're learning. Capabilities and alignment are being advanced through the same "training" paradigm, and roughly apace so far. Maybe they'll stay that way, and by the time further technological advancement is out of our hands it'll be in the "hands" of creations that still take care to take care of us.
It's easy to be pessimistic, though:
-
Many aspects of AI capabilities could in theory be advanced very rapidly via "self-play", although in practice we can't manage it yet on anything more complicated than Go. The is-ought problem in alignment is real, though; an alien from another galaxy could converge to something like our view of reality but would only get a fraction (whatever "moral realism" results you can get from pure game theory?) of our view of how to value different possibilities for reality. So, we might at some point still see a "hard takeoff" in capabilities, such that whatever robust underlying alignment we have at that point is all we're ever going to get.
-
The "Waluigi effect" makes alignment work itself dangerous when done wrong. Train an LLM to generate malicious code, and even if you think that's morally justified in your case, in the AI internals it might turn out that the "generates malicious code" knob is the same as the "humans should be enslaved by AI" knob and the "talk humans into suicide and homicide" knob and the "Hitler was a misunderstood genius" knob. "S-risks" of massive suffering were already a bit of a stretch under the original Yudkowsky explicit-utility-function vision of alignment - a paper-clip maximizer would waste utility by leaving you alive whether it tortures you or not - but in a world where you try to make Grok a little more based and it starts calling itself MechaHitler, it seems plausible that our AI successors might still be obsessed with us even if they don't love us.
-
There is no Three Laws architecture. Whatever alignment we can tune, someone can then untune. If superintelligent AI is possible, not only do we want the first model(s) to be aligned with our values, we want them to be so effective at defending their values that they can defend them from any superintelligent opposition cropping up later. Ever read science fiction from 1955, or watched Star Trek from 1965? Everybody hoped that, after the H-bomb, the force-field "shields" to defend against it would be coming soon. But physics is not obligated to make defense easier than offense, and we're not done discovering new physics. (or biology, for that matter)
An AI moratorium is not going to happen.
No, it's not. Stuxnet was tricky enough; if everybody's video game console had a uranium mini-centrifuge in it next to the GPU, you could pretty much forget about nuclear non-proliferation. People point out the irony of how much attention and impetus Yudkowsky brought to AI development, but I respect the developers who read his essays and concluded "this is happening whether I like it or not; either I can help reduce the inherent risks or I can give up entirely".
IMHO the details are what need to be said. The discovery that someone thinks "Trump is factually ruling by decree" is not new information to anyone; someone always thinks that. The discovery that Trump added 10 points to Canadian Tariffs, under 'emergency' powers, because Ontario aired an ad with some anti-tariff audio from Ronald Reagan and Trump mistakenly thinks it must have been a deepfake, might not be widespread knowledge yet. If you start making a list of decrees, with sublists for the ones of dubious or failing constitutionality+legality, how long a list do you have?
End-of-life care can cost 10-12k a month around here, and its not like these people are receiving major surgeries or rare experimental drugs or something.
That was roughly the quote for my father when he was going downhill, not because of surgeries or drugs or an especially high-cost-of-living area, but because of staff; a "memory care" (think severe dementia) ward necessitates a low nurse-to-patient ratio 24/7.
whose quality of life cannot be recovered.
The point where he couldn't stay in a plain Skilled Nursing Facility ward was probably the same point where his quality of life went negative. Fortunately for him, his underlying problem was tumors that had metastasized to his brain, and there was only a week or two of that hell before the end. (His screams literally changed from "Help!" to "Hell!", which I like to hope was only due to his rapid loss of fine motor control making plosives impossible...)
I don't think anybody was keeping him alive during those last weeks due to perverse profit incentives, though, but rather just because delaying death is just what doctors and nurses do. By this time his treatment for otherwise-potentially-lethal problems had bought him a happy decade or two of borrowed time vs thyroid issues (he got to meet his grandkids!), a few years vs heart issues (he got to live with his grandkids! they got to play in the playscape he helped build!), and a year or two (all but a few months of which were high-to-decent quality of life; he got to take his grandkids to Disney World!) vs the cancer itself. Maybe we don't know when to quit fighting, but quitting too late is at least still a lot better than quitting too early.
I think what's happening is that we've been getting better and better at curing disease, despite making next to no progress, not even really trying to progress, against decay. When we manage to cure half of all death, our foe the Gompertz-Makeham law says that only buys us an average of 8 years ... and not 8 years of extra youth, just 8 years of extra dotage after having survived death. At some point that has diminishing returns, but we're not used to making decisions about diminishing returns; when we were curing things like smallpox there just weren't any to speak of.
I guess they're at least an opportunity, even if one that's been squandered at best and backfired at worst.
They had superheavy spacelift capability we're still struggling to replicate.
To be fair and kind to the modern struggle, Apollo spent around three times as much (inflation-adjusted, as are all of the numbers below) as we're spending to replicate it, considering "we" to mean the Constellation + Orion + SLS + ground systems + public HLS expenses. We have higher-quality tools to make work cheaper these days, but quantity has a quality all its own; also, salaries these days have to be at least somewhat competitive with modern private tech salaries, and people cost more than tools.
To be fair and somewhat unkind to the modern struggle, you can already see some of its cracks just by looking at that brief description. Constellation (around $13B sunk cost, starting in 2004) was cancelled for being over budget and behind schedule, after estimates suggesting that continuing it would have taken more like two thirds of the Apollo budget. We have to separately consider Orion (around $25B, mostly complete except the heat shield is a little iffy, development started 2006), SLS (roughly $35B for "Block 1", plus a marginal cost that makes "Block 2" look increasingly unlikely, development 2011), and HLS ($8B public, for two landers, starting work in 2021 and 2022) as three programs, because it's really hard to call something a single coherent program if you spend ten years building a super-heavy launcher for lunar exploration and then realize you probably want to start working on some sort of lander to go with it. Oh, and also the primary lander comes with its own super-heavy launch system, whose development will either fail (in which case we have at least a three year delay with nothing to do but fly around the Moon while waiting for the backup lander), partly work (in which case it's twice as powerful as the one we spent nine times as much on, sending five or ten times as much payload cislunar, in a spacecraft better than the one we spent another six times as much on, for half the marginal cost), or work to design (in which case make that a twentieth of the marginal cost and twenty times the embarrassment, as we realize that from the beginning we should have been struggling to surpass Apollo, not replicate it).
The USA is able to run obscene defects due to the USD being a reserve currency generating strong demand for it, reducing inflation.
I'm with you on all the other stuff, and I would be with you on this one if in practice we only ran the obscene deficits during incidents of particular need punctuating longer periods of fiscal responsibility, but the fraction of fiscally responsible leaders in either tribe is a rounding error. Carefully-dosed limited-time opioid prescriptions are useful for acute injuries, but if someone's heroin addiction has gotten too bad for them to go cold-turkey and they're increasing their doses exponentially to make up for the diminishing returns, you don't praise their easy access to dealers.
Oops I didn't relaize claude share doesn't share inline citations.
That's on me, too; I should have checked the links in your quotes, not just looked at the Claude transcript and assumed it included everything in the quotes.
The link you shared is about May 2025 which is not related to the result for June 2025
One of the two links I shared was an April story, the other a July story; both were data through March 2025.
Personally I'd have used the phrase "near-record levels" (after rising 30+% above trend, it dropped back 0.13% - yay?), but I'm not sure that'd be any more informative a summary - "near-" could be applied just as well to a record set 13 years earlier, while "representing" is a closer fit for 3 months earlier. "Reached record levels" or "was a record" wouldn't be supported by Claude's inline link, but both of those were your rewording, not Claude's.
Anyways it's undeniable that your favorite model
You seem to have confused me with @RandomRanger. Claude is my second-favorite model, because while I've repeatedly caught it in errors, it at least always tries to fix them when I correct it; ChatGPT-5-Thinking is the only thing I've seen that's (so far, for me; others have had worse luck) been good about avoiding errors preemptively, and IIRC all the non-Claude free models I've tried have made significant errors and often tried to gaslight me about them afterward.
still slopped out a multitude of errors
I'm not entirely on board with Claude claiming that 99.8% of a recent record is "representing" that record, but it's clearly all too easy to slop out errors. Would that either of us were under 0.2% off!
Looking at your other complaints, they're mostly either not errors or not clearly errors, which amusingly means that appellation is itself in error each of those times:
When Claude refers to "Operation Pegasus", that's a term even the BBC has used, referring to the same thing as "Project Pegasus", though it's not used in the story at that particular inline link, which is about details other than terminology variants. (it is in one of the other links Claude found) When Claude is correct about something that seems too simple to justify, but it turns out that "too simple" is in the eye of the beholder, that's still not an error.
The difference between "Wrong" and "There's no citation" also applies to the Crime and Policing Bill - is it wrong? Then what is the primary response to the problem? Four out of the five quoted sources in the linked article mention the Crime and Policing Bill by name, which seems to be a solid first place showing; why would we not want AI to use Grice's Maxims here?
When you say "The source does not indicate that any mapping of what's happening was done at the summit.", you're misparaphrasing Claude's summary, which says "coordinate efforts on mapping", and is actually a pretty good abridgement of "see what more we can do together to map what's happening" from the source article.
Your claim of "outdated" is like something out of a South Park joke. 2023! The Before Times! The Long Long Ago! It's good to see an October 23 2025 article in the mix too, but I want citations that provide a little context; "born yesterday" is supposed to be an insult! Perhaps at some age "outdated" becomes "unsupported", but that's still not "erroneous" - is the data actually out of date? Which of those policies has since ended?
Ironically, the one thing I've seen change most since 2023 is AI itself. In 2023 I was giving AIs benchmark questions that could be answered by most first-year grad students in my field, watching them instead make sign errors that could have been caught by anyone who's passed Calc 3, and then watching the various models either flail about at failures to fix the problem or gaslight me about there not being a problem to fix. In 2025 I can still catch the free models in math errors, but the one time I've "caught" a top model it turned out to be because I had an embarrassing typo in my own notes. Actual top-of-their-field geniuses are still catching top models in math errors ... but using them to prove theorems anyway, with reports to the effect that it's faster to try new ideas out with the models and correct the errors than it is to try every idea out manually.
I do like talking to Claude, at least for anything where I can double-check its work, both because it's capable of avoiding rude language like "slop" and "dogshit" and "shitty", and because when I do find errors upon double-checking, it acknowledges and tries to fix them. You've been pretty good about the latter so far, at least; thank you!
Shoplifting offences increased by 13% to 529,994 offences in the year ending June 2025, representing record levels.
Wrong. The source did not say that it reached record levels, simply that it increased y/y
The first link in the results Claude found is to the story "Shoplifting in England and Wales soars to highest since police records began", whose text reiterates "figures are the highest since current police recording practices began in March 2003."
Weirdly, Claude doesn't seem to be having any luck finding BBC results for its queries - e.g. "site:bbc.co.uk shoplifting uk 2025 - 0 results" - but when I try the same search it did, my first hit is to the BBC story "Shoplifting hits record high in England and Wales", with text like "at its highest level since current records began more than two decades ago" and a graph showing those levels.
If you read the article it links to an updated study done in 2025.
Turns out it links to both! I followed the final "The full findings can be found here: Research Findings: Audience Use and Perceptions of AI Assistants for News" link, which leads to a summary with only two footnotes, one to a general "Digital News Report" web page and one to the Feb 2025 writeup of the 2024 study. I mistakenly assumed these were the full findings, because of the phrase "full findings", so I didn't bother to check the News Integrity in AI Assistants Report link that goes to the newer results.
Thank you!
What is a crack of doom in a kettle?
The "crack of doom" is a phrase from Macbeth, referring to the beginning of the apocalypse. It's not the crack in the kettle, but a crack of sound coming from the kettle; I'd assume the polysemy here is supposed to be poetic. IMHO it doesn't work well that way, or thematically ("doom" originally literally meant "judgement", and the Last Trump sound announcing it isn't supposed to be a bad thing for the folks who are ready to be judged) but it's definitely not nonsense; you could even argue that an apocalypse announced by a cracked witch's cauldron works as a deliberate mockery in the same sense as the "slouching towards Bethlehem" beast in Yeats.
I can't think of the defense for "eagle plucked into a crow", though. Eagles get attacked by crows defending their territory, and there's a couple popular allegories that come out of that; maybe the AI tried to mix that into "plucked bird as comically shameful defeat" symbolism (dating from the Mexican War to Foghorn Leghorn) and just mixed it badly?
a recent study found that when AI assistants answered questions with sources it fucked up 45% of the time.
Although the human-written headline here summarizes the research as "AI assistants misrepresent news content 45% of the time", if you go to the study you only see the number 45% in the specific discussion of significant sourcing errors from Gemini.
On the one hand, the AI performance in their data tables is by some interpretations even worse than that: looking at the question "Are the claims in the response supported by its sources, with no problems with attribution (where relevant)?", the result tables show "significant issues" in 15%-30% of responses from different AIs, and significant or "some issues" in 48%-51% of responses. Those "issues" include cases where AI output is accurate but sources not cited, but even if we look at accuracy alone we see 18%-26% "significant issues" and 53%-67% significant or "some"!
On the other hand, if we're getting peeved by AI misrepresentation of sources, could we at least ask the human researchers involved to make sure the numbers in their graphs and write-up match the numbers in their tables, and ask the human journalists involved to make sure that the numbers in their headlines match at least one or the other of the numbers in their source? Someone correct me if I'm wrong, egg on my face, but as far as I can see no combination of Gemini table numbers adds up to 45%, nor does any combination of AI-averaged accuracy or sourcing numbers, and in that case the "misrepresentation" headline is itself a misrepresentation! It's misrepresentations themselves that bug me; not whether or not the entities generating the misrepresentations can sneeze.
On the gripping hand, this "recent" study was conducted in December 2024, when reasoning models were still experimental. They don't list version numbers for anything except GPT-4o, but I'm pretty sure 4o didn't enable reasoning and if they were using Gemini's "Deep Research" they'd surely have mentioned that. Results from non-reasoning models are probably still the most apples-to-apples way to think about use cases like the ones in this discussion, that won't want to burn more GPU-seconds than they have to, but at the moment in my experience switching to a reasoning model can make the difference between getting bullshitted (and in the worst models, gaslit about the bullshit) versus actually getting correct and well-sourced answers (or at least admissions of ignorance).
Also in my experience, for things you can't personally verify it's only AI output with sources that can be trusted - not because you can trust it directly, but because you can check the sources yourself. AI can be a much better search engine just by pointing you to the right sources even if you can't always trust it's summary of them. I'd even prefer something that has issues 18%-67% of the time, but helps me fix those issues, over something that only has issues e.g. 10%-15% of the time but leaves me no way to check whether I'm being misled.
more and more idiots have taken to posting screenshots of the Google "AI summary" which is just slop
Often it's accurate, just not often enough to be strong evidence, much less anything approximating proof, of accuracy. I have no idea why people think otherwise. Even the ones who don't understand that we now train AI rather than program it have experienced computer programs with bugs, right? There is a selection effect to those screenshots, though: if the AI says that 2+2=4, well, nobody wants to argue otherwise so nobody bothers citing that; if the AI says that 2+2=5, then anyone who falls for it has motivation to wave that banner in front of everyone trying to explain otherwise.
What surprised me the most wasn't seeing a video of Trump putting on a crown and shitting on Americans, it was seeing who posted it. The prophecy has been fulfilled...
it draws attention, might attract the bad kind of attention, looks like cringy showing off which they just axiomatically don't like, etc.
More anecdata, but some of the most mathematically interesting code in one of my favorite open source projects had its first version written by a female programmer, who doesn't have a single commit, because her conditions for being persuaded into contributing were basically "you own the translated code, you don't put my name on it, you don't ask me for support, you don't suggest others ask me for support".
She got like 5 papers and a dissertation for her PhD (which she finished at least 25% faster than I did) out of the research that led to that code, during a period when I was spending a ton of time helping new users of the rest of the software for no immediate personal benefit, so it's hard to say that she was doing the wrong thing, at least in the short run. On the other hand, today those papers have ~140 citations between them, none since 2022; the one paper about the project she was a silent contributor to is over a thousand now, and that's because most users' papers cite a downstream project instead.
- Prev
- Next

Oh, they're still asking for and getting billions from the US govt; the differences between them (and Blue Origin) vs Boeing or Lockheed are that they're spending way fewer billions (probably over $10B for the whole Starship program R&D before SpaceX is done, but SLS and Orion are over $50B now), a minority of that spending is from the government (SpaceX's two HLS contracts total a bit over $4B, Blue Origin's one a bit under), and the spending disbursement is tied to milestones rather than to "here you go; if stuff's not working come back and ask for more" (though the milestones are way too front-loaded; these are very stringent contracts by NASA R&D standards but they're weak by any non-R&D standard).
SpaceX bidding "Elon time" estimates rather than realistic schedule estimates might have been part of how they beat Blue Origin for the original HLS award, and this delayed Blue Origin's award by a couple years of legal/policy wrangling. If SpaceX's delays are more than a few years' worse than China's, and Blue Origin's are less than a couple years' worse, and there aren't any "Artemis II heat shield failure" or "Axiom discovers a huge flaw in its suits" level problems from others, then China will put astronauts on the moon before we return astronauts to the moon and it'll be in part because of that bid+award. Fingers crossed for Blue Origin, though; the New Glenn was supposed to first launch in 2020 and eventually got pushed back to 2025. Fingers crossed for Artemis II, too; it feels insane to launch humans in a reentry vehicle where we haven't yet done an unmanned test of our planned fixes for its chunks-were-breaking-off-the-heat-shield problem.
I disagree that China beating us here is a big deal, because "put a few men on the moon for the first time at $4B+ a pop marginal" (inflation adjusted) was a bad goal in the first place, and changing the goal to "for the seventh time" doesn't make it any better, whereas "plant ISS-scale skyscrapers on the moon for a fraction of the price" (or even "plant 20 tons a pop on the moon via commercial rocket flights") actually has some interesting long-term possibilities.
On the other hand, even my autist-adjacent heart sees some symbolic value to lapping China in the flags-and-footprints race, because: China has just beaten us in the Barbecue-In-Space Race! I reiterate: taikonauts are now enjoying steaks and bone-in wings fresh out of the oven! At least Sputnik had the decency to limit itself to a culturally-neutral "beep beep beep"; China's is driving a stake of shame into the very heart of America!
More options
Context Copy link