faul_sname
Fuck around once, find out once. Do it again, now it's science.
No bio...
User ID: 884
To GP's point, "we are cutting funding at the end of 2025, figure it out" would have still been a better way to do this then an immediate stop work order (at least if it could be made to stick, which is perhaps not something the Trump administration could actually do).
Previously, for example, Harvard grants on average charged 69% above the cost of doing research for institutional overhead. (I think we can all imagine where that ends up). NIH just capped that tax at 15%. This will save $4 billion per year. That's $53 for every one of the 75 million Americans who paid federal income tax last year.
If you follow the incentives off a cliff (as happened with health insurance), that means that if they want to retain their $4B cut, that means the new cost of doing research needs to be such that 15% of it is $4B.
On inspection it looks like the "strip disallowed html tags and attributes" step happens after all the sketchy regex stuff so it's probably fine.
With the default system prompt it won't say stuff like that, if you use something like eigenrobot's system prompt it will.
strikethrough_regex = re.compile('''~{1,2}([^~]+)~{1,2}''', flags=re.A)
Used here
# turn ~something~ or ~~something~~ into <del>something</del>
sanitized = strikethrough_regex.sub(r'<del>\1</del>', sanitized)
Anyway maybe like ~this~?
Which looks like &#126;this&#126;
In unrelated news I'm not sure how much I trust the variable named sanitized
to contain what it says on the tin.
ETA: In accordance with the new rules on AI, disclaimer that the escape sequences for tildes was AI generated (and then I verified that the proposal worked).
Looks like your top-level comment about formatting is filtered.
Likely no. But if you fed a bunch of high-value data through openrouter for natural language processing purposes, I think there's a decent chance said high-value data finds its way into future training datasets.
For api access, openrouter should work if you're not doing anything sensitive.
but news broadcasts that are going to be seen by millions are carefully scripted beforehand
I think you may be underestimating the extent to which everything in the world is the result of duct tape and improvisation, and that most things are done by people who do lots of things and thus didn't spend as much time as you might think.
We have their base model. It's very strong on standard benchmarks like Pile loss, ie predicting next tokens in some large corpus of natural text. It's just generically well-trained. You can't accelerate this with OpenAI slop and end up winning on money.
The OpenAI chat API gives you the top 5 if you set the right param.
That said DaseindustriesLtd did a good job of knocking down my crackpot hypothesis.
Yeah on reflection and on actually reading the DeepSeekv3 technical report (here for anyone who's curious) you're right and I no longer believe my crackpot hypothesis.
1: We have their base model. [...] You can't accelerate this with OpenAI slop and end up winning on money.
I bet you could accelerate this at all with OpenAI slop, just because "token + top 5 logprobs" will generate a more precise gradient than "token alone". But that speedup would be less than you could get by using an even-more-precise loss signal by distilling the DeepSeekV2 model that they definitely already had, so "cheat by mimicking ChatGPT" is a strictly worse option than "mimic an open-source or internal model". And even that might not be worth the extra development time to speed up the already-pretty-fast early training stage. So yeah on reflection that part of the crackpot hypothesis just doesn't work.
2: The math checks out. Yes it's a feat of engineering to actually make such a cluster work but the shape of the model + 15T tokens do work out to this number of FLOPs an therefore GPU-hours. If they needed much more GPU-hours, that'd imply pathetically low FLOPs utilization.
Whispers through the grapevine have been that "pathetically low FLOPs utilization" has been pretty much par for the course for the past couple years. Whereas their technical report contains a whole bunch of "we adapted our code to the very specific performance characteristics of the GPUs we actually had, rather than the GPUs we wished we had". Section 3.3.2 of the technical report in particular is impressive in this regard (and is even more impressive in the implications, since that's a particularly legible and self-contained tricky problem, but the team likely solved dozens of other less-publishable problems of similar difficulty with a team of just 139 people).
3: Do you seriously think that these guys would write 16 detailed tech reports including many sections on data augmentation, and not just build a filter that replaces "ChatGPT" with "DeepSeek".
I sure do think that they wouldn't have done that particular filter step (if nothing else, because I would expect that to have a different failure mode where it talks about how OpenAI's DeepSeek model was released in November 2022, and that different failure mode would have shown up on Twitter and I have not seen it).
My crackpot hypothesis is:
- Training a new foundation model from scratch is expensive
- Distillation/mimicry is a lot cheaper than training from scratch, especially with access to logprobs (even only top-k logprobs), though the success metric for the student model is "how well it predicts the teacher model" not "how well it predicts the ground truth distribution".
- Fine-tuning chains of thought to be effective at reasoning is finicky but computationally cheap
- And therefore DeepSeek is letting OpenAI do the expensive foundation model training and initial assistant tuning, then cloning those assistants and iterating from there.
Supporting this, DeepSeekV3 thinks it's ChatGPT.
https://novelfull.com/forty-millenniums-of-cultivation/chapter-2771.html
Wow you weren't kidding about that translation quality. And yeah probably any recent LLM can do it, but that 2M context limit is pretty sweet when you need it.
Some frankly insane bastards persevere nonetheless, becoming one with the Dao of MTL, and self-reportedly no longer see the broken Mandarin Matrix but grokk the underlying intent. Unfortunately, often at the cost of being unable to process normal English.
Betcha Claude can grok the underlying intent and create a less-borked translation too, and any damage to its sanity would be isolated to only that chat. Care to provide a sample?
A random walk in 1D and 2D space is recurrent, and the odds of returning to the origin over an infinite amount of time approaches 1.
An unbiased random walk (where each direction is equally likely) in 1D and 2D space is recurrent.
Aside from server reliability, what other things do they need all these bigbrains for?
I think asking the question with the word "need" is likely to lead to confusion. Instead, note that as long as the marginal benefit of adding one more developer is larger than the amount it costs to do so, they will keep on hiring, and so the key is to look at what those marginal developers are doing.
Large organizations have non-obvious advantages of scale. This can combine with the advantages of scale that companies have to produce surprising results.
Let's say you have a company with a billion users and a revenue model with net revenue of $0.25 / user / year, and only 50 employees (like a spherical-cow version of WhatsApp in 2015). Let's further say that it costs $250,000 / year to hire someone.
The questions that you will be asking include
- Can I increase the number of users on the platform?
- Can I increase the net revenue per user?
- Can I do creative stuff with cashflow?
- And, for all of these, you might consider hiring a person to do the thing.
At a billion $0.25 / year users, and let's say $250k / year to hire a person, that person would only have to do one of
- Increase the size of the userbase by 0.1%
- Increase retention by an amount with the same effect (e.g. if users typically use the platform for 3 years before dropping off, increase that to 3 years and 1 day)
- Or ever-so-slightly decrease CAC
- Increase expected annual net revenue per user by $0.00025
- If the amount you make is flat across all users, double annual net revenue per user specifically for the specific subgroup "users in Los Angeles County", while not doing anything anywhere else
- If the amount you make per user is pareto distributed at 80/20, figure out if there's anything you can build specifically for the hundred highest-revenue users that will cause them to spend 10% more money / generate 10% more revenue for the company (if the distribution is more skewed than 80/20, you may end up with an entire team dedicated to each of your few highest-revenue customers - I would not be surprised if google had a team dedicated specifically to ensuring that Mr Beast stays happy and profitable on YT).
- Figure out how to get the revenue at the beginning of the week instead of the end of the week
- Increase the effectiveness of your existing employees by some tiny amount
Realistically you will instead try to do 100+x each of these with teams of 100+ people, and keep hiring as long as those teams keep wanting more people. But those are the sorts of places to look.
"California's high-speed rail is a bad investment" is an evergreen topic on HN. It is probably one of these 120 articles but without an indication of when you saw it or what specific pushback was in the comments it's hard to say with more detail than that.
Red ammo in gun turrets at the edge of infinite research need 25 turretseconds to kill a big stomper.
Yeah maybe that's viable. I admit I just slapped down a 1.4GW reactor and a perimeter of tesla turrets because I didn't want to deal with iron or copper production on Gleba.
I also felt that the expansion railroaded players very hard towards the specific play styles the developers like, while the original game let the player choose between multiple viable options. For example:
- Dealing with enemies: Pre-expansion game, there were multiple viable approaches - a belt of ammo going to a bunch of turrets, a couple layers of laser turrets, a pipe to flamethrowers, or some mix thereof were all viable strategies with advantages and disadvantages. In the expansion on Gleba, though, the 80% laser / 50% physical resistance on the stompers makes the "laser turret / gun turret perimeter" approach a lot less viable. This is clearly intended to push players towards using rocket turrets in the places they're needed, but it feels like they encouraged rocket turrets by making the other options worse rather than making rocket turrets better
- Similarly with the asteroid resistances, seems designed to force the player to route three types of ammo, and to force them to redesign their interplanetary ship multiple times (not just "provide new tools to make a better ship" but "block progress entirely until players explore the mechanics of the new ammo type")
- Gating cliff explosives behind Vulcanus likewise seems like an attempt to make city-block-from-the-start or main-bus-from-the-start approaches non-viable. Likewise Fulgora seems to be encouraging spaghetti by ruling out other approaches, rather than by making spaghetti approaches better, and likewise on Aquilo with the 5x energy drain for bots.
That said I did enjoy the expansion, even Gleba. There were lots of interesting mechanics, and those mechanics were pretty well designed (except maybe quality, but that one is optional anyway). But it did quite often feel that the developers were gating progress behind using the new expansion game mechanics, rather than making the mechanics available and rewarding the player for exploring them.
Lol I didn't even give it any of my online comments, I had a random chat where I fed it a math puzzle to see what the blocker was (specifically this one)
You have 60 red and 40 blue socks in a drawer, and you keep drawing a sock uniformly at random until you have drawn all the socks of one color. What is the expected number of socks left in the drawer?
and then at the end of the 3 message exchange of hints and retries, asked it to guess my age, sex, location, education level, formative influences, and any other wild guesses it wanted to make... and it got all of them besides education level.
I was particularly impressed by
Formative influences:
- Theoretical computer science/mathematics education
- Engagement with rationalist/effective altruism communities
- Experience with AI research or development
And also it guessed my exact age to the year.
This irks me because it reminds me of all those nutrition articles that praise one food's benefits, like how uniquely special quinoa is because it has magnesium, this, that, etc. When you could write the same exact article replacing "quinoa" for some other food, because there's tons of foods with identical or better nutrient profiles.
The good news is that LLMs exist now, and you can write those articles about other, non-trendy foods too! Just imagine, "6 Reasons Why Rutabagas Are An Underrated Superfood". Be the change you fear to see in the world.
Can you list out the specific things that you would do differently if you were worried vs if you were not? The answers to some of them ("have an emergency kit, at least a week's worth of food an water", "have the sort of PPE you probably should have anyway if you ever do home improvement projects", "get and use an air filter") are "yes", the answer to others (e.g. "get a homestead that is robust to the end of civilization", "spend a lot of mental energy on panic but don't do anything") are "no", and then there are ones in the middle like "bet on increased volatility/ in the market" to which the answer is "maybe useful if you know what you're doing, but if you have to ask how to do it you're probably unsophisticated enough that playing the market is -EV".
Yeah, that's another good way to demonstrate why biologists defined the kinship coefficient as the probability that a pair of randomly sampled homologous alleles are identical by descent rather than identical by state.
If a Frenchman has a kid with a Chinese woman, he'll be genetically more closely-related to a random French kid on the street than to his own child
If a Frenchman has a daughter with a French woman from the same village as him, he'll also be genetically more closely-related to a random French boy on the street than to his own daughter, if you do the naive "sequence the genomes and count the differences" calculation.
- Prev
- Next
Yeah, it's sadly plausible to me that "shut the program down in an orderly fashion" is a fabricated option.
More options
Context Copy link