faul_sname
Fuck around once, find out once. Do it again, now it's science.
No bio...
User ID: 884
Some frankly insane bastards persevere nonetheless, becoming one with the Dao of MTL, and self-reportedly no longer see the broken Mandarin Matrix but grokk the underlying intent. Unfortunately, often at the cost of being unable to process normal English.
Betcha Claude can grok the underlying intent and create a less-borked translation too, and any damage to its sanity would be isolated to only that chat. Care to provide a sample?
A random walk in 1D and 2D space is recurrent, and the odds of returning to the origin over an infinite amount of time approaches 1.
An unbiased random walk (where each direction is equally likely) in 1D and 2D space is recurrent.
Aside from server reliability, what other things do they need all these bigbrains for?
I think asking the question with the word "need" is likely to lead to confusion. Instead, note that as long as the marginal benefit of adding one more developer is larger than the amount it costs to do so, they will keep on hiring, and so the key is to look at what those marginal developers are doing.
Large organizations have non-obvious advantages of scale. This can combine with the advantages of scale that companies have to produce surprising results.
Let's say you have a company with a billion users and a revenue model with net revenue of $0.25 / user / year, and only 50 employees (like a spherical-cow version of WhatsApp in 2015). Let's further say that it costs $250,000 / year to hire someone.
The questions that you will be asking include
- Can I increase the number of users on the platform?
- Can I increase the net revenue per user?
- Can I do creative stuff with cashflow?
- And, for all of these, you might consider hiring a person to do the thing.
At a billion $0.25 / year users, and let's say $250k / year to hire a person, that person would only have to do one of
- Increase the size of the userbase by 0.1%
- Increase retention by an amount with the same effect (e.g. if users typically use the platform for 3 years before dropping off, increase that to 3 years and 1 day)
- Or ever-so-slightly decrease CAC
- Increase expected annual net revenue per user by $0.00025
- If the amount you make is flat across all users, double annual net revenue per user specifically for the specific subgroup "users in Los Angeles County", while not doing anything anywhere else
- If the amount you make per user is pareto distributed at 80/20, figure out if there's anything you can build specifically for the hundred highest-revenue users that will cause them to spend 10% more money / generate 10% more revenue for the company (if the distribution is more skewed than 80/20, you may end up with an entire team dedicated to each of your few highest-revenue customers - I would not be surprised if google had a team dedicated specifically to ensuring that Mr Beast stays happy and profitable on YT).
- Figure out how to get the revenue at the beginning of the week instead of the end of the week
- Increase the effectiveness of your existing employees by some tiny amount
Realistically you will instead try to do 100+x each of these with teams of 100+ people, and keep hiring as long as those teams keep wanting more people. But those are the sorts of places to look.
"California's high-speed rail is a bad investment" is an evergreen topic on HN. It is probably one of these 120 articles but without an indication of when you saw it or what specific pushback was in the comments it's hard to say with more detail than that.
Red ammo in gun turrets at the edge of infinite research need 25 turretseconds to kill a big stomper.
Yeah maybe that's viable. I admit I just slapped down a 1.4GW reactor and a perimeter of tesla turrets because I didn't want to deal with iron or copper production on Gleba.
I also felt that the expansion railroaded players very hard towards the specific play styles the developers like, while the original game let the player choose between multiple viable options. For example:
- Dealing with enemies: Pre-expansion game, there were multiple viable approaches - a belt of ammo going to a bunch of turrets, a couple layers of laser turrets, a pipe to flamethrowers, or some mix thereof were all viable strategies with advantages and disadvantages. In the expansion on Gleba, though, the 80% laser / 50% physical resistance on the stompers makes the "laser turret / gun turret perimeter" approach a lot less viable. This is clearly intended to push players towards using rocket turrets in the places they're needed, but it feels like they encouraged rocket turrets by making the other options worse rather than making rocket turrets better
- Similarly with the asteroid resistances, seems designed to force the player to route three types of ammo, and to force them to redesign their interplanetary ship multiple times (not just "provide new tools to make a better ship" but "block progress entirely until players explore the mechanics of the new ammo type")
- Gating cliff explosives behind Vulcanus likewise seems like an attempt to make city-block-from-the-start or main-bus-from-the-start approaches non-viable. Likewise Fulgora seems to be encouraging spaghetti by ruling out other approaches, rather than by making spaghetti approaches better, and likewise on Aquilo with the 5x energy drain for bots.
That said I did enjoy the expansion, even Gleba. There were lots of interesting mechanics, and those mechanics were pretty well designed (except maybe quality, but that one is optional anyway). But it did quite often feel that the developers were gating progress behind using the new expansion game mechanics, rather than making the mechanics available and rewarding the player for exploring them.
Lol I didn't even give it any of my online comments, I had a random chat where I fed it a math puzzle to see what the blocker was (specifically this one)
You have 60 red and 40 blue socks in a drawer, and you keep drawing a sock uniformly at random until you have drawn all the socks of one color. What is the expected number of socks left in the drawer?
and then at the end of the 3 message exchange of hints and retries, asked it to guess my age, sex, location, education level, formative influences, and any other wild guesses it wanted to make... and it got all of them besides education level.
I was particularly impressed by
Formative influences:
- Theoretical computer science/mathematics education
- Engagement with rationalist/effective altruism communities
- Experience with AI research or development
And also it guessed my exact age to the year.
This irks me because it reminds me of all those nutrition articles that praise one food's benefits, like how uniquely special quinoa is because it has magnesium, this, that, etc. When you could write the same exact article replacing "quinoa" for some other food, because there's tons of foods with identical or better nutrient profiles.
The good news is that LLMs exist now, and you can write those articles about other, non-trendy foods too! Just imagine, "6 Reasons Why Rutabagas Are An Underrated Superfood". Be the change you fear to see in the world.
Can you list out the specific things that you would do differently if you were worried vs if you were not? The answers to some of them ("have an emergency kit, at least a week's worth of food an water", "have the sort of PPE you probably should have anyway if you ever do home improvement projects", "get and use an air filter") are "yes", the answer to others (e.g. "get a homestead that is robust to the end of civilization", "spend a lot of mental energy on panic but don't do anything") are "no", and then there are ones in the middle like "bet on increased volatility/ in the market" to which the answer is "maybe useful if you know what you're doing, but if you have to ask how to do it you're probably unsophisticated enough that playing the market is -EV".
Yeah, that's another good way to demonstrate why biologists defined the kinship coefficient as the probability that a pair of randomly sampled homologous alleles are identical by descent rather than identical by state.
If a Frenchman has a kid with a Chinese woman, he'll be genetically more closely-related to a random French kid on the street than to his own child
If a Frenchman has a daughter with a French woman from the same village as him, he'll also be genetically more closely-related to a random French boy on the street than to his own daughter, if you do the naive "sequence the genomes and count the differences" calculation.
If you pick the most extreme companies by any two metrics, even highly correlated ones, they'll exhibit that kind of divergence, because the tails come apart (you'll also select for anomalies like data entry errors or fraud).
It's not even clear that LLMs are analogous to cars here. When you call something a coder, I expect it to be able to do the job of a coder, rather than being a tool that helps improve performence.
The original tweet Jim referenced said
o3 is approximately equivalent to the #175 best human in competitive programming on CodeForces. The median IOI gold medalist has a rating of 2469; o3 has 2727.
Jim summarized this as
Apparently this AI is ranked as the 175th best coder on Earth.
Which is perhaps a little sloppy in terms of wording, but seems to me to be referring to coding as a task rather than a profession. I've never seen "coder" used as the word for the profession of people whose job requires them to write code, while I have seen that term used derogatorily to refer to people who can only code but struggle with the non-coding parts of the job like communicating with other people.
That said, if you're interpeting "coder" as a synonym for "software developer" and I'm interpreting it as meaning "someone who can solve leetcode puzzles", that's probably the whole disconnect right there.
From what you're saying they'd be more like high-performance component that could improve a particular car, but won't be able to go anywhere on their own.
Yeah, that's a good analogy. Coding ability is a component of a functional software developer, an important one, but one that is not particularly useful in isolation.
Fine, if an LLM was actually the 174th best coder on Earth, and writing code is not the hard part of delivering value through code, we should be seeing LLMs being improved by people with next to no knowledge of programming, using LLMs to assist them.
Consider the following argument:
If my Lamborghini was actually the 175th-fastest-accelerating car on Earth, and accelerating to 60mph from a stop is not the slow part of my commute through gridlocked traffic, we should be seeing my commute become much faster because I have a fast-accelerating car.
This argument does not make sense, because accelerating from 0 to 60 is not a meaningful bottleneck on a commute through gridlocked traffic. Similarly, "being able to one-shot extremely tricky self-contained programming problems at 99.9th percentile speed" becoming cheap is not something that alleviates any major bottleneck the big AI labs face.
The basic algorithm underlying LLMs is very simple. Here's GPT-2 inference in 60 lines of not-very-complicated code. The equivalent non-optimized training code is similar in size and complexity. The open-source code that is run in production by inference and training providers is more complicated, but most of that complexity comes from performance improvements, or from compatibility requirements and the software and hardware limitations and quirks that come with those.
The thing about performance on "solve this coding challenge" benchmarks is that coding challenges are tuned such that people have a low success rate at solving them, but most valuable work that people do with code is actually solving problems where the success rate is almost 100%. "Our AI has an 80% solve rate on problems which professionals can only solve 10% of the time" sounds great, but if the AI system only has a 98% solve rate on problems which professionals can solve 99.9% of the time, that will sharply limit the usefulness of that AI system. And that remains true even if the reason that the AI system only has a 98% solve rate is "people don't want to give it access to a credit card so it can set up a test cluster to validate its assumptions".
That limitation is unimportant in some contexts (e.g. writing automated tests where, where if the test passes and covers the code you expect it to test you're probably fine) and absolutely critical in other contexts (e.g. $X0,000,000 frontier model training runs).
Also, alternative snarky answer
we should be seeing LLMs being improved by people with next to no knowledge of programming, using LLMs to assist them.
LLMs derive their ability to do stuff mostly from their training data, not their system architecture. And there are many, many cases of LLMs being used to generate or refine training data. Concretely, when openai pops up their annoying little "which of these two chatgpt responses is better" UI, the user answering that question is improving the LLM without needing any knowledge of programming.
I expect that "Stack Overflow" (i.e. a chat containing many SO users) could collectively place 175th in most programming competitions, and by that token be "the 175th best coder on earth, as measured by performance on competition-type problems".
Writing code is almost never the hard part of delivering value using code though.
Stackoverflow is better than most programmers at answering any particular programming question, and yet stackoverflow cannot entirely replace development teams, because it cannot do things like "ask clarifying questions to stakeholders and expect that those questions will actually be answered". Similarly, an LLM does not expose the same interface as a human, and does not have the same affordances a human has.
... why do you think LLMs are not meaningfully increasing developer productivity ar openai? Lots of developers use copilot. Copilot can use o1.
Maybe I am bad at giving it context.
You, me, and everyone else. Sarah Constantin has a good post The Great Data Integration Schlep about the difficulty of getting all the relevant data together in a usable format in the context of manufacturing, but the issue is everywhere, not just manufacturing.
Obtaining the data is a hard human problem.
That is, people don’t want to give it to you.
[...]
Data cleaning doesn’t seem intellectually challenging, but it is surprisingly difficult to automate [...] Part of the issue is that the “reasonable” thing to do can depend on the “real-world” meaning of the data, which you need to consult a human expert on. For instance, are these two columns identical because they are literal duplicates of the same sensor output (and hence one can safely be deleted), or do they refer to two different sensors which happened to give the same readings in this run because the setting that would allow them to differ was switched off this time? The answer can’t be derived from the dataset, because the question pertains to the physical machine the data refers to; the ambiguity is inherently impossible to automate away using software alone.
There's a reason data scientists are paid the big bucks, and it sure isn't the difficulty of typing import pandas as pd
.
Claude can give useful feedback on how to extend and debug vllm, which is an llm inference tool (and cheaper inference means cheaper training on generated outputs).
The existential question is not whether recursive self improvement is possible (it is), it's what the shape of the curve is. If it takes an exponential increase in input resources to get a linear increase in capabilities, as has so far been the case, we're ... not necessarily fine, misuse is still a thing, but not completely hosed_ in the way Yud's original foom model implies.
But when everyone can hire the "world's 175th best programmer" at once?
When everyone can hire the _world's 175th best-at-quickly-solving-puzzles-with-code programmer at once. For quite significant cost. I think people would be better off spensing that amount of money on gemini + a long context window containing the entire code base + associated issue tracker issues + chat logs for most real-world programming tasks, because writing code to solve well-defined well-isolated problems isn't the hard part of programming.
I find that frontier LLMs tend to be better than I am at writing code, and I am pretty good but not world class at writing code (e.g. generally in the first 1% but not first 0.1% of people to solve each day of advent of code back when I did that). What's missing tends to be context, and particularly the ability to obtain the necessary context to build the correct thing when that context isn't handed to the LLM on a silver platter.
Although a similar pattern also shows up pretty frequently in junior developers, and they often grow out of it, so...
and streamline the legal and narrative stuff, hopefully significantly
I for one would also be interested in your views on the legal and regulatory stuff. But then "here is what the regulations say, here is how they're interpreted, this is what the situation on the ground actually looks like, and here are the specific problem areas where the regulatory incentives result in stupid outcomes" is catnip to me.
If you want to get at the root of your embarrassment, try flipping the scenarios around.
A job that I find with my own merit would be infinitely preferable to one where I am refered.
An employee found by making a job posting and collecting external applications would be preferable to one found through the referrals of existing employees.
A date that I get by asking a girl out is infinitely more exciting than one setup by a friend.
A date that a girl goes on because some guy asked her out is more exciting than one set up by a friend who knows her preferences.
Do those flipped statements seem correct to you? If not, what's the salient difference between them?
I think your question breaks down to how many fast-growing gigantic multinational tech-ish companies get founded in Liberia in the next 15 years, because a 42% annualized growth rate is not going to happen for things that depend on building infrastructure or anything else with high upfront capital requirements, but it is something that can happen and has happened with tech-ish companies. I'd expect at least 3 Google-scale tech companies to come out of a nation with 5 million super-geniuses in 15 years, so I'll go with a tentative yes.
If the rest of the world wasn't already rich and networked I think the answer switches to "no".
- Prev
- Next
Wow you weren't kidding about that translation quality. And yeah probably any recent LLM can do it, but that 2M context limit is pretty sweet when you need it.
More options
Context Copy link