@birb_cromble's banner p

birb_cromble


				

				

				
0 followers   follows 0 users  
joined 2024 September 01 16:16:53 UTC

				

User ID: 3236

birb_cromble


				
				
				

				
0 followers   follows 0 users   joined 2024 September 01 16:16:53 UTC

					

No bio...


					

User ID: 3236

When I try to one-shot or purely vibe code, I get junk

That's fascinating. It's been my experience that the more I iterate, the worse it gets. It usually only works well when I one shot it.

Damn, I might make this for myself if he doesn't like it

Legacy concerns. The amount of custom code that has built up over the last 15 years is too big of a shift to deal with right now. It's on the backlog, but not anywhere near the top priority.

I have no idea why Claude Code is working so badly for you.

I'm not @ChickenOverlord, but I'm also seeing unimpressive results. Maybe we can get to the bottom of it.

I've tried Claude (via Claude Code), Gemini (via Gemini CLI), and GPT (via codex).

In all of them, I've used their equivalent of Claude.md/Agents.md to lay ground rules of how we expect the agent to behave. Multiple people have taken multiple shots at this.

We always use plan mode first.

Our documentation is markdown in the same repository, so that should be useful and accessible.

We're using Java, which is strongly typed and all our endpoints are annotated with additional openapi annotations that should provide even more metadata.

We're using a pretty basic bitch tech stack, but it's not spring boot. All three models regularly fight us on that fact.

We have a four levels of validation, each with their own entry point in the build scripts. These are described in a readme.md in the root of the project. The first is a linter. The second is unit tests and code coverage. The third is a single end to end test. The fourth is all end to end tests. We have instructed the models to use these validation targets to check their work.

Despite all this, we see common failure modes across all models we've tested.

  1. Bad assumptions about the tech stack. No, we do not use spring boot.
  2. A tendency to add more code, rather than fix code.
  3. An urge to "fix" "bad tests" that exist for very specific reasons. These specific reasons are usually covered with inline developer documentation as well.
  4. Confusion about what capabilities our version of java has available. Yeah, the pattern matching preview was cool. Stop trying to turn it on with experimental feature flags.
  5. Writing tests that don't actually test the thing it's changing.

I'm sure there are more, but these immediately come to mind. There are four of us trying to make these things work, and we all keep running into the same problems again and again. It's not just me - even people with dramatically different writing styles and thought processes are seeing the same thing. I feel like I'm taking crazy pills, because a lot of people I know in real life are experiencing the same pain, but on the Internet it seems like I'm a huge outlier.

What's the disconnect here?

That last bit is the most interesting part to me.

Right now, my understanding is that VC is extremely hard to get because a handful of AI darlings have sucked all the air out of the room. If they IPO soon, VCs should theoretically have freed up capital to deploy as the OpenAIs/Anthropics of the world start to show a return.

If I believe the argument, then it should result in a much larger number of smaller investments, since labor is ostensibly the biggest cost of software startups and that cost should plummet.

Block is vibe coding now

It's interesting that you mention C# and null checks.

I also work C# here and there, as well as a language that is a relatively verbose, garbage collected, class based, statically typed, single dispatch, object oriented language with single implementation inheritance and multiple interface inheritance. Like you, I'm seeing unimpressive results that do not justify the spend necessary for agentic coding.

Every time I've mentioned it here, I'm told the following:

  1. I'm using the wrong model. It does not matter what model I'm using - I'm using the wrong one. If it's not the absolute latest model as of three days ago, I'm speaking in bad faith because I'm using an outdated model (and I should ignore the fact that people were saying the same damned thing about the last version that they're now denigrating). If I am using the latest model, I should be using a different model from a different vendor. At this point I've tried Gemini 3.1 Pro/Thinking/Flash, Opus 4.5/4.6, and GPT 5.4. I'm running out of frontier models.
  2. Next, I'll be told I'm not using plan mode. I can read the manuals. I assure you that I am using plan mode. The fact that the agents frequently do not follow their own plan is apparently a moral failing on my part.
  3. Next, I'll be told I'm using writing a bad spec and providing bad prompts. I'm an experienced developer. I'm a published author. I have an English minor from college. I worked as a technical writer for a while. If I can't write a solid prompt, I have to wonder who the ideal candidate is - especially when these things are supposedly so frighteningly powerful that the vendors claim to be half-afraid to release them.
  4. After that, I'll get barraged with vague claims about how the tech is so rapidly improving that my personal tribulations don't matter. Depending on the person, they'll either refer to radiology as a benchmark (ignoring the fact that the models return results even without a film ) or something about how the models are only improving, and inference is only getting cheaper.

Nobody seems to want to offer the sane take, which seems to be that there can be real efficiency gains for small, well-specified projects, provided you are already an expert in the domain and are willing spend a considerable amount of time beating it into submission whenever it so much as coughs.

If you're working on a small (or perhaps exquisitely modularized) codebase, and it's chock full of documentation written in a way that the LLM can comfortably consume it without getting confused, and it's using only the happy path architecture and library set for your language, and it's in one of the "favored" languages (like python), and you have a robust set of preexisting end to end tests that can help keep the LLM on the rails, then this technology is probably pretty great.

Outside of FAANG and a few startups, however, I'm not sure how often that's the case. Legacy code is real. Enterprise customers can have upgrade cycles that are measured in years. Backwards compatibility is worth more than features. Regulatory compliance issues might end up a court summons instead of a JIRA ticket. That's not a world that does well with disposable code. Unless startups can outcompete every established player in every industry with those characteristics, I'm not sure how that changes. I can't rule out that such a future might happen, but given the moats around those industries, it'll be a tough row to hoe.

In our internal pilots, AI-generated PRs from frontier models make it through our test suite on the first try about 15% of the time. Another 30% never pass at all because they spiral out into schizophrenic fantasy lands, trying to call libraries that don't exist or attempting to rewrite a two million line codebase in "modern python". Of the ones that do make it through, about three quarters of them end up failing code review, even as we update and refine our agent instructions. At this point, dependabot has a better track record, and it doesn't even have Dario Amodei crying at night about how terrifyingly capable it is.

It pisses me off. The technology clearly has some uses, but fuck me if it doesn't feel like it's been wildly oversold. We still use it internally, but the mania is starting to die down. Management thinks it's the best thing ever because it can automatically spam LinkedIn for them. Development uses it as a more accessible StackOverflow. But we've given up on agentic coding for the time being. We'll probably look at it again in six months, assuming nothing bizarre happens between now and then.

That's one of the things that has caused the org I work for to re-think their internal AI push. The blast radius of bad developers is no longer limited by their own incompetence.

They might trust a Vibe-coded website, though.

The code coming back might be ugly, buggy, insecure, and probably completely impossible to scale.

But if it works, how much does the 'average' user care?

In my experience the average user starts to care right around the same time that heir credit card number and mother's maiden name end up for sale to the highest bidder.

I think it's largely AI vendors cranking the screws.

On the other hand, Anthropic can't even manage two nines of uptime, so it may just be outright incompetence on their part. Being a PhD in machine learning does not make you and expert at SRE, and the models aren't quite there yet either.

A lot of people have brought it up, and he's not keen on the idea.

There are a lot of kinds of bears. We don't have to be picky. The experiment could probably bear a little variance.

(I appreciate your forebearance on the puns. They're what make my life bearable)

My youngest brother is working through a Latin class, so I'm working through the Aeneid in the original Latin to help him out before his exams.

"Tech bro" is a tricky one, because it's coming from two directions at once.

On one hand, you have what is generally a younger, left wing contingent that dislikes the Alex Karps and Marc Andreeesens of the world, and that splashes out to the industry at large. Whether I like it or not, I have to admit that there's a narcissistic, sociopathic, bloviating, stimulant-addled contingent of people in silicon valley who do a fantastic job of ruining the image of the entire industry.

On the other hand, you have the executive class. They loathe the jumped-up peasants who have the raw fucking audacity to earn a decent wage without getting a degree from an Ivy League school and getting a certificate that lets them work in a highly gatekept field. At this point, I think a lot of them would gladly detonate their own companies if they thought it would hurt the tech employees more (I'm looking at you, Jassy).

God that phrase chaps my ass.

I have coal miners in my family. There is no way that they would be able to learn to code with enough proficiency to find a job. Even if they could, they live in East god-damned Kentucky. After the return to office mania of the last 18 months, where the hell would they get a job doing it?

There have been a lot of green energy initiatives in my area. Every time, the initiative claims that it will create jobs and lower utility prices, and do so with minimal disruption to the local environment and community.

This is, inevitably, bullshit. The jobs never materialize. Our power bill goes up even faster than it had been rising before. Entire sections of forest get torn out and replaced with a few half-assed plantings of non-native softwoods. Even us dumb, cousin-fucking rednecks catch on to the game eventually.

Beyond that, I've noticed that "green energy" is for the peasants. For the things that actually matter to the neo-aristocracy right now, like data centers, we're burning fossil fuels like never before with local gas turbines.

At this point, I'm thoroughly fatigued by it. Whenever somebody hectors me about installing grid scale renewable energy, my default response is "you go first".

Thanks! This is exactly the kind of thing I'm looking for.

Let's do a natural experiment by dropping a quarter million bears into Baltimore and see what happens.

I'd normally ask this in the wellness thread, but it doesn't quite fit.

My father is going through chemotherapy, which has absolutely trashed his appetite, and he has some issues with swallowing due to previous radiation treatment. He's managing to keep his weight stable now by pounding shakes made from ensure, ice cream, protein powder, and peanut butter, but he doesn't exactly enjoy it.

I recently made him bread pudding from scratch (with homemade bread) and he nearly made himself sick eating it. I think part of it is that his usual staples aren't quite appetizing enough to cut through the side effects of the treatment, and he might have better luck with something new.

Cooks of themotte, can you recommend any recipes that are tasty, easy to swallow, and have absolutely degenerate amounts of calories in each serving? The doctor has said that all the usual rules about healthy eating are out the window here, if it tastes and smells good enough to give him an appetite, and it fattens him up, it's a win.

If you're already doing fine and feel functional, what's the point?

I'm not the person you are responding to, but in my case it was incidental. I went seeking treatment for PTSD and they wanted to sort the tism out first.

My family did the best they could to beat it out of me and I guess it mostly worked.

For real though, I think all they can do is not waste money on things like new cloths, meat, restaurants, child care, medical expenses, any food that costs more than 3$ a pound, getting their car fixed, having hot water, etc etc. They must learn how to make everything, fix everything, where to buy everything cheaply, and live a buddha like existence of self denial until they reach escape velocity, which they probably will never reach.

I know you mean this as snark, but also this, unironically.

To benchmark - I currently have enough in tax advantaged accounts that I can retire comfortably at 65 even with a great depression-scale market correction in between now and then, and depending on how things go, I might be able to retire in as little as two years.

I was born into a poor family. I grew up so poor that I got in trouble at school more than once for not wearing shoes because my existing pair fell apart and we couldn't afford new ones. Even food wasn't a guarantee.

I managed to get from there to where I am now by pretty much doing what you described above. I worked 15-20 hours a week all through high school. I smoked the SATs, which got me accepted into several different schools. I chose to go to a fairly pedestrian state school rather than a prestigious engineering college because they offered a much larger aid package. I chose a major that wasn't my passion because I estimated that it would represent my best chance at not being poor. I learned how to cook my own meals and fix my own car.

When I graduated, I took a job at a relatively "safe" employer and I've been there for over 20 years. I kept my head down during the global financial crisis and did what I could to be at the bottom of the layoff pile. For the first three years I worked, I lived like a monk in a busted-up apartment that didn't even have hot water half the time. I used every spare penny I had to pay off my student debts.

Once I paid off my debts, I started putting 15% of my income into a 401(k) and maxing out a Roth IRA. It meant that I couldn't afford a nice car and I wasn't going to go on a vacation every year, but starting early meant that my gains had a chance to compound.

Even today I try to live simply. I try to keep my entertainment cheap. I don't take extravagant trips. I only have beef a handful of times a year because it's expensive. I live in a low cost of living area and have kept the same small, simple house for the last fifteen years.

I did what you suggested above, and it's working pretty well.

Is that code for "you don't act autistic"?

I ran into a bug once in a unique ID generation scheme because the system clock ran monotonically backwards for thirty seconds, then started marching forward again.

Writing a mitigation was actually surprisingly fun. It's not often you have to explain to a reviewer why the "didTimeTravelHappen" variable is in there.