site banner

Friday Fun Thread for November 1, 2024

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

I'm in my 40s and believe I'm finally hitting my stride as a young cranky old man. What did it?

Working at a company full of Python developers using Google Cloud.

OMFG I do not care about

  • Kubernetes
  • Terraform
  • cloud triggers
  • Celery jobs
  • Python in general
  • anything that ends with .yaml
  • Docker

It's not because I don't know these technologies and can't handle it. It's because they're stupid. They seem like they were some half-baked approach done by someone barely competent at the task they were given and bam they're now the industry standard and we all need to use it and everyone frowns at you like you're an idiot if you think people shouldn't be forced to huff that original barely competent developer's farts all day every day.

Well, fuck that and fuck you if you agree with them. We should not tolerate the simplest things taking 100ms (or 5 seconds) or taking 100MB (or gigabytes) or 10 approved PRs.

I'm going knee-jerk write everything I possibly can in C++ from now on. I'm pushing straight to main prod. I don't care if it's not memory safe or difficult to reason about or not "best practice". I will use indomitable volition to solve the problem and when I do it'll be so much faster and I get to really dig in and be cranky and old and superior. Behold, this actually takes only 50 micros and uses 5MB of RAM and the Hertzner server costs 1/10th and the overall cost is 1/100th and this is right and good and just. While you're entering day three debugging some inscrutable GCP error I'm shipping.

I am elite and I know how computers work and this is how you do it. Sorry if you can't keep up, young whipper snapper :sunglasses: :muscle_arm: :smug_smirking_face:

Get. Off. My. Lawn.

Docker is a great achievement. Like Dropbox or Tailscale, it didn't invent a new thing, but it made existing technologies into a premade solution so simple and painless that it would just work, so it would become just a tool.

Yes, it did a lot of the defaults wrong for running in prod. But it made deploying stuff and undeploying it ten times easier. How do I compile Apache Comet so that it uses the same glibc as the server in production? Yes, you could spin up a VM for that and Vagrant makes spinning VMs up easier. But Docker makes it a snap. How do I deploy some enterprise software that requires an idiosyncratic combo of JVM and Python? Docker makes it a snap. How do I test out and cleanly uninstall afterwards some random server software? Oh, and it needs Postgres and Redis to run. Again, Docker (Compose) makes it practically a zero-friction task.

yeah Docker's fine

dockerizing every single possible thing is not really fine. it gives way to some super aggressively microservice oriented architecture that adds much, much more overall bloat

from now on. I'm pushing straight to main

The last guy I knew who pushed straight to master was a CTO and he bricked his company’s product while it was in active use by really serious clients.

I get that you’re exaggerating for effect and I’ve definitely known some very irritating guru programmers who insist that your one-man proof of concept prototype be audited like it’s intel assembly code, but stuff like docker and CI and PR requirements are used because it makes good things happen and bad things not-happen at a vastly superior rate.

Likewise we had a very good programmer who implemented a bespoke deployment solution based on ansible and a bunch of other stuff, and even before he left it had started to contort into a horrifying mess of spaghetti that was steadily losing features as they became unsupportable. We eventually ended up with a custom Debian for all_dependencies.deb because one of the tools he’d used had stopped supporting versions properly.

We eventually ended up with a custom Debian for all_dependencies.deb because one of the tools he’d used had stopped supporting versions properly.

based

While you're entering day three debugging some inscrutable GCP error I'm shipping.

But are you? My experience has been k8s makes shipping - and by that I don't mean compiling the code (or whatever people do to package python apps in your country) and throwing it over the fence for some other people to figure out how to run it, but actually creating a viable product consumable over the long periods of time by the end user - way smoother than any solution before it. Sure, I can build a 50-component system from the base OS up and manage all the configs and dependencies manually. Once. When I need to do it many times over and maintain it - in parallel to debugging bugs and developing new code - I say fuck it, I need something that automates it. It's not even the fun part. Yes, it means I'll pay the price in pure speed. If I were in a hedge fund doing HFT, I wouldn't pay it. 99% of places I've seen it's prudent to pay it. My time and my mental health is more valuable than CPU time. Not always, but often.

In cranky neckbeard era UNIX-based distributed system environments I almost never hit a problem that I can't diagnose fairly quickly. The pieces are manageable and I can see into all of them, so long as the systems are organized in a fairly simple way. Like maybe once or twice in 20 years have I been genuinely stumped by something that took weeks of debugging to get to the bottom of (they were both networked filesystems).

With cloud-based garbage, being stumped or confused about unintended behavior is more the norm. Especially on GCP. I am frequently stuck trying to make sense of some obscure error, with limited visibility in the Google thing that's broken. The stack of crap is so deep it's very time consuming to get through it all and we often just give up and try to come up with some hacky workaround or live with not being to cache something the way we want or weaken security in a way we don't want to. It's just ugly through and through and everyone has learned helplessness about it.

I almost never hit a problem that I can't diagnose fairly quickly

There can be only two reasons for that, based on my experience: either you are an extreme, generational quality genius, proper Einstein of bug triage, or you've just got lucky so far. In the former case, good for you, but again, that works only as long as the number of problems to diagnose is substantially less than one person can handle. Even if you take 1 minute to diagnose any problem, no matter how hard it is, there's still only 1440 minutes in a day, and I presume you have to also eat, sleep and go to the can. Consequently, this means a bigger system will have to fall into hands of persons who, unlike you, aren't Einsteins. And if the system is built in a way that it requires Einstein to handle it, the system is now under catastrophic risk. It could be that the system you're dealing right now is not the kind of system where you ever foresee any problem that you couldn't handle in a minute. That's fine - in that case, keep doing what you're doing, it works for you, no reason to change. I am just reminding that not all systems are like that, and I have worked many times with system that would be completely impossible to handle with the "lone genius" mode. They are, in fact, quite common.

There can be only two reasons for that, based on my experience: either you are an extreme, generational quality genius, proper Einstein of bug triage, or you've just got lucky so far.

I just know UNIX really well. It's not a freak accident. I used to go to bed reading UNIX programming manuals when I was a teenager. I know it at a fairly fundamental level. But it's also an open platform and there's been a lot of forks so there's been some natural selection on it as well on what runs today (not that it's all awesome everywhere).

I can't say the same about cloud platforms at all. They're purposefully atomized to a much larger extent and you can't see into them and there's no wisdom of the ancients text books that take you through the source code. The API is all you have, and the documentation usually sucks. Sometimes the only way I can figure some of the APIs out is by searching GitHub for ~hours to see if someone else has done this before, if I'm lucky.

Consequently, this means a bigger system will have to fall into hands of persons who, unlike you, aren't Einsteins. And if the system is built in a way that it requires Einstein to handle it, the system is now under catastrophic risk. It

None of what I'm arguing for really requires being the lone genius, but I recognize trying to hire teams of people with this kind of knowledge is probably a risk.

Whatever not my problem crank crank crank

Certainly I’ve found that diagnosing problems in Azure-based CI is an absolute nightmare because you can’t just go in and fiddle with stuff. You have reupload your pipeline, wait the obligatory 40 minutes for it to rebuild in a pristine docker, then hope that the print statements you added are enough to diagnose the problem, which they never are.

That said, it was still better than our previous non-cloud CI because it didn’t fail if you had more PRs than PCs or if you got shunted onto the one server that was slow and made all your perfectly functional tests time out. So I can’t condemn it wholeheartedly.

And not just for you the original coder either. When I’m brought in on a project, the first step really shouldn’t be ‘reinstall your OS so it’s compatible with the libraries we use’.

Yeah that's another aspect. When you graduate from "one man band" to development team, and from development team to 20 teams each of them doing their own thing and needing to coordinate and figure out how not to step on each other toes, turns out hyper-smart CPU-optimal solutions are very rarely the best ones. You need common languages and solutions that can be made reusable and portable. Otherwise the undomitable volition solution needs to be thrown out and redone, because however good is whoever wrote it, he is not very scalable. There were times where lone heroes could single-handedly win battles, by their sheer will and awesomeness, and it's very romantic. But modern battles are never won that way.

I will push back slightly against ‘never’. Comma.ai was pretty much a one-man self-driving solution and that was on part with the big boys for motorway driving. Likewise Palmer Luckey invented modern VR pretty much single handedly. But it’s rare and usually only happens within niches the mainstream hasn't noticed are viable.

OK maybe never is going too far. I'm not saying one-man band can't compete necessarily. In some cases, with the man being particularly awesome, it can happen in a particular place at a particular time. But scaling this to a company of hundreds of people would be absolutely impossible, because one person can not communicate effectively with hundreds, it's just physically not possible. One person or small number of persons can not be the bottleneck. And super-clever solutions would necessarily make them the bottleneck. It's either one-man band (maybe with a limited cast of janitorial staff) or a scalable corporation, but not both. And for some niches, being small is fine, but most businesses want to grow. And, very frequently, those who do grow eat up those who don't.

Agreed.

I'm reminded of a cranky user from /r/learnjavascript who was like "you don't need any library, not a single one, everything can be done in standard JS". From my understanding, this works as long as you're the only one maintaining your own code.

And you have a perfect memory.

as a young cranky old man this is a risk I'm willing to take

Everything you listed except Celery is how I got into tech and make six figures now, lol. I don't know how computers work** since I don't have a CS degree and don't do tech stuff for fun (anymore), but I agree that a lot of people use the tools you listed terribly (especially Terraform and k8s, wtf). But I'm curious what your objections are to the tools you listed. How would you do things differently? Usually when I run into someone who pooh-poohs those tools, they're the sort of person who wants to write their own epic genius 1337 codegolf in-house tool that has zero documentation, is full of idiosyncracies, and will become someone else's pain in the ass when they leave the company in a year. And then it's a part of the workflow that I have to use/work with/work around/slowly start making plans to sneakily deprecate. I dunno, I'm in my mid 30s. Maybe in a few years I'll start to get crusty too.

**by this I mean I have only basic knowledge about DSA, time/space complexity, Linux internals, etc. compared to turbo nerds who spend every weekend contributing to OSS for fun

ETA: One thing that I think is lost on a lot of engineers is the value of legibility. Terraform might suck, but you can explain what it does to some dumb non-technical stakeholder or some mid/low-quality engineer. It has tons of docs, and there are lots of slick videos explaining it on YouTube. HCL sucks, and it reinvents a lot of basic programming concepts but worse (for_each), but it's pretty easy to get started with.

There's also the "nobody ever got fired for buying IBM" factor. As a manager, part of my job is pushing for new/better tooling. If it's something mainstream and there are case studies or tons of threads about it or some Gartner bullshit or whatever, I can budget approved easier. What I pick is almost certainly not the optimal tool/software, but I have to get shit done and I can't let perfect be the enemy of good.

This also comes into play with public cloud (touched on by @ArjinFerman). I've never worked anywhere that has fully optimized cloud spending, there's always tons of waste. But after the corporate card is attached to the AWS account, I can provision servers/containers/clusters when I need to, and I only get yelled at about billing once a year as long as nothing ever gets out of hand. Is it wasteful, inefficient, and dumb? Yes, but that's just a reflection of the wasteful, inefficient, and dumb nature of the vast majority of human organizations. It's not a technical problem.

tl;dr a lot of the devops/infra people know these tools are dumb/inefficient but the alternatives are endless red tape or deadlock.

Usually when I run into someone who pooh-poohs those tools, they're the sort of person who wants to write their own epic genius 1337 codegolf in-house tool that has zero documentation, is full of idiosyncracies, and will become someone else's pain in the ass when they leave the company in a year.

To use a toy example, discussing one aspect: lets say you have an app that needs to be up all of the time. A simple solution is to set up the app on one box and a standby on the next box. If it goes down, you simply respond and assess and confirm yes, the primary is down. Lets start the standby.

People absolutely cannot resist looking at this and saying well why do that when you can have the standby take over automatically. And yes I get it that's a reasonable desire. And yes, you can do that, but that model is so much more complicated and difficult to get right. There are frameworks now that help with this, but the cost of setting up an app now is still an order of magnitude higher if you want this kind of automation.

Unfortunately, the modern distributed computing environment is organized around the idea that everything needs to be as high availability as Google search or YouTube. This is the default way of standing up an app.

Maybe your business has one big bread and butter app that needs this, and by all means go ahead, but businesses also have like 100x as many apps that are just support or bean counting tools that absolutely don't need this that you kind of get pulled into setting up the same way. It becomes so difficult to set up an app that teams lose the vocabulary of even proposing that as a solution to small problems.

Definitely agree. One of the more challenging parts of my job is having to be the guy who who says, "Okay, you want this app to be HA... but why? If you can justify this to me and tie it to some positive business outcome that merits the extra engineering hours spent, we can do this. Otherwise, no." I've only ever worked on understaffed teams and so I've always had to be extremely judicious when allocating engineering effort. Most ICs want to do this kind of thing because it's cool, or "best practice," or they see it as a career builder/educational opportunity. FWIW in 1:1s I ask what their career growth goals are and actively try to match them with work that will help them progress -- so I'm not entirely unsympathetic to their wishes).

It also just seems a lot easier than it really is. There's the whole Aphyr Jepsen series where he puts a bunch of different distributed databases to the test that everyone knows are supposed to be good and correct and they fall apart miserably. Almost every single one. It's bad enough that people don't really understand the CAP theorem's tradeoffs, but the real world systems are even worse because they can't even live up to what they claim to guarantee.

If you really think your application has outgrown the directory tree of .json files or the SQLite instance, show me how you plan to deal with failures and data consistency issues. It's not trivial and if you think it is I'm not going to let you drive.

or they see it as a career builder/educational opportunity

I feel like this is the unstated rationale for using every single cloud provider's API

A simple solution is to set up the app on one box and a standby on the next box. If it goes down, you simply respond and assess and confirm yes, the primary is down. Lets start the standby.

Then the standby goes down, or doesn't start. Your next move? You start debugging shit when people around you run with their hair on fire and scream bloody murder at you, the system is down over 2 kiloseconds and you still didn't fix it yet, are you actually sent from the future to ruin as all?

And note that this will definitely happen at 3am, when you are down with the flu, your internet provider is hit by a freak storm and your dog ate something and vomited over every carpet in your house. That's always how it happens. It never goes the optimistic way. And then you realize it'd be cool if you had some tools that can help you in these situations - even it it means paying a little extra.

a bulk of my experience is in quant trading where every minute we were down cost tens of thousands. we actually did engineer a lot of systems the way I described just because they were so much easier to reason about and took much less effort to stand up and maintain

They are easier to reason about up to a point. Which a typical HFT trading setup will probably never cross, but a lot of other companies frequently do.

yes, and if we reach that point we will introduce the complex multi-master thing

but most things never reached that point

People absolutely cannot resist looking at this and saying well why do that when you can have the standby take over automatically. And yes I get it that's a reasonable desire. And yes, you can do that, but that model is so much more complicated and difficult to get right.

Who's going to get paged awake at 3AM on Saturday to run a shell script to fail over to the standby? I presume there's some services out there where two or three days of downtime is fine but I don't have any experience with them.

In contrast I find it's pretty easy to set up a service with a few replicas and a load balancer with health checking in front of it so that nobody needs to be manually selecting replicas. It's not complicated and with reasonable infrastructure it's a problem you only need to solve once and use it everywhere, in contrast to hand rolling another unreliable service that's going to become somebody's operational burden.

Put another way, being a pager monkey for one unreliable service is already dumb. Doing that for ten services is just ridiculous.

In contrast I find it's pretty easy to set up a service with a few replicas and a load balancer with health checking in front of it so that nobody needs to be manually selecting replicas.

yeah that part's easy. what about if you want to make the database they write to redundant? you have to worry about CAP issues and that makes things much more obnoxious

Yeah but you've presumably already had to solve that problem one way or another because you've (I assume?) already got a service that needs a highly available database. Surely configuring replication for MySQL isn't insurmountable for a leet haxx0r such as yourself.

no? not every system wants the same CAP tradeoffs. not everything benefits from the overhead of being a distributed system. it's not free to make something distributed.

example: github.com has real costs because it's planet scale and multi-master. it takes 250ms+ to process your push because it makes geographic redundancy guarantees that cannot physically be improved on. if you did away with the multi-master setup it would take significantly less time

you have "solved" the distributed system problem here but making every push take that much longer is a big cost. in this case, it just so happens github doesn't care about making every developer wait an extra 250ms per push

to say nothing about how you've also blown up complexity that needs to be reasoned through and maintained

(and yes, it doesn't have to be geographically redundant, I'm simply upping the scale to demonstrate tradeoffs)

I certainly don't call 250ms a big cost here. That is literally so small that I would never notice it.

Mmm, I notice it. if I'm working on solo projects I switch to a git repo on a personal server instead of github just to avoid it

no?

So you've got a system where you can't pay some latency for availability (I'll level with you, 250ms is an ass-pull on your part, even planet scale databases like Spanner that are way overkill for something like this can manage much better latencies, to say nothing of a simple MySQL master/replica situation), but it's totally fine if it goes down and stays down over a weekend?

If we're talking about a system where 24% uptime (aka business hours) is totally cool, yeah I guess you don't need to think about reliability, but again ive never seen a system like this so i don't know if they exist.

If we're talking about a system where uptime actually matters, it's totally unsustainable to page someone awake to do a manual fail over every time the primary shits the bed. That also comes with a cost, and it's probably bigger than a few ms of latency to make sure your database is available. Making stuff run all the time is literally what computers are for.

(I'll level with you, 250ms is an ass-pull on your part, even planet scale databases like Spanner that are way overkill for something like this can manage much better latencies

I can tell you for an absolute fact that plans to use Spanner to back geographically redundant multi-master git repos made latency even worse. But this is a digression.

(and yes, it doesn't have to be geographically redundant, I'm simply upping the scale to demonstrate tradeoffs)

I'm saying the magic distributed database introduces tradeoffs over the single SQLite file, and they vary by project and used github.com as a mundane but easily accessible example.

tl;dr a lot of the devops/infra people know these tools are dumb/inefficient but the alternatives are endless red tape or deadlock.

Oh, that explains a lot. I'd off myself if I had to work for a MegaCorp, so most of my work was for small companies with little to no red tape.

Yeah, it's pretty grim. The only place I didn't have to deal with that kind of thing was at a place where the entire leadership consisted of former software engineers. Otherwise it's a constant battle.

and the Hertzner server costs 1/10th

This in particular has me regularly scratching my head as to how we got here. Surely, I must be missing something if the whole industry decided this is the way to go. But why is it that any time I run the numbers, cloud compute ends up feeling like highway robbery? "Noo, you don't understand, you can set up auto-scaling, so you only pay for what you're using at any given time!" Sir, at that price differential I can rent enough servers to cover peak demand several times over, and still have money to spare relative to your galaxy-brained auto-scaling solution. "Nooo, you have to use reserved instances, they're cheaper!" How is that not just renting a server? How are you still making it several times more expensive given the scale you're operating at?

Am I missing something, or did they play us for absolute fools?

Sir, at that price differential I can rent enough servers to cover peak demand several times over, and still have money to spare relative to your galaxy-brained auto-scaling solution.

I like this mental model.

It comes up in health care too. For example, if I get surgery done at a free market place like the Oklahoma Surgery Center, there's a good chance the TOTAL cost will be less than just my personal out of pocket cost using a standard hospital that accepts insurance.

I also use a wealth insurance based approach to health care and I am regularly surprised that a bit of negotiating and shopping around can bring the cash price of something down to less than it would've been with co-insurance.

At the end of the day it's usually easier for a random organizations to spend money on a cloud bill that keeps getting bigger than it is to spend money on sys admins to set up a cheaper, more DIY solution. Hiring the sys admin takes expertise a lot of orgs don't have, and often the search takes time, and you're kind of at the mercy of sys admins who have ornery and persnickety temperaments (not unlike me!)

If you're a tech company yourself you often have the talent to DIY it, though you may or may not consider this the highest ROI investment.

It's not unlike commercial real estate. You can probably save money by buying an office instead of renting, but it's not like you just write a check and you're done. You need to now bring on a facilities maintenance crew, and concern yourself with additional financing and accounting issues, and maybe also directly work with contractors and the regulatory state. Is it natural for your org to pivot into commercial real estate? Or are your resources better invested in your primary business?

  • reversed stupidity is not intelligence
  • everyone only pushes directly to staging, but some people also have a separate prod environment