site banner

Friday Fun Thread for January 10, 2025

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

Question for the software engineers:

Is there anything uniquely innovative or difficult to reproduce about the software/codebase for any of the big social media platforms (particularly Twitter/Facebook/Instagram/Reddit/TikTok/Youtube) or is their hold on the market mostly a result of network effects and their established large user bases?

Edit: Having clarified my thoughts after early responses, I think the core of what I want to understand is this: I know that there a many very intelligent people being paid handsomely as software engineers for these sites. Given the apparent simplicity and minimal improvement in the basic functions (from a user perspective) of many of these sites, what is it that these engineers are actually being paid to work on? Aside from server reliability, what other things do they need all these bigbrains for?

Aside from server reliability, what other things do they need all these bigbrains for?

I think asking the question with the word "need" is likely to lead to confusion. Instead, note that as long as the marginal benefit of adding one more developer is larger than the amount it costs to do so, they will keep on hiring, and so the key is to look at what those marginal developers are doing.

Large organizations have non-obvious advantages of scale. This can combine with the advantages of scale that companies have to produce surprising results.

Let's say you have a company with a billion users and a revenue model with net revenue of $0.25 / user / year, and only 50 employees (like a spherical-cow version of WhatsApp in 2015). Let's further say that it costs $250,000 / year to hire someone.

The questions that you will be asking include

  • Can I increase the number of users on the platform?
  • Can I increase the net revenue per user?
  • Can I do creative stuff with cashflow?
  • And, for all of these, you might consider hiring a person to do the thing.

At a billion $0.25 / year users, and let's say $250k / year to hire a person, that person would only have to do one of

  • Increase the size of the userbase by 0.1%
  • Increase retention by an amount with the same effect (e.g. if users typically use the platform for 3 years before dropping off, increase that to 3 years and 1 day)
  • Or ever-so-slightly decrease CAC
  • Increase expected annual net revenue per user by $0.00025
  • If the amount you make is flat across all users, double annual net revenue per user specifically for the specific subgroup "users in Los Angeles County", while not doing anything anywhere else
  • If the amount you make per user is pareto distributed at 80/20, figure out if there's anything you can build specifically for the hundred highest-revenue users that will cause them to spend 10% more money / generate 10% more revenue for the company (if the distribution is more skewed than 80/20, you may end up with an entire team dedicated to each of your few highest-revenue customers - I would not be surprised if google had a team dedicated specifically to ensuring that Mr Beast stays happy and profitable on YT).
  • Figure out how to get the revenue at the beginning of the week instead of the end of the week
  • Increase the effectiveness of your existing employees by some tiny amount

Realistically you will instead try to do 100+x each of these with teams of 100+ people, and keep hiring as long as those teams keep wanting more people. But those are the sorts of places to look.

Making a UI clone for Twitter should be not hard. Same for reddit, though moderation and customization functions may require some more work. Making full clone - with whatever ads, analytics, system functions, metrics, etc. exist and not visible to the public may be more complicated. Making it work reliably at scale Twitter works at may be a serious project for a serious qualified team, though it's definitely nothing impossible, just needs investment. Reddit I'd say the same with more investment since there are more options, but bare bones clones of both, especially if they don't need ads/analytics and billion scale, would probably not be too hard.

Facebook is a bit tougher due to a myriad of privacy settings and modes which may require some non-trivial approaches to data retrieval, and then there's whatever filthy black magic that underlies their feed algorithm... Plus it has streaming video, which is its own big can of worms, with which I personally never worked but heard it has a lot of dark magic into it too. Instagram/Youtube are also based on that, so the same applies there.

is their hold on the market mostly a result of network effects and their established large user bases?

On the market - definitely. Even if they had some super awesome technologies, it would likely be possible to reproduce the same results maybe with slightly higher costs and slightly less awesome performance, and for social media network effects beat technology any time of the day. Don't get me wrong - you need a lot of technology to run code at the scale of Facebook, and a lot of it goes not only to the site itself but to support the organization that supports it and makes money from it, but code superiority has nothing to do with their success. In fact, I have seen very successful projects (not ones you named, but also famous names) where the code and the technology behind it are very subpar, but as long as it works and brings in the sweet dollars...

Given the apparent simplicity and minimal improvement in the basic functions (from a user perspective) of many of these sites,

You can make a toy twitter in a weekend. Taking it from a toy to billion-users business with billions of revenue is the hard part.

Aside from server reliability, what other things do they need all these bigbrains for?

  • Maintenance - finding and fixing bugs (there are always bugs)
  • Performance improvements - in every big and old software, there's probably something old and slow and tons of money can be saved by making it faster
  • New features - you may not see them, but somebody else does
  • Revenue - for all those sites that means mostly serving ads, counting ads, selling ads, analyzing data from ads and so on
  • Business analytics - not the same as the above, the ad buyers get the above, this one is for the business itself
  • Internal tools - any large project has build systems, docs systems, test systems, etc. and somebody has to work on those
  • Moderation tools - for pretty much every site that allows user comments, if you don't want for the FBI to visit you, you need moderation tools
  • Catching up with new technologies - there's always new browser, new network protocol, new API, new login method, new security feature, new OS, new mo bile platform, etc. that needs to be supported

Probably a couple of dozens of other things I forgot to mention.

To put it in perspective, I was able to put together a decent Yelp clone within 8 hours. I had a webserver framework and Adderall but that'll give you an idea of what a ~3x engineer can do. With AI that's probably gotten better.

"Describe how you would implement a Twitter clone" is a fairly standard and easy interview question a senior software engineer should be able to answer to a reasaonable amount of detail. (The same question about Ticketmaster is significantly harder which would surprise most people outside the industry.)

Ticketmaster is way harder of course due to heterogeneity of the underlying data.

Network effects, mostly.

The last novel model was TikTok's: short vertical videos fed to you by an algorithm forever. Everyone replicated it as quickly as possible, but couldn't defeat TikTok.

Twitter is another good example: no amount of money spent by FAANGs helped them build a viable clone. Bluesky is thriving purely on network effects.

The engineers are paid to lower operating costs and improve engagement with the ads. Social networks are some of the biggest datasets in the world, and people expect them to work for free and 24x7: every dollar you spend on running them is coming out of your ad revenue. At this scale it makes sense to do things like develop your own compression algorithm for data and get the major browsers to support it to lower your traffic costs by 1%. Or to hire the author of the programming language your software is written in and to give him a team to improve its performance.

Hardest part to replicate is probably the server reliability because that takes lots of intricate work and the AI-driven systems (mostly recommendation / advertising) because you need data.

This matters different amounts for different companies. But I would say that network effects are a far bigger hurdle; the above is just sauce.

So in comparing, say, this site to Reddit, there's probably some complex code for managing the orders of magnitude greater traffic that themotte just doesn't worry about? Or are you mainly referring to baseline server reliability?

@lagrangian covered some of it: the fault tolerance you need as your system scales up. At that scale, freak incidents happen everyday. I still remember the chaos in my office when Google services dropped for a few hours.

Consider also the kind of bugs you start to get when you have users worldwide, all expecting to use their own language and writing system, and expecting UI and help to be available in that language.

Then moderation. If you’re building an up and coming social media, sooner or later someone is going to livestream a beheading or use it to send plausible death threats, and you’re going to be forced to deal with that.

Of course, most startups fail, so these are problems you want to have. But still problems.

Then moderation.

"A webform I can paste content into for others to see? Guess I'll programmatically post enormous amounts of child pornography into it."

Scott had a point about witches overrunning communities. He was right. The devil and his followers notice your website and have endless amounts of suffering children to show everyone.

So in comparing, say, this site to Reddit, there's probably some complex code for managing the orders of magnitude greater traffic that themotte just doesn't worry about?

Right. Zorba pays for the site out of pocket, but that is not scalable. The site occasionally goes down - we even lost most of a day of posts not too long ago. That's no big deal at our scale - just ssh in, figure out the bug, deploy something manually, etc.

But at e.g. Google scale, it's $500k/minute of gross revenue on the line in an outage, to say nothing of customers leaving you permanently over the headache. Fractions of a percent of optimization are worth big bucks. Compliance headaches are real. Hardware failures are a continual certainty.

Read about the brilliance behind Spanner, the database a huge amount of Google is built on: their own internet cables, because why choose between C[onsistency] and A[vailability] when you can just not have network P[artitions]?

You need an incredible degree of fault tolerance in large systems. If n pieces in a pipeline each have a probability p of working, the system has p^n - exponential decay.

Plenty of it is feature bloat, that said. You really can serve an astonishing amount of traffic from a single cheap server.

I don't have a good sense of scale—how much would you expect running this site costs per month?

Off the top of my head, $50.

  • Fermi estimate:
    Multiply the following:

    • 207 "report" ctrl+f matches = comments
    • 5 lines/comment
    • 20 words/line
    • 5 characters/word
    • 4 bytes/character
    • = 414kb
  • Comparing to dev tools, which shows 5.3MB, a factor of 10 I can't account for.

    • I'm a backend dev...what's an order of magnitude between friends.
    • Using that figure and 24k thread views on the culture war thread so far this week:
      • = 127gb/week
      • = 210 kilobytes/second
  • Let's assume we want to serve a peak traffic of 10x average and don't care enough to set something that automatically scales up and down:

    • = 2.1 megabytes/second
  • This is... jack shit.

    • A 3.5" floppy disk can do 100 kbps, filling the entire 1.44mb in 14.4 seconds.
    • I think it could probably be served off a Raspberry Pi.
    • If the vyvanse hadn't worn off, I could probably calculate how many threads/Ghz/etc are needed, but I'm pretty comfortable saying "one of any shitty processor can handle this load"
  • Google cloud charges for egress:

    • Checks notes:
      • $0 for up to 200GB/month, then $0.11/GB up to 2TB.
    • So (127 * 4 - 200) * 0.11 = $34.
    • I am not actually sure if serving traffic is "egress". Best guess: no.
    • (This is why startups shouldn't hire FAANG engineers.)
  • Worst case:

    • Something like $34 for egress and $20 for the VM itself.
    • Pretty close to my first number!
  • @ZorbaTHut, how'd I do? And would you be willing to share what % of costs you've had donated, vs paid out of pocket? You really shouldn't have to be paying yourself.

    • Edit: the patreon is at $140/month, so it looks like this site may be slightly profitable (ignoring the enormous value of the free labor). Nice!

That was all without chatgpt, but here's a transcript from my talking to it afterwards. I think it looks reasonable until maybe the end when I ask about vpn costs. Still comes out to ~$50. It did a decent job analyzing the amount of cpu used (which I skipped in the "jack shit" section).

People upload images to the Motte, could that account for the difference between the 400kb and 5mb?

I think some variety of "I'm an idiot" is more likely. I don't see any images. If they're included as hyperlinks, they're not loaded until you click (I think, and ~confirmed by dev tools)

Attached a screenshot of the resource usage breakdown. Largest element is 402kb for the banner, compared to 215kb (fifth place) for what I think is the actual comments (compare to my 414kb estimate - not bad).

Some of the overestimate is from my extensions. Filtering those out, I see 2.0MB (2.3 uncompressed). 1.15 of that is fonts (unclear to me why that needs reloading each time - presumably this could be optimized out.)

/images/17366984362281015.webp

Probably something on the order of $20-40 a month. Depends how fancy it is set up, and how much traffic we get.