This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.
Jump in the discussion.
No email address required.
Notes -
How beefy? I thought I was looking into trying to run alphafold or some of the other structural bio models a year or two ago and we were talking like 20k. Is it easier to run inference on the image generation models or was I just stupid?
Both local FLUX models require 24 GB of VRAM uncompressed. You can buy a used 3090 with that much for $750 or a new one for $1,200. Or just rent time from an online GPU company like RunPod. And that's before you start getting into the quantized models; FLUX is really affordable.
Honestly I’m amazed that the 4070, being a 3090 with half the RAM, sold as well as it has. Though I stop being amazed once I talk to people that by all rights should know better.
It’s worth noting that even with a 3090 it’s still running in low-RAM mode anyway since the encoder has to fit in there too; but you can get it running better if you care to optimize it (or just use the 3090 as a secondary GPU) or just use the 8-bit quantization. It takes around 40 seconds per image at 1024px square.
You're still looking at ~850 USD for a 3090 new today today, compared to around 600 USD for an nVidia 4070, plus the increased size and power/heat. Not a great deal if you need the VRAM, but it's not clear you do need it yet. I'd err in favor of futureproofing, but if you've got One Game You're Gonna Play for seriously for the next couple years, I could see the argument.
Though the various suffixes are more clearly dumb. 4070 TI Super
Omega Super Saiyin Blueis at the same price point, nearly the same size, and similar power profiles, so looks like you're trading slightly better DRSS for less VRAM?I do beg to differ on the VRAM, specifically because other GPUs in the same price class all carry 24GB (and any next-gen console is going to have unified memory or at 16GB of VRAM, I suspect), and after that it’s just the wanting it now, not after optimization. And while I do get that you’re still not fitting a 70B LLM in 24GB, it’s still going to have an impact as far as needing to swap with main memory goes, or at least that’s my impression.
It’s also worth noting that a 3090 is no less capable in AI tasks than a 4090, 5090, or 6090 will be, simply because a card with more VRAM than 24GB won’t be released for a long, long time due to market segmentation.
The market seems to have caught onto that fact, unfortunately.
What does this mean?
They want to be able to charge enterprises and server operators massively more than ordinary consumers, even for very similar silicon, because they can afford it, and the product is more valuable to enterprises doing work on GPUs than to consumers playing Hogwarts Legacy or dicking around with open source AI models.
The goal is to charge each customer as close to their maximum price as possible. That's much higher for enterprises, so the graphics companies load features that are especially valuable to them on specialty cards that cost much, much more than consumer cards without them. Much of this strategy has collapsed lately, with features like GPU vm passthrough coming to consumer cards, so they've reoriented this strategy to promoting using clusters of cards to get max VRAM for enterprises, which consumer platforms cannot accommodate.
CPU manufacturers do it too, which is why it's very hard to use ECC RAM on a computer unless you shell out for a workstation/server grade platform.
Somewhere in here is an idea to prompt AAA game studios to develop games that require huge amounts of VRAM so that GPU manufacturers are elbowed into offering consumer cards that can do this. But that will take time, and for all I know looks like some sort of time-persistent AI shading model ("game rendered in the style of Van Gogh").
It has to be in a console first; developers don't just lead with PC-first, console-second, because of the sales figures (and nVidia still has some levers to pull, since they can always say "you can buy our chips under cost if you agree not to put more than X GB of unified RAM on your console" and defend their margins that way). And texture memory ultimately has diminishing returns past 1080p; even the biggest texture packs for modern games aren't using even half of the 24GB the largest cards have.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
It is in nVidia’s interest that cheap GPUs don’t get very much VRAM, because the limiting factor for how smart and performant an LLM/image generation model can be is how fast it can access the matrices. If you could get 80GB of VRAM, which is eminently reasonable, on a card for 2000 dollars then no business would buy the overpriced purpose-built cards.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
Most image generators are a good deal more manageable : FLUX.1 dev is 24GB, but people have got it running on mobile GPUs with less than 4GB VRAM. It's slower -- a couple minutes per generation, as opposed to the <20 sec for running on a nvidia 3060 or equivalent -- but it's usable.
A good part of that is just that image generation models quantize better than LLMs, without become as 'dumb', so you can run down to fp8 (8-bit) with relatively little loss of information, and nf4 is good enough for a lot of uses even if notably different. But imagegen has also had a lot more software work done to do partial staging and some CPU offloading, in the casual sphere.
More options
Context Copy link
I don't know about Grok's image gen specifically, but having used Stable Diffusion for almost 2 years now, I can tell you that the cutting edge image generation stuff can be run reasonably fast (about 30-60s per a batch of 4 512x512 images) on a 5 year old gaming PC with an Nvidia 1070 that I was already using as my home computer, without any upgrades. I did upgrade to a more modern gaming PC with a 4090 last year, which can do a batch of 4 512x512 images in a few seconds. The entire new PC I bought, primarily for gaming, was around $4,500, with the largest chunk of that coming from the GPU, which you could probably cheapen out on with a 4080 or a 3090 and get plenty good performance.
I’ve been using Stable Diffusion on a 5 year old second hand laptop where the gpu was basically a ”well, might as well get it since the extra cost is just 50e” type of thing. Combine that with preconfigured uncensored cloud rental services and unrestricted image generation is ridiculously affordable if you care at all.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link