site banner

Friday Fun Thread for September 30, 2022

Be advised; this thread is not for serious in depth discussion of weighty topics, this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

4
Jump in the discussion.

No email address required.

As far as I know, each of these datasets has been curated by one person, to their respective tastes. Hasuwoof for Yiffy, DirtyApples for Furry, and Zach for Zach3d. e621 is well-enough tagged for high-score posts that it seems fairly automatable, and as long as you're not abusing the download process, it's hard to tell a normal user from an archiver, especially if you filter before download. And the code itself is... not fun, since it's poorly documented python in most parts, but it's nothing ridiculous.

((There's a My Little Pony-specific one that's supposed to have been released recently, but I know less about that.))

There's been some discussion of setting up teams for difficult heavy lifting (eg, improving tagging, building and parsing datasets with more eyes-on-curation), but the big issue for now are cost and technical accessibility. The core model is expensive because it took literally millions of steps in a large dataset, but further tuning is relatively cheap, with most epochs taking less than a day on a single (beefy) cloud GPU server. But getting the data together and onto that machine rapidly enough can be complex to do right, and easy to end up with a staggering AWS bill if done wrong.

That'll be less an issue if newer GPU generations continue to bulk up on VRAM; if done at home, it's mostly an energy (and/or cooling) bill thing. And that might be coming as soon as this winter for people willing to splurge on the higher-VRAM versions of the 4090.