So it turns out that the triple-parenthesis thing can get you banned from reddit even for benign nonsense. Some context: in some open source AI image models, you can use parentheses to emphasize terms that you want the model to pay more attention to. In this case, the author wrote "(((detailed face)))" and some other terms in their image prompt.
- 23
- 11
Jump in the discussion.
No email address required.
Notes -
With a warning for (furry) NSFW: for those interested in the content of the removed comment (or, like me, appreciate the generated image):
how's AI going for your corner of the internet? You've made a couple SOTA posts that seemed pretty informative, and I think last I heard the generators seemed like a neat toy, but were pretty limited. Is that still pretty much the state?
I've not been able to keep up much in the last three weeks due to work, but it's kinda been a mixed bag. They're still very much in the toy realm, rather than central utility or even tool.
There are some people have been fighting some of the worst limitations to the system. The original release for StableDiffusion was limited to prompts of 75 tokens (kinda syllables) or less, partly due to the tokenizer; there's been some development toward better or more varied approaches, although none that I believe are complete yet, and other complete approaches like applying multiple prompts to a single image. Aesthetic gradients are a more meaningful way of embedding the meaning of styles than simple throwing 'greg rutkowski' at the end of everything. So on.
There are still some pretty harsh limits to what you can do. There's been a lot of different efforts toward solving resolution limits, and they've not been able to make a ton of progress. Complex, specific scenes remain extremely difficult to get with even heavy curation, especially when multiple characters with specific (and especially contradictory) traits are involved. Some of this is probably specific to StableDiffusion's implementation, given the better results ScottAlexander got from Google's closed-shop equivalent, but it's at least a hard step to get there.
People have had surprising luck training specialized models or vectors for models for some complex concepts and combinations, such as Charr (horned and four-eared cat-people with complex gender presentation,
400 images) or Nargacuga (a sort of bat-wyvern-panther thing from Monster Hunter,90 images), albeit at the cost of very strong overtuning. I was worried about classes with small numbers of referents, and to some extent I'm thinking that nargacuga examples is probably under the lower bound -- it's not replicating its training data, but it presses toward a handful of common poses -- but it's a much lower requirement than I'd have ever expected, and there's a lot of interesting ramifications that might have outside of this sphere.Other concepts, sometimes even trivial ones with tons of referents, can be surprisingly hard to consistently get working 'right'. It's somewhat understandable why ; it less clear why models do better with flareon and jolteon than umbreon and vaporeon.
There's been some mild interaction with 'traditional' digital artists looking to use Diffusers as part of their work, but it seems to be much rarer than I'd expect. Most of the major sites have or are in the process of banned AI-generated art, for unreasonable and reasonable causes, so not sure how much of this is just fear of getting caught up in that, not having any experience with the stuff, or not finding the toolset intuitive enough or useful at all.
Upstream models have gotten better. It took a lot longer than originally claimed for SD-1.5 checkpoints to get to the public, but they do seem to be improving areas of weakness like eyes, hands, so on, and given how SD-1.5 was trained this probably points to filtering out bad data and using more GPU power rather than some deep revelation. I don't think people are gonna retrain the finetuned models in too much of a rush given other spaces for improvement, but it suggests that at least some of the weaknesses in earlier StableDiffusion results aren't unavoidable limits to the model.
The upstream politics are... not looking great, and I've not missed that the few commercial successes (such as NovelAI) are using lower-than-CCBill-tier card processing. I'd be surprised if we ever hear what's going on behind closed doors, here, but.. that's part of the problem, and what we have seen publicly is discouraging. The Furry Diffusion discord's largely responded by enforcing a strong no-politics no-controversy rule, but there's a lot of places where the whole ecosystem could get shoved into a sack and dropped in a river.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link