Diffusion models work for text too. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10909201/
The blending of concepts that we see in MidJourney is probably less to do with the diffusion per se as with CLIP - a building block within diffusion. CLIP aligns a language model with an image model. Moving concepts between different representations helps with concept generation. There's a lot being done with 'MultiModal models' to make the integration between different modalities work better.
'Self play' is relevant for text generation. There is a substantial cottage industry in using LLMs to evaluate the output of LLMs and learn from the feedback. It can be easier to evaluate whether text 'is good' than it is to generate good text. So multiple attempts and variations can lead to feedback and improvement. Mostly self play to improve LLMs is done at the level of optimising prompts. However the outputs improved by that method can be used as training examples, and so can be used to update the underlying weights.
https://topologychat.com is a commercial example of using LLMs in a way inspired by chess programming (Leela, Stockfish). It does a form of self play on inputs that have been given to it, building up and prioritising different lines. It then uses these results to update weights in a mixture of experts model.
Here's the quote from Geoffrey Hinton:
"why is a compost heap like an atom bomb? And GPT-4 says, well, the timescales and the energy scales are very different. That’s the first thing but the second thing is the idea of a chain reaction.
So in an atom bomb, the more neutrons around it, the more it produces, and in a compost heap, the hotter it gets, the faster it produces heat and GPT-4 understands that. My belief is when I first asked it that question, that wasn’t anywhere on the web. I searched, but it wasn’t anywhere on the web that I could find. It’s very good at seeing analogies because it has these features. What’s more, it knows thousands of times more than we do. So it’s gonna be able to see analogies between things in different fields that no one person had ever known before.
That may be this sort of 20 different phenomena in 20 different fields that all have something in common. GPT-4 will be able to see that and we won’t. It’s gonna be the same in medicine. If you have a family doctor who’s seen a hundred million patients, they’re gonna start noticing things that a normal family doctor won’t notice."
From transcript at https://medium.com/@jalilnkh/geoffrey-hinton-will-digital-intelligence-replace-biological-intelligence-fc23feb83cfb of the video.
they aren't good at synthesizing information from two fields in ways that haven't been done before
They are not good at that yet. But there are already indicators that they could become so.
- Midjourney will blend concepts from different places. Not just style transfer. A nice example I saw was an image of 'boy with a hedgehog'. The boy was holding a hedgehog, but also his hair was a bit spiky, like the hedgehogs. I think it is most unlikely MidJourney had ever seen a composition/juxtaposition of that kind before, and both the hedgehog and the hair were modified to make the composition work.
- DeepMind have AlphaZero, which plays chess, shogi and go. It plays better than human, i.e. not just based on play it has seen before, and one can argue it is crossing between different genres, not confined to one field.
- The often cited example of finding an analogy between compost heap and nuclear fission, again an example of crossing field boundaries.
So to say that machine learning can't synthesise information from two fields in ways that have not been done before needs more qualification, to be defensible.
- Prev
- Next
Thanks, yes, I made a mistake. My first post on theMotte.
More options
Context Copy link