site banner

Friday Fun Thread for February 17, 2023

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

4
Jump in the discussion.

No email address required.

One theory I've seen bandied about is that the recording industry has a long and storied history of being very trigger-happy with lawsuits about their intellectual property, which led to AI devs being a bit more hesitant about training models on professionally published music. This is in contrast with Stable Diffusion, ChatGPT, and GitHub Copilot, which were all trained on publicly published but copyright protected images/text/code as well as public domain works. I don't know how much of an impact this actually had, but I imagine it's something at least on the back of the minds of devs. That said, there's no shortage of public domain music out there, and I wouldn't expect an AI trained only on classical music to be particularly bad - just limited.

Also, perhaps data size is an issue. A typical 3 minute song, even compressed, is far bigger in filesize than a typical image or text file. Sheet music could be used, though I'm also not sure how easy and cheap it is to get huge databases of sheet music. Scraping the internet for songs is also likely a bit more complicated than doing so for images and text, since generally music tends to be streamed by services which try intentionally to make it annoying to download the actual files, versus images and text which are trivial to right click + save.

There was, a couple months ago, some Stable Diffusion enthusiasts who developed what they called Riffusion which made use of the Stable Diffusion architecture to generate music. It involved training a model on image representations of sound waves, making the model generate new images of them and converting them back to sound. They got surprisingly decent results but the state of the tech then didn't seem like it could be used for much more than a toy, due in part to how short each output was. There are obviously workarounds to such limitations, but I don't know how far the development has come since then on actually bringing those workarounds to reality. Given that using Stable Diffusion to generate music is a hack, I'm not sure it'd be particularly worth it for devs to keep following that thread, but it's still a really clever and fun application of the tech.