site banner

Friday Fun Thread for February 7, 2025

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

The first song I consider to be really bad.

The second has a nice bass line that I would be happy to steal wholesale. Much more impressive.

The third one didn't seem any good, either. Sure, it's a simple beat and chord changes, but that's much less impressive than a melody, or funky base line.

Can it output sheet music? Can it output notation? Part of the problem is the synthesizer isn't that great. Ideally it would output notation is something like Frescobaldi.

Can it write a melody for me? Can I give it a melody and have it write counterpoint? While it's interesting to give it words and get out sounds, I'm more interested in including music (notation) and getting music (notation) back.

Can it output sheet music? Can it output notation? Part of the problem is the synthesizer isn't that great. Ideally it would output notation is something like Frescobaldi.

No, and it fundamentally can't right now. Those models are trained on raw music, not on notation. During creation, the model isn't "composing" like a human would, in the same sense that an image model isn't actually sketching, drawing and painting - the final image is directly condensed from the diffusion process.

But this is clearly the next step in the value chain. Once audio creation models can input and output notation, they will completely change the creative process - in the same way that video models will become valuable once they can input and output an entire 3D scene into/from Blender. But this step is difficult, there is orders of magnitude less training data in all those cases (you need specific sets of music + notation, video + 3D models, ect.

Music is, of course, simpler than 3D in this aspect. You can run AI audio creation through the usual transcription aids or quickly rebuild a beat you like in Abelton by ear/hand.

Do you know how viable it would be for an AI model to be able to "reverse engineer" sheet music from an audio file? Knowing very little about music myself, my intuition is that one could train a model with lots of sheet music-audio file pairs and then feed it the latter to generate the former, but I could easily be missing some hurdle that would prevent this from being viable.

my intuition is that one could train a model with lots of sheet music-audio file pairs and then feed it the latter to generate the former

Yeah, that's the way. Once you run out of training data, you can probably also do self learning by transcribing music without available sheet music, transforming the generated notation into sound through a synthesizer, compare the results (this needs another model) and then try again. Once you run out of music, you can continue with synthetic data (since current models can already make fresh sound files of high enough quality).

The devil is in the details, of course, e.g. current software transcription aids work much better for solo piano than for any other instruments (there not many different ways to modify the sound of a note on a piano). Guitars, on the other hand, are notoriously hard to transcribe. They kind of make up for it by having tabs available for a million songs, so at least there's a lot of training data. But the relationship between tabs and final sound is much less straight forward than for piano.

Text -> Image -> Spritesheet -> 3D Model models are actually already here. They're just pretty bad at giving you usable topology, but you'll probably begin seeing AI generated assets in production games inside a few months. Not big or moving stuff, but static medium poly assets like crates or plants.

There's a few on huggingface, and an integration called BlenderGPT they're working on.

Different strokes I guess. I'm also not primarily evaluating if the exhibited technical/music theory prowess of the songs in question are particularly impressive - most music isn't particularly rich in complex composition, and mediocre music is inevitably going to represent a large part of Udio's dataset. I consider all of the linked songs to be about on par with a lot of the music that gets released. Instead, I’m evaluating on the basis of “could this be a song that I’d hear out in the wild?”

I'm more interested in including music (notation) and getting music (notation) back.

Ideally, that'd be the goal of a machine learning-driven plugin. Unfortunately I'm not aware of any notation-producing ones worth their salt yet, but I do know that there are a number of very competent plugins which have focused on the generation of sound design.