site banner

Friday Fun Thread for February 7, 2025

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

Can it output sheet music? Can it output notation? Part of the problem is the synthesizer isn't that great. Ideally it would output notation is something like Frescobaldi.

No, and it fundamentally can't right now. Those models are trained on raw music, not on notation. During creation, the model isn't "composing" like a human would, in the same sense that an image model isn't actually sketching, drawing and painting - the final image is directly condensed from the diffusion process.

But this is clearly the next step in the value chain. Once audio creation models can input and output notation, they will completely change the creative process - in the same way that video models will become valuable once they can input and output an entire 3D scene into/from Blender. But this step is difficult, there is orders of magnitude less training data in all those cases (you need specific sets of music + notation, video + 3D models, ect.

Music is, of course, simpler than 3D in this aspect. You can run AI audio creation through the usual transcription aids or quickly rebuild a beat you like in Abelton by ear/hand.

Do you know how viable it would be for an AI model to be able to "reverse engineer" sheet music from an audio file? Knowing very little about music myself, my intuition is that one could train a model with lots of sheet music-audio file pairs and then feed it the latter to generate the former, but I could easily be missing some hurdle that would prevent this from being viable.

my intuition is that one could train a model with lots of sheet music-audio file pairs and then feed it the latter to generate the former

Yeah, that's the way. Once you run out of training data, you can probably also do self learning by transcribing music without available sheet music, transforming the generated notation into sound through a synthesizer, compare the results (this needs another model) and then try again. Once you run out of music, you can continue with synthetic data (since current models can already make fresh sound files of high enough quality).

The devil is in the details, of course, e.g. current software transcription aids work much better for solo piano than for any other instruments (there not many different ways to modify the sound of a note on a piano). Guitars, on the other hand, are notoriously hard to transcribe. They kind of make up for it by having tabs available for a million songs, so at least there's a lot of training data. But the relationship between tabs and final sound is much less straight forward than for piano.

Text -> Image -> Spritesheet -> 3D Model models are actually already here. They're just pretty bad at giving you usable topology, but you'll probably begin seeing AI generated assets in production games inside a few months. Not big or moving stuff, but static medium poly assets like crates or plants.

There's a few on huggingface, and an integration called BlenderGPT they're working on.