site banner

Friday Fun Thread for February 7, 2025

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

1
Jump in the discussion.

No email address required.

I have continued to play around with Udio music generation recently, and the stuff it spits out is... disconcertingly high-quality. Its auto-generated lyrics continue to be truly awful, though, so what I do to circumvent having to rely on Udio lyrics or write my own is to ask it to generate a song in Japanese, a language where the insipidness of the lyrics will be lost on me. It absolutely nails the musical aspects of its generations, though; I often like the musical content far more than I do many actual songs.

There's still some artifacting in its generations, but on the compositional level alone it's begun to ape human-created music so well that I think it clearly passes the musical turing test. If it managed to improve the fidelity of its generations and had a lyric generator that wasn't so trash I can see this being a dangerously addictive superstimulus for me. It's easy to fall into states wherein none of the media out there seems to appeal to you, and with generative machine learning the solution to that kind of malaise becomes "just prompt until the prediction machine spits out something you like". It feels utterly solipsistic and also very tempting at the same time.

Here are a number of funk/jazz fusion generations I got over the past couple of days:

https://voca.ro/1kaIduRHpYT1

https://voca.ro/15UbDIfpljpH

https://voca.ro/162be1cbaoaT

Is it just me or are these generations, musically speaking, genuinely really decent? They're still slightly genericised, but no more so than most music out there, and I was not expecting its generation ability to get this good this fast. Despite the fact that I am not an amateur when it comes to music, I like these, and this is something that makes me think that perhaps my music taste has been irreversibly broken.

The first song I consider to be really bad.

The second has a nice bass line that I would be happy to steal wholesale. Much more impressive.

The third one didn't seem any good, either. Sure, it's a simple beat and chord changes, but that's much less impressive than a melody, or funky base line.

Can it output sheet music? Can it output notation? Part of the problem is the synthesizer isn't that great. Ideally it would output notation is something like Frescobaldi.

Can it write a melody for me? Can I give it a melody and have it write counterpoint? While it's interesting to give it words and get out sounds, I'm more interested in including music (notation) and getting music (notation) back.

Can it output sheet music? Can it output notation? Part of the problem is the synthesizer isn't that great. Ideally it would output notation is something like Frescobaldi.

No, and it fundamentally can't right now. Those models are trained on raw music, not on notation. During creation, the model isn't "composing" like a human would, in the same sense that an image model isn't actually sketching, drawing and painting - the final image is directly condensed from the diffusion process.

But this is clearly the next step in the value chain. Once audio creation models can input and output notation, they will completely change the creative process - in the same way that video models will become valuable once they can input and output an entire 3D scene into/from Blender. But this step is difficult, there is orders of magnitude less training data in all those cases (you need specific sets of music + notation, video + 3D models, ect.

Music is, of course, simpler than 3D in this aspect. You can run AI audio creation through the usual transcription aids or quickly rebuild a beat you like in Abelton by ear/hand.

Do you know how viable it would be for an AI model to be able to "reverse engineer" sheet music from an audio file? Knowing very little about music myself, my intuition is that one could train a model with lots of sheet music-audio file pairs and then feed it the latter to generate the former, but I could easily be missing some hurdle that would prevent this from being viable.

my intuition is that one could train a model with lots of sheet music-audio file pairs and then feed it the latter to generate the former

Yeah, that's the way. Once you run out of training data, you can probably also do self learning by transcribing music without available sheet music, transforming the generated notation into sound through a synthesizer, compare the results (this needs another model) and then try again. Once you run out of music, you can continue with synthetic data (since current models can already make fresh sound files of high enough quality).

The devil is in the details, of course, e.g. current software transcription aids work much better for solo piano than for any other instruments (there not many different ways to modify the sound of a note on a piano). Guitars, on the other hand, are notoriously hard to transcribe. They kind of make up for it by having tabs available for a million songs, so at least there's a lot of training data. But the relationship between tabs and final sound is much less straight forward than for piano.

Text -> Image -> Spritesheet -> 3D Model models are actually already here. They're just pretty bad at giving you usable topology, but you'll probably begin seeing AI generated assets in production games inside a few months. Not big or moving stuff, but static medium poly assets like crates or plants.

There's a few on huggingface, and an integration called BlenderGPT they're working on.

Different strokes I guess. I'm also not primarily evaluating if the exhibited technical/music theory prowess of the songs in question are particularly impressive - most music isn't particularly rich in complex composition, and mediocre music is inevitably going to represent a large part of Udio's dataset. I consider all of the linked songs to be about on par with a lot of the music that gets released. Instead, I’m evaluating on the basis of “could this be a song that I’d hear out in the wild?”

I'm more interested in including music (notation) and getting music (notation) back.

Ideally, that'd be the goal of a machine learning-driven plugin. Unfortunately I'm not aware of any notation-producing ones worth their salt yet, but I do know that there are a number of very competent plugins which have focused on the generation of sound design.

How long is it going to be before video games start including fully dynamic soundtracks? Divinity Original Sin 2 did it a little bit (swapping out instruments based on the character focus), but there's room for a lot more than that, particularly if it's in realtime.

(See also Vaudeville for AI-based dialogue)

Also, how long before they get past one terabyte of data?

I think we're at like 300 gb on the very largest games? I doubt we'll hit 1 tb in at least the next 2 or 3 years, and we'll likely see game sizes drop or stabilize as neural textures or even full-on neural rendering takes off.

Didn't https://en.wikipedia.org/wiki/Black_(video_game) have some interesting ideas about dynamic soundscape?

As for 1TB games - those won't be for me and my stone age machine.