site banner

Tinker Tuesday for November 19, 2024

This thread is for anyone working on personal projects to share their progress, and hold themselves somewhat accountable to a group of peers.

Post your project, your progress from last week, and what you hope to accomplish this week.

If you want to be pinged with a reminder asking about your project, let me know, and I'll harass you each week until you cancel the service

1
Jump in the discussion.

No email address required.

Boid simulation in Redot

Googling around for an answer on whether it's possible to set primitive-type uniforms with Redot's shader API (which is how the C++ project was handling it's simulation), I ended up running into this repository. On one hand, that's a bit of a shame, not much more for me to do - on the other hand, with my relative lack of experience in Go/Re-dot, I'd be wasting a lot of time figuring out minutia, so this saved me a lot pain, and I still managed to learn a bunch of things:

Redot has "particle shaders" which grant you full control over individual particles in a system. It's what I was going to use originally, but when I was experimenting with them I noticed that there's no way to have an infinite lifetime on particles, and that the lifetime is inseparable from the emission rate, so setting a long one would just give me a slow drip of new "boids" showing up. I was quite prepared to say screw it, and just draw my own particles on a texture, with a compute shader, but I noticed that this project is actually using the particle system + shader combo. To my surprise, the particle lifetime was set to 1 second... and then it hit me - as long as the simulation is done in a separate compute shader, and the particle one is just their for setting up the position and orientation, it doesn't matter that a particle "dies" and another is created, the user won't even see that it happened.

It turns out that primitve-type uniforms - the original issue that caused me to find the project - aren't available in Redot, but you can have named properties in a buffer (like so), and that's a pretty elegant solution, imo.

Originally, starting the simulation up was a disappointment. The Readme, as well as the accompanying videos talk about simulating 100K boids. My GPU could handle 32K at 60+FPS, or maybe 50K at 30FPS. By the last week's goal of "go big, or go home", where "big" was supposed to mean millions, it felt like it was time to pack it up, and never speak of the idea again. Kind of devastating, because the big idea that made me revisit that idea was to optimize the simulation by sorting the particles / boids into a grid. This is already done, both the C++ project from last week, or this Godot version that I found go into detail about it. The shader code has a flag for turning off the spatial sorting, and indeed without it the frame rate drops by half. A part of me thought "fair enough", My hardware is pretty old, and those are roughly the numbers I ran into the first time I took a stab at it, but I felt like the sorting should be giving a much bigger boost.

So then I start playing around with the parameters. Since the boids are sorted on a grid, they only have to look up other boids from neighboring grid cells. Smaller grid, even less look-ups. That did speed things up quite a bit, and I could simulate 131K at 20FPS. Somehow this gave me a strong feeling that I'm not at the hardware limit, and there must be some issue in the code.

It's been a while since I took that GPU programming course, but roughly from what I remember: At some point it got a lot easier to squeeze in more cores into hardware than to squeeze more speed out of a core, and with GPU's in particular it turns out you can squeeze a lot more processing units in, if they are all performing the same instruction at the same time. If you want to run a linear algorithm on a GPU, it will blaze through millions of data points like it was nothing. Even when you go quadratic, things still run pretty smooth, this is why it wasn't necessarily surprising that the spatial sorting increased performance, but not by that much. If you can eliminate data points from the list of what needs to be processed that is in theory a plus, but the bane of all GPUs is thread divergence. If one group of cores goes one way of a particular if-block and another goes via the else, what happens in practice is that you have to run the same code twice. First you run all the threads where the condition is met, and then all the threads where it is not. The more flow-control you use, the more divergence, and the bigger hit on the performance.

Still, that should not have happened here. The boids are sorted, so on each pass the threads should be processing boids that are next to each other, and looking up the same neighboring grid cells. There might be some divergence, but things should be mostly staying in-sync, which again should mean much higher performance. What could possibl... oh, that motherfucker - he's not actually using the sorted indices. When a thread selects a boid to process, it just goes through the unsorted buffer containing the boid data (position, velocity, etc). Since the boids can move around there's no guarantee that they're next to each other at any given time. Sure, the sorting helps, because when you're looking up the neighbors, you don't have to check on all of them for each boid, but like I was saying above I was getting divergence up the wazoo.

Well, one "if (use_bins) my_index = bin_reindex.data[my_index];" later that 131K simulation was running at 100FPS. I can go as high as 262K at 40FPS. By 524K, the shader program crashes, by the looks of the error my GPU doesn't have enough memory, though the message is not very clear.

Not a lot of code written, but that was quite fun. Now I want to if I can reimplement my old simulation. "Follow player, collide with each-other" should be a bit easier on the GPU, since it necessarily requires that the boids are seperated, and spread equal-ish on the grid. I'd also like to do something about the spaghetti GD script. Let's see how it goes.