Performant data structures for bullet hell

One interesting challenge when coding the game engine for Magicore Anomala is figuring out the ideal data structures for different scene objects.

Since we're on a 7MHz CPU and need to process hundreds of objects per frame, every CPU cycle counts. Here are the requirements for the bullet objects:

  1. We have a list of 200 bullets. Every frame, each active bullet in the list needs to be updated.
  2. During the update, if a bullet leaves the play area, then it needs to despawn and/or become inactive. This means any random bullet in the list can despawn at any time, leaving free memory in that spot.
  3. At arbitrary times (sometimes multiple in a frame), we need to spawn a new bullet to the list.

Number 3 is the hardest here: To spawn a new bullet, we have to find a free entry in our list of bullets. It's tough to do that fast.

I don't have much experience with manual memory management, so these kinds of problems are brand new to me.

My current solution

A barebones array is out of the question—to spawn a bullet, we'd have to traverse the array to find the first free spot, and that is way too costly.

To go with the array of bullet objects, I have a second array of pointers, and I pre-populate it with a pointer to every element of the bullets array.

FreeBulletStackPtr:
    dc.l    FreeBulletStack

FreeBulletStack:
    dc.l    Bullets+BULLETSIZE*0    ; <-- We are here
    dc.l    Bullets+BULLETSIZE*1
    dc.l    Bullets+BULLETSIZE*2
    dc.l    Bullets+BULLETSIZE*3
    dc.l    Bullets+BULLETSIZE*4
    ...

Bullets:
    dcb.b   BULLETSIZE*200

Every time we want to spawn a bullet, we look at the current entry in our stack, and we spawn it to that location:

FreeBulletStackPtr:
    dc.l    FreeBulletStack+4       ; Now points to next element in stack

FreeBulletStack:
    dc.l    0
    dc.l    Bullets+BULLETSIZE*1    ; <-- We are here
    dc.l    Bullets+BULLETSIZE*2
    dc.l    Bullets+BULLETSIZE*3
    dc.l    Bullets+BULLETSIZE*4
    ...

I zeroed out the first item in the stack to make things easier to visualize. After that pointer gets used to spawn a bullet, we move down the stack, and that entry is now junk.

Let's spawn in a couple more bullets.

FreeBulletStackPtr:
    dc.l    FreeBulletStack+12

FreeBulletStack:
    dc.l    0
    dc.l    0
    dc.l    0
    dc.l    Bullets+BULLETSIZE*3    ; <-- We are here
    dc.l    Bullets+BULLETSIZE*4
    ...

But wait—the first bullet has gone offscreen, so that bullet needs to despawn. First, we set that object to "inactive" in the bullets array, so the update loop knows to skip over it (not pictured). Then, we write its pointer to FreeBulletStack:

FreeBulletStackPtr:
    dc.l    FreeBulletStack+8

FreeBulletStack:
    dc.l    0
    dc.l    0
    dc.l    Bullets+BULLETSIZE*0    ; <-- We are here
    dc.l    Bullets+BULLETSIZE*3
    dc.l    Bullets+BULLETSIZE*4
    ...

As you can see, the next bullet will spawn into the newly-freed memory of the one that just despawned. This means the bullets can despawn in any order, and we can always instantly figure out where in memory to spawn in the next bullet.

Improvements?

This solves the problem of spawning in new bullets, but the update routine still has to loop through the entire list and check if the bullet is active before running physics on it.

I'm actually fine with this. In the worst-case scenario where all 200 bullets are active, I need to loop through the entire list anyway. If I make sure I'm hitting 60fps in that scenario, then I'm guaranteed to hit 60fps if there are fewer bullets—even if I'm wasting time by having to find and skip over inactive objects. The overhead of reading and following pointers (e.g. in a linked list) would be less efficient in our worst-case scenario than a simple little check to see if each bullet is inactive.

To help with this further, each attack has a "max bullets" value specified. When I design an attack, I approximate the maximum number of bullets that can possibly be onscreen, and I instantiate a list of that size. That ensures the worst-case scenario for that attack is not still looping through 100 inactive objects, and I can dedicate that extra CPU to the design of the attack in some way.

I think further optimizing this system would involve really scrutinizing the performance cost of specific CPU instructions or memory reads. I'm already doing that where I can, so it's diminishing returns at this point.

As someone with little experience managing memory and data structures, it was a pleasure to come up with a system that worked well for this very specific need.

But oftentimes, the experience I gain (and change in requirements) is causing me to rip out old systems and replace them with something better, so I wouldn't be surprised if that eventually happens here, too.