Rapidly building custom file formats for my game

Earlier this year, I had to figure out how I was going to load level data into Magicore Anomala. A level contains backgrounds, sprites, animations, dialogue scripts, and much more. Ideally, I want to pack all of these things into a single file, so that I can deal with the filesystem and file names as little as possible.

I did some cursory searching on tools or frameworks that make it easy to define and build custom file formats, but the stuff I found was a little too complex for my use case, and not flexible enough. So I created my own tool, and I open-sourced it in case others find it useful too.

Simple Binary Builder

Simple Binary Builder is a tool I made that lets me define data structures in clean, readable Python and then build them by sourcing my data from a simple .toml file.

For example, here is a data structure that I use for sound effects:

class SFXDef(Block):
    data_pointer: U32
    length: U16
    period: U16
    volume: U16
    channel: U8
    priority: U8
    sfx_data: Bytes

    def set_data_pointer(self, _):
        return 0

    def set_period(self, data: dict):
        return round(3579545 / data['samplerate'])

    def set_channel(self, _):
        return 0

    def set_length(self, _):
        return self.sfx_data.size() // 2

    def set_sfx_data(self, data: dict):
        return sfx.get_sfx(self.root_path/data['data'], data['samplerate'])

And here is how the SFX are defined in my .toml file:

[[sfx]]
data = "sfx/shoot.wav"
samplerate = 22050
volume = 64
priority = 64

[[sfx]]
data = "sfx/jump.wav"
samplerate = 27000
volume = 48
priority = 60

Then, as part of my build script, I just give this .toml file to Simple Binary Builder, and it outputs a data file with the exact layout you see in the Python type annotations. Some values (like volume and priority) are taken right from the .toml file, but others require logic or computation, handled in the corresponding Python setter function.

Python was a great choice because it's extremely flexible and you can be hacky with it. For instance, sfx_data takes the path of the .wav file and converts it to 8-bit RAW using an external process called sox. I was able to do this in just a few lines of code, thanks to Python's subprocess module.

I am also using some very aggressive reflection in Simple Binary Builder to enable the definition format you see above, where the data structure is defined using Python type annotations. It was very important to me that my data structures were highly self-documenting like that, but still extremely flexible by living in the code. That also lets me create more primitive types as needed, like this 8-bit Bool type and a simple List type:

class Bool(U8):
    def __init__(self, parent: Optional[Block], data: bool) -> None:
        super().__init__(parent, 0xff if data else 0x00)


class List(Block):
    """Contains item count, item offsets, and a byte array of all the items."""
    count: U16
    offsets: Array[U32]
    items: Array

    def set_count(self, _):
        return len(self.items)

    def set_offsets(self, _):
        offsets = []
        current_offset = self.offsets.offset() + (4 * self.count)
        for item in self.items:
            offsets.append(current_offset)
            current_offset += item.size()
        return offsets

    def set_items(self, data):
        return data

Going back to the SFX example, I'm using that List type to store all my SFX as a list:

class SFXs(List):
    count: U16
    offsets: Array[U32]
    items: Array[SFXDef]

So in my .toml file, I just have the list of SFX definitions, and then the count and offsets fields are computed at build time by the setter functions!

Rapid iteration

I think one of the biggest advantages of modern development tools is the extremely rapid iteration process. We can develop applications several times faster if we can see our changes as soon as we make them.

So in developing my own game engine, a lot of this luxury is lost, and menial things like getting data into my game can be a slog. Simple Binary Builder solved some of those problems on its own, but its extensibility means that I was able to hack some of my own workflow on top of it.

For instance, the rooms in my levels have a grid of 16x16 collision tiles, and a "hazards layer" which is just a bitmap of spikes and things that kill you. This data is all read from a PNG file, like this one:

Then, my room definition in the .toml file looks like this:

[[rooms]]
has_hazards = true
has_tiles = true
event_id = 128
event_params = 1
starting_pos = [8, 192]
exits = [-1, 2, -1, -1]
background_id = 0
hazards_image = "rooms/1-hazards.png"

Of note here is hazards_image. The PNG path gets sent over to my script, which extracts the white pixels into a bitmap for the hazards layer (the image above is obviously placeholder graphics). More than that, my script looks at the purple tiles to build a tilemap for the room. A full purple block means that the tile has collision on all sides, and a thin rectangle means it's a platform that you can pass through from underneath.

As I add more tiles with unique behaviors, I can just tell my script to look for a new color in the PNG and put the corresponding tile ID in the tilemap.

So, instead of having to maintain an entirely separate level editor application, I can simply design my levels entirely in Photoshop, just as quickly! Now, it helps that these rooms don't have tons of custom assets required per room, but this is a case where I gave a simple solution to a simple problem.

Try it yourself

These are some examples of how I used Simple Binary Builder for Magicore, but perhaps you have a way to make use of it in your own game. You can find it here on GitHub. I tried to provide lots of examples in the readme. If you do end up using it, I'd love to hear about it!