Hot loading code on Amiga, the dumb way

August 13, 2024

In my opinion, an understated programming skill—one that only comes with experience—is knowing when to be "good practice", when to be clever, and when to be dumb. Today, we're being dumb.

The codebase of Magicore Anomala has steadily increased in size over the years—it's currently sitting at about 70kb. And it's only going to get much larger, perhaps eventually even pushing against the classic Amiga's limit of 512kb general-purpose RAM. The more code, the less room for game.

So to keep things sustainable, I need to be able to load code from disk only when it's needed (like level-specific code). These code "modules" need full, unrestricted access to the game's codebase and memory.

The situation is this: Let's say a code module wants to call a game function, or adjust a variable such as the player's position. These functions and variables can exist anywhere in memory—it changes every time the game is loaded. So, how do the code modules determine where the game's functions and variables are located, if they're standalone units that exist separately from the main program?

Variables in Assembly

A brief explanation on how "variables" work in Assembly. Consider this example:

    add.w    #10,pl_PosX

You might infer that this simple instruction adds 10 to the player's X position, and you'd be correct. However, pl_PosX is meaningless to the computer; it needs to be translated into a memory address, specifically the address that holds the player's X position.

In simple terms, programs have a "data" section where these variables are defined. The assembler knows that pl_PosX is, say, 0x324 bytes into the data section. When creating the final executable, a table is added to the bottom of the program (the relocation table) that tracks references like this. It tells the OS that this instruction is trying to access DATA+0x324.

When the OS loads the executable, it finds some spot in memory to load the data section—let's say the data section gets loaded into address 0x20000. Now that we finally have a real memory address for our data, the OS can go through the relocation table and replace our instruction with this:

    add.w    #10,$20324

FYI, these variables are usually called "symbols" or "labels", because they're doing little more than labeling a spot in memory.

Problem solved?

Okay, so there's already a good mechanism for performing "relocation" to convert these variables to absolute memory addresses when the code is loaded. However, I chose not to go down that route, for a couple reasons.

The biggest reason is that my code almost never uses this absolute addressing method when accessing memory. I use base-relative addressing, which is both faster and smaller.

All your base are belong to us

Base-relative addressing is a technique where we access data by using a pointer to the data section, instead of hard-coding all those absolute memory addresses. The pointer is loaded into a register when the program starts, and it's expected to persist all throughout the program.

Let's say I have a pointer to my data section in register a4. Now, in my code, I can do this:

    add.w    #10,pl_PosX(a4)

As long as I tell the assembler that a4 is my base register, it's smart enough to see pl_PosX and replace it with the correct offset into the data section:

    add.w    #10,$324(a4)

This instruction adds 10 to the memory address (a4)+0x324, which, in our example, is where pl_PosX is stored. This translation is done by the assembler, without the OS having to take care of relocation or anything like that.

As opposed to using the absolute address, this instruction is 4 cycles faster and 2 bytes smaller. Throughout the entire program, that's a lot of savings.

Furthermore, since we can access our entire data section through a4, it enables us to build position-independent code—code where no relocation is required. That means I can build code as standalone, raw binary files that contain nothing but the exact code and data I write; not an object file, not an executable, just the exact binary output of my source file. That's the key ingredient for my "dumb" code modules.

Fighting dumb with dumb

There's another problem, and it's during the build process. The main program knows that pl_PosX is at position 0x324 of the data section, because the data section is included in the main build. But my standalone code modules are effectively each their own independent program, so they have no idea what pl_PosX or any other symbol is referring to. How can I get that information over to the modules?

There is probably a decent, clever way to do this using advanced assembler and/or linker features. But we're being dumb today, which means I'm sidestepping that added complexity and just writing a simple script.

When I build the main program, I use a build flag to generate a symbol map. The map file looks something like this:

Symbols of DATA:
  0x00000000 _DataStart: local reloc, size 0
  0x00000000 atk_Params: local reloc, size 0
  0x00000080 os_GraphicsName: local reloc, size 0
  0x00000092 os_DOSName: local reloc, size 0
  0x0000009e scr_YPosTable: local reloc, size 0
  0x0000029e scr_ShakeTable: local reloc, size 0
  0x000002cc lvl_PathList: local reloc, size 0
  0x000002e6 lvl_Paths: local reloc, size 0

This is just an excerpt, but you get the idea. All I do is take this map file and put it through a script that reformats it into an include file, symbols.i, that my code modules can use.

The output looks like this:

atk_Params              = $0000
os_GraphicsName         = $0080
os_DOSName              = $0092
scr_YPosTable           = $009e
scr_ShakeTable          = $029e
lvl_PathList            = $02cc
lvl_Paths               = $02e6

Now, in my code module, I just include symbols.i, and I get access to every symbol in my program as a position-independent offset.

This isn't a one-time thing; it needs to be done on every single build. The reason is simple: If I insert some new data into the program, it will change all the offsets of the data that comes after it.

That's not an issue, because it takes virtually no time at all to "build" symbols.i. I build the main program, then symbols.i, then the code modules.

By the way, I use another base register for the code section, which gives my modules full access to all the functions in my main program, not just the data.

Modularity and modding

Rather than come up with some library or API, my code modules have full, unrestricted access to the entire program's memory space and functions, as though they were a part of the main program itself.

I'm doing this in part to make my life easier, but I'm also doing it for the possibility of modding. Magicore's level files are 100% self-contained, meaning they contain all of their own assets, code, and data needed in a single package. That basically means new levels can be added to the game simply by dropping a new level file into the game directory. The level is all-powerful, because it can run its own code, so the sky is the limit for modders. They have complete access to everything.

And before you lecture me about the security implications of that, I'd like to remind you that this game is for the goddamn Amiga. What are they gonna do, play techno music at me?