Writing my own dialogue scripting language

December 2, 2024

Magicore Anomala is an Amiga game I'm working on. Watch the newest gameplay demo in full: Link

I made Doki Doki Literature Club using Ren'Py, a Python-based visual novel engine that has some staggering capabilities. But thinking about it, there are tons of games with visual novel-like dialogue features—maybe even the majority of them at this point.

How is dialogue implemented in these games? Do most of them use existing dialogue scripting libraries, or are they more often spinning up their own solution? I have no idea, but given the extremely specific features and limitations of Magicore, I had no choice but to create my entire stack from scratch (like literally everything else in the game engine).

The components

The dialogue system in Magicore is made up of four distinct components:

The script, the file containing the actual dialogue (and other commands)
The parser, which takes the script and compiles it into bytecode for the game engine
The bytecode, our compiled script that the game engine reads byte-by-byte to determine whether to draw text or run special functions
The renderer, or all the stuff that handles drawing the text and portraits to the screen

The script

The dialogue script format should be easy to read, and efficient to write and edit. Here's a snippet of dialogue from the latest demo:

let f_hurthands = 4

section 3
    char 0 l
    0 "I've made it out."
    if [f_hurthands]
        5 "Although my hands hurt for some reason..."
    fi
    char 1 r
    event 66
    1 "A--"
    "A spirit?!"
    char 0 l
    0 "..."
    char 1 r
    11 "..."
    "{40}...Are you evil?"
    char 0 l
    5 "Uh..."
    choice
        evil_maybe "Maybe?"
        evil_no "I don't think so..."
    endchoice

label evil_maybe
    char 0 l
    11 "Maybe?"
    char 1 r
    6 "Ah--"
    "Well--I can't vanquish you if you're only *maybe* evil!"
    char 0 l
    8 "Sorry..."
    char 1 r
    7 "..."
    jump evil_after

For the sake of the file extension and syntax highlighting, this filetype needs a name. Arbitrarily, I called it "magiscript" with a .mgs file extension. It's really not magic, but thankfully, it's script, so my name only 50% sucks.

The syntax

The syntax is loosely inspired by Ren'Py, but it's much simpler and more barebones, because I don't need nearly as many features.

The syntax is simply made up of directives (special commands) and dialogue (raw text to display).

You can see all kinds of unique directives in the above snippet:

A section is a top-level mark that is built into a table of contents. To start dialogue, I provide the section number as a parameter.
A label can be jumped to. It's not stored in the final file format like section; it's just an alias for a file offset.
char controls which portrait set to display (i.e. which character), and on which side (left/right). Some dialogue lines start with a number, which selects the portrait ID to display from the set (i.e. which facial expression).
if/fi encloses a conditional script block. It takes level flags as an argument, which the game can set and clear as it chooses.
choice/endchoice encloses a choice selection. The choices can even be conditional, like if you only want a choice to be shown when a certain flag is set.
event runs a level event, which is a code function that can do literally anything. Events 0-63 are level-specific events (they get loaded with the rest of the level), whereas 64-127 are global events. In the snippet, event 66 is a global event used to resume the cutscene timeline, usually to move and animate character sprites while dialogue is happening.
We can use let to define an alias, usually for level flags.
set and clear (not shown above) let us set level flags. This gives us complex branching logic, where we can set flags right in the script, and then conditionally branch depending on which flags are set.

There is some other syntax that isn't shown above, but that's the meat of it.

Using `let`

The let directive is pretty flexible and lets us substitute just about anything. Besides level flags, another nice use case would be to name characters, facial expressions, and events. Maybe something like this:

let f_hurthands = 4
let nova = 0
let nana = 1
let nova_neutral = 0
let nova_lookdown = 5
let nana_surprised = 1
let tl_play = 66

section 3
    char [nova] l
    [nova_neutral] "I've made it out."
    if [f_hurthands]
        [nova_lookdown] "Although my hands hurt for some reason..."
    fi
    char [nana] r
    event [tl_play]
    [nana_surprised] "A--"
    "A spirit?!"

It just depends on what I feel is more readable for myself (and faster to write). If I know what a simple digit represents, that can end up being more readable to me than a verbose alias.

Syntax highlighting

By the way, it's extremely easy to add basic syntax highlighting in Vim for custom filetypes. Here is the entire syntax file for magiscript:

if exists("b:current_syntax")
  finish
endif

syn keyword directive char end jump label choice endchoice set clear if fi let section event wait

syn match comment '#.*$'
syn region string start='"' end='"'
syn match number '\v(^|\s)\d+'

let b:current_syntax = "magiscript"

hi def link comment Comment
hi def link string Constant
hi def link directive Statement
hi def link number Constant

It does have some slight inaccuracies that I may fix at some point, but it's good enough, and it took like 30 minutes to research and put together.

The parser

The parser's job is to take our script and convert it into bytecode. It's a very simple parser that evaluates the script line by line, converting the different directives to their corresponding opcodes and arguments.

Here is a simplified version of the top-level function in the script parser:

def parse_script(script_path: str) -> tuple[bytes, list[int]]:
    state = ScriptState()
    # Read script lines from file
    with open(script_path, 'r') as f:
        state.script = f.readlines()
    # Iterate through lines
    for line_number, line in enumerate(state.script):
        state.line_number = line_number
        # Remove comments
        if '#' in line:
            line = line[:line.index('#')]
        if not line:
            continue
        directive = line.split()[0]
        # Check for known directive
        if directive in Directives:
            state.output += Directives[directive](state, line)
            continue
        # Check for portrait ID change
        if directive.isdigit():
            state.output += _directive_portrait(state, line)
            state.output += _directive_text(state, line)
            continue
        state.output += _directive_text(state, line)
    # Resolve jumps
    for offset, to_label in Jumps.items():
        to_offset = (Labels[to_label] - offset).to_bytes(2, signed=True)
        state.output[offset + 1] = to_offset[0]
        state.output[offset + 2] = to_offset[1]
    return bytes(state.output), state.section_offsets

The main loop

Basically, after reading the file, the parser does this line by line:

Remove comments.
Check for known directives (e.g. if, jump, char). If found, run the function to handle that directive.
If the "directive" is a decimal digit, that's the directive to change portrait ID, followed by a line of text.
If no directive is found, then the line must simply be dialogue text.

Directive functions

Directives are parsed in this line of the loop:

        # Check for known directive
        if directive in Directives:
            state.output += Directives[directive](state, line)
            continue

Which is calling a function from this table:

Directives: dict[str, Callable[[ScriptState, str], bytes]] = {
    'char': _directive_char,
    'end': _directive_end,
    'label': _directive_label,
    'jump': _directive_jump,
    'set': _directive_set,
    'clear': _directive_clear,
    'let': _directive_let,
    'if': _directive_if,
    'fi': _directive_endif,
    'section': _directive_section,
    'event': _directive_event,
    'wait': _directive_wait,
}

Here is an example of one such directive function:

def _directive_char(state: ScriptState, line: str) -> bytes:
    """Sets the character ID for using a specific portrait set and textbox side."""
    args = line.split()
    opcode = Opcodes['char'] # opcode 0x01
    char_id = int(args[1]).to_bytes()
    side = (0xff if args[2] == 'l' else 0).to_bytes()
    return opcode + char_id + side

Pretty simple, right? Each directive function just takes the script line, checks the parameters, and returns some bytes.

Resolving jumps

After the script is done being parsed, we resolve jumps. The jump opcode is really just "Jump to file offset xxxx".

Well, we don't know any file offsets until we actually have the file. So, the script is fully compiled first, but all jumps are given an offset of 0000. Now, we can go back through and replace those offsets with the actual file offset of the corresponding labels.

Of note, if statements are just conditional jumps. The writer thinks of it as "if condition, run this block". However, the parser converts it to "if NOT condition, jump to the end of this block (fi)". So, just like labels, fi isn't its own opcode. It's more like a temporary unnamed label used for that conditional jump.

That parser seems pretty dumb and unsafe

Indeed. The script parser is an internal tool that only I am using. Since I know what rules to follow, I don't feel the need for much error handling—especially since it's prone to change at any time. It's one of those "just get it to work" type things.

This could bite me later on. If I make a typo in the script that results in undefined behavior, it might be hard to locate that bug. I think once I'm 100% confident that the syntax and directives are set in stone, I'll go through and harden the parser to prevent such cases.

Plus, I will eventually make the whole Magicore build pipeline open-source, so at some point, these tools will need enough safety to be usable by others.

The bytecode

When Magicore reads the compiled script, it doesn't perform any extra logic or preprocessing. It simply reads the script one byte at a time.

Here's the in-game function that reads the next byte in the script and processes it. Ready for some Assembly?

_txt_next_char:
            move.b      #TEXT_SPEED,_txt_NextCharTimer
            move.l      txt_pSection,a1
.n:         move.w      _txt_ScriptOffset,d1
            addq        #1,_txt_ScriptOffset
            moveq       #0,d0
            ; d0 = current char byte
            ; If char is 0 (pad), skip to next char
            move.b      (a1,d1.w),d0
            beq         .n
            cmp.b       #32,d0
            bhs         .literal
            ; Char < 32, so it's an opcode
            add.w       d0,d0
            lea         txt_jOpcodes,a0
            move.w      (a0,d0.w),d0
            jmp         (a0,d0.w)
            ; Char is a literal
            ; If space (32), look ahead to see if we need to wrap word
.literal:   sub.b       #32,d0
            beq         .lookahead
            ; Add new char to buffer
.addchar:   move.w      d0,-(sp)
            jsr         tchr_add
            move.w      (sp)+,d0
            ; Update HPos with char width
            bsr         _txt_get_char_width
            add.w       d0,txt_HPos
            ; Play SFX
            bsr         _txt_play_sfx
            rts

Even if you're not great with Assembly, the comments should help you get the general idea.

Opcodes

Byte values 0-31 are the opcodes. If you know your ASCII, you know that printable characters start at 32, so it ends up being quite efficient to repurpose values 0-31 in this way.

I use a jump table (basically a switch statement) to run a different function for each of the unique opcode values. But if the value is 32 or higher, the engine instead buffers that character to be rendered.

Here is an example of how the char directive is processed in-game:

_txt_directive_char:
            addq.w      #2,_txt_ScriptOffset
            ; Portrait ID
            move.b      1(a1,d1.w),d0
            ; Side ($00 right, $ff left)
            move.b      2(a1,d1.w),d1
            ext.w       d0
            ext.w       d1
            jsr         prt_change_set
            bsr         _txt_cr
            bra         _txt_next_char

Since the script is read byte by byte, different opcodes can easily have different parameter lengths. The char opcode has 2 bytes of parameters: Portrait ID, and which side the portrait goes on. That's why we add 2 to the script offset—those 2 bytes are being used right now, so the engine should skip over them next time a byte is read.

Meanwhile, an opcode like ctc needs no arguments. The parser inserts the ctc opcode at the end of every dialogue line, which tells the game engine to pause the script, and continue after the fire button is pressed.

_txt_directive_ctc:
            move.w      #TXT_STATE_IDLE,txt_State
            bra         ctc_enable

As of now, the jumpif opcodes have the most arguments, at 6 bytes: The offset to jump to, and the level flags to use as our conditional.

; Jump if all of the flags are set/clear
; opcode (byte), jump offset (word), flags (word), mask (word)
; Mask: Which bits are relevant
; Flags: Flag state (on or off) to test against
_txt_directive_jumpifall:
            move.w      lvl_Flags,d0
            ; d2 = flags, d3 = mask
            move.w      3(a1,d1.w),d2
            move.w      5(a1,d1.w),d3
            eor.w       d0,d2
            and.w       d2,d3
            ; Jump if all flags within the mask match
            beq         _txt_directive_jump
            ; Otherwise, just move on
.nojump:    addq.w      #6,_txt_ScriptOffset
            bra         _txt_next_char

Raw text

If a byte in the script is value 32 or higher, that means it's a literal ASCII character for us to draw, so we add it to the buffer.

Lookahead at runtime

The only time the engine actually looks ahead at upcoming data is when inserting a space. The reason for this is word wrapping. Is the next word going to fit on the current line? If not, we insert a newline instead of a space.

Theoretically, we can do word wrapping at compile time instead of at runtime. If the parser knows what font is being used, and the width of every character in the font, it could preemptively figure out the newlines and save us some runtime code.

But I don't know if I like that. Text wrapping is also based on the padding of the textbox, and whether a portrait is being shown. Yes, those things are also known ahead of time, but it feels more future-proof to let the game engine handle text wrapping. I'll keep it that way unless it's taking a toll on performance in key scenes.

The renderer

Finally, we have to render the text. It's fairly complicated, so I won't go into full detail.

The font

The font used for Magicore dialogue is not final—I think it looks cool, but it has some readability issues.

In any case, I wanted my build system to support easily changing fonts. A font in Magicore is basically just a bitmap image with all the characters, and a table containing character widths (for good kerning).

There are plenty of external tools to generate bitmaps of fonts, but I got tired of manually generating font bitmaps while experimenting with different fonts.

Thankfully, Pillow supports rendering text to an image using ImageFont. That means as part of my build system, I can pass in any TTF font and render all the characters to a bitmap.

The video memory

Amiga doesn't have a static region of video memory—you get to give it a pointer to anywhere in Chip RAM, and whatever is there will be displayed.

That means I can have separate screen memory for the textbox and the main playfield. Using the Copper (Amiga's parallel coprocessor), I first provide a pointer to the textbox as the screen memory. Then, the Copper waits for line 56 (the bottom of the textbox) before switching the screen pointer back to the main playfield.

Pretty cool, right? Another way to think about it is hardware-supported splitscreen.

The colors

Earlier, I mentioned we have a buffer of characters to render. Why do we need a buffer, rather than just rendering each character immediately?

The reason is that we have a visual effect where each character fades in over a few frames. It's subtle, but you can see it if you look closely at the text in the video. It looks quite a bit better than having the characters appear instantly at full white.

The textbox uses 5 bitplanes to achieve 32 colors (to learn more about bitplanes, see my last post).

The textbox colors are distributed like this:

00      BG (black)
01      Text frame 1 (dark gray)
02      Text frame 3 (white)
03      Text frame 2 (light gray)
04      Text emphasis (red)
05-31   Portrait image

When a new character is added to the buffer, it gets drawn over 3 frames:

Blit to bitplane 0 (color 1 = dark gray)
Blit to bitplane 1 (color 3 = light gray)
Erase from bitplane 0 (color 2 = white)

Any emphasis characters (drawn in red) don't have this fade-in effect. I wouldn't mind adding it, but it depends on how many colors we want to reserve for the portrait image. Right now, the portrait gets 27 colors, but the assets so far only use about 20 colors, so we might have a few colors to spare. It just depends on the color complexity of future portraits.

Wrapping up

The dialogue system is a good example of everything that goes into both the build pipeline and runtime functionality. I've been working on Magicore and its game engine as a side project for over 2 years now. Hopefully, this gives you a good understanding of what that time goes into (and why I don't have a release date yet).