Magicore Anomala is powered largely by the Amiga's blitter, allowing me to quickly clear the screen and draw hundreds of objects every frame at a full 60fps. It runs in parallel with the CPU and excels at copying or manipulating large blocks of data.
But the blitter goes above and beyond the functionality of simply hauling bits around. You can shift, mask, and logically combine up to three independent sources anywhere in shared memory.
Today I'll show you how Magicore uses the copper and blitter to convert and copy a 24-bit RGB color palette into the Amiga's 12-bit color registers, every frame, using zero CPU cycles.
Quick blitter intro
"Blit" stands for "block transfer"—in other words, transferring a block of memory from a source to a destination.
Jay Miner, the "father of the Amiga", insisted on calling the Amiga's blitter a "bimmer" (which evidently did not catch on). He wanted to distinguish it as a "bitmap image manipulator" because it could shift and combine up to three independent sources in many unique ways.
As you'll soon see, we can use the blitter (bimmer?) to do some clever bit shifting, masking, and ORing on a big chunk of memory—something the CPU would strain to do.
Our RGB data
I have a palette of 32 colors, and each color is stored with 8-bit RGB values.
To make this work, I technically store it as GRB (swapped green and red). In
memory, each color looks like 00GgRrBb
, and we want to convert them to 0RGB
(discarding the low 4 bits of each color).
I'm storing the palette as 8-bit colors because I'm doing some color effects like additive blending and converting HSV to RGB. These operations are much easier on byte-aligned values, especially for our 7MHz CPU.
The blit operation
Here is the blit operation to convert our 00GgRrBb
to 0RGB
:
; 00GgRrBb -> 0RGB
; A points to RrBb, B points to 00Gg
; C has no source, BLTCDAT loaded with $00f0 as a constant
; 1. Mask A (R0B0), shift A 4 bits (0R0B)
; 2. D = A + BC, i.e. 0R0B | (00Gg & $00f0)
dc.w $0001,$0000 ;wait for blitter
dc.w BLTCON0,$4df8 ;4: shift A, d: use ABD, f8: D = A + BC
dc.w BLTCON1,$0000
dc.w BLTCDAT,$00f0 ;C is always $00f0 to mask B
dc.w BLTAPTH,0 ;GRB8 color source +2 (RrBb)
dc.w BLTAPTL,0
dc.w BLTBPTH,0 ;GRB8 color source (00Gg)
dc.w BLTBPTL,0
dc.w BLTDPTH,0 ;Copperlist for color registers
dc.w BLTDPTL,0
dc.w BLTAFWM,$f0f0 ;mask out color low bits
dc.w BLTALWM,$f0f0
dc.w BLTAMOD,2
dc.w BLTBMOD,2
dc.w BLTDMOD,2
dc.w BLTSIZE,32<<6+1 ;blit 32 lines of 1 word each
The above is done using copper (coprocessor) instructions. Here is an equivalent using the CPU:
_scr_blit_colors:
; Blit color registers
move.l #$4df80000,BLTCON0(a6)
move.w #$00f0,BLTCDAT(a6)
move.l #CopColor+2,BLTDPTH(a6)
lea cfx_WorkingColors,a0
move.l a0,BLTBPTH(a6)
addq #2,a0
move.l a0,BLTAPTH(a6)
move.l #$f0f0f0f0,BLTAFWM(a6)
moveq #2,d0
move.w d0,BLTAMOD(a6)
move.w d0,BLTBMOD(a6)
move.w d0,BLTDMOD(a6)
move.w #32<<6+1,BLTSIZE(a6)
rts
A more readable breakdown
Let's walk through what happens step by step:
- A gets loaded with
RrBb
(from source memory) - B gets loaded with
00Gg
(from source memory) - C gets loaded with
00f0
(as a constant) - A gets masked with
f0f0
. A is nowR0B0
- A gets shifted 4 bits. A is now
0R0B
- The minterms kick in to combine A, B, and C:
- B intersects C, combining
00Gg
with00f0
to give us00G0
- A unions BC, combining
0R0B
with00G0
to give us0RGB
- B intersects C, combining
- The result
0RGB
gets written to destination D - A, B, and D move 2 bytes forward but have reached the end of the line, so they move another 2 bytes forward (because of our modulo 2), which brings them to the next color entry
This all happens in 8 cycles. That's the power of the Amiga blitter!
Copper vs. CPU
Above, I gave examples of performing this blit using either the copper or the CPU. The CPU version takes about 64 cycles, which is perfectly reasonable for a 68000.
The copper version takes 0 cycles, because it's the exact same set of instructions every frame—no logic needs to be performed to set up the copper instructions, except for the very first time. Only a small optimization over the CPU version, but it feels cool!
A minor downside
Only one component at a time can access shared memory. If the blitter is running, that means the CPU has to wait its turn to read from shared memory—whether that's reading/writing data, or simply fetching CPU instructions.
By default, the blitter gives every 4th DMA cycle to the CPU. Thankfully, this blit is quite small (only 32 words), so we're not tied up for too long. For larger blits, you can try to structure your program so that your most expensive CPU instructions happen during the blit. A single multiply or divide can take 70 or more CPU cycles, so it's a great time for the CPU to be doing that, rather than repeatedly waiting around a lot to fetch its next instruction.
If your Amiga has "Fast RAM" then this is no issue, because Fast RAM is CPU-only and doesn't have to be shared among all the other chips.
Blitter fun
One of the most fun parts of working on Magicore is leveraging Amiga-specific features to make the game do cool stuff. The blitter is so powerful that it just makes me hope I come across all kinds of unique use cases for it. Maybe it gets your imagination going, too.