Magicore Anomala is powered largely by the Amiga's blitter, allowing me to quickly clear the screen and draw hundreds of objects every frame at a full 60fps. It runs in parallel with the CPU and excels at copying or manipulating large blocks of data.
But the blitter goes above and beyond the functionality of simply hauling bits around. You can shift, mask, and logically combine up to three independent sources anywhere in shared memory.
Today I'll show you how Magicore uses the copper and blitter to convert and copy a 24-bit RGB color palette into the Amiga's 12-bit color registers, every frame, using zero CPU cycles.
Quick blitter intro
"Blit" stands for "block transfer"—in other words, transferring a block of memory from a source to a destination.
Jay Miner, the "father of the Amiga", insisted on calling the Amiga's blitter a "bimmer" (which evidently did not catch on). He wanted to distinguish it as a "bitmap image manipulator" because it could shift and combine up to three independent sources in many unique ways.
As you'll soon see, we can use the blitter (bimmer?) to do some clever bit shifting, masking, and ORing on a big chunk of memory—something the CPU would strain to do.
Our RGB data
I have a palette of 32 colors, and each color is stored with 8-bit RGB values.
To make this work, I technically store it as GRB (swapped green and red). In
memory, each color looks like
00GgRrBb, and we want to convert them to
(discarding the low 4 bits of each color).
I'm storing the palette as 8-bit colors because I'm doing some color effects like additive blending and converting HSV to RGB. These operations are much easier on byte-aligned values, especially for our 7MHz CPU.
The blit operation
Here is the blit operation to convert our
; 00GgRrBb -> 0RGB ; A points to RrBb, B points to 00Gg ; C has no source, BLTCDAT loaded with $00f0 as a constant ; 1. Mask A (R0B0), shift A 4 bits (0R0B) ; 2. D = A + BC, i.e. 0R0B | (00Gg & $00f0) dc.w $0001,$0000 ;wait for blitter dc.w BLTCON0,$4df8 ;4: shift A, d: use ABD, f8: D = A + BC dc.w BLTCON1,$0000 dc.w BLTCDAT,$00f0 ;C is always $00f0 to mask B dc.w BLTAPTH,0 ;GRB8 color source +2 (RrBb) dc.w BLTAPTL,0 dc.w BLTBPTH,0 ;GRB8 color source (00Gg) dc.w BLTBPTL,0 dc.w BLTDPTH,0 ;Copperlist for color registers dc.w BLTDPTL,0 dc.w BLTAFWM,$f0f0 ;mask out color low bits dc.w BLTALWM,$f0f0 dc.w BLTAMOD,2 dc.w BLTBMOD,2 dc.w BLTDMOD,2 dc.w BLTSIZE,32<<6+1 ;blit 32 lines of 1 word each
The above is done using copper (coprocessor) instructions. Here is an equivalent using the CPU:
_scr_blit_colors: ; Blit color registers move.l #$4df80000,BLTCON0(a6) move.w #$00f0,BLTCDAT(a6) move.l #CopColor+2,BLTDPTH(a6) lea cfx_WorkingColors,a0 move.l a0,BLTBPTH(a6) addq #2,a0 move.l a0,BLTAPTH(a6) move.l #$f0f0f0f0,BLTAFWM(a6) moveq #2,d0 move.w d0,BLTAMOD(a6) move.w d0,BLTBMOD(a6) move.w d0,BLTDMOD(a6) move.w #32<<6+1,BLTSIZE(a6) rts
A more readable breakdown
Let's walk through what happens step by step:
- A gets loaded with
RrBb(from source memory)
- B gets loaded with
00Gg(from source memory)
- C gets loaded with
00f0(as a constant)
- A gets masked with
f0f0. A is now
- A gets shifted 4 bits. A is now
- The minterms kick in to combine A, B, and C:
- B intersects C, combining
00f0to give us
- A unions BC, combining
00G0to give us
- B intersects C, combining
- The result
0RGBgets written to destination D
- A, B, and D move 2 bytes forward but have reached the end of the line, so they move another 2 bytes forward (because of our modulo 2), which brings them to the next color entry
This all happens in 8 cycles. That's the power of the Amiga blitter!
Copper vs. CPU
Above, I gave examples of performing this blit using either the copper or the CPU. The CPU version takes about 64 cycles, which is perfectly reasonable for a 68000.
The copper version takes 0 cycles, because it's the exact same set of instructions every frame—no logic needs to be performed to set up the copper instructions, except for the very first time. Only a small optimization over the CPU version, but it feels cool!
A minor downside
Only one component at a time can access shared memory. If the blitter is running, that means the CPU has to wait its turn to read from shared memory—whether that's reading/writing data, or simply fetching CPU instructions.
By default, the blitter gives every 4th DMA cycle to the CPU. Thankfully, this blit is quite small (only 32 words), so we're not tied up for too long. For larger blits, you can try to structure your program so that your most expensive CPU instructions happen during the blit. A single multiply or divide can take 70 or more CPU cycles, so it's a great time for the CPU to be doing that, rather than repeatedly waiting around a lot to fetch its next instruction.
If your Amiga has "Fast RAM" then this is no issue, because Fast RAM is CPU-only and doesn't have to be shared among all the other chips.
One of the most fun parts of working on Magicore is leveraging Amiga-specific features to make the game do cool stuff. The blitter is so powerful that it just makes me hope I come across all kinds of unique use cases for it. Maybe it gets your imagination going, too.