As one does, I’ve written a game boy emulator that also supports the game boy color. Unfortunately, it has a few glitches, with some games refusing to run at all.

For instance, in GBC mode, Pokémon Crystal nearly immediately shows a bright red screen that (slowly, but still) flashes a vertical stripe pattern in white and red:

screenshot of the bright red screen

screenshot of the red and white bars

On my way to work, I watched a video by TheZZAZZGlitch about types of crashes in Pokémon Red/Blue. One of them, the RST 38 crash, also displays a vertical stripe pattern. Interesting.

I started looking for the reason why this happens. A full decompilation of Pokémon Crystal was made by fans, and it includes a symbol table file, indicating what function (or data structure) many addresses belong to. I added loading support for this file to my emulator, and made it print the function name, the currently loaded ROM bank and the CPU registers every time it hit an address:

BANK:01 PC:0100 SP:fffe A:11 F:10000000 B:00 C:00 D:ff E:56 H:00 L:0d Start
BANK:01 PC:016e SP:fffe A:11 F:10000000 B:00 C:00 D:ff E:56 H:00 L:0d _Start
BANK:01 PC:0175 SP:fffe A:11 F:11000000 B:00 C:00 D:ff E:56 H:00 L:0d _Start.cgb
BANK:01 PC:0177 SP:fffe A:01 F:11000000 B:00 C:00 D:ff E:56 H:00 L:0d _Start.load
BANK:01 PC:017d SP:fffe A:01 F:11000000 B:00 C:00 D:ff E:56 H:00 L:0d Init
BANK:01 PC:01a2 SP:fffe A:04 F:10000000 B:00 C:00 D:ff E:56 H:00 L:0d Init.wait
BANK:01 PC:01b1 SP:fffe A:00 F:10000000 B:10 C:00 D:ff E:56 H:c0 L:00 Init.ByteFill
BANK:01 PC:01b1 SP:fffe A:ff F:00000000 B:0f C:ff D:ff E:56 H:c0 L:01 Init.ByteFill
BANK:01 PC:01b1 SP:fffe A:ff F:00000000 B:0f C:fe D:ff E:56 H:c0 L:02 Init.ByteFill
BANK:01 PC:01b1 SP:fffe A:ff F:00000000 B:0f C:fd D:ff E:56 H:c0 L:03 Init.ByteFill
...

It was very long. Eventually I found that it hits the Start function again much later on (after 48 frames, or 800ms in real time):

...
BANK:01 PC:0100 SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55 Start
BANK:01 PC:016e SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55 _Start
BANK:01 PC:0177 SP:ff45 A:00 F:10000000 B:80 C:00 D:7f E:81 H:ff L:55 _Start.load
BANK:01 PC:017d SP:ff45 A:01 F:10000000 B:80 C:00 D:7f E:81 H:ff L:55 Init
BANK:01 PC:01a2 SP:ff45 A:04 F:10000000 B:80 C:00 D:7f E:81 H:ff L:55 Init.wait
BANK:01 PC:01a2 SP:ff45 A:7a F:01010000 B:80 C:00 D:7f E:81 H:ff L:55 Init.wait
BANK:01 PC:01a2 SP:ff45 A:7a F:01010000 B:80 C:00 D:7f E:81 H:ff L:55 Init.wait
...

It also doesn’t detect the system as a GBC, since the A register is not set to $11 anymore. Notably, the stack pointer (SP) has a weird value: $ff45 is the register used to configure the rendered line count at which an interrupt should fire. All the addresses around it are also various PPU and APU registers. How did it get set to this?

I made it so my emulator will print the current bank and registers for every run instruction after those 48 first frames to get a clue, and well…

...
BANK:41 PC:4240 SP:c0e1 A:06 F:11000000 B:7f C:01 D:7f E:81 H:ff L:44 _continue_HDMATransfer.final_ly_loop
BANK:41 PC:4240 SP:c0e1 A:06 F:11000000 B:7f C:01 D:7f E:81 H:ff L:44
BANK:41 PC:4241 SP:c0e1 A:06 F:01110000 B:7f C:01 D:7f E:81 H:ff L:44
BANK:41 PC:4243 SP:c0e1 A:06 F:01110000 B:7f C:01 D:7f E:81 H:ff L:44
BANK:41 PC:4244 SP:c0e1 A:07 F:01110000 B:7f C:01 D:7f E:81 H:ff L:44
BANK:41 PC:4245 SP:c0e1 A:07 F:11010000 B:7f C:00 D:7f E:81 H:ff L:44
BANK:41 PC:4247 SP:c0e1 A:07 F:11010000 B:7f C:00 D:7f E:81 H:ff L:44
BANK:41 PC:424a SP:c0e1 A:07 F:11010000 B:7f C:00 D:7f E:81 H:ff L:55
BANK:01 PC:424c SP:c0e1 A:07 F:11010000 B:7f C:00 D:7f E:81 H:ff L:55
BANK:01 PC:424f SP:c0e1 A:07 F:11010000 B:7f C:00 D:7f E:81 H:ff L:55
BANK:01 PC:4251 SP:c0e1 A:07 F:01000000 B:7f C:00 D:7f E:81 H:ff L:55
BANK:01 PC:4252 SP:c0e1 A:07 F:01000000 B:7f C:00 D:7f E:81 H:ff L:55
BANK:01 PC:4253 SP:c0e1 A:07 F:01000000 B:7f C:00 D:7f E:81 H:ff L:55
BANK:01 PC:4254 SP:c0e1 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:0038 SP:c0df A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:0038 SP:c0dd A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:0038 SP:c0db A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
...

tangent explanation

Here’s the code for _continue_HDMATransfer.final_ly_loop:

.final_ly_loop
	cp [hl]
	jr z, .final_ly_loop
	ld a, [hl]
	dec c
	jr nz, .final_ly_loop
	ld hl, rHDMA5
	res 7, [hl]
	ei

	ret

However, this only runs until the res 7, [hl] instruction (at PC:424a) because the loaded ROM bank changes to the first bank. According to the symbol table and the assembly, address $424c in bank 1 contains *squints*

FacingBoulderDust1: ; boulder dust 1
	db 4 ; #
	db  0,  0, ABSOLUTE_TILE_ID, $fe
	db  0,  8, ABSOLUTE_TILE_ID, $fe
	db  8,  0, ABSOLUTE_TILE_ID, $fe
	db  8,  8, ABSOLUTE_TILE_ID, $fe

FacingBoulderDust2: ; boulder dust 2
	db 4 ; #
	db  0,  0, ABSOLUTE_TILE_ID, $ff
	db  0,  8, ABSOLUTE_TILE_ID, $ff
	db  8,  0, ABSOLUTE_TILE_ID, $ff
	db  8,  8, ABSOLUTE_TILE_ID, $ff

data about facing boulder dust. It starts at the last db statement of FacingBoulderDust1, which disassembles to: (not fully confirmed, the SP register doesn’t change like it should if this was the first instruction)

ld [$0408], sp
cp a, $0
nop
inc b
rst 38

And rst 38 calls the reset vector at $0038 (in the 0th ROM bank, which is always loaded from $0000 to $3fff inclusive). Opening the ROM file with hexyl shows

│00000030│ 6f d1 e9 00 00 00 00 00 ┊ ff 00 00 00 00 00 00 00 │

that $0038 is $ff, which is also the rst 38 instruction. It loops, pushing $0039 all over the address bus, starting with a bit of work RAM, then all of cartridge RAM (which is not open to writes at this point), all of video RAM, all cartridge ROM and then loops back to the registers between $ff00 and $ffff.

...
BANK:01 PC:0038 SP:ff49 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:0038 SP:ff47 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:0038 SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:0039 SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:003a SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:003b SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:003c SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
...
BANK:01 PC:00fd SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:00fe SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:00ff SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55
BANK:01 PC:0100 SP:ff45 A:07 F:00100000 B:80 C:00 D:7f E:81 H:ff L:55 Start
...

But, when the stack pointer is at $ff45, by some miracle, my emulator steps over the rst 38 instruction and gets all the way to the Start function again. This shouldn’t happen, but this is not my highest worry right now. Let’s step back a bit.

end of tangent

So what’s happening here? And by here I mean

.final_ly_loop
	cp [hl]
	jr z, .final_ly_loop
	ld a, [hl]
	dec c
	jr nz, .final_ly_loop
	ld hl, rHDMA5
> res 7, [hl]
	ei

	ret

that res 7, [hl], which writes to rHDMA5, which is $ff55, the register that controls HDMA (High Direct Memory Access) transfers. It lets the game quickly copy data from one area of the address bus to another. res 7, [hl] resets the 7th bit of that register to 0.

My code to handle writes to this register is this:

let double_speed_mode = self.double_speed_mode();
let hdma = self.hdma.as_mut().unwrap();
hdma.write_control(value);
let cycles: TCycle<Absolute> = TCycle::new(4 + 32 * u64::from(hdma.blocks));
self.cycles += cycles.to_relative(double_speed_mode);

let src = hdma.src;
let dest = hdma.dest;
let blocks = hdma.blocks;
hdma.src += u16::from(blocks + 1) << 4;
hdma.dest += u16::from(blocks + 1) << 4;

for i in 0..u16::from(blocks + 1) << 4 {
    let value = self.read(src + i);
    self.write(dest + i, value);
}

self.hdma.as_mut().unwrap().write_control(0xFF);

My guess would be that it overwrites the wrong area of memory and changes the bank somehow?

In debugging:

thread 1 "rboy2" hit Breakpoint 1, rboy2_core::bus::Bus::write (self=0x555557927c60, address=65365, value=57)
    at rboy2_core/src/bus.rs:575
575                     let double_speed_mode = self.double_speed_mode();
(gdb) n
576                     let hdma = self.hdma.as_mut().unwrap();
(gdb) n
577                     hdma.write_control(value);
(gdb) p *hdma
$6 = rboy2_core::bus::Hdma {src: 55328, dest: 8208, transfer_mode: rboy2_core::bus::HdmaTransferMode::HBlank, blocks: 127}

The source address is $d820 and the destination address is $2010, so cartridge ROM. In Pokémon Crystal’s memory controller, MBC3, $2000 to $3fff is the ROM bank selection register, so as soon as this starts, the bank changes from $41 to $01, putting the program counter into random data and nearly immediately locking up the emulation. How did we get to writing to this address?

Well, the $ff51 to $ff54 registers (labeled rHDMA1 to rHDMA4 in assembly) control the source and destination addresses. The beginning of the _continue_HDMATransfer function sets the source and destination to the values in the HL and DE CPU registers:

_continue_HDMATransfer:
; a lot of waiting around for hardware registers
	; [rHDMA1, rHDMA2] = hl & $fff0
	ld a, h
	ldh [rHDMA1], a
	ld a, l
	and $f0 ; high nybble
	ldh [rHDMA2], a
	; [rHDMA3, rHDMA4] = de & $1ff0
	ld a, d
	and $1f ; lower 5 bits
	ldh [rHDMA3], a
	ld a, e
	and $f0 ; high nybble
	ldh [rHDMA4], a

With a little more debugging, I found another place where an invalid value is written to the HDMA destination address register:

Thread 1 "rboy2" hit Breakpoint 4, rboy2_core::bus::Hdma::write_high_dest (self=0x55555792bc98, value=22)
    at rboy2_core/src/bus.rs:735
735	        self.dest = self.dest & 0xF0 | (u16::from(value) << 8);
(gdb) n
736	    }
(gdb) p *self
$34 = rboy2_core::bus::Hdma {src: 53248, dest: 5840, transfer_mode: rboy2_core::bus::HdmaTransferMode::HBlank, blocks: 127, active: false}
(gdb) n
rboy2_core::bus::Bus::write (self=0x555557927a60, address=65363, value=22) at rboy2_core/src/bus.rs:603
603	    }
(gdb) n
rboy2_core::bus::Bus::write_cpu (self=0x555557927a60, address=65363, value=22) at rboy2_core/src/bus.rs:669
669	    }
(gdb) p cpu
No symbol 'cpu' in current context
(gdb) n
rboy2_core::cpu::Cpu::write_u8 (self=0x555557930868, bus=0x555557927a60, place=..., value=22)
    at rboy2_core/src/cpu/mod.rs:80
80	    }
(gdb) p *self
$35 = rboy2_core::cpu::Cpu {regs: rboy2_core::cpu::registers::Registers {pc: 16920, sp: 49377, a: 22, b: 127, c: 1, d: 150, e: 0, h: 208, l: 0, f: rboy2_core::cpu::registers::F (rboy2_core::cpu::registers::_::InternalBitFlags (32))}, ime: true, halted: false}
(gdb) p bus.current_rom_bank_number()
$36 = 65

The higher bit of the destination address of HDMA is set to $16 by the instruction at $4218 in bank $41 - inside _continue_HDMATransfer. I can match it to my logs with the CPU register values:

BANK:41 PC:4218 SP:c0e1 A:16 F:00100000 B:7f C:01 D:96 E:00 H:d0 L:00

Huh, A is $16 but D, which it was just copied from, is $96? It’s because the code ORs A with $1f before writing it to the destination address high byte register (HDMA3), and it does that because as says Pan Docs:

FF53–FF54 — HDMA3, HDMA4 (CGB Mode only): VRAM DMA destination (high, low) [write-only]

These two registers specify the address within 8000-9FF0 to which the data will be copied. Only bits 12-4 are respected; others are ignored. The four lower bits of this address will be ignored and treated as 0.

Bits 15 to 12 are supposed to be ignored, and in my write handling, they weren’t:

impl Hdma {
  fn write_high_dest(&mut self, value: u8) {
    self.dest = self.dest & 0xF0 | (u16::from(value) << 8);
  }
}

After an update:

impl Hdma {
  fn write_high_dest(&mut self, value: u8) {
    self.dest = (u16::from_be_bytes([value, self.dest.to_be_bytes()[1]]) & 0x1FF0) | 0x8000;
  }
}

Then finally, the game can show its opening animation:

animation of Pokémon Crystal’s intro, slightly glitchy

Before this, few GBC-only games would boot. Even now, Pokémon Crystal doesn’t let the player leave their room, nor does Pokémon Silver. The HDMA controller here only supports general transfers, not HBlank transfers which run periodically instead of all at once, and the sound is glitchy when a game turns on double speed mode. But those are problems for another day!

On top of the sources mentioned in the article, I also used mGBA to confirm the correct behavior of the ROM with its debugger.