Skip to content →

getting started with the cpu

So, to be able to properly emulate the CPU, I had to emulate all of it’s internals. The CPU is very similar to a Z80, and 8080, yet people aren’t quite sure which it actually is.

The more important part is, the instruction set of the cpu is known, and well documented. So, we have to set up the internal structure of the cpu, which contains registers, flags, a stack pointer and a program counter.

struct Registers
{
	unsigned char A;
	unsigned char B;
	unsigned char C;
	unsigned char D;
	unsigned char E;
	unsigned char F;
	unsigned char H;
	unsigned char L;
};

struct Flags
{
	unsigned char Z; //	ZERO FLAG; set to 1 if current operation results in Zero, or two values match on a CMP operation
	unsigned char N; //	SUBTRACT FLAG; set to 1 if a subtraction was performed
	unsigned char H; //	HALF CARRY FLAG; set to 1 if a carry occured from the lower nibble in the last operation
	unsigned char C; //	CARRY FLAG; set to 1 if a carry occured in the last operation or if A is the smaller value on CP instruction
	bool HALT;
};

As you can see, there are 8 internal registers, which are accessible on their own, but often are accessed in pairs (AF, BC, DE, HL).

AF is the A-register together with the Flag-register, so all 4 flags are stored in the high-nibble of a byte in that case. (0xZNHC0000)

This is the basic structure, with which we can use our CPU. Now, my first step, was to get the GameBoy boot-ROM to boot properly. It is only 256 bytes in size, and well documented, e.g. here. Each step of the boot-ROM is explained, so it is pretty easy to understand, what is happening, and why.

So, after we load up the boot-ROM into our virtual memory…

//	load bootrom
unsigned char memory[0x100];
FILE* file = fopen("gbboot.rom", "rb");
int pos = 0;
while (fread(&memory[pos], 1, 1, file)) {
	pos++;
}
fclose(file);
notice that we use unsigned char for a datatype, as we will read a single byte on each memory address

.. we can start to create our main loop, and a rudimentary CPU-class.

int main() {
  while(1) {
    // call our cpu
    stepCPU(pc, sp, registers, flags)
  }
  return 0;
}

The cpu’s job is, to read the byte at the memory address that our program counter (PC) is pointing to. This byte is then interpreted as the CPUs according opcode, which we will need to implement (so the CPU actually knows, what it’s supposed to do´).

Therefore, we pass all of our structure to our CPU (or a reference of it, to be more precise), and let the CPU handle the rest.

So, our CPU is going to be a giant switch-statement, that can handle each of the Gameboy’s opcodes. There are around 500 opcodes, that need to be implemented, so this is one of the biggest parts of the work.

With a short look at izik’s GBOps table, you can see how opcodes are encoded:

Gameboy Opcodes

For example, if we read out the byte that our PC is currently pointing at, and it is a 0x14, our opcode to execute would be INC D, which in that case, would increase the D-register. You can also see, the operation does affect the Z-Flag, resets the N-Flag, affects the H-Flag, and does not affect the C-Flag.

Our CPU function should be looking something like this:

int stepCPU(uint16_t& pc, uint16_t& sp, Registers& registers, Flags& flags) {

	switch (memory[pc]) {

	    //	NOP
    	case 0x00: 
            pc++; 
            return 1; 
            break;
        ...
        default:
	    	printf("Unsupported opcode: 0x%02x at 0x%04x\n\n\n", opcode, pc);
	    	std::exit(EXIT_FAILURE);
		    pc++;
    		return 0;
	    	break;
	}

I added a default printf() so I would be notified by my code, if there is an opcode, that isn’t implemented yet.

The general principle that applies here is, the CPU reads memory at PC’s location, processes whatever it is supposed to do (add, subtract, jump, etc.) and then advances PC to the next address. The address varies on the opcode’s length. There are opcodes as the previously mentioned INC D, which is of a single byte’s length, and then there are opcodes as e.g. ADD D, n, that (in this case) has a length of 2, so the PC would have to be incremented twice, instead of once.

(0xC6) ADD A, n - adds an immediate byte to the Register A

Example:
PC = 0x123
our PC shows to 0x123
memory[0x123] = 0xC6 
our memory reads 0xC6 at that location, which translates to ADD A, n
registers.A = registers.A + memory[pc + 1]
we read the immediate byte and add it to register A
PC += 2
PC is increased by two

and now this loops to the next instruction..

The main work in the beginning was running the code, seeing at which opcode I got a console output, implement the according opcode, and run again. Once you reach a PC above or equal to 0x100 you have successfully booted your Gameboy’s boot-ROM.

Note: The boot-ROM isn't really necessary to run games. You can just set your PC = 0x100 from the beginning, and your games will boot anyway.

But who would want to miss the Nintendo logo animation, and the satisfying *po-Ling* sound in the beginning? ;-)

You may have thought by now “how are you going to fit ~500 opcodes, in a single-byte encoding?” The answer is pretty easy: you don’t.

The Gameboy’s CPU has an extended instruction register, and this is achieved by using a specific byte as a prefix for a second opcode table. That mean’s, every byte you come across that is 0xCB isn’t an opcode, but a sign for you, that the next byte will be an instruction in the CB-table.

0xCB prefixed opcodes

Therefore, our switch-statement, has another switch-statement inside:

//
//	From here on CB-Prefix OpCodes
//
case 0xcb:
	switch (readFromMem(pc + 1))
	{
	//	RLC - L2 - T8 - Z00C
	//	Rotate left TO carry bit
	case 0x00: return op_rlc(registers.B, flags, registers, pc); break;
	case 0x01: return op_rlc(registers.C, flags, registers, pc); break;
	case 0x02: return op_rlc(registers.D, flags, registers, pc); break;
	case 0x03: return op_rlc(registers.E, flags, registers, pc); break;

With all this new knowledge, some help from the discord, and a few days of eager work (and actually having a blast with it), implementing the opcodes to make the boot-ROM work wasn’t that much work, but implementing every single other opcode took several days.

You will eventually come across a moment (or a ton of them), where your CPU won’t behave the way you expect it, and you will have to debug your CPU. For this case, you can always take a look at BGB, which offers a very nice debugger, that let’s you step through single instructions, making you able to compare your own debugger disassembly to how it really is supposed to run.

BGB’s debugger

Most of the time, masking addresses, or handling the Flags the right way can be really problematic. They cause your CPU to run horribly wrong at one point, but the error will be really hard to spot, and may cause you to facepalm, once you finally caught the bug after 3 consecutive days of debugging, because you simply mixed up a bitwise-OR with a bitwise-AND, or a plus with a minus.

So, once we have all our opcodes implemented, and are … well, rather… sure that they are correct, it’s time to test our CPU!

Comments

Leave a Reply