Skip to content →

charming sound – the apu

To finish off our NES emulator, we will, of course, do the APU.
It consists of four voices / channels, very similar to the GameBoy:
– Two Squarewave Channels
– a Triangle Channel and
– a Noise Channel.


These channels consists of length counter, volume envelopes, frequency sweeps, a frame sequencer and duties – so pretty much everything we already know from the GameBoy APU.

Registers Channel Units
$4000-$4003 Pulse 1 Timer, length counter, envelope, sweep
$4004-$4007 Pulse 2 Timer, length counter, envelope, sweep
$4008-$400B Triangle Timer, length counter, linear counter
$400C-$400F Noise Timer, length counter, envelope, linear feedback shift register
$4010-$4013 DMC Timer, memory reader, sample buffer, output unit
$4015 All Channel enable and length counter status
$4017 All Frame counter

Important to know is, generally it takes 2 CPU cycles for 1 APU cycle. With this background, we can now start creating our channels.

First of all, we need to set up our APU.

void initAPU() {

	SDL_setenv("SDL_AUDIODRIVER", "directsound", 1);
	//SDL_setenv("SDL_AUDIODRIVER", "disk", 1); <-- this saves to an audio file, if you need to investigate your waveforms
	SDL_Init(SDL_INIT_AUDIO);

	// Open our audio device; Sample Rate will dictate the pace of our synthesizer
	SDLInitAudio(44100, 1024);

	if (!SoundIsPlaying)
	{
		SDL_PauseAudio(0); // we need to unpause SDL audio, this is necessary
		SoundIsPlaying = true;
	}
}

squarewave (SC1 / SC2)

Address Bitfield Description
$4000 DDlc.vvvv Pulse 1 Duty cycle, length counter halt, constant volume/envelope flag, and volume/envelope divider period
$4004 DDlc.vvvv Pulse 2 Duty cycle, length counter halt, constant volume/envelope flag, and volume/envelope divider period
$4001 EPPP.NSSS See APU Sweep
$4005 EPPP.NSSS See APU Sweep
$4002 LLLL.LLLL Pulse 1 timer Low 8 bits
$4006 LLLL.LLLL Pulse 2 timer Low 8 bits
$4003 llll.lHHH Pulse 1 length counter load and timer High 3 bits
$4007 llll.lHHH Pulse 2 length counter load and timer High 3 bits

After we make sure, that we have proper APU cycling, we will start to implement every feature of our channel.

volume envelope

if (((apu_cycles_sc1 == 3729 || apu_cycles_sc1 == 7457 || apu_cycles_sc1 == 11186 || apu_cycles_sc1 == 14915) && (readFromMem(0x4017) >> 7)) ||
			   ((apu_cycles_sc1 == 3729 || apu_cycles_sc1 == 7457 || apu_cycles_sc1 == 11186 || apu_cycles_sc1 == 18641) && (readFromMem(0x4017) >> 7) == 0)) {
	if (!SC1envelopeStart) {
		SC1envelopeDivider--;
		if (SC1envelopeDivider < 0) {
			SC1envelopeDivider = readFromMem(0x4000) & 0b1111;
			if (SC1envelope > 0) {
				SC1envelope--;
				SC1envelopeVol = SC1envelope;
			}
			else {
				SC1envelopeVol = SC1envelope;
				if (readFromMem(0x4000) & 0b100000) {
					SC1envelope = 15;
				}
			}
		}
	} 
	else {
		SC1envelopeStart = false;
		SC1envelope = 15;
		SC1envelopeDivider = readFromMem(0x4000) & 0b1111;
	}
}
if (SC1constantVolFlag) {
	SC1amp = SC1constantVol;
}
else {
	SC1amp = SC1envelopeVol;
}

First thing you notice, the envelope is ticked by the frame counter, or more of a cycle counter in this case. The counter dictates, on which APU cycles the volume envelope (and other stuff) is ticked – we will get to that in a second.
The envelope constantly decreases in Volume, except reloads when ticked and the volume is at zero. The constant volume (0x4000 & 0x10) flag dictates whether the envelope is actually used, or if the the constant volume is used.

mode 0:    mode 1:       function 
---------  -----------  -----------------------------  
- - - f    - - - - -    IRQ (if bit 6 is clear)  
- l - l    - l - - l    Length counter and sweep  
e e e e    e e e - e    Envelope and linear counter 

Above you can see when the cycle counter ticks each individual component (of each channel). This translates to these cycle numbers:

mode 1
------
apu_cycles_sc1 == 3729
apu_cycles_sc1 == 7457
apu_cycles_sc1 == 11186
apu_cycles_sc1 == 14915

mode 0
------ 
apu_cycles_sc1 == 3729
apu_cycles_sc1 == 7457
apu_cycles_sc1 == 11186
apu_cycles_sc1 == 18641

The mode is set in 0x4017 & 0x40, mode 1 if set, mode 0 if clear.

sweep & length counter

Since these components are ticked the same, we will do them in one go.

//	Length Counter & Sweep
if (apu_cycles_sc1 == 7456 || apu_cycles_sc1 == 14915) {
	//	Length Counter - NOT halted by flag
	if ((readFromMem(0x4000) & 0x20) == 0x00) {
		//	length > 0
		if (SC1len) {
			SC1len--;
		}
	}

	//	Sweep
	if (SC1sweepEnabled) {
		SC1sweepDivider--;
		if (SC1sweepDivider < 0) {
			int16_t post = SC1timerTarget >> (readFromMem(0x4001) & 0b111);
			int8_t neg = (readFromMem(0x4001) & 0b1000) ? -1 : 1;
			int16_t sum = (uint16_t)(post * neg);
			SC1timerTarget = SC1timerTarget + sum;
			if (SC1timerTarget >= 0x7ff || SC1timerTarget <= 8) {
				SC1amp = 0;
				writeToMem(0x4001, readFromMem(0x4001) & 0x7f);		
				SC1enabled = false;									
			}
		}
		if(SC1sweepDivider < 0 || SC1sweepReload) {
			SC1sweepReload = false;
			SC1sweepDivider = (readFromMem(0x4001) >> 4) & 0b111;
		}
	}
}
if (SC1len <= 0) {
	SC1enabled = false;
}

The length counter works as simple as it gets. If it is above zero, it gets reduced by one. If it hits zero, it disables the channel.
The (frequency) sweep is a little bit more complex. Basically, we can sweep upward or downwards (neg), and by a certain margin (post). The newly created sweep value will be set to the new target value for the (general) Timer – the one that actually controls the frequency of the output. (The other parts are preventing sounds too high or too low, and reloading the sweep).
I have covered this more intensively in the GameBoy section, and since this is the exact same approach, I’m only explaining the code a little bit here.

timer and duty

//	handle timer
if (SC1timer <= 0x00) {
	SC1timer = SC1timerTarget;

	//	tick duty pointer
	++SC1dutyIndex %= 8;
}
else {
	SC1timer--;
}

//	handle duty
int duty = readFromMem(0x4000) >> 6;
if (duties[duty][SC1dutyIndex] == 1)
	SC1freq = SC1amp;
else
	SC1freq = 0;
uint8_t duties[4][8] = {
	{0, 0, 0, 0, 0, 0, 0, 1 },
	{0, 0, 0, 0, 0, 0, 1, 1 },
	{0, 0, 0, 0, 1, 1, 1, 1 },
	{1, 1, 1, 1, 1, 1, 0, 0 }
};

The timer is what actually controls the frequency of the tone you are hearing, if it’s high or low. Whenever our timer expires, we reload it, and tick (and modulo) the duty index – else, we just decrease our timer.
The duty index selects the duty of the tone (how much of the pulse wave actually is positive, and how much is negative). These values are predetermined by the NES, and you can just copy the array as it is. Whenever we have a 1 as duty, the current volume will be used. Otherwise, there is no volume, and we output a 0.

filling audio buffer

if (!--SC1pcc) {
	SC1pcc = frames_per_sample;
	//	enabled channel
	if (SC1enabled && (readFromMem(0x4015) & 0b1) && SC1len) {
		SC1buf.push_back((float)SC1freq / 100);
		SC1buf.push_back((float)SC1freq / 100);
	}
	//	disabled channel
	else {
		SC1buf.push_back(0);
		SC1buf.push_back(0);
	}
}

Since we need to sync to 44100 Hz (we use it, just like in the GB emulator), we only select certain audio samples, and not every single one created. For me, frames_per_second set to 18 worked pretty well.
So, if our channel is still enabled, we have a value in our length counter, and the Status Register (0x4015), we push the value to our buffer, if not, we push zero.

Triangle (SC3)

$4008 CRRR.RRRR Linear counter setup (write)
bit 7 C—.—- Control flag (this bit is also the length counter halt flag)
bits 6-0 -RRR RRRR Counter reload value
$400A LLLL.LLLL Timer low (write)
bits 7-0 LLLL LLLL Timer low 8 bits
$400B llll.lHHH Length counter load and timer high (write)
bits 2-0 —- -HHH Timer high 3 bits

The Triangle Channel doesn’t have an envelope, nor a sweep, but it has a linear counter, to give it the triangle shaped waveform.

//	Linear Counter
if (((apu_cycles_sc3 == 3729 || apu_cycles_sc3 == 7457 || apu_cycles_sc3 == 11186 || apu_cycles_sc3 == 14915) && (readFromMem(0x4017) >> 7)) ||
	((apu_cycles_sc3 == 3729 || apu_cycles_sc3 == 7457 || apu_cycles_sc3 == 11186 || apu_cycles_sc3 == 18641) && (readFromMem(0x4017) >> 7) == 0)) {

	if (SC3linearReload) {
		SC3linearCounter = SC3linearReloadValue;
	}
	else if (SC3linearCounter > 0) {
		SC3linearCounter--;
	}
	if (!SC3controlFlag) {
		SC3linearReload = false;
	}

}
//	handle timer
if (SC3timer == 0x00) {
	SC3timer = SC3timerTarget;

	if (SC3linearCounter && SC3len) {

		SC3ampIndex = (SC3ampIndex + 1) & 0x1F;

		//	handle amp from table
		if (SC3timerTarget >= 2 && SC3timerTarget <= 0x7ff) {
			SC3freq = SC3triangleAmps[SC3ampIndex];
		}
	}

}
else {
	SC3timer--;
}

So this has the effect on our (general) Timer, that it doesn’t work with duties, nor variable frequencies, but with the linear counter instead – and triangle values. These are the only differences to SC1 and SC2.

uint8_t SC3triangleAmps[32] = { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

These values are, again, predetermined and can be copied just like that.

noise channel (SC4)

$400C –lc.vvvv Length counter halt, constant volume/envelope flag, and volume/envelope divider period (write)
$400E M—.PPPP Mode and period (write)
bit 7 M— —- Mode flag
bits 3-0 —- PPPP The timer period is set to entry P of the following:
Rate $0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $A $B $C $D $E $F ————————————————————————– NTSC 4, 8, 16, 32, 64, 96, 128, 160, 202, 254, 380, 508, 762, 1016, 2034, 4068 PAL 4, 8, 14, 30, 60, 88, 118, 148, 188, 236, 354, 472, 708, 944, 1890, 3778
$400F llll.l— Length counter load and envelope restart (write)

The noise channel comes with an envelope, just like SC1 and SC2, but with a different way to get ones and zeroes, than using duties, but rather with calculating semi-random values to get the distorted sounds we all know (yeah, same as on the GameBoy).

//	handle timer
if (SC4timer <= 0x00) {
	SC4timer = SC4timerTarget;

	//	calc lsfr
	/*When the timer clocks the shift register, the following actions occur in order:

	Feedback is calculated as the exclusive-OR of bit 0 and one other bit: bit 6 if Mode flag is set, otherwise bit 1.
	The shift register is shifted right by one bit.
	Bit 14, the leftmost bit, is set to the feedback calculated earlier.*/
	uint16_t feedback = (SC4modeFlag) ? (SC4lfsr & 1) ^ ((SC4lfsr >> 6) & 1) : (SC4lfsr & 1) ^ ((SC4lfsr >> 1) & 1);
	SC4lfsr >>= 1;
	SC4lfsr |= feedback << 14;
}
else {
	SC4timer--;
}
if (!--SC4pcc) {
	SC4pcc = frames_per_sample;
	//	enabled channel
	if (SC4enabled && (readFromMem(0x4015) & 0b1000) && SC4len) {
		SC4buf.push_back((SC4lfsr & 1) ? 0 : ((float)SC4amp / 100));
		SC4buf.push_back((SC4lfsr & 1) ? 0 : ((float)SC4amp / 100));
	}
	//	disabled channel
	else {
		SC4buf.push_back(0);
		SC4buf.push_back(0);
	}
}

As you can see, we calculate the LFSR (which has a starting value of 1, on boot) with the description you can see in the code. This is responsible if we push full volume to our buffer, or zero.

playing from our buffers

The only thing left is, to let our buffers fill up, mix our channels together, queue our audio data to SDL, and wait until the buffer drains to a certain point – and that is exactly what we are going to do.

void stepAPU(unsigned char cycles) {

	stepSC1(cycles);
	stepSC2(cycles);
	stepSC3(cycles);
	stepSC4(cycles);

	if (SC1buf.size() >= 100 && SC2buf.size() >= 100 && SC3buf.size() >= 100 && SC4buf.size() >= 100) {

		for (int i = 0; i < 100; i++) {
			float res = 0;
			if (useSC1)
				res += SC1buf.at(i) * volume;
			if (useSC2)
				res += SC2buf.at(i) * volume;
			if (useSC3)
				res += SC3buf.at(i) * volume;
			if (useSC4)
				res += SC4buf.at(i) * volume;
			Mixbuf.push_back(res);
		}
		//	send audio data to device; buffer is times 4, because we use floats now, which have 4 bytes per float, and buffer needs to have information of amount of bytes to be used
		SDL_QueueAudio(1, Mixbuf.data(), Mixbuf.size() * 4);

		SC1buf.clear();
		SC2buf.clear();
		SC3buf.clear();
		SC4buf.clear();
		Mixbuf.clear();

		//TODO: we could, instead of just idling everything until music buffer is drained, at least call stepPPU(0), to have a constant draw cycle, and maybe have a smoother drawing?
		while (SDL_GetQueuedAudioSize(1) > 4096 * 4) {
		}
	}

}

Comments

Leave a Reply