bleeding ears – time to add audio

The title is a serious warning. If you want to get sound working for your emulator, prepare to have your ears stressed out to the maximum.

The audio / sound processing unit, or SPU called from now on, was really one of the hardest things on the emulator, but it would have never felt complete without audio. So, prepare to bite down on your mouthpiece, and power through.

The SPU consists of 4 voices, SC1, SC2, SC3 and SC4.

SC1	Square-Wave Channel – Sweep – Volume Envelope – Length Counter
SC2	Square-Wave Channel – Volume Envelope – Length Counter
SC3	Waveform Channel – Length Counter
SC4	Noise Channel – Volume Envelope

sound registers

        SC1
NR10 0xFF10 -PPP NSSS  Sweep period, negate, shift 
NR11 0xFF11 DDLL LLLL  Duty, Length load (64-L) 
NR12 0xFF12 VVVV APPP  Starting volume, Envelope add mode, period 
NR13 0xFF13 FFFF FFFF  Frequency LSB 
NR14 0xFF14 TL-- -FFF  Trigger, Length enable, Frequency MSB

        SC2      
     0xFF15 ---- ----  Not used 
NR21 0xFF16 DDLL LLLL  Duty, Length load (64-L) 
NR22 0xFF17 VVVV APPP  Starting volume, Envelope add mode, period 
NR23 0xFF18 FFFF FFFF  Frequency LSB 
NR24 0xFF19 TL-- -FFF  Trigger, Length enable, Frequency MSB

        Wave 
NR30 0xFF1A E--- ----  DAC power 
NR31 0xFF1B LLLL LLLL  Length load (256-L) 
NR32 0xFF1C -VV- ----  Volume code (00=0%, 01=100%, 10=50%, 11=25%) 
NR33 0xFF1D FFFF FFFF  Frequency LSB 
NR34 0xFF1E TL-- -FFF  Trigger, Length enable, Frequency MSB

        Wave Table      
     0xFF30 0000 1111 Samples 0 and 1
     ....      
     0xFF3F 0000 1111 Samples 30 and 31 

        Noise
     0xFF1F ---- ----  Not used 
NR41 0xFF20 --LL LLLL  Length load (64-L) 
NR42 0xFF21 VVVV APPP  Starting volume, Envelope add mode, period 
NR43 0xFF22 SSSS WDDD  Clock shift, Width mode of LFSR, Divisor code 
NR44 0xFF23 TL-- ----  Trigger, Length enable        Control/Status 

        SPU Power
NR50 0xFF24 ALLL BRRR  Vin L enable, Left vol, Vin R enable, Right vol 
NR51 0xFF25 NW21 NW21  Left enables, Right enables 
NR52 0xFF26 P--- NW21  Power control/status, Channel length statuses 


(props to http://gbdev.gg8.se/wiki)

general

Each channel has a frequency, where the word ‘frequency’ might irritate you in this context. Think of it as a timer, that ticks down, and as soon as it reaches zero, it ticks its inner FrameSequencer.

Every channel has a memory address, that when written to, and bit 7 of the value written to it is 1, will trigger the channel.

Triggering a channel entails multiple steps for each channel. Mostly resetting timers, reloading volumes and other values.

frame sequencer

The frame sequencer dictates which of the channels gets clocked. The FS is ticked whenever the channels frequency / timer expired. The FS is one byte wide.

Step	0	1	2	3	4	5	6	7
Len Ctr	Clock	–	Clock	–	Clock	–	Clock	–
Vol Env	–	–	–	–	–	–	–	Clock
Sweep	–	–	Clock	–	–	–	Clock	–

As soon as the FS exceeds its length, it wraps around.

length counter

The length counter ticks down, every time it’s clocked. Once it reaches zero, it disables the channel, therefore giving the option to set the duration a tone is supposed to play. The length counter will only run, if it is enabled by the corresponding length enable bit of the channel.

volume envelope

The volume envelope allows for tones do increase or decrease in volume. Every time it is clocked by the FS, it decreases the envelope. Once it reaches zero, the volume is increased or decreased by one, depending on the negate bit. If the new volume is not in the range of 0 to 15 it is ignored, and volume envelope is disabled. Otherwise the new volume is applied to the channel.

sweep

The sweep, or much rather frequency sweep, is used for the well known, Gameboy typical sound effects, that include lots of boioioing, peeewww and boom. Also being clocked by the FS, once it reaches zero, it shifts (and possibly negates, depending on the negate bit) the current frequency. Then a check is performed, if the frequency is in the range of 0 and 2047. If that is the case, the new frequency will be stored in a shadow register, and will be written back to memory!

If the check is failed, the channel and the sweep are completely disabled, until enabled again by a trigger.

power

The power registers allow to enable / disable the complete audio unit, or separate channels, or even pan the audio to different (stereo) speakers.

sync to cpu

It was hard to get the audio synced to the CPU, and not have either audio pops and cracks, or a choppy emulation, when either of the two had to wait for the other. In the end, it was solved by syncing the emulation to the audio, so the CPU will wait to execute more opcodes until the APU / SPU is done playing the current bits.

implementation

The general approach was to use SDL’s audio queuing function, thus guaranteeing that there is always audio to play, and no situation occurs, where we can’t feed the audio data fast enough.

//      BASIC SDL AUDIO INITIALIZATION
internal void SDLInitAudio(int32_t SamplesPerSecond, int32_t BufferSize)
{
	SDL_AudioSpec AudioSettings = { 0 };

	AudioSettings.freq = SamplesPerSecond;
	AudioSettings.format = AUDIO_F32SYS;
	AudioSettings.channels = 2;
	AudioSettings.samples = BufferSize;

	SDL_OpenAudio(&AudioSettings, 0);

}

So, I created buffers for all the channels, and since we need a proper sample rate a framework / soundcard can work with, I chose 44100 Hz. Therefore I needed to get 44100 samples per second, from each channel. Since our channels clock way more than 44100 Hz, I needed to filter out only every 1/44100th sample per second.

void stepSPU(unsigned char cycles) {

	stepSC1(cycles);
	stepSC2(cycles);
	stepSC3(cycles);
	stepSC4(cycles);

	if (SC1buf.size() >= 100 && SC2buf.size() >= 100 && SC3buf.size() >= 100 && SC4buf.size() >= 100) {

		for (int i = 0; i < 100; i++) {
			float res = 0;
			if(useSC1)
				res += SC1buf.at(i) * volume;
			if (useSC2)
				res += SC2buf.at(i) * volume;
			if (useSC3)
				res += SC3buf.at(i) * volume;
			if (useSC4)
				res += SC4buf.at(i) * volume;
			Mixbuf.push_back(res);
		}
		//	send audio data to device; buffer is times 4, because we use floats now, which have 4 bytes per float, and buffer needs to have information of amount of bytes to be used
		SDL_QueueAudio(1, Mixbuf.data(), Mixbuf.size() * 4);

		SC1buf.clear();
		SC2buf.clear();
		SC3buf.clear();
		SC4buf.clear();
		Mixbuf.clear();
		
		while (SDL_GetQueuedAudioSize(1) > 4096 * 4) {}
	}

}

Once all buffers exceed 100 samples, I push to SDL’s audio buffer, just after mixing all 4 channels together, by simply adding the amplitudes, multiplied by the volume. After queuing the audio data, clear all buffers, and we are set.

pitfalls / bugs

Dealing with all the different effects can be hard enough to conquer, but it is even worse when the framework you are using has some serious bugs, that might drive you insane. So happened to me with SDL 2.0.9 on Windows. Apparently this version has a serious bug when you have (heavy loading) USB devices attached to your computer. This will cause micro lag every 3-7 seconds. So every few seconds, my frame length went from ~10-11 ms up to 55-75 ms, which of course causes choppy and laggy emulation, in the PPU and the APU/SPU.

That is why I spend 2 full days, trying to debug my code, when it was actually SDL’s fault. Going back to 2.0.8. did the trick, and the micro lag was gone.

Other than that, making audio work was quite the sculpting process. It went from horrible sounds and almost breaking my speakers, to the first recognizable sounds like this:

First not horrible sounds from SC1

Then you try to iron out the quirks, implement the features of the channel, and have something almost decent like this:

First proper SC1 playback

And finally, when everything is put together, you have something nice to hear:

Audio completed

I still encountered bugs, like the high pitch noise you can hear in the following Mystic Quest / Final Fantasy Adventure video. This was due to not disabling channels properly, so it was easy to fix.

gimmick

When I worked on the APU / SPU I stumbled upon a little bug, that actually sounded very nice, so I decided to dress my bug as a feature, and implement a remix-mode (user can toggle it on / off), where the audio is a little bit different, but has a very catchy tune to it, I would say.

Note: It's only a type change of the waveform variable from unsigned, to signed.

Remix mode demo

So, after I finished sound, it was time to make use of this emulator, and get more games running. So, let’s go bug hunting.

⯈ bug hunting – get more games going