Putting the things together

So, over time I have discussed various isolated things related to 8088-based PCs. Specifically:

These topics are not as isolated as they seem at first. Namely, I was already using the auto-EOI trick for the streaming data program, to get the best possible performance. And I streamed audio data, which is related to sound cards. When I discussed the latching timer, I also hinted at music already (and the auto-EOI feature). And again, when I discussed auto-EOI in detail, I mentioned digital audio playback.

Once I had built my Tandy sound card (using the PCB that I ordered from lo-tech, many thanks to James Pearce for making this possible), I needed software to do something with it. The easiest way to get it to play something, is to use VGM files. There are quite a few captured songs from games (mostly from Sega Master System, which used the same audio chip), and various trackers can also export songs to VGM.

VGM is a very simple file format: it simply stores the raw commands sent to the sound chip(s). Between commands, it stores delays. There are simple VGM files, which only update the music 50 times or 60 times per second (synchronized to PAL or NTSC screen updates). These are easy to play: Just set up your timer interrupt to fire at that rate, and output the commands. But then there’s the more interesting files, which contain digital samples, which play at much higher rates, and they are stored with very fine-grained delay commands. These delays are in ticks of 44.1 kHz resolution. So the format is flexible enough to support very fast updates to sound chips, eg for outputting single samples at a time, up to 44.1 kHz sample rate.

The question of course is: how do you play that? On a modern system, it’s no problem to process data at 44.1 kHz in realtime. But on an 8088 at 4.77 MHz, not so much. You have about 108 CPU cycles to process each sample. That is barely enough to just process an interrupt, let alone actually processing any logic and outputting data to the sound chip. A single write to the SN76489 takes about 42 CPU cycles by the way.

So the naïve way of just firing a timer interrupt at 44.1 kHz is not going to work. Polling a timer is also going to be difficult, because it takes quite some time to read a 16-bit value from the PIT. And those 16-bit values can only account for an 18.2 Hz rate at the lowest, which gets you about 50 ms as maximum delay, so you will want to detect wraparound and extend it to 32-bit to be able to handle longer delays in the file. This will make it difficult to keep up high-frequency data as well. It would also tie up the CPU 100%, so you can’t do anything else while playing music.

But what if we view VGM not as a file containing 44.1 kHz samples, but rather as a timeline of events, where the resolution is 44.1 kHz, but the actual event rate is generally much lower than 44.1 kHz? Now this sounds an awful lot like the earlier raster effect with the latched timer interrupt! We’ve seen there that it’s possible to reprogram the timer interrupt from inside the interrupt handler. By not resetting the timer, but merely setting a new countdown value, we avoid any jitter, so we remain at an ‘absolute’ time scale. The only downside is that the countdown value gets activated immediately after the counter goes 0, so that is before the CPU can reach your interrupt handler. Effectively that means you always need to plan 1 interrupt handler ahead.

I have tried to draw out how the note data and delay data is stored in the VGM file, and how they need to be processed by the interrupt handlers:

vgm-data-order

We basically have two challenges here, with the VGM format:

  1. We need a way to ‘snoop ahead’ to get the proper delay to set in the current handler.
  2. We need to process the VGM data as quickly as possible.

For the first challenge, I initially divided up my VGM command processor into two: one that would send commands to the SN76489 until it encountered a delay command. The other would skip all data until it encountered a delay command, and returned its value.

Each processor has its own internal pointer, so in theory they could be in completely different places in the file. Making the delay processor be ‘one ahead’ in the stream was simple this way.

There was still that second challenge however: Firstly, I had to process the VGM stream byte-by-byte, and act on each command explicitly in a switch-case statement. Secondly, the delay values were in 44.1 kHz ticks, so I had to translate them to 1.19 MHz ticks for the PIT. Even though I initially tried with a look-up-table for short delays, it still wasn’t all that fast.

So eventually I decided that I would just preprocess the data into my own format, and play from there. The format could be really simple, just runs of:

uint16_t delay;
uint8_t data_count;
uint8_t data[data_count]

Where ‘delay’ is already in PIT ticks, and since there is only one command in VGM for the SN76489, which sends a single byte to its command port, I can just group them together in a single buffer. This is nice and compact.

I have now reversed the order of the delays and note data in the stream, and in the following diagram you can see how that simplifies the processing for the interrupt handlers:

preprocessed-data-order

As you can see, I can now just process the data ‘in order’: The first delay is sent at initialization, then I just process note data and delays as they occur in the stream.

Since I support a data_count of 0, I can get around the limitation of the PIT only being able to wait for 65536 ticks at most: I can just split up longer delays into multiple blocks with 0 commands.

I only use a byte for the data_count. That means I can only support 255 command bytes at most. Is that a problem? Well no, because as mentioned above, a single write takes 42 CPU cycles, and there are about 108 CPU cycles in a single tick at 44.1 kHz. Therefore, you couldn’t physically send more than 2 bytes to the SN76489 in a single tick. The third byte would already trickle over into the next tick. So if I were ever to encounter more than 255 bytes with no delays, then I could just add a delay of 1 in my stream, and split up the commands. In practice it is highly unlikely that you’ll ever encounter this. There are only 4 different channels on the chip, and the longest commands you can send are two bytes. You might also want to adjust the volume of each channel, which is 1 byte. So worst-case, you’d probably send 12 bytes at a time to the chip. Then you’d want a delay so you could actually hear the change take effect.

That’s all there is to it! This system can now play VGM data at the resolution of 44.1 kHz, with the only limitation being that you can’t have too many commands in too short a period of time, because the CPU and/or the SN76489 chip will not be able to keep up.

Well, not really, because there is a third challenge:

  1. VGM files (or the preprocessed data derived from it) may exceed 64k (a single segment) or even 640k (the maximum amount of conventional memory in an 8088 system).

Initially I just wanted to accept these limitations, and just load as much of the file into memory as possible, and play only that portion. But then I figured: technically this routine is a ‘background’-routine since it is entirely driven by an interrupt, and I can still run other code in the ‘foreground’, as long as the music doesn’t keep the CPU too busy.

This brought me back to the earlier experiment with streaming PWM/PCM data to PC speaker and Covox. The idea of loading the data into a ringbuffer of 64k and placing the int handler inside this ringbuffer makes a lot of sense in this scenario as well.

Since the data is all preprocessed, the actual interrupt handler is very compact and simple, and copying it around is very little overhead. The data rate should also be relatively low, unless VGMs use a lot of samples. In most cases, a HDD, or even a floppy, should be able to keep up with the data. So I gave it a try, and indeed, it works:

Or well, it would, if the floppy could keep up! This is a VGM capture of the music from Skate or Die, by Rob Hubbard. It uses samples extensively, so it is a bit of ‘worst case’ for my player. But as you can hear, it plays the samples properly, even while it is loading from disk. It only messes up when there’s a buffer underrun, but eventually recovers. Simpler VGM files play perfectly from floppy. Sadly this machine does not have a HDD, so I will need to try the Skate or Die music again some other time, when I have installed the card into a system with a HDD. I’m confident that it will then keep up and play the music perfectly.

But for now, I have other plans. They are also music-related, and I hope to have a quick demonstration of those before long.

This entry was posted in Oldskool/retro programming and tagged , , , , , , , , , , , , , , , . Bookmark the permalink.