Putting the things together, part 2: MIDI and other problems

Remember a few months ago, when I explained my approach to playing VGM files? Well, VGM files are remarkably similar to Standard MIDI files. In a way, MIDI files are also just time-stamped captures of data sent to a sound device. MIDI however is an even stronger case for my approach than VGM is, since MIDI has even higher resolution (up to microsecond resolution, that is 1 MHz).

So when I was experimenting with some old MIDI hardware, I developed my own MIDI player. I then decided to integrate it with the VGM preprocessor, and use the same technology and file format. This of course opened up a can of worms…

(For more background information on MIDI and its various aspects, see also this earlier post).

You know what they say about assumptions…

The main assumption I made with the VGM replayer is that all events are on an absolute timeline with 44.1 kHz resolution. The VGM format has delay codes, where each delay is relative to the end of the previous delay. MIDI is very similar, the main difference is that MIDI is more of a ‘time-stamped event’ format. This means that each individual event has a delay, and in the case of multiple events occuring at the same time, a delay value of 0 is supported. VGM on the other hand supports any number of events between delays.

So implicitly, you assume here that the events/commands do not take any time whatsoever to perform, since the delays do not take any processing time for the events/commands into account. This means that in theory, you could have situations where there is a delay shorter than the time it takes to output all data, so the next event starts while the previous data is still in progress:

Overlapping data

In practice, this should not be a problem with VGM. Namely, VGM was originally developed as a format for capturing sound chip register writes in emulators. Since the software was written on actual hardware, the register writes will implicitly never overlap. As long as the emulator accurately emulates the hardware and accurately generates the delay-values, you should never have any ‘physically impossible’ VGM data.

MIDI is different…

With MIDI, there are a number of reasons why you actually can get ‘physically impossible’ MIDI data. One reason is that MIDI is not necessarily just captured data. It can be edited in a sequencer, or even generated altogether. Aside from that, a MIDI file is not necessarily just a single part, but can be a combination of multiple captures (multi-track MIDI files).

Aside from that, not all MIDI interfaces may be the same speed. The original serial MIDI interface is specified as 31.25 kbps, one start bit, one stop bit, and no parity. This means that every byte is transmitted as a frame of 10 bits, so you can send 3125 bytes per second over a serial MIDI link. However, there are other ways to transfer MIDI data. For example, if you use a synthesizer with a built-in sequencer, it does not necessarily have to go through a physical MIDI link, but the keyboard input can be processed directly by the sequencer, via a faster bus. Or instead of a serial link, you could use a more modern connection, such as USB, FireWire, ethernet or WiFi, which are much faster as well. Or you might not even use physical hardware at all, but virtual instruments with a VSTi interface or such.

In short, it is certainly legal for MIDI data to have delays that are ‘impossible’ to play on certain MIDI interfaces, and I have actually encountered quite a few of these MIDI files during my experiments.

But what is the problem?

We have established that ‘impossible’ delays exist in the MIDI world. But apparently this is not usually a problem, since people use MIDI all the time. Why is it not a problem for most people? And why is it a problem for this particular method?

The reason why it is not a problem in most cases, is because the timing is generally decoupled from the sending of data. That is, the data is generally put into some FIFO buffer, so you can buffer some data while it is waiting for the MIDI interface to finish sending the earlier data.

Another thing is that timing is generally handled by dedicated hardware. If you implement the events with a simple timer that is being polled, and the event being processed as soon as the timer has passed the delay-point, then the timing will remain absolute, and it will automatically correct itself as soon as all data has been sent. The timer just continues to run at the correct speed at all times.

Why is this not the case with this specific approach? It is because this approach relies on reprogramming the timer at every event, making use of the latched properties of the timer to avoid any jitter, as explained earlier. This only works however if the timer is in the rate-generator mode, so it automatically restarts every time the counter reaches 0.

This means that we have to write a new value to the timer before it can reach 0 again, otherwise it will repeat the previous value. And this is where our problem is: when the counter reaches 0, an interrupt is generated. In the handler for this interrupt, I output the data for the event, and then write the new counter value (actually for two interrupts ahead, not the next one). If I were to write a counter value that is too small, then that means that the next interrupt will be fired while we are still in the interrupt handler for the previous event. Interrupts will still be disabled, so this timer event will be missed, and the timer will restart with the same value, meaning that our timing is now thrown off, and is no longer on the absolute scale.

Is there a solution?

Well, that is a shame… we had this very nice and elegant approach to playing music data, and now everything is falling apart. Or is it? Well, we do know that worst-case, we can send data at 3125 bytes per second. We also know how many bytes we need to send for each event. Which means that we can deduce how long it takes to process each event.

This means that we can mimic the behaviour of ‘normal’ FIFO-buffered MIDI interfaces: When an event has an ‘impossible’ delay, we can concatenate its data onto the previous event. Furthermore, we can add up the delay values, so that the absolute timing is preserved. This way we can ensure that the interrupt will never fire while the previous handler is still busy.

So, taking the problematic events in the diagram above, we fix it like this:

Regrouped data

The purple part shows the two ‘clashing events’, which have now been regrouped to a single event. The arrows show that the delays have been added together, so that the total delay for the event after that is still absolute. This means that we do not trade in any accuracy either, since a ‘real’ MIDI interface with a FIFO buffer would have treated it the same way as well: the second MIDI event would effectively be concatenated to the previous data in the FIFO buffer. It wouldn’t physically be possible to send it any faster over the MIDI interface.

This regrouping can be done for more than just two events: you can keep concatenating data until eventually you reach a delay that is ‘possible’ again: one that fires after the data has been sent.

Here is an example of the MIDI player running on an 8088 machine at 4.77 MHz. The MIDI device is a DreamBlaster S2P (a prototype from Serdaco), which connects to the printer port. This requires the CPU to trigger the signal lines of the printer port at the correct times to transfer each individual MIDI byte:

This entry was posted in Oldskool/retro programming and tagged , , , , , , , , , , , , , , , . Bookmark the permalink.

5 Responses to Putting the things together, part 2: MIDI and other problems

  1. soviet says:

    Very interesting now i figured out why the “little delay” warning on the screen.

  2. Thanks for this write-up. I’ve never been this deep down into a MIDI player implementation (yet?) but I did run across it, more as a thought experiment than a practical problem.

    It may be useful to logically separate MIDI as a logical protocol, the transport interface, and SMF (Standard MIDI File). The delays (deltas) are specific to SMF and have no counterpart elsewhere in MIDI.

    I believe the transports (at least the original serial one and the USB MIDI class) say absolutely nothing about timing. As you say, the classic serial interface has a known bandwidth and a fairly predictable behavior if you keep the FIFOs full. (It’s also not terribly fast, which leads to various ways of optimizing MIDI event data.) USB is in comparison quite unpredictable. There is a known maximum bandwidth, but that is purely theoretical because a USB MIDI device will likely share the bus with others. There is a difficult to predict latency as well, caused by the OS and USB host controller hardware. And interestingly, low/full-speed USB works on a 1 millisecond cycle, so even though USB has much higher bandwidth than a serial MIDI cable, it has also higher latency.

    Does any of that matter? That’s what I don’t know. Do MIDI synths quantize the incoming events? Probably. Can the human ear tell a few milliseconds off here and there? Not really. But it is bothersome.

    • Scali says:

      It may be useful to logically separate MIDI as a logical protocol, the transport interface, and SMF (Standard MIDI File). The delays (deltas) are specific to SMF and have no counterpart elsewhere in MIDI.

      Yes, I already mentioned that in an earlier post, so I added a link to that: https://scalibq.wordpress.com/2017/03/29/trackers-vs-midi/

      Do MIDI synths quantize the incoming events? Probably.

      Yes, I would say that digital synths would at least need to quantize the events to the sampling rate.
      Early digital devices generally had only about 32 kHz sampling rate. So at the very least they could not start or stop notes at a resolution of more than 1/32000th of a second, which is a much lower resolution than the 1 MHz resolution used by the MIDI timestamps.
      Later synths would have higher resolutions, such as 44.1, 48, 96, 192 kHz… but still nowhere near 1 MHz.

      But it is bothersome.

      Yes, I suppose my approach is a bit OCD in a way… I wanted to have the fastest possible replay method, but I also wanted to get the best accuracy possible.
      The way I designed my replayer, there was no reason to quantize the data (other than the ‘quantizing’ or whatever you want to call it when I combine the data of two or more events).
      Many DOS MIDI players do quantize the data, but they do this mainly because they can then use a fixed-speed timer interrupt at the quantized interval (usually a multiple of the framerate, so for 70 Hz VGA, they’d use 140, 280… even 700 Hz). This is suboptimal in 2 ways:
      1) You quantize the data in a non-musical way… because the quantization interval is just an arbitrary speed of the replayer, and not a function of the tempo of the song or anything
      2) The interrupt fires at every interval, even when there is no data, so there is always CPU overhead. The higher you want your resolution to be, the more CPU overhead it takes.

      My approach eliminates both problems: I don’t quantize, and I don’t fire interrupts unless there is data to play.
      I could extend this idea by inserting non-music events. Eg, one of the reasons for quantizing MIDI data is to sync the replayer to a multiple of the framerate. This way you can also time graphics routines on the timer interrupt.
      In my case, I could put non-music events in the data stream at the exact framerate, so you’d get the same functionality, but still with lower overhead.

      The only downside compared to a quantized stream is that you can’t predict when there is NO data. That is, you can fire a non-music event at the start of each frame, but there could be music data directly following it. When you quantize it, you know that there won’t be any more music data until the next quantization interval (which brings us back to how trackers are designed).

  3. I don’t know, making things work right when it’s not unreasonably difficult doesn’t sound OCD to me 🙂

    Right, digital synths can’t very well start events in the middle of a sample, so that would be the upper bound. They might also have some not terribly fast microcontroller or CPU, which could significantly reduce the granularity.

    As a random data point, the Miles Sound System (MSS) XMIDI uses 120 Hz as a quantization basis. That is low enough that it causes problems when converting from generic SMF, but still sufficient when creating game music.

    • Scali says:

      Yes, the early Roland MT-32s are notorious for not being able to keep up with SysEx commands at full speed. There may be other issues.
      If I listen to this SC-55 playing CANYON.MID: https://youtu.be/fxnrg1CMc6E
      At about 1:36, it seems to struggle somewhat with the timing. I wonder what causes that… the SC-55 or the player?

      And yea, 120 Hz quantization is a bad idea for generic SMF songs. That is why I most specifically did NOT want to go down that route to get MIDI playback working on very slow machines. I suppose that’s the OCD part. It seems that most others would just ‘accept’ that playing VGM or MIDI or other data simply requires a faster machine.

      I suppose that’s why the MPU-401’s Intelligent Mode was so popular with early games: the interface would take care of the timing, and no quantization was necessary.

      But as we learnt from trackers, if you compose your music around a certain tempo, there is no problem.
      This is a very interesting presentation by LFT on game music: https://youtu.be/aEjcK5JFEFE
      At 16:23 he goes into the tempo limitations of early game music.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s