Any real-keeping lately?

The 5-year anniversary of my inaugural ‘Just keeping it real’-article came and went. Has it been that long already? It’s also been quite some time since I’ve last written about some oldskool coding, or even anything at all. Things have been rather busy.

However, I have still been doing things on and off. So I might as well give a short overview of things that I have been doing, or things that I may plan to do in the future.

Libraries: fortresses of knowledge

One thing that has been an ongoing process, has been to streamline my 8088-related code, and to extract the ‘knowledge’ from the effects and utilities that I have been developing into easy-to-use include and source files for assembly and C. Basically I want to create some headers for each chip that you regularly interact with, such as the 8253, the 8259A and the 6845. And at a slightly higher level, also routines for dealing with MDA, Hercules, CGA, EGA and VGA, and also audio, such as the PC speaker or the SN76489.

For example, to make it easy to set up a timer interrupt at a given frequency, or to enable/disable auto-EOI mode, or to perform horizontal or vertical sync for low-level display hacking like in 8088 MPH. That is the real challenge here. The header files should be easy to use, while at the same time giving maximum control and performance.

I am somewhat inspired by the Amiga’s NDK. It contains header files that allow easy access to all the custom chip registers. For some reason, something similar is not around for the PC hardware, as far as I know. There’s very extensive documentation, such as Ralf Brown’s Interrupt List, and the BOCHS ports list. But these are not in a format that can be readily used in a programming environment. So I would like to make a list of constants that describes all registers and flags, in a way they can be used immediately in a programming context (in this case assembly and C, but it should be easy to take a header file and translate it to another language. In fact, currently I generally write the assembly headers first, then convert them to C). On top of that, I try to use macros where possible, to add basic support routines. Macros have the advantage that they are inlined in the code, so there is no calling overhead. If you design your macro just right, it can be just as efficient as hand-written code. It can even take care of basic loop unrolling and such.

Once this library gets more mature, I might release it for everyone to use and extend.

Standards are great, you can never have too many of them

As I was creating these header files, I came to the conclusion that I was doing it wrong, at least, for the graphics part. Namely, when I first started doing my oldskool PC graphics stuff, I started with VGA and then worked my way down to the older standards. I created some basic library routines in C, where I considered EGA to be a subset of VGA, and CGA a subset of EGA in turn. I tried to create a single set of routines that could work in CGA, EGA or VGA mode, depending on a #define that you could set. Aside from that, I also added a Hercules mode, which didn’t quite fit in there, since Hercules is not an IBM standard, and is not compatible at all.

There are two problems with that approach:

  1. As we know from software such as 8088 MPH, EGA and VGA are in fact not fully backward compatible with CGA at all.  Where CGA uses a real 6845 chip, EGA and VGA do not. So some of the 6845 registers are repurposed/redefined on EGA/VGA. Various special modes and tricks work entirely differently on CGA than they do on EGA or VGA (eg, you can program an 80×50 textmode on all, but not in the same way).
  2. If you set a #define to select the mode in which the library operates, then by definition it can only operate in one mode at a time. This doesn’t work for example in the scenario where you want to be able to support multiple display adapters in a single program, and allow the user to select which mode to use (you could of course build multiple programs, one for each mode, and put them behind some menu frontend or such. Various games actually did that, so you often find separate CGA, EGA, VGA and/or Tandy executables. But it is somewhat cumbersome). Another issue is that certain videocards can actually co-exist in a single system, and can work at the same time (yes, early multi-monitor). For example, you can combine a ‘monochrome’ card with a ‘color’ card, because the IBM 5150 was originally designed that way, with MDA and CGA. They each used different IO and memory ranges, so that both could be installed and used at the same time. By extension, Hercules can also be used together with CGA/EGA/VGA.

So now that I have seen the error of my ways, I have decided to only build header files on top of other header files when they truly are supersets. For example, I have a 6845 header file, and MDA, Hercules and CGA use this header. That is because they all use a physical 6845 chip. For EGA and VGA, I do not use it. Also, I use separate symbol names for all graphics cards. For example, I don’t just make a single WaitVBL-macro, but I make one specific for every relevant graphics card. So you get a CGA_WaitVBL, a HERC_WaitVBL etc. You can still masquerade them behind a single alias yourself, if you so please. But you can also use the symbols specific to each given card side-by-side.

And on his farm he had some PICs, E-O, E-O-I

The last oldskool article I did was focused around the 8259A Programmable Interrupt Controller, and the automatic End-of-Interrupt functionality. At the time I already mentioned that it would be interesting for high-frequency timer interrupt routines, such as playing back digital audio on the PC speaker. That was actually the main reason why I was interested in shaving a few cycles off. I have since used the auto-EOI code in a modified version of the chipmod routine from the endpart of 8088 MPH. Instead of the music player taking all CPU, it can now be run from a timer interrupt in the background. By reducing the mixing rate, you can free up some time to do graphics effects in the foreground.

That routine was the result of some crazy experimentation. Namely, for a foreground routine, the entire CPU is yours. But when you want to run a routine that triggers from an interrupt, then you need to save the CPU state, do your routines, and then restore the CPU state. So the less CPU state you need to save, the better. One big problem with the segmented memory model of the 8088 is how to get access to your data. When the interrupt triggers, the only segment you can be sure of is the code segment. You have no idea what DS and ES are pointing to. You can have some control over SS, because you can make sure that your entire program only uses a single segment for the stack throughout.

So one idea was to reserve some space at the top of the stack, to store data there. But then I figured that it might be easier to just use self-modifying code to store data directly in the operands of some instructions.

Then I had an even better idea: what if I were to use an entire segment for my sample data? It can effectively be a 64k ringbuffer, where wraparound is automatic, saving the need to do any boundschecking on my sample pointer. It is a 16-bit pointer, so it will wrap around automatically. And what if I would put this in the code segment? I only need a few bytes of code for an interrupt handler that plays a sample, increments the sample pointer, and returns from the interrupt. I can divide the ringbuffer in two segments. When the sample pointer is in the low segment, I put the interrupt handler in the high segment, and when the sample pointer switches to the high segment, I move the interrupt handler in the low segment.

Since each segment is so large, I do not need to check at every single sample. I can just do it in the logic of the foreground routine, every frame or such. This makes it a very efficient approach.

I also had this idea of placing the interrupt handlers in segment 0. The advantage here is that CS will point to 0, which means that you can modify the interrupt vector table directly, by just doing something like mov cs:[20h], offset myhandler. This allows you to have a separate handler for each sample, and include the sample in the code itself, so the code effectively becomes the sample buffer. But at the time I thought it may be too much of a hassle. But then reenigne suggested the exact same thing, so I thought about it once more. There may be something here yet.

I ended up giving it a try. I decided to place my handlers 32 bytes apart. 32 bytes was enough to make a handler that plays a sample and updates the interrupt vector. The advantage of spacing all handlers evenly in memory is that they all had the instruction that loaded the sample in the same place, so they were all spaced 32 bytes apart as well. This made it easy to address these samples, and update them with self-modifying code from a mixing loop. It required some tricky code that backs up the existing interrupt vector table, then disables all interrupts except irq 0 (the timer interrupt), and restores everything upon exist. But after some trial-and-error I managed to get it working.

As we were discussing these routines, we were wondering if this would perhaps be good enough as a ‘replacement’ for Trixter’s Sound Blaster routines in 8088 Corruption and 8088 Domination. Namely, the Sound Blaster was the only anachronism in these productions, because streaming audio would have been impossible without the Sound Blaster and its DMA functionality.

So I decided to make a proof-of-concept streaming audio player for my 5160:

As you can see, or well, hear, it actually works quite well. At least, with the original controller and Seagate ST-225, as in my machine. Apparently this system uses the DMA controller, and as such, disk transfers can work in the background nicely. It introduces a small amount of jitter in the sample playback, since the DMA steals bus cycles. But for a 4.77 MHz 8088 system, it’s quite impressive just how well this works. With other disk controllers you may get worse results, when they use PIO rather than DMA. Fun fact: the floppy drive also uses DMA, and the samples are of a low enough bitrate that they can play from a floppy as well, without a problem.

Where we’re going, we don’t need… Wait, where are we going anyway?

So yes, audio programming. That has been my main focus since 8088 MPH. Because, aside from the endpart, the weakest link in the demo is the audio. The beeper is just a very limited piece of hardware. There must be some kind of sweet-spot somewhere between the MONOTONE audio and the chipmod player of the endpart. Something that sounds more advanced than MONOTONE, but which doesn’t hog the entire CPU like the chipmod player, so you can still do cool graphics effects.

Since there has not been any 8088 production to challenge us,  audio still remains our biggest challenge. Aside from the above-mentioned disk streaming and background chipmod routine, I also have some other ideas. However, to actually experiment with those, I need to develop a tool that lets me compose simple music and sound effects. I haven’t gotten too far with that yet.

We could also approach it from a different angle, and use some audio hardware. One option is the Covox LPT DAC. It will require the same high-frequency timer interrupt trickery to bang out each sample. However, the main advantage is that it does not use PWM, and therefore it has no annoying carrier wave. This means that you can get more acceptable sound, even at relatively low sample rates.

A slightly more interesting option is the Disney Sound Source. It includes a small 16-byte buffer. It is limited to 7 kHz playback, but at least you won’t need to send every sample indvidually, so it is less CPU-intensive.

Yet another option is looking at alternative PC-compatible systems. There’s the PCjr and Tandy, which have an SN76489 chip on board. This allows 3 square wave channels and a noise channel. Aside from that, you can also make any of the square wave channels play 4-bit PCM samples relatively easily (and again no carrier wave). Listen to one of Rob Hubbard’s tunes on it, for example:

What is interesting is that there’s a home-brew Tandy clone card being developed as we speak. I am building my own as well. This card allows you to add an SN76489 chip to any PC, making its audio compatible with Tandy/PCjr. It would be very interesting if this card became somewhat ‘standard’ for demoscene productions.

(Why not just take an AdLib, you ask? Well, various reasons. For one, it was rather late to the market, not so much an integral part of 8088 culture. Also, it’s very difficult and CPU-consuming to program. Lastly, it’s not as easy to play samples on as the other devices mentioned. So the SN76489 seems like a better choice for the 8088. The fact that it was also used in the 8088-based PCjr and Tandy 1000 gives it some extra ‘street cred’).

Aside from that, I also got myself an original Hercules GB102 card some time ago. I don’t think it would be interesting to do another demo on exactly the same platform as 8088 MPH. Instead, it would be more interesting to explore other hardware from the 8088 era. The Hercules is also built around a 6845 chip, so some of the trickery done in 8088 MPH may be translated to Hercules. At the same time, the Hercules also offers unique features, such as 64 kB of memory, arranged in two pages of 32 kB. So we may be able to make it do some tricks of its own. Sadly, it would not be a ‘world’s first’ Hercules demo, because someone already beat me to it some months ago:

This entry was posted in Oldskool/retro programming, Software development and tagged , , , , , , , . Bookmark the permalink.

6 Responses to Any real-keeping lately?

  1. jdwii says:

    I was hoping to see some kind of update on your opinions on Ryzen

    • Scali says:

      Well, I have not seen any new information. Nothing in-depth about the architecture, no pricing, no release dates. I’d also like to wait for independent benchmarks to see what performance is really like.

      • jdwii says:

        It sounds like they learned their lesson and that the engineering team is in charge instead of idiot businessmen and marketing idiots

        Why i wanted you to comment on their architecture

      • Scali says:

        Well, that’s the thing we don’t know yet. They put on a little show for the press, but it could just have been some cherry-picked benchmarks and some hyperbole, making Zen look a lot better than it really is. In which case it would be Barcelona and Bulldozer all over again. I don’t expect Zen to be that bad though. But if it is really as good as they say? Let’s wait and see what kind of benchmark figures the independent reviewers will come up with first.

  2. Michal says:

    Have you thought about multi-voice synthesis on the PC speaker? That would be a middle ground between sample playback and 1 channel beeping. It is described here . It was used by Budokan (real time in-game), Music Construction Set, Blockout, Turbo Outrun etc.

    • Scali says:

      I have yes, but I’m not sure if it’s worth the effort.
      It isn’t significantly less CPU-heavy than regular sample playback, and it doesn’t sound as good (MCS requires 100% CPU to replay music. Might as well use the chipmod player approach from 8088 MPH if you use 100% CPU anyway, since that sounds much better).
      I think Budokan targets much faster systems than 8088 at 4.77 MHz. Not sure about the other games, but I wouldn’t be surprised if they did as well (either that, or they only used the ‘fancy’ routine during the title screen, not in-game, so they are 100% CPU again).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s