Last weekend, there was a special demoscene party, the 1991 party, with, obviously, 1991 as a theme. Well, that is just my bag, baby! The focus was mainly on C64 and Amiga, which were the most popular platforms for gaming and demoscene activity in those days. I wanted to do a small production myself as well. I decided to go with a PC production, because of my oldskool code experiments, the PC stuff was in the most mature state. And also, because the PC platforms of those early days have not been explored very much, so it’s still possible to create some refreshing things and work out interesting ideas for early PC.
Speaking of early PCs… Because of my early PC escapades, I have come into contact with Trixter a while ago, and we started bouncing ideas back and forth. Trixter’s platform of choice is the PCjr, the slowest PC ever made, but it does have slightly more capable graphics and audio hardware than a regular PC with CGA and PC speaker. Earlier this year, Trixter released the world’s first intro on PCjr, namely INTROjr:
I especially liked the rasterbars. There is no raster interrupt on the PC, so there is no easy way to time code to the screen position. In INTROjr, the rasterbars may not be perfectly stable, but that is understandable, given that the CPU is far too slow to do any remotely accurate polling for the horizontal blank interval, and unlike a C64, it is virtually impossible to count out cycle-exact code. Firstly because the 8088 CPU itself has some internal buffers and things that make it very difficult to predict what state the CPU is in at any given time (eg, is the next instruction already fetched, or not?). Secondly, because of the way DRAM is refreshed on a PC.
The ‘D’ in DRAM stands for ‘Dynamic’. This means that the memory can only hold its contents for a limited time, as opposed to static RAM (SRAM). So the RAM needs to be ‘refreshed’ (contents read and written back) periodically in order to store data. On many systems, this memory refresh is performed automatically by the chipset. For example, on a C64, the VIC-II chip takes care of the refresh. And on the Amiga, one of the DMA channels of the Agnus chip performs the refresh.
A PC is built up from generic parts, rather than a custom chipset. It does not have any hardware to automatically refresh the memory. Instead, a ‘software’ solution is employed: one of the timers in the chipset is set up to trigger the DMA controller periodically to read a single byte (whenever memory is read, the contents of that cell are lost, so a read will always trigger a refresh).
The problem with this solution for cycle-exact coding is that you never quite know when the timer interrupt will steal away some cycles to refresh memory. On systems like the C64 and Amiga, the refresh is synced to the display output, so it is very predictable. Memory is always refreshed in the exact same cycles on screen, so the whole process happens in the background and can basically be ignored.
If you want to read more about DRAM refreshes on PC, and how to get around it, there is some more in-depth information on Andrew Jenner’s blog, and also some information on how to get the CPU and CGA adapter into a synchronized state which he calls ‘lockstep’. This idea is similar to the ‘stable raster’ I described for C64 earlier, but on PC it is even harder to do. The short version of it is that this is for a real PC only, and the PCjr performs its DRAM refresh in a slightly different way. Trixter has not yet found a reliable workaround for the problem that works on PCjr. So, given these limitations, the rasterbars in INTROjr are very nice indeed.
Another thing I really liked is that the scroller and the rasterbars run in full framerate, even though the other effects may not. Running effects ‘in 1 frame’ is the hallmark of good C64 and Amiga demos. So it is very nice to see that on the PC, where it is even harder to synchronize things, and perform asynchronous processing (remember, we are coding on the bare metal here, we don’t have an OS with threading functionality, let alone multiple CPU cores. Even if we did have threading functionality, we wouldn’t have the resources to run multiple threads this efficiently and well-synchronized).
1-bit ought to be enough for everyone
Another really cool thing that Trixter has made, is MONOTONE, a tracker aimed at the PC speaker. In the hands of a skilled musician, it is capable of some very interesting sounds:
So, hopefully Trixter and I can combine forces to push the limits of early PCs. The first idea was to do something like INTROjr, with a logo on top, a scroller at the bottom, some music, and some 3d objects in the center of the screen. That would look like a classic Amiga demo, such as Phenomena’s Animotion for example:
Anyway, for the 1991 party, I wanted to do a simple PC intro, based on the subpixel-correct donut renderer that I developed earlier, for 16-bit x86 systems. I wanted to add a logo and a scroller. This is what I came up with:
I took Trixter’s idea for the timer interrupt, and modified it to work on different PCs. Namely, a PCjr is a fully synchronized system, much like a C64 or Amiga, where all timings are based off the same crystal. This means that CPU, video chip and things like timers all run in sync (which is why early PCs had such peculiar clockspeeds, such as 4.77, 7.16 and 9.54 MHz, they were based off the NTSC refresh rate of 59.94 Hz). So for the PCjr, you can create a raster interrupt by waiting for the vertical blank interval once, then setting up a timer at the refresh rate of your screen, and it will trigger at the vertical blank interval at every frame.
On later PCs, different parts of the system would have their own clock generator, and they would run asynchronously. This is a problem that I ran into for my target machine (a fast 286, or an entry-level 386sx/dx). If you set up a timer interrupt, it will not be in sync with the video refresh, so it will drift quickly. So instead, I used a one-shot timer, which I resynchronized every frame. I set up a timer to trigger a few scanlines before the end of the screen (an arbitrary safety margin), and then I enter a loop to wait for the vertical blank interval to start. Now I am re-synchronized to the screen, and I can set up a new one-shot timer to act as a faux raster interrupt. There will always be some inaccuracy, because not all systems run their timers at exactly the same speed, but there is no drift. So worst case you may not hit the exact scanline you were aiming for, but at least you will always hit the same scanline (give or take some jitter), rather than drifting up or down the screen over time.
And that is good enough for this particular intro. The intro runs in a 16-colour videomode. The timer interrupt is used to switch palettes between the logo, the donut and the scroller, so that more than 16 colours are visible on screen at a time (as you can see, I have a few black scanlines between the different parts, so I can easily hide any inaccuracies of the timer interrupt). The scroller is also updated as part of the timer interrupt, so it will always run at full framerate, regardless of how fast the system can render the donut.
The palette switching is not done by just overwriting all the RGB registers. Instead it uses a lesser-known feature of the VGA card, which allows you to have either 4 banks of 64 colours or 16 banks of 16 colours. It is quite a simple trick, if you know how. Namely, you can override the highest bits of the palette index via a register in the attribute controller (port 0x3C0). You can do this with the Color Select Register, which is at index 14h. For more information, see this page. This allows you to switch palettes with just a single command, rather than having to write 48 bytes of palette data, which can take quite a bit of CPU-time.
Another way to scroll
Getting a scroller to work in the planar 16-colour videomode was quite an interesting problem as well. EGA/VGA do have support for horizontal scrolling (panning), but as far as I know, it can only be used for scrolling the top of the screen, and optionally keeping the bottom part of the screen fixed, using the line-compare function. So where you can trigger scrolling at any given time via a raster interrupt or copper on C64/Amiga, this approach is not likely to be compatible with most EGA/VGA hardware. I have to do some more testing on this, because I am not 100% sure that my code was working properly at the time, but it seems logical that the scroll register is latched for the line-compare function, and is only read at vertical blank.
So, instead I opted for a ‘software’ scroller. The scroller in INTROjr is a software scroller as well, since the PCjr hardware does not have any horizontal scrolling support. However, the pixelformat for PCjr is much like CGA: two 4-bit pixels are packed together in a single byte. So you can make an acceptable scroller by just moving the data one byte in memory, which effectively scrolls 2 pixels.
In my case, a byte-oriented scroller would scroll 8 pixels at a time, which would be far too much. So the scroller would have to perform bit-oriented movement of the data. If you were to do this on the CPU, it would get very expensive, since shift/rotate are very slow operations on these early CPUs.
However, the EGA ALU has support for rotating and masking built in! So I decided to use that instead. On the CPU side, it would be reasonably cheap: I just have to write all bytes for the scrolltext twice, so that I can handle the overlap of 2 characters within 1 byte. The EGA ALU will then rotate and mask the bits into place, so they can be displaced in a pixel-accurate way.
The donut will have to be drawn to a backbuffer in order to avoid nasty flashing and garbage during drawing. For the logo, this is not really a problem: I can just place the logo in the front- and backbuffer, and make sure I never overwrite it while drawing. The scroller will be racing the beam on the frontbuffer however. The flip will have to be performed by the timer interrupt now, rather than having the 3d rendering routine itself wait for VBL after it has finished a frame. A simple solution is to set a global flag when a frame has completed rendering, and then enter a loop to wait until the flag is reset. The timer interrupt checks the flag everytime it is triggered and reached the VBL. If the flag is set, it will perform a flip and reset the flag. This makes the 3d rendering loop exit its wait loop, and it can start rendering the next frame in the new backbuffer.
This way you can get nicely asynchronous effects, while still making use of backbuffering in videomemory and using hardware page flipping.
An interesting bug that occurs here is the following: the polygon renderer makes use of the EGA latches. When the timer interrupt kicks in to update the scroller, these latches will be invalidated. When the polygon renderer then continues, it may render garbage, because the latches had been initialized just before the scroller kicked in, and have now been overwritten with scroll-data.
Although this is an interesting and somewhat peculiar bug, I wanted to have a pixel-perfect intro, so I decided to look for a workaround. I came up with the following: Before the scroller starts, it assumes that the latches contain useful data. It will write a byte to the backbuffer in the scroll area, in order to save the latches. After it is done updating the scroller, it will load this byte again, so that the latches contain the same data as they did before the timer interrupt kicked in.
Since the scroller runs on the frontbuffer, the pixels are not used in the backbuffer, and can freely be used as temporary storage. The scroller always runs at full framerate, so the data is always overwritten with valid scroll data before it is displayed after the buffers have been flipped.