Well, you’ve already seen the demo in my previous post. And Trixter, reenigne and VileR have already covered most of the technical details in these articles:
8088 MPH: We Break All Your Emulators
1K colours on CGA: How it’s done
8088 PC Speaker MOD player: How it’s done
More 8088 MPH how it’s done
CGA in 1024 Colors – a New Mode: the Illustrated Guide
I have done some technical articles myself:
8088 MPH: Sprites? Where we’re going, we don’t need… sprites!
8088 MPH: The polygons
So I think I will approach it from a different angle.
If you have been following this blog closely, you’ll know that I’ve already been in contact with Trixter and reenigne for a few years now, and we have already been sharing some code, knowledge and such regarding oldskool PC programming.
However, about a year ago, the idea of doing a serious 8088+CGA production together started taking shape. In my experience this has mostly been Trixter’s idea. Perhaps even a long-time dream of his, to put the PC on the map as a serious demo platform. He ‘recruited’ VileR because of his discovery of a 512-colour trick, and his excellent skills in working within the limitations of CGA. He ‘recruited’ reenigne because of his cycle-exact CGA hacking, which unlocked new videomodes and made new effects possible for the first time. And he ‘recruited’ me, because he wanted the demo to also have some 3d parts.
I feel it has been an honour to have been part of this project. I’m not sure if he selected me because he thought I could deliver the goods, or just because I was the only one crazy enough to try 3d on 8088+CGA, but it’s all the same :)
Since Trixter asked me to do some polygons, I started writing a sprite compiler. Erm what? Yes… I had played with the idea of a sprite compiler before, and this would be an excellent target platform. So I finally had a good excuse to write it. Also, my initial experiments in trying to port the 3d renderer of 1991 donut to CGA were not too successful:
I figured that I wouldn’t be able to do much more than a small cube at a handful of frames, because of the extremely limited CGA memory write speed (about 170 kb/s). So doing some vectorbobs might be more realistic.
I would like to stress how much of a team-effort this demo has been. We set up a mailing list, and started bouncing ideas around, sharing code, debugging and optimizing each other’s code, and getting inspired to do new effects or expand on them.
One example I’d like to share is what led to the final shading on the 3d objects. It started out with a discussion on creating fade-ins or fade-outs. I wanted to try a fade-to-white effect, which is commonly seen on C64. Now, we only have 16 fixed colours, but if you group them into gradients, you can get quite decent results. So I asked VileR if he had some suggestions for gradients that I could use. He came up with a whole list of gradients, where some were unexpectedly large. In one particular graphics mode, he had a blue gradient of 7 colours, and a red gradient of 6 colours.
Initially I was not planning on doing any lighting on the polygons, because the palettes seemed too restricted. But once I saw this, I thought: “Okay, shading may actually look good in that particular palette”. And Trixter had been asking me to add dithering for a long time. Initially I was reluctant to do so, again because the palettes just didn’t seem to have any usable colours to make dithering look good. But now I thought: “If I were to dither between the two nearest colours in these gradients, I’d actually have quite a selection of shades, and it may actually look good!”
The only problem that was left was that shading made the renderer slower, and dithering even more so. Namely, as already explained in Trixter’s coverage, the renderer is designed to only render the changes with the previous frame, to save fillrate. If you don’t do any shading, it is basically only updating the changes at the edges. Everytime a polygon changes shade, the entire polygon needs to be filled on screen again. And by adding dithering, you effectively double the amount of shades you have, so changes in polygon colour will be more frequent. So I implemented it, and shared the results. We voted that the shading and dithering looked good enough to use despite the hit in framerate, so we kept them.
The fade-in/out routines also made it into the demo by the way, they are used in the DeLorean part.
Tools used during production
As you can see in that early cube video, I initially used a Philips XT clone. I got this machine from BokanoiD, and it was very useful during early development. However, as I already discussed earlier with the CGADEMO, it is not a cycle-exact clone of the IBM PC/XT.
Another problem with that machine was that although the ATi Small Wonder videocard had a composite out, it did not appear to generate a signal. Upon closer inspection, it appeared that a lot of components were missing from the card, mostly resistors, a transistor and some caps. These components were basically what makes up the RGBI->NTSC composite DAC ladder. Reenigne and I studied some online photos of other ATi Small Wonders, which did have all the components, and we decided to try and solder them on to see what happens. So reenigne sent me the parts, which I then soldered onto my card… and indeed, the NTSC composite output started working! I am not sure why the components were left off… Perhaps because it was a machine sold in a PAL country, and the NTSC signal wouldn’t work anyway, and might even damage equipment?
However, as I noticed, it only worked correctly in regular RGBI-oriented stuff. The fixed 16 colours in 40×25 textmode worked correctly, and the 4-colour CGA palettes also worked. But when using the colorburst on 620×200 mode, the 16 artifact colours were all wrong compared to a real IBM CGA (either old style or new style).
So, I knew that I had two reasons to locate a real IBM machine: for cycle-exact effects and for proper composite output. I eventually had to buy an IBM PC/XT 5160 machine second-hand. I could not find any with a CGA card installed though. So Trixter offered to send me one of his cards. That way we at least knew that it was an original, tested and compatible IBM CGA card. He only had one old style card though, which he needed at home for captures, so he sent me a new style card. It would be cycle-exact, but its colours would be slightly off. Not too much of a problem for software development. He would take his old style CGA card to the Revision, so we could put it in my machine for the final capture on-site. Which is exactly what happened: the capture shown during the compo, which is now also on YouTube, is taken from my machine with Trixter’s old style CGA card.
During development, I initially used a Philips CM8833 monitor. This was the RGBI CGA monitor I originally used back in the late 80s on my Commodore PC10-III. Sadly, it broke down after a while. Luckily Ikilledher had a Commodore 1084S, which he gave to me, so I could continue development with a real monitor. Neither of these monitors did composite in colour though, so I used a Samsung LCD TV for that.
Another problem I ran into was with OpenWatcom. Initially I was using Turbo C++ 3.1 for all my DOS retroprogramming needs. But I ran into some issues, such as that it does not seem to handle uninitialized data properly. Even uninitialized data takes up space in your binary. Since we were on a tight budget of a single 360KB floppy, this was a problem. Also, the compiler is far from state-of-the-art. So I tried looking at OpenWatcom, since that is a more modern cross-compiler, which supports 8088/8086.
After modifying my codebase to be compatible with OpenWatcom, I found that the code would run fine in DOSBox or PCem, but it locked up on a real 8088. I spent a number of hours debugging the exact issue, and eventually pinpointed it to the FPU detection routine in their libc. Namely, on 286 and newer, you can execute an fwait instruction if no FPU is present, and it will just complete immediately. On an 8088/8086 however, the CPU will wait endlessly for the FPU to signal that the bus is free. If you do not have an actual FPU installed, the code will just lock up. So, once I found what was causing this problem, I modified the code in the libc to be compatible with 8088, while maintaining compatibility with newer systems. I have filed a bug report, and made the patched libc available here.
Once I solved this issue, I noticed that my code was still not behaving properly. As I found out, OpenWatcom defaults to unsigned char, where Turbo C++ and MS C/C++ default to signed char. Once I had fixed the datatypes to signed char, I could move to OpenWatcom C for development. The 3d polygon part is done with OpenWatcom C and inline assembly. The sprite part was done in Turbo Assembler.
At the party
As Trixter has already mentioned, we came to the party with a working set of effects, but the two days of development at the party really transformed it into a polished product. The first real response we had was when gasman came to see the demo running on real hardware. We saw a smile on his face in all the right places. That was the first sign that this demo might actually work the way we hoped. Since nobody had ever done a major production on this platform for a major demoparty before, we had no idea how people would respond to it, and whether they would ‘get’ the platform. Which is why we designed the intro sequence to explain that this was not just a PC (286/386/486, VGA, Sound Blaster etc), but actually THE IBM PC, the original from 1981, which is no match for a C64 hardware-wise. We thought our main competitors would be some very strong C64 demos, a platform that has been explored for demos for some 30 years now, with some very experienced demo makers in the scene.
Since the organizers were behind on schedule, and we had a rather ‘unique’ platform (NTSC composite, which is always a gamble in a PAL country, and IBM CGA signals are not 100% perfect anyway, so you need capture equipment that is somewhat forgiving), gasman agreed to let us provide our own capture, since we had a working capture setup already. He did not even ask to inspect the inside of our machines. I think that was a nice sign of respect and trust.
The demo turned out to be a huge success at the party. People loved it, despite the rather crude hardware and rough soundtrack. I think some of the best responses we could have gotten are the ones that say that this is what the demoscene is all about: pushing hardware to the limits and beyond. And how it inspires other people to push on as well.
Loader Text screens
I have seen complaints that the loader font was difficult to read, so I will just give you the texts of all loader screens, in case you had trouble reading them:
welcome to 4.77 mhz! welcome to cga! what’s a bitplane?
and now i see with eyes serene the very heart of the machine
no copper! no vic-ii! what are we supposed to do?
dots are my favours… except when they’re saviour’s.
sprites? where we’re going, we don’t need sprites
you may want to close your eyes for this
race the beam on your mark… get set… go!
if my calculations are correct, when this baby hits 8088 miles per hour, you’re going to see some serious *!?*
and now we must bid you farewell no paula… no sid… no problem
This demo seems to have become bigger than just the demoscene, and has found its way to various other tech-related newssites, blogs and forums. It even made its way to Adafruit, where a very appropriate quote from Teller happened to appear underneath the article:
Sometimes, magic is just someone spending more time on something than anyone else might reasonably expect.
I suppose that is certainly true for this production. A lot of work went into this production. We are the first to do such a production on this platform, so we had to research the hardware thoroughly, and write our own tools for everything. Reenigne’s blog does a good job of explaining just how much thought went into the new display modes, or into the mod player for example. Some people, even Sylvester Hesp, game developer at Nixxes, think it’s just a case of getting together for a few days, writing some assembly, ‘counting some cycles’, and that’s it. But writing and optimizing the actual code is only a small part of what made this demo possible. We first had to figure out what to write, and how to write it. The same goes for my polygon routines for example, they work very differently from the polygon renderers I have discussed on this blog earlier, and in fact I have never written a polygon routine like this before.
A lot of research went into this demo, in every single part of it. Some people seem to think that you do a demo like this top-down: “I want to do this-and-that effect”, and then code it. But instead, this demo is very much a bottom-up affair: we started off by just studying the hardware, and exploring the limits. As we became more familiar with the hardware, and got more control over it, we got inspired to try new effects, or to add extra features to existing effects (eg, the sprite part started out as just moving sprites, but later we found a way to combine sprites and scrolling. And initially the sprite would just move near the bottom of the screen, but after some experimenting we managed to make it move over the entire screen without flicker).
Some people also seem to take the comparison with C64 the wrong way. As far as I am concerned, “PC SUXX!” is not a joke. If you have been following my blog, you know that I grew up with a C64, and am still quite fond of the machine and the scene surrounding it. This demo does not intend to prove that the PC is better than the C64. We know it is not, and we know that there are some things the C64 does, that we can never do. Also, the C64 has about 30 years of a headstart on us. It would be unrealistic to think that we could close that gap completely with just one demo. That level of refinement takes many years to reach. We were heavily inspired by the C64 scene, and took various ideas from C64 and adapted them to our platform. We approached the PC as a fixed hardware platform, allowing special cycle-exact code and tricks, much like what people do on C64.
Some people also think that our system is a lot faster because we have a 4.77 MHz CPU, and the C64 has a 1 MHz CPU. This is a case of the MHz myth, and Trixter has already covered that in an earlier blog. The short version is that the DRAM modules used in all early 1980s microcomputers are more or less the same speed, so all 8-bit systems have more or less the same memory bandwidth. This dictates most of the performance (there was no caching yet). CPUs may run at vastly different clockspeeds, but performance-wise, they are very similar. The Z80 also ran at 3.5 Mhz in most implementations, but still a ZX Spectrum isn’t exactly a computing powerhouse compared to a C64 either.
So where does that leave us? Well, we did not just want to do the first serious demo on the original IBM PC with CGA for the novelty and the ‘1k hack’. We wanted to give it the best we got, so we have optimized all routines in our demo to the best of our abilities, much like many contemporary C64 demos. So we tried to set the standard for this platform as high as we could. I am sure it can be done better, but until someone else makes a demo for this same platform, we won’t really know just how good or bad our attempt was. At some point I said though: “Some stuff looks or sounds so smooth and effortless, that you can’t tell anymore how difficult it was to make”.
And lastly, some people even think the demo is fake. Sadly, it is rather difficult to find the right hardware, but I have found two people who recorded the demo running on their IBM PC 5150:
The first video shows the whole demo, and there are two problems:
1) The high-colour modes do not work, because the used IBM CGA card has an HD6845 instead of the MC6845 that we used during development. This can be fixed with a small tweak in the code however.
2) The Kefrens bars are unstable. As you can read in the comments, this was caused by a network driver. Doing a clean boot without loading the driver was enough to stabilize that.
The second video does not show the whole demo, but it does show the 256 colour plasma and the 1024 colour girl working, so apparently this PC has a compatible IBM CGA card. Put the two together, and there is the proof that the whole demo works on a real IBM PC 5150 with IBM CGA.
I don’t think that part has quite sunk in yet… We actually won the oldskool demo compo with an IBM PC from 1981 with CGA! That’s just crazy!