Wow, part 4 already. I was only planning for 2 parts initially: one for PC, and one for Amiga. But after 3 parts, there is still some unfinished business. Let’s see if I can wrap it up now.
First, let’s wrap up the Amiga business. Last time, I had the polygon routine worked out, but I did not have an entire cube encoded as quads yet. The PC version used triangles (more on that later). So, the first thing I did was to complete all 6 faces, and render the cube transparently, at the actual speed of an Amiga 500/1000/2000 (which would be the original hardware back in 1985):
I was quite happy to see that the polygon routine could run at acceptable speed. After all, the routine I described was a very generic routine. You could get it much faster if you optimize for the special case of a cube. You could eliminate the scratchpad step (affectionately known as ‘cookie cut’) altogether. Only 3 faces are visible at any given time, so only 3 bitplanes are required, and each face can be rendered directly to the bitplane.
The next step was to port the backface culling and lighting code to the Amiga. I also applied a similar blue-ish palette. Then I added rotation around two axes. And this is the final result:
So there it is then: I now have a proper 3d object rendered on the Amiga, making extensive use of its unique blitter capabilities.
Triangles are not real polygons
Back to the PC side of things now… As I said above, I only used triangles for the cube geometry in the first version. Triangles are by far the most common polygons, and a lot of 3d software and hardware will use only triangles these days. However, this is about oldskool graphics, and especially in the early days, polygons with more than 3 vertices were used a lot.
Polygons with 4 or more vertices are generally more efficient to render. Let’s go back to the Amiga, and look at what is required to render one face of the cube. If this face is a quad:
- 4 lines to draw
- 1 rectangular area to floodfill
- 1 copy operation from scratchpad to backbuffer
And if the face were two triangles:
- 6 lines to draw (one line is actually drawn twice, because it is shared by both triangles)
- 2 rectangular areas to floodfill
- 2 copy operations from scratchpad to backbuffer
So clearly, using quads is much more efficient here. On the Amiga it is also really simple to do: there is basically no difference rendering triangles or polygons with more vertices. You always just draw the lines for all edges, determine the bounding rectangle, and then floodfill (in fact, the floodfill method even allows you to render concave polygons with no extra effort). On top of that, there are other benefits, which hold for the PC as well. For example, the backface culling only has to be performed once for each polygon. The same goes for flatshading. By using quads, you have only half the polygons to cull and shade.
Another issue of efficiency is in the way my fill routine works. As I explained in part 1, the trick with unchained VGA (and CGA/EGA) is that you can quickly draw consecutive pixels of the same colour on the screen. By breaking up the cube faces in triangles, you also break up a single run of pixels into two. So it is obvious that I should be rendering proper n-gons, not just triangles. But how?
A REAL polygon filler
When you render triangles, you will generally render them from top to bottom, and from left to right. A common approach is this:
- Sort the vertices from low to high y-coordinate. This effectively divides the triangle into a ‘top’ and ‘bottom’ half, split at the scanline of the middle vertex.
- Sort the edges left to right.
- Render the top half.
- Swap the ‘short’ edge with the remaining edge.
- Render the bottom half.
This way you can render any triangle at any orientation. It is obvious that this approach will not work for more than 3 vertices though. For more than 3 vertices, you need some way to describe the topology. A common way to do so is to store the vertices in strictly clockwise (or counter-clockwise) order. This way no extra information is required to store the edges. The first edge is v0 to v1, the second edge is v1 to v2, and so on. Then if we walk from index 0 towards the end of the vertex list, we walk down the left side of the polygon (assuming counter-clockwise orientation), and if we walk from the end towards 0, we walk down the right side.
If you would just sort the vertices on y, it would break the topology. However, the vertices always remain in the same relative order, under any transform. The only exception is that if the polygon is backfacing, the order is reversed. So we can ‘rotate’ the vertices around in the list, until the topmost vertex (lowest y) is at index 0. Then once again we can walk down our edges from top to bottom. If you don’t just want to cull all backfaces, but also want to render them, then you can reverse the order while rotating the vertices around inside the list. This way your rasterizer will not have to deal with any special cases, but can simply assume that the edges on the left are always on one side of the list, and the edges on the right are at the other end of the list.
Once the polygons are sorted, the rasterizing is similar to the triangle method described above: you render one part, until you reach the scanline where the shortest edge ends. Then you replace that edge with the next edge (of the same side of the polygon), and render the next part. You continue until the left and right edge end at the same vertex. This way you can render any n-gon, as long as it is convex. And this is the end-result:
There is no visual difference, but this time the faces are rendered as quads instead of triangles, so it is more efficient. I have also improved the palette a bit, to make the shading look better, and I have fixed a few minor bugs, and added some more rotation.
Before I forget, I decided to rasterize the polygons with a Bresenham-based algorithm to step along the edges. I figured that would make it extra oldskool, and it is also an approach that I had not used before. An advantage of Bresenham is that it does not require any divisions or multiplications. This is an attractive feature for old CPUs, where such operations are very slow. Another advantage of Bresenham is that it does not require fractions. So there is no need for floating-point or fixed-point mathematics. Which is also attractive for old CPUs, which have only limited precision (such as the 16-bit registers in the 286 and below, which would not be enough to properly rasterize on a 320×200 screen. Extended precision would be required, which would take up extra registers and instructions). On the other hand, Bresenham does require some conditional jumps. However, on old CPUs, these happen to be very cheap, since they don’t have any caches, out-of-order execution, branch prediction or anything. They will not stall.
The updated binaries for CGA/EGA/VGA can be downloaded here: http://bohemiq.scali.eu.org/OLDSKOOL.zip
Coming full circle
To conclude, I would also like to show my old 486 renderer. It pretty much picks up where the flatshaded cube left off: it adds texture mapping, and more complex geometry in the form of a torus, or ‘donut’:
I actually had to put in some effort for that one. Namely, when I originally wrote that, I developed it for early Win9x and DirectDraw 1.0 on my 486. For many years, the code has worked fine. However, on newer systems, the old VGA display modes are no longer supported in DirectDraw. No more 320×200. So it now refuses to run, since it was never made to handle any higher resolutions (note: with some display drivers you can add 320×200 yourself as a custom resolution, after which the original code should work again).
I still had the source code, so I decided to try and fix it. However, when I tried to compile the project, I found that ddraw.lib is no longer part of the DirectX SDK. Then I figured it might be better to port it back to DOS anyway. My first try was to get it to work with Borland Turbo C++, which I had been using for developing the cube code. However, it only supports 16-bit code, and this was 32-bit code. So I ran into a lot of problems with the inline assembly code, and with the large arrays that this renderer uses. I decided it was more trouble than it’s worth. It would be better to just keep it 32-bit, and perhaps use a DOS extender. But… where does one get the required tools to develop DOS code in 2011?
It’s a small world
As it turns out, there are still a number of DOS extender projects around. One of them caught my eye in particular: Japheth’s HX DOS Extender. Among its features is a limited emulation of Win32 APIs, allowing you to run Windows console executables from DOS, to a certain degree. This also implies that it is compatible with many Windows development environments, including Visual Studio 2010, which I normally use for development. Now that would be very nice, I could just use tools that I am already familiar with.
When I looked at the included samples, the Plasma one in particular caught my eye. It’s a small world, this plasma example was based on the Plasma tutorial that Ewald and I once wrote in Win32 assembly. And that plasma code was exactly the DirectDraw framework that I also used for my 486 renderer. So it looked like it would be very easy to get my code working with HX.
Sadly, it was not as simple as it looked. I was hoping that HX’ DirectDraw emulation would just run the binary as-is under DOS. But no, I had compiled it with dynamically linked C runtimes, and that was a bit too much for the HX Win32 layer. So well, I decided to just strip the Win32 code, and rewrite the DirectDraw code with standard VGA code, and make it a proper DOS executable.
I could reuse a lot of code from the flatshaded cube, and I had my code back up and running quite quickly. I must say, it felt rather funny to use ints and in/out inside Visual Studio 2010, not to mention compiling something and then running it in DOS. But it worked, and it worked well. It was nice to see the renderer in action again. It uses linear texture mapping, and another ‘modern’ feature is subpixel correction, making it look very smooth and stable, even at low resolutions.
The original DirectDraw version can still be downloaded here: http://bohemiq.scali.eu.org/486compo.exe
And the new DOS version can be downloaded here: http://bohemiq.scali.eu.org/486compo.zip