Just keeping it real, part 9

Right, the topic is still the Commodore 64. This time I want to get more technical and explain how to actually program some things. But first I would like to paint a picture of C64 programming in general.

When the C64 was introduced in 1982, game development was still in its infancy. In fact, the whole concept of home computers was still relatively new. Only a few years earlier, ‘home computing’ consisted of either designing and building a microcomputer yourself (such as the Homebrew Computer Club in Silicon Valley, which is the birthplace of Apple Computers), or ordering a computer in kit-form such as the Altair, or the Apple I, and putting it together yourself. Computers were as much about assembling and modifying hardware as they were about software, in those early days.

When pre-built, off-the-shelf home computers became readily available, this would also draw in a new breed of computer users, who were not necessarily interested in the hardware engineering side, but concentrated entirely on software engineering. Initially, most programs were still quite primitive, and generally written by one person. When it came to games, this meant that a single person would do everything from writing the code to drawing the graphics and doing the music. When you look at early C64 games, it is usually quite obvious that one man made the entire game. The graphics and sound or music are not very sophisticated.

Sound programming

In early C64 games, some primitive sound effects may be the only sound there is. If there is music, it is often an adaptation of a classical piece, or a classic pop song, rather than an own composition. In some cases it is quite obvious that the music was made by a programmer rather than a musician, because the adaptation will contain some off-key notes or bad timing, indicating that the programmer may have been somewhat tonedeaf.

A classic example of this is the VIC-20 version of Radar Rat Race, with a dodgy note:

Compare to the C64 version:

It seems that the person who did the music in the VIC-20 version was too tonedeaf to hear the problem. The C64 version, which was done later, fixes this.

But things changed for the better. One person who has played,  shall we say, an instrumental role here, is Rob Hubbard. He is somewhat of a legend when it comes to C64 music. He was among the first programmers to specialize in music. He already had a background as a professional session musician before he learnt how to program. In these early days, there were no tools for composing music on a C64 yet, so he had to write his own routines for editing and replaying music on the C64. Since his routines were far more advanced than anything before him, his music also sounded much richer and more mature than anything anyone had ever heard from a C64.

Many other C64 musicians would study Rob Hubbard’s routines and create their own routines based on his ideas. Just like Rob Hubbard, these people were still programmers as well. You will find that other famous composers such as Jeroen Tel and Chris Hülsbeck also had their own routines, giving their music their own ‘fingerprint’. Chris Hülsbeck also released Soundmonitor, which was one of the first ‘tracker’ tools for editing music.

The fact that these composers also wrote their own replay routines often makes their tunes instantly recognizable. These replay routines are still assembly routines at the lowest level, tweaking the registers of the SID chip directly, with exact timing. Each programmer/composer’s routine had its own distinct instruments and effects, which resulted in a signature sound and style. The epic soundtracks these people managed to create from a simple 3-channel soundchip from 1982 is all the more impressive if you also factor in the fact that the C64 only had a 1 MHz processor, and only 64k of memory. The music routines also had to be very compact, and very efficient. They had to use a minimum of CPU time, because most of the CPU time was spent on the game logic and the updating of the screen.

Therefore most music routines are called once per frame, so 50 times per second (on PAL). They are generally processed in the small timeslot between the last visible scanline of the current screen and the first visible scanline of the next screen, with the music routine taking only a few hundred CPU cycles at most.

The SID music file format can be seen as ‘compiled’ music: it consists of actual 6502 code, which updates the SID’s registers, usually 50 times per second (although there are also advanced ‘multispeed’ tunes which update more than once per frame, for even tighter timing, allowing for special sounds and effects). A SID player for other platforms actually emulates the 6510 and the SID chip of the C64, and runs the code ‘as is’.

This also means that it’s very easy for a programmer to use a tune in a C64 program. You can just load a SID file into memory, and call its entrypoint once per frame. Et voila, music is playing.

This period only lasted a short while though, as in the Amiga days music software would become readily available (NoiseTracker, ProTracker and various other ‘trackers’ were available for free), and no programming was required anymore. Musicians could concentrate entirely on composing their music. This music was stored as a data file, and played back through a standard replay routine which usually came with the tracker software. In a way this made compositions less personal, less unique.

Racing the beam

So, apparently programming music on the C64 requires an intimate knowledge of the sound chip and the CPU. With graphics it is no different. I already discussed copper/raster bars briefly in part 3, where I explained how carefully timed changes to the palette can draw horizontal bars or more elaborate patterns on screen. Techniques like these are known as “racing the beam”, as in: your drawing is timed to positions of the beam (cathode ray) of the display, and affect the screen directly. This is in sharp contrast with more modern systems, where you generally draw an entire frame in a so-called ‘back buffer’, which is not visible on screen. Once the entire frame has been drawn, the back buffer and front buffer are swapped, to display the new image.

‘Racing the beam’ is generally very easy on the Amiga, since the copper can do fairly accurately timed updates to the registers of the custom chipset. On the Commodore 64 however, there is no copper, so you have to carefully synchronize your CPU with the raster position manually. This requires a deeper level of understanding of the hardware than anything I’ve discussed so far. You need to count exactly how many cycles each instruction takes, in order to determine the raster position after each instruction.

And that is just the CPU-side…

Doing bad lines

To make things even more difficult, not every cycle is available to the CPU. This was also the case on the Amiga, where things like bitplanes, sprites and the blitter would steal cycles from the CPU, since they were all using the same memory. The exact timings for all DMA operations by the custom chipsets are also documented very well in the Amiga Hardware Reference Manual. However, on the Amiga the timing is not as relevant as on the C64, and generally you only have to worry about overall performance (in a nutshell: more and/or larger bitplanes and longer blit operations steal more cycles from the CPU). The copper would take care of critically timed routines.

On the C64, the memory is shared by the VIC and the CPU as well. The 6510 is so slow that it can only access memory at every other cycle. So the CPU takes the even cycles, leaving the odd cycles to the VIC. However, to make things more complicated, the VIC needs to use the even cycles as well, every now and then. The VIC can halt the CPU during these cycles.

On normal scanlines, the CPU gets 63 cycles (for a PAL system at least). However, every 8th scanline is a so-called ‘bad line’. As mentioned in part 8, the graphics modes of the VIC are closely related to the 8×8 character set. This also goes for the colour information. At every 8th scanline, a new line of characters starts, and the VIC-II will fetch data from the ‘color RAM’, which encodes the colour data for each character on screen. The VIC-II needs 40 extra cycles for this, leaving only 23 CPU cycles on a bad line.

A sprite stole my cycle!

The VIC chip can also display up to 8 sprites. For each sprite that is enabled on the current scanline, the VIC also needs to take two extra cycles from the CPU. In fact, it is more complicated than that, because the VIC will signal the CPU 3 cycles in advance that it wants to use the bus. So the actual amount of cycles it takes also depends somewhat on which sprites are enabled, and what the CPU is doing exactly when it is first signaled. If you really want to know, here is more information on it.

So, when you want to have cycle-exact routines on the C64, you not only need to time your code to be exactly 63 cycles on a normal scanline, but you also need to factor in the cycles lost by the sprites that are enabled, and handle the special case of a bad line. So you really REALLY have to know what your hardware is doing, at every given point in time. You can’t keep it any more real than this.

To make matters even more complicated, the shortest possible instruction on the 6510 processor takes 2 cycles. So it is impossible to delay your code by just 1 cycle. You can only get into cycle-exact sync by using delays of either 2 or 3 cycles.

Raster stability is a virtue

Why is it important to be cycle-exact anyway? Well, this is mainly for various hacks with the VIC-chip. These hacks generally consist of writing a certain value to a certain register at the right time, which overrides the value that the VIC-chip updated earlier. This will ‘confuse’ the VIC into thinking it is in a different phase of building up the screen than it really is. The side-effects could be very useful, such as avoiding a bad line, or in fact, triggering extra bad lines (which means you can update the colours more often than just per 8 scanlines, removing some of the limitation of the native graphics modes), or having sprites appear in the borders of the screen. For other effects, such as raster bars, it might not be necessary to be completely cycle-exact, but still you don’t want too much jitter, and you will need to put in delays of a few cycles to get to the start of a scanline.

I suppose the ‘Hello World’ of C64 graphics coding is doing raster bars. Let’s start there, and take it one step further, and set up a routine that is synchronized cycle-exact to the raster position. From there, you can perform a number of classic effects. Setting up such cycle-exact code is known as a “stable raster” in C64 jargon. How does one set up such a stable raster? Well, there are a number of different approaches. The one I have chosen here is the one I found to be most intuitive.

We will start out by setting up a raster interrupt. When our program starts, we have no idea where we are on the screen. So all we can do is pick a scanline to wait for (do not pick a bad line), set up the interrupt for it, and perform an endless loop, waiting for the raster interrupt to trigger.

Once the raster interrupt triggers, all we know is that we were in an endless loop, where each branch instruction takes 3 cycles. The interrupt was handled as soon as the last branch instruction had completed. Since we don’t know in which of the 3 cycles the instruction was when the interrupt occurs, we don’t know if we have 0, 1 or 2 extra cycles of ‘jitter’ since the interrupt was triggered (which may be good enough for some effects). Aside from that, there are 9 cycles of overhead for the CPU to finish the last instruction (at least 2 cycles) and push the context on the stack and call the interrupt routine (7 cycles). So we know that we are already quite far from the start of the scanline by the time we can execute our first instruction.

We can improve the accuracy by setting up a new raster interrupt for the next line, and fill up the remaining cycles on the scanline with nop-instructions. Each nop-instruction takes 2 cycles, so when the next interrupt triggers, we can only be 1 cycle off at most.

From here, we can poll the register of the raster position directly. We will assume that we are one cycle off, and delay our code so that we read the raster position at exactly the point when the next scanline starts. If we were indeed one cycle off, then the raster position has updated as expected. If we were already in sync, then we still see the previous scanline. Now we know whether to correct for 1 cycle or not, which is easily done with a conditional branch (a taken branch is 3 cycles, not taken is 2 cycles… although, it’s not quite as simple as that… is it ever? When the branch crosses a page boundary, it takes 4 cycles rather than 3. So if your branch happens to fall exactly on a page boundary, it will not work. It’s not likely, but you could disassemble your code to see the exact position of each instruction). And from this point on, we know EXACTLY where we are. We know on which scanline, and in which cycle on this scanline (namely, we are on the 3rd cycle of the current scanline after the last branch). So we know all we need to know about our raster. We can count cycles to the end of the scanline, and since we know on which scanline we are, we also know when we are on a badline or not (and adjust our timing for that).

So basically, if you want to perform cycle-exact code on scanline N, you’ll have to start with a raster interrupt for scanline N-2. This raster interrupt will then setup an interrupt for N-1 with a jitter of 1 cycle. Then we correct for that last cycle of jitter, and arrive cycle-exact on the 3rd cycle of scanline N (so if you need to be on the first or second cycle of a scanline, you’ll have to delay for an extra line, meaning you’ll need to start at N-3 instead).

Once you have everything under control, you should be able to do raster bars over the entire width of each scanline (including badlines), which is a good way to visualize your stable raster. It will result in something like this:

Now, for rasterbars this is overkill. This video was recorded with Vice, where the borders were set to ‘debug’. This means you see the full 63 cycles on screen. In reality the first part of the scanline is lost in the horizontal retrace which happens ‘on the left’ of the left border. So for rasterbars it is not required to be this cycle-exact. Even a few cycles of jitter could be hidden in this invisible part of the screen. But, it does show that we indeed have a jitter-free raster now, and we can count out cycles for any position on the screen, to perform any kind of VIC-II trickery we can think of.

Up close and personal

Well, that concludes this first encounter with C64 programming. A kind of programming that is very lowlevel. Not only are you doing everyhing in assembly language, but you also have to keep track of the cycles spent by your CPU and your VIC-II chip. You need to get to know your hardware inside-out, even more so than with the Amiga. Let alone modern computers. It isn’t even possible to know exactly what a modern computer is doing at every cycle anymore (and to people who think they’re retro when they code rasterbars by just drawing horizontal lines in an RGB backbuffer on a modern system: that has absolutely nothing to do with what the effect is about, as my explanation should have shown).

This ‘up close and personal’ style of programming probably also explains why cracking, training games, and modifying other people’s programs in various other ways were such popular pastimes on the C64. Programmers had a very good idea of what was happening inside the machine, and had no problem reverse-engineering other people’s code. Some programmers did not even use an assembler at all, but just coded directly with a monitor. They were used to ‘cryptic’ code, where memory addresses and things were just coded directly in hex, rather than using some ‘nice’ symbolic names. I noticed that Amiga code was also still quite ‘primitive’ and hardcore in this respect. Probably because many Amiga users started out on C64, and just continued to code that way on the Amiga later.

Racing the beam also confronts you directly with just how slow a 1 MHz 6510 CPU really is. Even a single cycle is already visible as 8 pixels at 320×200. And just storing a byte to screen memory already takes 4 cycles at the least. The 286 and Amiga that I’ve covered before, were already too slow to just do regular software rendering, and required all sorts of trickery to be able to fill polygons quickly. But on the C64 it seems even less likely that it will do any kind of 3d graphics. The CPU cannot even perform multiply or divide operations either (which I will probably cover in a future blog)! Nevertheless, people have been doing quite impressive 3d graphics on the C64.

This entry was posted in Oldskool/retro programming, Software development and tagged , , , , , , , , , , , , , , , . Bookmark the permalink.

6 Responses to Just keeping it real, part 9

  1. Pingback: Revision 2013 is keeping it real | Scali's OpenBlog™

  2. Pingback: Just keeping it real… like it’s 1991 | Scali's OpenBlog™

  3. Pingback: CGADEMO by Codeblasters | Scali's OpenBlog™

  4. Note RE: Radar Rat Race (VIC20) The VIC-I cannot produce Equal Temperament scales. Its not a tone deaf issue

  5. Lars Wadefalk says:

    Thanks for the walkthru.
    That Kefrens demo is the sickest thing ever.
    Most of the stuff they do in it is as hard as getting to the moon.

Leave a comment