The IBM PCjr was a huge flop in the marketplace. As such, it has only been in production for about 14 months, and never even reached my part of the world. When I grew up, I had a vague notion that these machines exist, since many games offered enhanced Tandy audio and video, and some would advertise it as PCjr (which is what it was). I never actually saw a Tandy machine in the flesh though, let alone a PCjr. But it had always intrigued me: apparently there were these PCs that had better graphics and sound than the standard PCs and clones that I knew. A few weeks ago though, I finally got my own PCjr, a 128 kb model with floppy drive, and I would like to share a quick list of what makes it cool, and what does not. Not just as a user, but as a retro-coder/demoscener.
What is cool?
- The video chip does not suffer from ‘CGA snow’ in 80 column mode
- You get 16 colours in 320×200 mode and 4 colours in 640×200 mode, as opposed to 4 and 2 colours respectively on CGA
- You get a 4-channel SN76496 sound chip
- There is no separate system and video memory, so you can use almost the entire 128k of memory for graphics, if you want (the video memory is bank-switched, much like on a C64)
- The machine comes in a much lighter and smaller case than a regular PC
- The keyboard has a wireless infrared interface
- It has a ‘sidecar’ expansion mechanism
- It has two cartridge slots
What is uncool?
- Because the video memory and system memory are shared, the video chip steals cycles from the CPU
- 128k is not a lot for a machine that has to run PC DOS, especially if part of that memory is used by the video chip
- IBM omitted the DMA controller on the motherboard
- All connectors are proprietary, so you cannot use regular PC monitors, joysticks, expansion cards or anything
- The keyboard has a wireless infrared interface
Let me get into the ‘uncool’ points in some more detail.
Shared video memory
Shared memory was very common on home computers in the 80s. Especially on a 6502-based system, this could be done very elegantly: The 6502 can only access memory every other cycle. So by cleverly designing your video circuitry, you could make it run almost entirely in the unused memory cycles of the 6502. The C64 is an excellent example of this: most of the video is done in the unused cycles. There are only two exceptions: sprites and colorram. At the beginning of each scanline, the VIC-II chip will steal some cycles to read data for every enabled sprite. And every 8th scanline, the VIC-II will load a new line from colorram. Those are the only cycles it steals from the CPU.
The PCjr however, does not use a 6502, it uses an 8088. And an 8088 can and will access memory at every cycle. As a result, the video circuit will slow down the CPU. It will steal one in every 4 IO cycles (one IO cycle is 4 CPU cycles at 4.77 MHz). As a result, the CPU runs at only 3/4th of the effective speed, about 3.57 MHz effectively.
On the bright side though, the video accesses also refresh the memory. This is also very common on home computers in the 80s. PCs are an exception however. The solution that IBM came up with for this is both creative and ugly: IBM wired the second channel of the 8253 timer to the first channel of the 8237 DMA controller. This way the timer will periodically trigger a DMA read of a single byte. This special read is used as a memory refresh trigger. By default, the timer is set to 18 IO cycles. So on a regular PC, the CPU runs at about 17/18th of the effective speed, about 4.5 MHz. Considerably faster than the PCjr.
The downside of the regular PC however is that the memory refresh is not synchronized to the screen in any way. On the PCjr it is, so it is very predictable where and when cycles are stolen. It always happens in the same places on the same scanline (again, much like the C64 and other home computers). In 8088 MPH, we circumvented this by reprogramming the timer to 19 IO cycles (this means the memory is refreshed more slowly, but there should be plenty of tolerance in the DRAM chips to allow this without issue in practice). An entire scanline on CGA takes 76 IO cycles, so 19 is a perfect divider of the scanline. The trick was just to get the timer and the CRTC synchronized: ‘in lockstep’. On a PCjr you get this ‘lockstep’ automatically, it is designed into the hardware.
128 kb ought to be enough for anyone
The first PC had only 16kb in the minimum configuration. This was enough only for running BASIC and using cassettes. For PC DOS you would need 32kb or more. However, by 1984, when the PCjr came out, it was common for DOS machines to have much more memory than that. Since the PCjr shares its video memory with the system, you lose up to 32kb for the framebuffer, leaving only 96kb for DOS. That is not a lot.
What is worse, the unique design of the PCjr makes it difficult to even extend the memory beyond 128kb. There are two issues here:
- The memory is refreshed by the video circuit, so only the 128kb that is installed on the mainboard can be refreshed automatically.
- The video memory is placed at the end of the system memory, so in the last 32kb of the total 128kb.
It is complicated, but there are solutions to both. Memory expansions in the form of sidecars exist. These contain their own refresh logic, separate from the main memory. An interesting side-effect is that this memory is faster than the system memory. Namely, the system memory is affected by every access of the video circuit, which is a lot more than the minimum number of accesses required for refreshing. So the memory expansion steals less cycles from the CPU. So when you use code and data in this part of the memory, the CPU will run faster. With some memory expansions (for example ones based on SRAM, which does not need refresh at all), the CPU is actually faster than on a regular PC.
The second problem is that if you extend the memory beyond 128kb, there will be a gap’ for the video memory in the first 128kb. DOS and applications expect the system memory to be a single consecutive block of up to 640kb. So it will just allocate the video memory as regular memory, leading to corruption and crashes.
There is a quick-and-dirty solution to this: after DOS itself is loaded, load a device driver that allocates the remaining memory up to 128kb. This driver then effectively ‘reserves’ the memory, so that it will not be re-used by DOS or applications. You will lose some of your memory, but it works.
Most games with enhanced PCjr graphics and/or audio are actually aimed at Tandy 1000 machines, and will require more than 128kb. The Tandy 1000 however is designed to take more than 128kb, and its videomemory is always at the end of the system memory, regardless of the size. This means that not all games for Tandy 1000 will run on a PCjr as-is. If it’s games you want, the Tandy is the better choice hands down.
To preserve as much memory as possible, you will probably want to use the oldest possible DOS, which is PC DOS 2.10. The latest version of DOS to officially support the PCjr is PC DOS 3.30. The main feature you would be missing out on, is support for 3.5″ floppies and HD floppies. But your PCjr does not support drives for those floppies anyway, so there’s no real reason to run the newer version. There was never any support for hard disks for the PCjr either, although in recent years, some hobbyists have developed the jr-IDE sidecar. Since this also gives you a full 640k memory upgrade, you can run a newer DOS with proper support for the hard drive without a problem anyway.
No DMA controller
As already mentioned, the original PC uses its DMA controller for memory refresh. That part is solved by using the video chip on the PCjr. But the DMA controller is also used for other things. As I blogged earlier, it is used for digital audio playback on sound cards. That will not be a problem, since there are no ISA slots to put a Sound Blaster or compatible card in a PCjr anyway.
But the other thing that DMA is used for on PCs is floppy and harddisk transfer. And that is something that is great for demos. Namely, we can start a disk transfer in the background, while we continue to play music and show moving graphics on screen, so we can get seamless transitions between effects and parts.
On the PCjr, not a chance. The floppy controller requires you to poll for every incoming byte. Together with the low memory, that is a bad combination. This will be the most difficult challenge for making interesting demos.
This one is self-explanatory: you need special hardware, cables and adapters for PCjr. You cannot re-use hardware from other PCs.
I listed this as both ‘cool’ and ‘uncool’. The uncool parts are:
- It requires batteries to operate.
- You can’t use a regular keyboard, only the small and somewhat awkward 62-key one.
- The wireless interface is very cheap. It is connected to the Non-Maskable Interrupt (as discussed earlier), and requires the CPU to decode the signals.
This means that the keyboard can interrupt anything. The most common annoyance that people reported is that you cannot get reliable data transfers via (null-)modem, since the keyboard interface will interrupt the transfer and cause you to lose bytes.
It also means that keyboard events are much slower to handle on the PCjr than on a regular PC.
And it means that the interface is somewhat different. On a real PC, the keyboard triggers IRQ 1 directly. You can then read the scancode directly from the keyboard controller (port 60h). On the PCjr, the NMI is triggered by hardware. This has to decode the bits sent via the wireless interface with time-critical loops. This will give PCjr-specific scancodes. These are then translated by another routine on the CPU. And finally the CPU will generate a software interrupt to trigger the IRQ 1 handler, for backward compatibility with the PC.
For me personally, the PCjr definitely scores as ‘cool’ overall. I don’t think I would have liked it all that much if it were my main PC back in the day. It is very limited with so little memory, just one floppy drive, and no hard drive. But as a retro/demoscene platform, I think it offers just the right combination of capabilities and limitations.
I am not sure I understand how “memory bus stealing” works. You say that the video circuitry “steals” one IO in every 4 cycles, meaning that the CPU runs at the effective speed of 3.57MHz instead of 4.77MHz.
I have never worked so low level in such a configuration, and I’m no expert in 8086/8088. But I always believed that the CPU would be forbidden to **access** the bus one in every four cycles. Meaning that it would stall if an access was required at the wrong time, but also that it would not be impacted if it had to do something “internal”. Like using an internal register or performing a muli-cycle operation. If so, the performance impact would be highly dependent of the nature of the code, and would not be proportional to the amount of “steals”.
Yes, I have to elaborate:
Indeed, I specifically say it steals IO cycles, which does not necessarily halt the entire CPU, it merely prevents the CPU from accessing the bus. However, we are specifically talking about an 8088 here. And an 8088 is basically a 16-bit CPU shoehorned onto an 8-bit bus. Since the instructionset was also designed for a 16-bit bus, many instructions are two or more bytes long, and require multiple IO cycles to fetch from memory. The 8086 and 8088 have been designed with a primitive code cache, known as the prefetch buffer (which for some reason is 6 bytes on the 8086 but only 4 bytes on the 8088). This means that the 8088 always tries to fetch instructions from memory ahead of time. But because the 8-bit bus is so slow, this is highly ineffective, and the 8088 is effectively bottlenecked by the speed at which it can fetch instructions, since it generally can’t fill the prefetch buffer as quickly as it executes code (a byte takes 4 cycles to fetch, so for a 2-byte instruction, you already need 8 cycles to fetch. Many of them execute in only 2-4 cycles).
There are a few exceptions indeed, where the instructions themselves are complex, and require more cycles to execute than they do to fetch from memory. Multiply and division are such examples. However, since they are so slow, most code tries to avoid using them, and uses addition, shifting, or precalculated tables. Which brings you back to memory access being the primary bottleneck.
So yes, technically you are correct that given the right mix of instructions, the PCjr’s effective speed can be higher than 3.57 MHz. You may even get remarkably close to 4.77 MHz in isolated cases.
In practice however, it really is ~3.57 effectively, since 8088-optimized code is pretty much entirely IO-limited. Optimizing 8088 code for speed generally starts by optimizing it for size: every byte you can shave off gains you 4 CPU cycles.
Thanks for this extensive and very clear answer! Your blog is a great source of info by the way. Kudos for all the work done!
Pingback: A great upgrade for the PCjr: the jr-IDE | Scali's OpenBlog™