The strong ARM

I’ve done some posts on x86 vs ARM over the years, most recently on the new Microsoft Surface Pro X, which runs a ‘normal’ desktop version of Windows 10 on an ARM CPU, while also supporting x86 applications through emulation. This basically means that Microsoft is making ARM a ‘regular’ desktop solution that can be a full desktop replacement.

Rumours of similar activity in the Apple camp have been going around for a while as well. Ars Technica has run a story on it now, as it seems that Apple is about to make an official announcement.

In short, Apple is planning to do the same as Microsoft: instead of having their ARM devices as ‘second class citizens’, Apple will make a more conventional laptop based on a high-end ARM SoC, and will run a ‘normal’ version of macOS on it. So again a ‘regular’ desktop solution, rather than the iOS that current ARM devices run, which cannot run regular Mac applications. At this point it is not entirely clear whether these ARM devices can also run x86 applications. However, in the past, Apple did exactly that, to make 68k applications run on the PowerPC Macs, for a seamless transition. And they offered the Rosetta environment for the move from PowerPC to x86.

Aside from using emulation/translation to run applications as-is, they also offered a different solution however: they provided developers with a compiler that would generate code for multiple CPU architectures into a single binary (so both 68k and PPC, or both PPC and x86), a so-called Fat binary or Universal binary. The downside of this solution is of course that it requires applications to be compiled with this compiler, which rules out any x86 applications currently on the market.

In this sense it does not help that Intel is still struggling to complete their move from 14nm to 10nm and beyond. Apple can have its ARM SoCs made on 7nm, which should help to close the performance gap between ARM and high-end x86. I suppose that means that Intel will have to earn its right to be in Macs from now on. If Intel can maintain a performance benefit, then x86 and ARM can co-exist in the Mac ecosystem. But as soon as x86 and ARM approach performance parity, then Apple would have little reason to continue supporting x86.

Interesting times ahead.

Posted in Hardware news | Tagged , , , , , , , , , , | 2 Comments

Batch, batch, batch: Respect the classics!

Today I randomly stumbled upon some discussions about DirectX 12, Mantle and whatnot. It seems a lot of people somehow think that the whole idea of reducing draw call overhead was new for Mantle and DirectX 12. While some commenters managed to point out that even in the days of DirectX 11, there were examples of presentations from various vendors talking about reducing draw call overhead, that seemed to be as far back as they could go.

I on the other hand have witnessed the evolution of OpenGL and DirectX from an early stage. And I know that the issue of draw call overhead has always been around. In fact, it really came to the forefront when the first T&L hardware arrived. One example was the Quake renderer, which used a BSP tree, to effectively depth-sort the triangles. This was a very poor case for hardware T&L, because it created a draw call for every individual triangle. Hardware T&L was fast if it could process large batches of triangles in a single go. But the overhead of setting the GPU up for hardware T&L was quite large, given that you had to initialize the whole pipeline with the correct state. So sending triangles one at a time in individual draw calls was very inefficient on that type of hardware. This was not an issue when all T&L was done on the CPU, since all the state was CPU-side anyway, and CPUs are efficient at branching, random memory access etc.

This led to the development of ‘leafy BSP trees’, where triangles would not be sorted down to the individual triangle level. Instead, batches of triangles were grouped together into a single node, so that you could easily send larger batches of triangles to the GPU in a single draw call, and make the hardware T&L do its thing more efficiently. To give an idea of how old this concept is, a quick Google drew up a discussion on BSP trees and their efficiency with T&L hardware on from 2001.

But one classic presentation from NVIDIA that has always stuck in my mind is their Batch Batch Batch presentation from the Game Developers Conference in 2003. This presentation was meant to ‘educate’ developers on the true cost of draw calls on hardware T&L and early programmable shader hardware. To put it in perspective, they use an Athlon XP 2700+ GHz CPU and a GeForce FX5800 as their high-end system in that presentation, which would have been cutting-edge at the time.

What they explain is that even in those days, the CPU was a huge bottleneck for CPUs. There was so much time spent on processing a single call and setting up the GPU, that you basically got thousands of triangles ‘for free’ if you would just add them to that single call. At 130 triangles or less, you are completely CPU-bound, even with the fastest CPU of the day.

So they explain that the key is not how many triangles you can draw per frame, but how many batches per frame. There is quite a hard limit to the number of batches you can render per frame, at a given framerate. They measured about 170k batches per second on their high-end system (and that was a synthetic test doing only the bare draw calls, nothing fancy). So if you would assume 60 fps, you’d get 170k/60 = 2833 batches per frame. At one extreme of the spectrum, that means that if you only send one triangle per batch, you could not render more than 2833 triangles per frame at 60 fps. And in practical situations, with complex materials, geometry, and the rest of the game logic running on the CPU as well, the number of batches will be a lot smaller.

At the other extreme however, you can take these 2833 batches per frame, and chuck each of them full of triangles ‘for free’. As they say, if you make a single batch 500 triangles, or even 1000 triangles large, it makes absolutely no difference. So with larger batches, you could easily get 2.83 million triangles on screen at the same 60 fps.

And even in 2003 they already warned that this situation was only going to get worse, since the trend was, and still is, that GPU performance scales much more quickly than CPU performance over time. So basically since the early days of hardware T&L the whole CPU overhead problem has been a thing. Not just since DirectX 11 or 12. These were the days of DirectX 7, 8 and 9 (they included numbers for GeForce2 and GeForce4MX cards, which are DX7-level, they all suffer the same issue. Even a GeForce2MX can do nearly 20 million triangles per second if fed efficently by the CPU).

So as you can imagine, a lot of effort has been put into both hardware and software to try and make draw calls more efficient. Like the use of instancing, rendering to vertexbuffers, supporting texture fetches from vertex shaders, redesigned state management, deferred contexts and whatnot. The current generation of APIs (DirectX 12, Vulkan, Mantle and Metal) are another step in reducing the bottlenecks surrounding draw calls. But although they reduce the cost of draw calls, they do not solve the problem altogether. It is still expensive to send a batch of triangles to the GPU, so you still need to feed the data efficiently. These APIs certainly don’t make draw calls free, and we’re nowhere near the ideal situation where you can fire off draw calls for single triangles and expect decent performance.

I hope you liked this bit of historical perspective. The numbers in the Batch Batch Batch presentation are very interesting.

Posted in Direct3D, Oldskool/retro programming, OpenGL, Software development, Vulkan | Tagged , , , , , , , , , , , | 4 Comments

Politicians vs entrepreneurs

Recently the discussion of a newly published book caught my attention. The book investigated some of the ramifications of the financial crisis of 2007-2008. Specifically, it investigated how a bank received government support. This was done in the form of the government buying a lot of controlling shares in the bank, and also installing a CEO. This CEO was a politician.

In short, the story goes that initially he did a good job, by carefully controlling the bank’s spendings and nursing the bank back to health. However, as time went on, the bank was ready to grow again, and invest in new projects. This became a problem, partly because of the government as a larger shareholder, and partly because of the CEO being a politician. They were reluctant to take risks, which resulted in various missed opportunities. Ironically enough it also meant that the government could have cashed in their shares at an opportune moment, and they would have had their full investment back, and even a profit. But because the government was reluctant to do so at the time, it is unlikely that they will get another opportunity soon, as the current Corona-crisis made the shares drop significantly, and the government would be at a huge loss again.

This in turn led to an internal struggle between the CEO and other members of the board, who wanted ‘real’ bankers, more willing to take risks, and expand on opportunities. Eventually it led to the ousting of the CEO.

What struck me with this story was that I recognize these different management styles in software as well. I’d like to name “Delphi” as a key word here. Back in my days at university, I once did an internship with two other students, at a small company. As a result, Delphi has been on my resume for ages, and I ended up doing projects at various different Delphi-shops. This caused me to realize at some point that you should not put skills on your resume that you don’t want to use.

Why Delphi?

Delphi is just an example I’m giving here, because I have first-hand experience with this situation. There are various other, similar situations though. But what is the issue with Delphi? Well, for that, we have to go back to the days of MS-DOS. A company by the name of Borland offered various development tools. Turbo Pascal was one of them, and it was very popular with hobbyists (and also demosceners). It had a very nice IDE for the time, which allowed you to build and debug from the same environment. Its compile-speed was revolutionary. And in those days, that mattered. Computers were not very fast, and it could take minutes to build even a very simple program, before you could run, test and debug it.

Turbo Pascal was designed to make building and linking as fast and efficient as possible (see also here). Today you may be used to just hitting “build and run in debugger” in your IDE, because it just takes a few seconds, and it’s an easy to way see if your new addition/fix will compile and work as expected. But in those days, that was not an option in most environments. Turbo Pascal was one of the first environments that made this feasible. It led to an entirely different way of developing. From meticulously preparing and writing your code to avoid any errors, the compiler became a tool to check for errors.

When the move was made to from MS-DOS to Windows, in the 90s, a new generation of Turbo Pascal was developed by Borland. This version of Turbo Pascal was called Delphi. Windows was an entirely different beast from MS-DOS though. DOS itself was written in assembly, and interacting with DOS or the BIOS required low-level programming (API calls were done via interrupts). This, combined with the fact that machines in the early days of DOS were limited in terms of CPU and memory, meant that quite a lot of assembly code was used. Windows however was written in a combination of assembly and C, and its APIs had a C interface.

As a result, not everyone who used Turbo Pascal for DOS, would automatically move to Delphi. Many developers, especially the professional ones, would use C/C++. And for the less experienced developers, there now was a big competitor in the form of Visual Basic. Where Delphi was supposed to promote its IDE and its RAD development as key selling points, Visual Basic now offered similar functionality for fast application development, but combined it with a BASIC dialect, which was easier to use than Pascal, for less experienced developers.

This meant that Delphi quickly became somewhat of a niche product. It was mainly used by semi-professionals. People who couldn’t or wouldn’t make the switch to C/C++, but who were too advanced to be using something like Visual Basic. The interesting thing is that even though during my internship in the early 2000s, I already felt that Delphi was a niche product, on its way out, it still survives to this day.

Delphi as a product has changed hands a few times. Borland no longer exists. Today, a company by the name of Embarcadero maintains Delphi and various other products originating from Borland, and they still aim to release a new major version every year.

While I don’t want to take away from their efforts (Delphi is a good product for what it is: a Pascal-based programming environment for Windows and other platforms), fact of the matter is that Embarcadero is a relatively small company, and they are basically the only ones aiming for Pascal solutions. Compare that to the C/C++ world, where there are various different vendors of compilers and other tools, and most major operating systems and many major applications are actually developed with this language and these tools. The result is that interfacing your code with an OS or third-party libraries, devices, services and whatnot is generally trivial and well-supported in C/C++, while you are generally on your own in Delphi.

And that’s just comparing Delphi with C/C++. Modern languages have since arrived, most notably C#, and these modern languages make development easier and faster than Delphi with its Pascal underpinnings. Which is not entirely a coincidence, given that Anders Hejlsberg, the original developer of Turbo Pascal and the lead architect of Delphi, left Borland for Microsoft in 1996, and became the lead architect of C#.

Back to the point

As I said, the use of Delphi can generally be traced back to semi-professional developers who started using Turbo Pascal in DOS. For the small company of my internship that was certainly the case. Clearly, being dependent on Delphi is quite a risk as a business. Firstly because there is only one supplier of your development tools. And development tools need maintenance. It has always been common for Delphi (and other development tools) to require updates when new versions of Windows were released. Since development tools tend to interact with the OS at a relatively low level, to make debugging and profiling code possible, they also tend to be more vulnerable to small changes in the OS than regular applications. So if Embarcadero cannot deliver an update in time, or even at all, you may find yourself in the situation that your application can not be made to work on the latest version of Windows.

Another risk stems from the fact that Delphi/Pascal is such a niche language. Not many developers will know the language. Most developers today will know C#, Java, C/C++. They can find plenty of jobs with the skills they already have, so they are not likely to want to learn Delphi just to work for you. The developers that remain, are generally older, more experienced developers, and their skills are a scarce resource which will be in demand, so they will be more expensive to hire.

This particular company was so small, that it was not realistic to expect them to migrate their codebase to another language. The migration itself would be too risky and have too much of an impact. With the amount of development resources they have, it would take years to migrate the codebase (even so, I would still recommend to develop new things in C/C++ or C# modules, which integrate into the existing codebase, and whenever there is time, also convert relevant existing code to such C/C++ or C# modules so that eventually a Delphi-free version of the application may be within reach).

However, over time I also worked at other companies that mainly used Delphi. And I’ve come to see Delphi as a red flag. The pattern always appeared to be that just a few semi-professionals with a Turbo Pascal background developed some core technology that the company was built on, and moving to Delphi was the logical/only next step.

Some of these companies ‘accidentally’ grew to considerable size (think 100+ employees), yet they never shook their Delphi roots, even though in this case the risk-factor of developer or other resources would not apply. All the other risks do, of course. So it should be quite obvious that action is required to get away from Delphi as quickly as possible.

Politician or entrepreneur?

That brings me to the original question. Because it seems that even though these companies have grown over time, their semi-professionalism from their early Turbo Pascal/Delphi days remains, and is locked into their company culture.

So the people who should be calling the shots, don’t want to take any risks, and just want to try and please everyone. The easiest way of doing that is to retain the current status quo. And that sounds an awful lot like a politician. Especially if you factor in that these people are semi-professionals, not true professionals. They may not actually have a proper grasp of the technology their company works with. They merely work based on opinions and second-hand information. They are reactive, not proactive.

Ironically it tends to perpetuate itself, because when that is the company culture, the people they tend to hire, will also be the same type of semi-professionals (less skilled developers, project managers without a technical background etc). Should they ‘accidentally’ hire a true professional/entrepreneur, then this person is not likely to function well in this environment. Those people would want to improve the company, update the culture, and be all they can be. But that may rub too many people the wrong way.

With a true entrepreneur it’s much easier to explain risks and possibilities, and plot a path to a better future. They will be more likely to try new things, and understand that not every idea may lead to success, so they may be more forgiving for experimentation as well (I don’t want to use the world ‘failure’ here, because I think taking risks should not be done blindly. You should experiment and monitor, and try to determine as early as possible whether an idea will be a success or not, so that you minimize the cost of failed ideas).

I think it’s the difference between looking at the past, and trying to hold on to what you’ve got, versus looking to the future and trying to gauge what you can do better, using creativity and innovation. A politician may be good in times of crisis, to try and minimize losses. But they will never bring a company to new heights.

And my experience in such companies is that they still use outdated/archaic tools, and tend to have very a very outdated software portfolio. Still selling products based on source code that hasn’t had any proper maintenance in over 10 years. Constantly running into issues like moving to Windows 10 or moving to 64-bit, which is not even an issue in the first place for other organizations, because they had already updated their tools and codebase before this ever became an issue (for example, C# is mostly architecture-agnostic, so most C# code will compile just fine for 32-bit and 64-bit, x86 or ARM. And since the .NET framework is part of Windows itself, your C# code will generally work fine out-of-the-box on a new version of Windows).

Being reactive only is a recipe for a technical debt disaster. I have experienced that they would not do ANY maintenance on their codebase whatsoever, outside of their projects. So there was no development or maintenance on their projects, unless they had a paying customer who specifically wanted a solution. Which also meant that the customer had to pay for all maintenance. This was an approach that obviously was not sustainable, since you could not charge the customers for what it would cost to do proper maintenance and solve all the technical debt. It would make your product way too expensive. The company actually wanted to have competitive pricing of course, even trying to undercut competitors. And project managers would also want to keep things as cheap as possible, so the situation only got worse over time.

I think Microsoft shows a very decent strategy for product development with Windows. Or at least, they did in the past 20+ years. For example, they made sure that Windows XP was a stable version. They could then move to a more experimental Windows, in the form of Vista, where they could address technical debt, and also add new functionality (such as Media Foundation and DirectX 10). Vista may not have been a huge success, but there was always XP for customers to fall back on. The same pattern repeated with Windows 7 and Windows 8-10. Windows 7 continued what Vista started, but made it stable and reliable for years to come. This again gave Microsoft the freedom to experiment with new things (touch interfaces, integrating with embedded devices, phones, tablets etc, and the Universal Windows Platform). Windows 8 and 8.1 were again not that successful, but Windows 10 is again a stable version of this technology.

So in general, you want to create a stable version of your current platform, for your customers to fall back on. The more stable you make this version, the more freedom you have to experiment with new and innovative ideas, and get rid of your technical debt.

I just mentioned Delphi as an obvious red flag that I encountered over the years, but I’m sure there are plenty of other red flags. I suppose Visual Basic would be another one. Please share your experiences in the comments.

Posted in Software development | 2 Comments

Some thoughts on emulators

Recently I watched Trixter’s latest Q&A video on YouTube, and at 26:15 there was a question regarding PC emulators:

That got me thinking, I have some things I’d like to share on that subject as well.

First of all, I share Trixter’s views in general. Although I am a perfectionist, I am not sure if perfectionism is my underlying reason in this case though. I think emulators should strive for accuracy, which is not necessarily “perfection”. It is more of a pragmatic thing: you want the emulator to be able to run all software correctly.

However, that leads us into a sort of chicken-and-egg problem, which revolves around definitions as well. What is “all software”? What does “correctly” mean? And in the case of a PC emulator, there’s even the question of what a “PC” is exactly. There are tons of different hardware specs for PCs, even if you only look at the ones created by IBM themselves. Let alone if you factor in all the clones and third-party addons. I will just give some thoughts on the three subjects here: What hardware? What software? What accuracy/correctness?

What hardware?

While the PC is arguably the most chaotic platform out there, in terms of different specs, we do see that emulators for other platforms also factor in hardware differences.

For example, if you look at the C64, at face value it’s just a single machine. However, if you look closer, then Commodore has always had both an NTSC and a PAL version of the machine. Their hardware was similar, but due to the different requirements of the TV signals, the NTSC and PAL machines were timed differently. This also led to software having to be developed specifically for an NTSC or PAL machine.

As a result, most emulators offer both configurations, so that you can run software targeted at either machine. Likewise, there are some differences in different revisions of the chips. Most notably the SID sound chip. While they are compatible at the software level, the 6581 version sounds quite different from the later 8580 version. Most emulators therefore offer you to select from various chips, so that the sound most closely matches that specific revision of machine.

The PC world is not like that however. There were so many different clone makers around, and most of these clones were far from perfect, that the number of different possible configurations would be impossible to configure and emulate. At the same time, the fact that basically no two machines were exactly alike, also makes it less relevant to emulate every single derivation. As long as you can emulate one machine ‘in the ballpark’, it gives you exactly the same experience as real hardware did back in the day.

So the question is more about defining which ‘ballparks’ you have. I would say that the original IBM PC 5150 would make a lot of sense to emulate correctly, as a starting point. This is the machine that the earliest software was targeted at, and also the machine that early clones were cloning.

The PC/XT 5160 and 5155 are just slight derivations of the 5150, and the differences generally do not affect software, they only matter for physical machines. For example, they no longer have the cassette port, and they have 8 expansion slots with a slightly narrower form factor than the 5150 did.

Likewise, because most clones of that era are generally imperfect, and could not run all software correctly, they are less interesting as an emulator target.

Another two machines that make an interesting ballpark are the IBM PCjr and the Tandy 1000. They are related to the original PC, but offer extended audio and video capabilities. The Tandy 1000 was more or less a clone of the PCjr, but the PCjr was a commercial flop, while the Tandy 1000 was a huge success. In practice, this means a lot more software targets the Tandy 1000 specifically, rather than the PCjr original.

From then on, the PC standard became more ‘blurred’. Clones took over from IBM, and software adapted to this situation, by being more forgiving about different speeds, or slight incompatibilities between chipsets and such. So perhaps a last ‘exact’ target could be the IBM AT 5170, but after that, just “generic” configurations for the different CPU types (386, 486, Pentium etc) would be good enough, because that’s basically what the machines were at that point.

What software?

For me the answer to this one is simple: One should strive to be able to run all software. I have seen various emulator devs dismiss 8088 MPH, because it is the only software of its kind, in how it uses the CGA hardware to generate extra colours and special video modes. I don’t agree with that argument.

The argument also seems to be somewhat unique to the PC emulator scene. If you look at C64 or Amiga emulators, they do try to run all software correctly. Even when demos or games find new hardware tricks, emulators are quickly modified to support this.

What accuracy/correctness?

I think this is especially relevant for people who want to use the emulator as a development tool. In the PC scene, it is quite common that demos are developed exclusively on DOSBox, and they turn out not to run on real hardware at all. Being able to run as much software as possible is one thing. But emulators should not be more forgiving than real hardware. Code that fails on real hardware, should also fail on an emulator.

An interesting guideline for accuracy/correctness is to emulate “any externally observable effects”. In other words: you can emulate things as a black box, as long as there is no way that you can tell the difference from the outside. At the extreme, it means you won’t have to emulate a machine down to the physical level of modeling all gates and actually emulating the electrons passing through the circuit. Which makes sense in some way, because the integrated circuits that make up the actual hardware are also black boxes to a certain extent. Only the input and output signals can be observed from the outside.

However, that is difficult to put in practice, because what exactly are these “externally observable effects”? It seems that this is somewhat of a chicken-and-egg problem. A definition that may shift as new tricks are discovered. I already mentioned 8088 MPH, which was the first to use the NTSC artifacting in a new way. Up to that point, emulators had always assumed that you could basically only observe 16 different artifact colours. It was known that there was more to NTSC artifacts than just these fixed 16 colours, but because nobody ever wrote any software that did anything with it, it was ignored in emulation, because it was not ‘externally observable’.

Another example is the C64 demo Vicious Sid. It has a “NO SID” part:

It exploits the fact that there is a considerable amount of crosstalk between video and audio in the C64’s circuit. So by carefully controlling the video signal, you can effectively play back controlled audio by means of this crosstalk.

So although it was known that this crosstalk exists, it was ignored by emulators, as it was just considered ‘noise’. However, Vicious Sid now does something ‘useful’ with this externally observable effect, so it should be emulated in order to run this demo correctly. And indeed, emulators were modified to make the video signal ‘bleed’ into the audio, like on a real machine.

This also indicates that there may be various other externally observable effects that are already known, but ignored in emulators so far, just waiting to be exploited by software in the future.

Getting back to 8088 MPH, aside from the 1024 colours, it also has some cycle-exact effects. These too cause a lot of problems with emulators. One reason is the waitstates that can be inserted on the data bus by hardware. CGA uses single-ported memory, so it cannot have both the video output circuit and the CPU access the video RAM at the same time. Therefore, it inserts waitstates on the bus, to block the CPU whenever the output circuit needs to access the video RAM.

This was a known externally observable effect, but no PC emulator ever bothered to emulate the hardware to this level, as far as I know. PC emulators tend to just emulate the different components in their own vacuum. In reality all components share the same bus, and therefore the components can influence each other. It is relevant that waitstates are actually inserted on the bus, and are actually adhered to by other components.

It is also relevant that although the different components may run on the same clock generator, they tend to have their own clock dividers internally, and this means that the relative phase of components to each other should also be taken into account. That is, there is a base clock of 14.31818 MHz on the motherboard. The CPU clock of 4.77 MHz is derived from that by dividing it by 3. Various other components run at other speeds, derived from that same base clock, such as 1.19 MHz for the PIT and 3.58 MHz for the NTSC colorburst and related timings.

We have found during development of 8088 MPH that the IBM PC is not designed to always start in the exact same state. In other words, the dividers do not necessarily all start at the same cycle on the base clock, which means that they can be out of phase in various ways. The relative phase of mainly CPU, PIT and CGA circuit may change between different power-cycles. In 8088 MPH this leads to the externally observable effect that snow may not always be hidden in the border during the plasma effect. You can see this during the party capture:

The effect was designed to hide the snow in the border. And during development it did exactly that. However, when we made this capture at the party, the machine was apparently powered on in one of the states that the waitstates were shifted to the right somewhat. There are two ‘columns of snow’ hidden in the border normally. But because of this phase shift, the second column of snow was now clearly visible on the left of the screen.

We did not change the code for the final version. But since we were now aware of the problem, we just power-cycled the machine until it was in one of the ‘good’ phase states, before we did the capture (it is possible to detect which state the system is in, via software. As far as we know it is not possible to modify this state in any way though, through software, so only a power-cycle can change it):

So in general I think this really is a thing that emulators MUST do: components should really interact with eachother, and the state of the bus really is an externally observable effect. As is the clock phase.

For most other emulators this is apparent, because software on a C64, Amiga, Atari ST or various other platforms tends to have very strict requirements for timing anyway. More often than not, software will not work as intended if emulation is just a cycle off at all. For PCs it is not that crucial, but I think that at least for the PC/XT platforms, this exact timing should be an option. Not just for 8088 MPH, but for all the cool games and demos people could write in the future, if they have an emulator that enables them to experiment with developing this type of code.

Related to that is the emulation of the video interface. Many PC emulators opt for just emulating the screen one frame at a time, or per-scanline at best. While this generally ‘works’, because most software tries to avoid changing the video state mid-frame or mid-scanline, it is not how the hardware works. If you write software that changes the palette in the middle of a scanline, then that is exactly what you will see on real hardware.

Because at the end of the day, let’s face it: that is how these machines work. You should emulate how the machine works. And this means it is more than the sum of its parts. Emulating only the individual components, while ignoring any interaction, is an insufficient approximation of the real machine.

Posted in Oldskool/retro programming | Tagged , , , , , , , , , , , , | Leave a comment

Windows and ARM: not over yet

As you may recall, I was quite fond of the idea of ARM and x86 getting closer together, where on the one hand, Windows could run on ARM devices, and on the other hand, Intel was developing smaller x86-based SOCs in their Atom line, aimed at embedded use and mobile devices such as phones and tablets.

It has been somewhat quiet on that front in recent years. On the one hand because Windows Phones never managed to gain significant marketshare, and ultimately were abandoned by Microsoft. On the other hand because Intel never managed to make significant inroads into the phone and tablet market with their x86 chips either.

However, Windows on ARM is not dead yet. Microsoft recently announced the Surface Pro X. It is a tablet, which can also be used as a lightweight laptop when you connect a keyboard. There are two interesting features here though. Firstly the hardware, which is not an x86 CPU, as in previous Surface Pro models. This one runs on an ARM SOC. And one that Microsoft developed in partnership with Qualcomm: the Microsoft SQ1. It is quite a high-end ARM CPU.

Secondly, there is the OS. Unlike earlier ARM-based devices, the Surface Pro X does not get a stripped-down version of Windows (previously known as Windows RT), where the desktop is very limited. No, this gets a full desktop. What’s more, Microsoft integrated an x86 emulator in the OS. Which means that it can not only run native ARM applications on the desktop, but also legacy x86 applications. So it should have the same level of compatibility as a regular x86-based Windows machine.

I suppose we can interpret this as a sign that Microsoft is still very serious about supporting the ARM architecture. I think that is interesting, because I’ve always liked the idea of having competition in terms of CPU architectures and instructionsets.

There are also other areas where Windows targets ARM. There is Windows 10 IoT Core. Microsoft supports a range of ARM-based devices here, including the Raspberry Pi and the DragonBoard. I have tried IoT Core on a Raspberry Pi 3B+, but was not very impressed. I want to use it as a cheap rendering device connected to a display. The RPi’s GPU is not supported by the drivers, so you get software rendering only. The DragonBoard however does have hardware rendering support, so I will be trying this out soon.

I ported my D3D11 engine to my Windows Phone (a Lumia 640) in the past, and that ran quite well. Developing for Windows 10 IoT is very similar, as it supports UWP applications. I dusted off my Windows Phone recently (I no longer use it, since support has been abandoned, and I switched to an Android phone for everyday use), and did some quick tests. Sadly Visual Studio 2019 does not appear to support Windows Phones for development anymore. But I reinstalled Visual Studio 2017, and that still worked. I can just connect the phone with a USB cable, and deploy debug builds directly from the IDE, and have remote debugging directly on the ARM device.

I expect the DragonBoard to be about the same in terms of usage and performance. Which should be interesting.

Posted in Direct3D, Hardware news, Software development, Software news | Tagged , , , , , , , | 5 Comments

Bitbucket ends support for Mercurial (Hg), a quick guide

I have been a long-time user of Bitbucket for my personal projects, over 10 years now, I believe. My preferred source control system has always been Mercurial (Hg), especially in those early days, when I found the tools for Git on Windows to be quite unstable and plagued with compatibility issues. Using TortoiseHg on Windows was very straightforward and reliable.

As a result, all of my repositories that I have created on Bitbucket over the years, have been Mercurial ones. However, Git appears to have won the battle in the end, and this has triggered Bitbucket to stop supporting Mercurial repositories. They will no longer allow you to create new Mercurial repositories starting February 1st 2020, and by June 2020, they will shut down Mercurial access altogether, and what’s worse: they will *delete* all your existing Mercurial repositories.

So basically you *have* to migrate your Mercurial repositories before June 1st 2020, or else you lose your code and history forever.

Now, given such a decision, it would have been nice if Bitbucket had offered an automatic migration service, but alas, there is no such thing. You need to manually convert your repositories. I suppose the most obvious choice is to migrate them to Git repositories on Bitbucket. That is what I have done. There are various ways to do it, and various sources that give you half-baked solutions using half-baked tools. So I thought I’d explain how I did it, and point out the issues I ran into.


First of all, we have to choose the tools we want to use for this migration. I believe the best tool for the job is the hg-git Mercurial plugin. It allows the hg commandline tool to access Git repositories, which means you can push your existing Hg repository directly to Git.

As I said, I use TortoiseHg, and they already include the hg-git plugin. You need to enable it though, by ticking its checkbox. Go to File->Settings, and enable it in the dialog:


Sadly, that turned out not to work very well. There is a problem with the distributed plugins, which causes the hg-git extension to crash with a strange ‘No module named selectors!’ message. This issue here discusses it, although there is no official fix yet. But if you scroll down, you do find a zip file with a fixed distribution of the hg-git plugin and its dependencies. Download that file, unzip it into the TortoiseHg folder (replacing the existing contents in the lib folder), and hg-git is ready for action!

On the Git-side, I use TortoiseGit. Git is a bit ‘special’ though. That is, most Git-tools do not include Git itself, but expect that you have a binary Git distribution (git.exe and supporting tools/libs) installed already. For Windows, there is Git for Windows to solve that dependency. Install that first, and then install TortoiseGit (or whatever tool you want to use. Or just use Git directly from the commandline). But first I want to mention an important ‘snag’ with Git on Windows (or any platform for that matter).

Git wants to convert line-endings when committing or checking out code. On UNIX-like systems, you generally don’t notice, because Git wants to default to having LF as the line-ending for all text-files in a repository. LF happens to be the default on UNIX-like systems, so effectively there is not usually any conversion going on. On Windows, CRLF is the default line-ending, so that would mean that text files on a Windows system are converted to LF when you commit them, and converted back to CRLF when you check out.

Git argues that this is better for cross-platform compatibility. Personally I think this is a bad idea. There are two reasons why:

  1. Just because you are using Git from a Windows system does not necessarily mean you are using CRLF line endings (you may be checking out code for a different platform, or you are using tools that do not use CRLF). Most Windows software is quite resilient to both types of line-endings, and various tools use LF endings even on Windows (like Doxygen for example, it generates HTML files with LF endings. Which is not a problem, because browsers can handle HTML with either line-ending). Some tools even require LF endings, else they do not work.
  2. Git cannot reliably detect whether a file is text or binary. This means that you have to add exceptions to a .gitattributes file to take care of any special cases. Which you usually find AFTER you’ve committed them, and they turn out to break something when someone else checks them out.

So I would personally suggest to not use the automatic conversion of line-endings. I prefer committing and checking out as-is. Just make sure you get the file right on the initial commit, and it will check out correctly on all systems.

When you install Git for Windows, make sure you select as-is on the following dialog during installation:


As it says, it is the core.autocrlf setting, in case you want to change it later.

I would argue that the whole CRLF/LF thing is a non-issue anyway. Most version control systems do not perform these conversions. In fact, oftentimes code is distributed as a tarball or a zip file. When you extract those, you don’t get any conversion either. But still you can compile that same code on various platforms without doing any conversion at all.

Using hg-git

The process I used to convert each repository is a simple 7-step process. I based it on the article you can find here, but modified it somewhat for this specific use-case.

1) Rename your existing repository in Bitbucket. I do this by adding ‘Hg’ to the end. So for example if I have “My Repository”, I change its name to “My Repository Hg”. This means that the URL will also change, so you can no longer accidentally clone from or commit to this repository from any working directories/tools. It also means that you can create the new Git repository under the same name as your original Hg repository.

2) Do a clean hg clone into a new directory. This makes sure you don’t run into any problems with your working directory having uncommitted changes, or perhaps a corrupted local history or such. You can just use TortoiseHg to create this new clone, or run hg from the commandline.

3) Create the new Git repository on Bitbucket, and grab its URL (eg https://(username)

Note that you can use either https or ssh access for Git. However, I have found it to be quite troublesome to get ssh set up and working under Windows, so I would recommend using https here.

4) Open a command prompt, go to the directory of your clean hg clone from step 2, and run the following command:

hg bookmark -r default master

This command is very important, because it links the ‘default’ branch of your Hg repository to the ‘master’ branch of your Git repository. In both environments they have a special status.

5) Now push your local Hg repository to the new Git repository using hg-git:

hg push https://(username)

At this point your new Git repository should be filled with a complete copy of your Hg repository, including all the history. You can now delete this local clone. If you have any working directories you want to change to Git, the final 3 steps will explain how to do that.

(Note: I am not entirely sure, but I believe hg-git will always do an as-is push to Git, so no changing of line-endings. I don’t think it even uses git.exe at all, so it probably will not respond to the Git for Windows configuration anyway. At any rate, it did an as-is push with my repositories, and I could not find any setting or documentation on hg-git for changing line-endings).

6) Analogous to step 2, perform a new git clone from the repository into a new directory, to make sure there can be no existing files or other repository state that may corrupt things.

7) Make sure your working directory is at the latest commit of the default branch of your Hg repository (you may need to modify the .hgrc file, because it will still point to the old URL of the Hg repository, and we have renamed it in step 1, so you need to set it to the correct URL). Since this step and the next one are somewhat risky, you might want to create a copy of your working directory first, so you can always go back to Hg if the transition to Git didn’t go right.

8) Remove the (hidden) .hg directory from the existing working directory of your old Hg repository, and copy the (hidden) .git directory from the directory in step 6. This will effectively unlink your working directory from Hg altogether, and link it to Git (also at the latest commit).

You can now delete the local clone from step 6.

Note that if you chose a different line-ending option than ‘as-is’, you may find that Git now shows a lot of changes in your files. And when you do a diff, you don’t see any changes. That’s because Git considers it a change when it has to modify the line-endings, but diff does not show line-endings as changes.

Posted in Software development | Tagged , , , , , , , , , | Leave a comment

Am I a software architect?

In the previous article, I tried to describe how I see a software architect, and how he or she would operate within a team or organization. One topic I also wanted to discuss is the type of work a software architect would actually do. However, I decided to save that for a later article, so that is where we are today.

As you could read in the previous article, ‘software architecture’ is quite vague and ‘meta’.  Software architecture happens at a high-level of abstraction by nature, so trying to describe it will always remain vague. And that’s not just me. I followed a Software Architecture course at university, and it was just as vague. They made you aware of this however. The course concentrated only on a case study where requirements and constraints had to be extracted from the input of various parties, and had to be translated into an architecture document (various diagrams and explanations at various levels, some aimed at the customer, some at end-users, some at developers etc. Also various usage scenarios and a risk analysis/evaluation of the architecture). No actual programming was involved.

That is somewhat artificial of course. The architect’s work is rarely that cut-and-dry in practice. Firstly, not all software can just be designed ‘on paper’ like that. Or at least, the chances of getting it right the first time would be slim. I find that often when I need to design new software, I am introduced to various technologies that I have not used before, and therefore I do not know how they would behave in an architecture beforehand. To give a simple example: if you were to design a C# application, you might make a lot of use of multithreading and Tasks (thread pooling). But if you were to design a browser-based JavaScript application, the usage of threading is very limited, so you would choose a different route for your design.

Secondly, in practice, especially these days with Agile, DevOps, cross-functional teams and all that, the architect is generally also a team member, and will also participate in development, as I’ve already mentioned in the previous article.

So let’s make all this a little more practical. The guideline I use is that the architect “solves problems”. And I mean that quite literally. That is, the architect takes the high-level problem, and translates it to a working solution. My criterion here is that the problem is “solved”, as in: once the code has been written, the problem is no longer a problem. There is now a library, framework or toolset that handles the problem in a straightforward way (I’m sure most developers have found those ‘headache’ products that just keep generating new bugs, no matter how much you try to fix them, and never performed satisfactorily to begin with. A real architect delivers problem-free products).

That does not necessarily mean that the architect writes the actual code. It means that the architect reduces the high-level abstraction and complexity, and translates it to the level required for the development team to build the solution. What level that is exactly, depends on the capabilities of the team. Worst-case, the architect will actually have to write some code entirely by himself.

But if you think about it that way, as a developer you are using such libraries, frameworks and toolsets all the time. You don’t write your own OS, compiler, database, web browser etc. Other people have solved these problems for you. Problems that may be too difficult for the average developer to solve by themselves anyway. Each of these topics requires highly skilled developers. And even then, these developers may be skilled in only one or perhaps a few of these topics at best. That is, it’s unlikely for an OS expert to also be an expert at database, compiler, browser etc technology. People tend to specialize in a few topics, because there’s just not enough time to master everything.

So that is how I see the role of architects in practice: these are the experts who create the ‘building blocks’ for the topics your company specializes in.

Perhaps it is interesting to take some topics that I have discussed earlier on this blog. If you look at my retro programming, you will find that quite a bit of it has the characteristics of research and development. That is, a lot of it is ‘off the beaten path’, and is about things that are not done often, or perhaps have not been done at all before.

The goal is to study these things, and get them under control. Beat them into submission, so to say. Take my music player for example. It started as the following problem:

“Play back VGM files for an SN76489 sound chip on a PC”

You can start with the low-hanging fruit, by just taking simple VGM files with 50 Hz or 60 Hz updates, and focus on reasonably powerful PCs. Another developer had actually developed a VGM player like that, but ran into problems with VGM files taken from certain SEGA games, which used samples:

The problem with VGM files is that each sample is preceded by a delay value, so it doesn’t appear to be a fixed rate. And in fact, the sample could be optimized, so that the rate is indeed not fixed at all. That is, the SN76489 will play a given sample indefinitely. So if your sample data contains two or more samples of the same value, it can be optimized by just playing the first one, and adding the delays. Various VGM tools will optimize the data in this way.

The other guy had trouble getting the samples to play properly at all, even on a faster system. VGM allows for 44.1 kHz data maximum, so the straightforward way would be to just fire a timer interrupt at 44.1 kHz, and perform some internal bookkeeping to handle the VGM data. If your machine is fast enough (fast 386/486 at least), it can work, but it is very bruteforce. Especially when compared to the Sega Master System the game came from. That one only has a Z80 CPU at 4 MHz.

So I thought there was a nice challenge: if a Z80 at 4 MHz can play this, then so should an 8088 at 4.77 MHz. And the machine shouldn’t just be able to play the data by hogging the CPU. It should actually be able to play the data in the ‘background’, so using interrupts.

That is the problem I tried to solve, and as you see, I also drew some diagrams to try and explain the concept at a higher level. Once I had the concept worked out, I could just write the preprocessor and player, and the problem was solved.

I later re-used the same idea for a MIDI player. Since the basic problem of timing and interrupts was already solved, it was relatively easy to just write an alternative preprocessor that interpreted MIDI data instead of VGM data. Likewise, modifying the player for MIDI data instead of SN76489 was also trivial.

And these players also made use of two other problems that had been solved previously. The first is the auto-end-of-interrupt feature of the 8259 chip, which I discussed earlier. Because I solved that problem by creating easy-to-use library functions and macros, it was trivial to add them to any program, such as this player.

The second one is the disk streaming, which I also discussed earlier. Again, since the main logic of using a 64k ringbuffer and firing off 32k reads with DMA in the background was solved, it was relatively easy to add it to a VGM or MIDI player. Which in turn worked around the memory limitations (neither VGM nor MIDI data is very compact, and more complex files can easily go beyond 640k).

So, I think to sum up, here are a few characteristics that apply to a software architect:

  • Able to extract requirements and constraints from the stakeholders’ wishes/descriptions
  • Able to determine risks and limitations
  • Able to translate the problem into a working solution
  • Able to find the best/fastest/most optimal/elegant solution
  • Able to explain the problem and solution to the various stakeholders at the various levels of abstraction/knowledge required
  • Able to explain the solution to other developers so that they can build it
  • Able to actually ‘solve’ the problem by creating a library/framework/toolset/etc that is easy to (re)use for other developers


Here’s another thing I’d like to add to the previous article. I talked about how a software architect would work together with project managers, developers and other departments within the organization. I would like to stress how important it is that you can actually work together. I have been in situations where I may have had the title of architect on paper, and I was expected to be responsible for various things. But the company had a culture where managers decided everything, often even without informing, let alone consulting me.

If you ever find yourself in such a situation, then RUN. The title of architect is completely meaningless if you do not actually have a say in anything. How can you be responsible for the development of software when you have no control over the terms under which that software is to be developed? And of course, managers never understand (or at least won’t admit) that it’s often their decisions that leads to projects not meeting their targets.

For example, if you take some of the things described above, such as getting requirements/constraints and doing risk assessment. These things take time, and time has to be allocated in the life cycle of the project to perform these tasks. If a project manager can just decide that you do not get any time to prepare, and you just have to start developing right away, because they already set a deadline with the client, there’s not much you can do. Except RUN.

Posted in Software development | Tagged , , , , , | Leave a comment

Who is a software architect? What is software architecture?

After the series of articles I did on software development a while ago, I figured that the term ‘Software Architect’ is worth some additional discussion. I have argued that Software Engineering may mean different things to different people, and the same goes for Software Architecture. There does not appear to be a single definition of what Software Architecture is, or what a software architect does and does not do, what kind of abilities, experience and responsibilities a software architect should have.

In my earlier article, I described it as follows:

Software Engineering seemed like a good idea at the time, and the analogy was further extended to Software Architecture around the 1990s, by first designing a high-level abstraction of the complex system, trying to reduce the complexity and risks, and improving quality by dealing with the most important design choices and requirements first.

I stand by that description, but it already shows that it’s quite difficult to determine where Software Engineering ends and Software Architecture starts, as I described architecture ans an extension of engineering. So where exactly does engineering stop, and does architecture start?

I think that in practice, it is impossible to tell. Also, I think it is completely irrelevant. I will try to explain why.

Two types of architects

If you take my above description literally, you might think that software architects and software engineers are different people, and their workload is mutually exclusive. That is not the way I meant it though. I would argue that the architecture is indeed an abstraction of the system, so it is not the actual code. Instead it is a set of documents, diagrams etc, describing the various aspects of the system, and the philosophies and choices behind it (and importantly, also an evaluation of possible alternatives and motivation why these were not chosen).

So I would say there is a clear separation between architecture and engineering: architecture stops where the implementation starts. In the ideal case at least. In practice you may need to re-think the architecture at a certain phase during implementation, or even afterwards (refactoring).

That however does not necessarily imply that there is a clear separation between personnel. That is, the people specifying the architecture are not necessarily mutually exclusive with the people implementing it.

The way I see it, there are two types of architects:

  1. The architect who designs the system upfront, documents it, and then passes on the designs to the team of engineers that build it.
  2. The architect who works as an active member of the team of engineers building the system

Architect type 1

The first type of architect is a big risk in various ways, in my opinion. When you design a system upfront, can you really think of everything, and get it right the first time? I don’t think you can. Inevitably, you will find things during the implementation phase that may have to be done differently.

When the architect is not part of the actual team, it is more difficult to give feedback on the architecture. There is the risk of the architect appearing to be in an ‘ivory tower’.

Another thing is that when the architect only creates designs, and never actually writes code, how good can that architect actually be at writing code? It could well be that his or her knowledge of actual software engineering is quite superficial, and not based on a lot of hands-on experience. This might result in the architect reading about the latest-and-greatest technologies, but only having superficial understanding, and wanting to apply these technologies without fully understanding the implications. This is especially dangerous since usually new technologies are launched with a ‘marketing campaign’, mainly focusing on all the new possibilities, and not looking at any risks or drawbacks.

Therefore it is important for an architect to be critical of any technology, and to be able to take any information with a healthy helping of salt, cutting through all the overly positive and biased blurb, and understanding what this new technology really means in the real world, and more importantly, what it doesn’t.

The superficial knowledge in general might also lead to inferior designs, because the architect cannot think through problems down to the implementation detail. They may have superficial knowledge of architectural and design patterns, and they may be able to ‘connect the dots’ to create an abstraction of a complete system, but it may not necessarily be very good.

Architect type 2

The second type is the one I have always associated with the term ‘Software Architect’. I more or less see an architect as a level above ‘Senior Software Engineer’. So not just a very good, very experienced engineer, but one of those engineers with ‘guru’ status: the ones you always go to when you have difficult questions, and they always come up with the answers.

I don’t think just any experienced software engineer can be a good architect. You do need to be able to think at an abstract level, which is not something you can just develop by experience. I think it is more the other way around: if you have that capability of abstract thought, you will develop yourself as a software engineer to that ‘guru’ level automatically, because you see the abstractions, generalizations, connections, patterns and such, in the code you write, and the documentation you study.

I think it is important that the architect works with the team on implementing the system. This makes it easier for team members to approach him with questions or doubts about the design. It also makes it easier for the architect to see how the design is working in practice, and new insights might arise during development, to modify and improve the design.

Once an architect stops working hands-on with the team, he will get out of touch with the real world and eventually turn into architect type 1.

I suppose this type of architect brings us back to the earlier ’10x programmers’ as I mentioned in an earlier blog. I think good architects are reasonably rare, just like 10x programmers. I am not entirely sure to what extent 10x programmers are also architects, but I do think there’s somewhat of an overlap. I don’t think that overlap is 100% though. That is, as mentioned earlier, there is more to being a software architect than just being a good software engineer. There are also various other skills involved.

The role of an architect

If it is not even clear what an architect really is or isn’t, then it is often even less clear what his role is in an organization. I believe this is something that many organizations struggle with, especially ones that try to work according to the Scrum methodology. Allow me to give my view on how an architect can function best within an organization.

In my view, an architect should not work on a project-basis. He should have a more ‘holistic’ view. On the technical side, most organizations will have various projects that share similar technology. An architect can function as the linking pin between these projects, and make sure that knowledge is shared throughout the organization, and that code and designs can be re-used where possible.

At the management level, it is also beneficial to have a ‘linking pin’ from the technical side, someone who oversees the various projects from a technical level. Someone who knows what kind of technologies there are available in-house, and who has knowledge of or experience in certain fields.

Namely, one of the first things in a new projects is (or should be) to assess the risks, cost and duration. The architect will be able to give a good idea of technology that is already available for re-use, as well as the people best suited for the project. Building the right team is essential for the success of a project. There is also a big correlation between the team and the risks, cost and duration. A more experienced team will need less time to complete the same tasks, and may be able to do it better, therefore with less risk, than a team of people who are inexperienced with the specific technology.

Since architects are scarce resources, it would not be very effective to make architects work on one project full-time. Their unique skills and knowledge will not be required on a single project all the time. At the same time, their unique skills and knowledge can also not be applied to other projects where they may be required. So it is a waste to make an architect work as a ‘regular’ software engineer full-time.

With project managers it seems more common that they only work on a project for a part of their time, and that they run multiple projects at a time. They balance their time between projects on an as-needed basis. For example, at the start of a new project, there may be times where 100% of their time is spent on a single project, but once the project is underway, sometimes a weekly meeting can be enough to remain up-to-date. I think an architect should be employed in much the same way. Their role inside a project is very similar, the main difference being that a project manager focuses on the business side of things, where the architect focuses on the technical side of things.

This is where the lead developer comes in. The lead developer will normally be the most experienced developer in a team. It is the task of the lead developer to make sure the team implements the design as specified by the architect. So in a way the lead developer is the right-hand man (or woman) of the architect in the project, taking care of day-to-day business, in the technical realm.

The project manager will need to work both with the lead developer and the architect. As long as the project is on track, it will be mainly the project manager being informed by the lead developer of day-to-day progress. But whenever problems arise, the current project planning may no longer be valid or viable. In that case, the architect should be an ‘inhouse consultant’, where the project manager and lead developer ask the architect to analyse the problems, and to determine the next course of action. Was the planning too optimistic, and should milestones be moved further into the future, at a more realistic pace? Did the team get stuck on a programming problem that the architect or perhaps some other expert can assist them with? Does the design not work as intended in practice, and does the architect need to work with the team to modify the design and rework the existing code?

The holistic position of the architect also allows him to look at other development-related areas, such as coding standards, build tools etc. And the architect can keep up with the state of technology from research or competing companies, and help plan the strategy of a product line. Likewise, the architect can always be consulted during the sales process. Firstly to answer technical questions about existing products. Secondly, when a potential customer wants certain functionality that is not yet available, the architect can give an expert opinion on how feasible/costly it will be to implement that functionality, and what kind of resources/personnel it would take. In the more complex/risky cases, the architect might spend a few weeks on a feasibility study, possibly developing a proof-of-concept/prototype in the process.

Finally, I would like to reference a few resources that are somewhat related. Firstly, here is an interesting article discussing strategy:

As I argued in my earlier blogs on Agile/Scrum… these methods are often not used correctly in practice. One of the reasons is because people try to predict things too far into the future. This article describes a very nice approach to planning that is not focused on timelines/deadlines/milestones as much, but on current, near term and future time horizons, where the scope is different for each term. The further things are away, the less detail you have. Which is good, because you can’t predict in detail anyway.

The second one is the topic of Software Product Lines:

It is a great way to look at software development. This goes well beyond just ‘design for change’, and the idea and analogy to product lines such as in (car) manufacturing may make the way of thinking more concrete than just ‘design for change’.

Posted in Software development | Tagged , , , , , , , | 1 Comment

AMD Bulldozer: It’s time to settle

As you may remember, AMD’s Bulldozer has always been somewhat controversial, for various reasons. One thing in particular was that AMD claimed it was the ‘first native 8-core desktop processor’. This led to a class-action lawsuit, because consumers thought this was deceptive advertising.

I think they have a point there. Namely, if Bulldozer was just like any other 8-core CPU out there, why would AMD spend all this time talking about their CMT architecture, modules and such? Clearly these are not just regular 8-cores.

AMD argued that the majority of consumers would have the same understanding of ‘core’ as AMD does in their CMT-marketing. The judge basically said: “Well, we’d have to see about that”. This led to AMD wanting to settle, because AMD probably figures that the majority of consumers would NOT have the same understanding, if people would actually investigate, and do a survey among consumers.

Which makes sense, because AMD is still a minor player in all this. Intel is the market leader, and they always marketed their SMT/HyperThreading CPUs as having ‘logical cores’ vs ‘physical cores’. The first Pentiums with HT were marketed as having a single physical core, and two logical cores. That is the standard that was set in the x86 world, which consumers would be familiar with. Intel has always stuck by that. The first Core i7s were marketed as having 4 physical cores and 8 logical cores (or alternatively 4 cores/8 threads). And AMD shot themselves in the foot here… With their marketing of CMT they are clearly implying that their CMT should be seen as more or less the same thing as SMT/HyperThreading. In fact, AMD actually argued that the OS needs a CMT-aware scheduler. Apparently a regular scheduler for a regular 8-core CPU didn’t work as expected.

So, the bottom line is that Bulldozer does not perform as you would expect from a regular 8-core CPU. And there’s enough of AMD’s marketing material around that shows that AMD knows this is the case, and that they felt there is a need to explain this, and also find excuses why performance may not meet expectations.

But you already know my opinion on the matter. I’ve written a number of articles on AMD’s Bulldozer and CMT back in the day, and I’ve always argued that it’s like a “poor man’s HyperThreading”:

This shows that HyperThreading seems to be a better approach than Bulldozer’s modules. Instead of trying to cram 8 cores onto a die, and removing execution units, Intel concentrates on making only 4 fast cores. This gives them the big advantage in single-threaded performance, while still having a relatively small die. The HyperThreading logic is a relatively compact add-on, much smaller than 4 extra cores (although it is disabled on the 2500, the HT logic is already present, the chip is identical to the 2600). The performance gain from these 8 logical cores is good enough to still be ahead of Bulldozer in most multithreaded applications. So it’s the best of both worlds. It also means that Intel can actually put 6 cores into about the same space as AMD’s 8 cores.

So here the difference between CMT and SMT becomes quite clear: With single-threading, each thread has more ALUs with SMT than with CMT. With multithreading, each thread has less ALUs (effectively) than CMT.

And that’s why SMT works, and CMT doesn’t: AMD’s previous CPUs also had 3 ALUs per thread. But in order to reduce the size of the modules, AMD chose to use only 2 ALUs per thread now. It is a case of cutting off one’s nose to spite their face: CMT is struggling in single-threaded scenario’s, compared to both the previous-generation Opterons and the Xeons.

At the same time, CMT is not actually saving a lot of die-space: There are 4 ALUs in a module in total. Yes, obviously, when you have more resources for two threads inside a module, and the single-threaded performance is poor anyway, one would expect it to scale better than SMT.

But what does CMT bring, effectively? Nothing. Their chips are much larger than the competition’s, or even their own previous generation. And since the Xeon is so much better with single-threaded performance, it can stay ahead in heavy multithreaded scenario’s, despite the fact that SMT does not scale as well as CMT or SMP. But the real advantage that SMT brings is that it is a very efficient solution: it takes up very little die-space. Intel could do the same as AMD does, and put two dies in a single package. But that would result in a chip with 12 cores, running 24 threads, and it would absolutely devour AMD’s CMT in terms of performance.

Or perhaps an analogy can make it more clear. Both SMT and CMT partially share some resources between multiple ‘cores’ as they are reported to the OS. As I said, Intel calls them ‘logical’ cores, but you can also see them as ‘virtual cores’.

The analogy then is virtual machines: you can take a physical machine, and use virtualization hardware and software to run multiple virtual machines on that single physical machine. Now, if you were to pay for two physical servers, and you were actually given a single physical server, with two virtual machine instances, wouldn’t you feel deceived? Yes, to the software it *appears* like two physical servers. But performance-wise it’s not entirely the same. The two virtual machines will be fighting over the physical resources that are shared, which they would not if they were two physical machines. That’s pretty much the situation here.

All I can say is that it’s a shame that Bulldozer is still haunting AMD now that they are back on track with Zen, which is a proper CPU with a proper SMT implementation, and they no longer need to market their CPUs as making it sound like they have more physical cores than they actually do.

Posted in Hardware news | Tagged , , , , , , , , , , , | Leave a comment

Steinberger guitars and basses: less is more

Time for a somewhat unusual blogpost. As you may know, aside from playing with software and hardware, I also dabble in music, mostly with (electric) guitars. I recently bought a new guitar, a Steinberger GT-PRO “Quilt Top” Deluxe in Wine Red:

There is quite a story behind this. I think there are various parallels with my usual blogs, such as this guitar design dating from the 1980s, and a lot of engineering and optimization went into this design.

For those of you who aren’t familiar with guitars, I have to say that the guitar world is generally quite conservative. Revolutionary guitar designs do not come along every year, or in fact not even every decade. I think roughly speaking, there are only a few major revolutions in the history of the guitar:

  1. The move from using ‘catgut’ strings (actually nylon on modern guitars) to using steel strings. The ‘original’ guitar, commonly known as classical or Spanish guitar, used strings made from animal intestines, as did most other classical stringed guitars (violin, cello, double bass etc). At some point in the early 1900s, a new type of guitar was developed by C.F. Martin & Company, designed for steel strings instead. These are also known as ‘western guitars’. They have a very different shape of the body and neck. The body is generally larger, which together with the steel strings, allows for more volume. The neck is longer and thinner than a classical guitar.
  2. The development of electric amplification in 1931. Now that many guitars had steel strings, it was possible to develop a ‘pickup’ module that uses electromagnetic induction, which can be placed under the strings, to convert the vibration of the strings to an electric signal, which can be sent to an amplifier and speaker. These pickups are simple passive circuits, with just magnets and a coil of copper wire. They are still used today in pretty much the same form.
  3. The ‘solidbody’ electric guitar, developed by Les Paul in 1940. As guitars became louder because of amplification, feedback became an issue: the acoustic body and strings of the guitar would resonate uncontrollably from the vibrations generated by the amplifier and speaker, which generated a feedback loop. Les Paul solved this by using a solid piece of wood for the guitar body. The body had lost its function as an acoustic amplifier anyway, now that there was electric amplification. A solid piece of wood was much more resistant to feedback.
  4. The Stratocaster guitar, developed by Leo Fender in 1954. It had many interesting ideas, including a body shape that no longer resembled a traditional guitar, but was contoured for ergonomic purposes. It also introduced a new type of ‘tremolo’ system. I will get into more detail on this guitar later.

And I think that is pretty much it. At least, as far as complete guitars go. There have been small innovations on certain parts of the guitar, but generally these are considered optional extras or aftermarket upgrades, and do not significantly alter the guitar as we know it. In fact, the Fender Stratocaster remains one of the most popular guitars to this day, as does the Gibson Les Paul (originally from 1952, but the models with humbucking pickups and sunburst finish made between 1958 and 1960 became the archetypal model), and most players still use these guitars in more or less the same form as they were originally launched in the 1950s. Most other guitars are also just slight variations on these original guitars, and are mainly different in shape or choice of woods, but do not differ significantly in terms of engineering.

Enter the 1980s

In the late 1970s and early 1980s, there was a big revolution in rock/metal. Namely, we entered the era of the guitar hero, ushered in by Eddie van Halen. What made Eddie van Halen somewhat unique is that he built his own guitar from a combination of aftermarket parts and parts he ‘borrowed’ from other guitars. He used a Gibson humbucker pickup and put it into a Strat-style guitar. He also was a very early adopter of the new Floyd Rose double-locking tremolo:

See the source image


His guitar became known as the Frankenstrat. It became the template of the ‘Superstrat’ guitar, and various guitar companies, including Ibanez, Jackson, Charvel and ESP, would jump into this market by offering Superstrat-style guitars straight from the factory.

This was also the time in which I grew up, and this type of virtuoso playing was what attracted me most. As a result, a Superstrat guitar seemed like the ultimate guitar for me, because it basically offered you the best engineering and the most features. You got the best tremolo systems, the most advanced pickup configurations, guitar necks that were designed for optimum playability (generally thin, wide necks with a relatively flat profile and very large, aka ‘jumbo’ frets), combined with the ergonomics of the Stratocaster body design. I have always been attracted to clever engineering and optimized designs, in any area of life.

Getting back to the Stratocaster

What made these Frankenstrats and Superstrats possible, is the vision of Leo Fender, I would say. Above I said the Stratocaster was one of the big revolutions in terms of guitar design. I have to add some nuance to that statement. Namely, Leo Fender designed another guitar before that, in 1950. The first model was a single-pickup one, known as the ‘Esquire’. He then introduced a two-pickup model, which was initially known as the ‘Broadcaster’. However, because the company Gretsch already sold a drum kit under the name of ‘Broadkaster’, Fender decided to change the name to ‘Telecaster’. This model is still sold today. It can be seen as the forerunner of the Stratocaster, but guitarists also liked the original, so it remains a popular guitar in its own right. Fender also designed the Fender Precision bass before the Stratocaster, where he introduced the contoured body shape.

The Esquire/Broadcaster/Telecaster was one of the first solidbody electric guitars on the market. What made Fender somewhat unique is that Leo Fender was not a guitarist, or even a musician at all. He started out with an electronics repair shop, and then focused on designing and building amplifiers and pickups for electric guitars and basses. So Leo Fender was an engineer, not a guitarist, not a musician, and certainly not a luthier.

When he decided to build guitars, he also approached it like an engineer: he wanted the guitars to be cheap and easy to mass-produce, durable, easy to fine-tune, and easy to repair. This led him to make certain design choices that conventional guitar builders might never have made. One example is that he chose to screw the neck to the body. Another example is in his choice of woods, which were very hard woods. They were very durable, relatively light, and easy to work on. But they delivered a tone that was not necessarily very conventional. It was a very bright and jangly tone. But certain artists, especially country artists, loved this new sound, because it would cut through very well with fast lead playing.

The Telecaster introduced the wood types, the bolt-on neck, and had three adjustable saddles for setting intonation and string height for each pair of strings. It had a cutaway on the body to access the high frets on the neck. And it had two pickups, with a volume knob, tone knob, and a pickup selector switch for easy access below the bridge.

The Precision bass introduced the ergonomic ‘contoured’ body shape. It added a second cutaway above the neck, for even better access to the high frets.

The Stratocaster took these ideas further. It perfected the adjustable bridge by having individual saddles for all 6 strings. It added a third pickup and a second tone knob. It introduced a contoured guitar body, inspired on the Precision bass. And it introduced the new ‘tremolo’ system.

Now, ‘tremolo’ needs some explanation: What Leo Fender called a ‘tremolo’, was actually a ‘vibrato’ system (and funny enough, some of his amps had a tremolo effect, which he called ‘vibro’). Namely, tremolo is a fluctuation in volume. The ‘tremolo’ system on a Strat does not do that. It allows you to lower or raise the pitch, so you can perform vibrato effects, or portamento (pitch slides). But somehow, the name stuck, so even today, most guitarists and guitar builders refer to Strat-like vibrato systems as ‘tremolo’.

Another typical feature of the Stratocaster construction is that the body is a ‘frontloader’. That is, the cavities for all the electronics are routed out from the front of the body. These cavities are then covered up by a large plastic scratchplate which covers most of the body.

Customization through mass production

I think it’s safe to say that the simple mass-production design of the Fender guitars is what gave rise to aftermarket parts. It is very easy to replace a bridge, a neck, the electronics, or even to perform some custom routing. The scratchplate will cover it up anyway, so nobody will notice if the routing is a bit sloppy.

That is how Eddie van Halen could build his own Frankenstrat. And how many other players did very similar things to their guitars. Especially Floyd Rose systems were installed by many guitarists on their Strats and similar guitars in the 1980s.

But, the Floyd Rose is still a part designed to be fitted to a standard Strat-style design with little modification. It still makes use of the same idea of a sustain block under the bridge, which is held under tension by three springs and an adjustable ‘claw’:

This system is rather difficult to adjust, and the large springs also have an additional problem: They are susceptible to vibrations and give a sort of ‘reverberation’ effect. When you hit the strings hard, the springs will start to vibrate along with the bridge and body. When you then mute the strings, you hear the springs ‘echo’. This in turn can be picked up again by the strings and pickups, so the ‘echo’ can also be heard in the sound coming from the amp.

Another thing is that you still have the regular tuners on the guitar, which you need to put the strings on the guitar, and perform the initial tuning, before clamping down the strings with the locking nut. At that point, there are mostly ‘vestigial’. The bridge only has fine-tuners. Over time, the strings may go out of tune to the point where they are beyond the limited reach of the fine-tuners, and you need to unlock the nut, use the regular tuners, and then lock again, and fine-tune.

A third issue with a Floyd Rose is that you can no longer adjust the height of each individual saddle. You can only lower or raise the bridge as a whole, but the relative saddle heights are basically fixed.

So, while the Floyd Rose is an improvement over the traditional Strat tremolo in terms of tuning stability, it is basically just that: an improvement on a design dating back to the 1950s, which has to work inside the limitations of the original Strat design to an extent, because it is meant to be an aftermarket part, which can be installed on existing guitars.

Enter Ned Steinberger

So, what you should take away from Leo Fender and his Stratocaster is that he basically ‘started fresh’, with no preconceived ideas of what materials to use, or what kind of design and construction method. The result was a guitar that had a very unique look, sound and identity, and also pushed playability and versatility to new heights.

From then on, most guitars have been more less evolutionary in nature: taking existing guitar designs such as the Stratocaster as the basis, and changing/improving certain details. The Floyd Rose tremolo system is arguably the largest improvement in that sense.

But then came Ned Steinberger. He basically did the same as what Leo Fender did 30 years earlier: He designed basses and guitars without any preconceived ideas, just trying to find the best materials available, and trying to engineer new solutions to create the best instruments possible.

Ned Steinberger was also not a luthier by trade, he originally designed furniture. And as far as I know, like Leo Fender,  Ned did not actually play guitar or bass himself. Interesting fact is that his father is Jack Steinberger, a Nobel-prize winning physicist. So he grew up in a household of science.

Steinberger started on designing basses. He wanted to create a bass that was as light and compact as possible, while also being very durable. This led him to use modern composite materials such as carbon fiber and graphite, rather than wood. Another defining feature of the Steinberger instruments is the headless design. By using strings with ball-ends at both sides, rather than only at the bridge, there is no longer any need for conventional tuning pegs, and the big headstock that they are mounted on. The tuners can be moved to the bridge instead. Unlike the Floyd Rose, these are not fine-tuners, but full-range tuners, made possible with very fine-threaded 40:1 ratio. Also, Steinberger did not sacrifice per-string adjustment of string height or intonation.

Image result for Steinberger L bass

Another defining feature is the active electronics. Where conventional pickups for guitar and bass are passive coils and magnets, active pickups use an on-board amplifier, powered by a battery (usually a 9v block). Because of the amplifier, the signal from the pickup itself does not have to be that strong. This means that pickups can be designed with smaller magnets and different coils. The result is that there is less magnetic pull on the strings, allowing the strings to vibrate more freely and more naturally, resulting in a more natural tone with better sustain. The on-board active electronics also allow for additional filtering and tone shaping. The electronics for Steinberger were designed and manufactured by EMG.

The body is actually a hollow ‘bathtub’ created from carbon fiber, where the top is like a ‘lid’. This allows plenty of space to house all the electronics. And unlike wood, there is no problem with feedback.

Another detail is the use of a zero-fret, rather than a conventional nut. This means there is no special setup required for any specific string gauge or desired string height (‘action’) for the player. The ‘special case’ of the nut is eliminated.

You could say that everything about this bass guitar has a ‘less is more’ approach. The neck and body are reduced to little more than the absolute minimum for the player to hold and play the instrument.

And then, the Steinberger guitar

Once Steinberger introduced their bass, and it became quite popular with players, the next logical step was to create a sister-instrument in guitar form. It looks quite similar to the bass, and it is very similar in terms of design and construction, except for one important detail:

The bass did not have a tremolo system, because these are very uncommon on basses, and are not very practical anyway, given the big, heavy and long strings. However, a guitar had to have one, Ned Steinberger must have thought. He may have seen this as an interesting engineering challenge.

And boy did Steinberger live up to that challenge! Because of his basic design with the double-ball strings and the zero-fret, there was no need for any additional locking of the strings. This basic design was effectively already equivalent to a Floyd Rose, but with the added benefit of having full-range tuners, instead of just fine-tuners. And the basic design of the bridge could easily be made to rotate around a pivot point, like a Strat-style tremolo does, without sacrificing the adjustability of the saddles.

Also, Steinberger could simply add one big spring to the metal base of the tremolo, instead of the multiple springs and the claw found underneath a Strat-style guitar. This got rid of the ‘echo’ problem of the springs. Also, he added a simple adjuster knob at the back, so you could adjust the spring to make sure the tremolo was in tune in its center position. Much easier than the claw and screws on a Strat.

But Steinberger did not just stop there. A characteristic of any traditional tremolo/vibrato system is that all the strings receive the same amount of movement. This however does not affect all strings equally. The thicker, lower pitched strings will have a much bigger drop or rise in pitch than the thinner, higher pitched strings.

While this is good enough for small changes in pitch on chords (a bit of vibrato), or changes of pitch on single notes, anything more advanced will go out of tune. This has only very limited musical application.

So, Steinberger thought: Challenge accepted! And he came up with what he called the TransTrem. As far as I know, it is still the only system of its kind. The ‘Trans’ in the name is short for ‘transpose’. Steinberger pulled it off to make each individual saddle on the bridge move at its own calibrated rate. This allowed him to make each string change pitch at the same rate. Which means that all 6 strings can remain in tune while operating the tremolo. From there, it was a relatively small step to make the tremolo ‘lock’ in a few ‘preset’ positions. This means you could transpose the entire tuning of the guitar up or down by a few steps (E is standard, you can go down to D, C and B, and up to F# and G), and remain in tune while doing so! This means you can change tuning on the fly while playing a song.

Eddie van Halen was an early adopter of the Steinberger guitar, and he composed a few songs that made use of this special TransTrem feature. The song ‘Summer Nights’ is a good example:

And as you can see, the guitar stays in tune during all of this, and sounds good in all tunings and during all trickery that Eddie van Halen pulls off.

Here is also a nice documentary on Steinberger, where Ned talks about how he came to the headless design, and his thoughts about ergonomics. It also gives some nice insights into the factory itself:

Fast-forward: Present day

I don’t suppose Steinberger ever became such a household name as Fender and its Stratocaster. So what happened? Steinberger’s guitars and basses were certainly ‘space age’ technology at the time. Back in the 1980s, people loved futuristic stuff, and initially Steinberger couldn’t make enough of them. By 1987, Ned sold the company to Gibson, one of the largest, and ironically enough, one of the more traditional guitar companies.

But by the 1990s, the futuristic thing got out of style, and I suppose Steinberger went along with it. Perhaps the instruments were too far ahead of their time? Ned Steinberger had moved on to a new instrument company, called NS Design. Somewhere halfway the 1990s, Gibson stopped producing the original Steinberger guitars.

From then on, the Steinberger name would resurface every now and then, usually on cheaper Asian-made guitars. The guitar I bought is a ‘Spirit by Steinberger’, and is made in China. This Spirit-model has been in and out of production a few times. When they were launched initially, I saw one at a shop, and tried it out. I always liked the original Steinbergers, so I liked the Spirit, because it looked and felt like those original guitars. It just was a lot cheaper. But I never bought it.

Recently, Gibson has relaunched the Spirit guitars again, so I figured I’d not miss out this time, and I bought one right away. Where the original guitars were very much high-end guitars, these ones are clearly budget guitars, and it seems that their price and their compact form are the main selling points. It is marketed as a travel guitar, more or less. There also seems to be some renewed interest in headless guitars in general, where some high-end brands such as Strandberg and Kiesel make headless guitars, albeit with ‘traditional’ body shapes. Perhaps that has something to do with the Spirit re-entering production.

My Spirit GT-Pro Deluxe differs from the original in various ways:

  1. It does not have the TransTrem, but a cheaper R-Trem, which does not have the transposing capability, but is otherwise more or less equivalent to a Floyd Rose style tremolo.
  2. It is made of wood, rather than composite materials.
  3. It has passive electronics rather than active.

Not having the TransTrem is a bit of a shame, but not surprising, given that it was a very complex piece of hardware. The R-Trem works quite well in practice, and is much easier to tune and set up than a Floyd Rose is.

The fact that it is made of wood is also interesting. Namely, there is always a considerable discussion about choices of tone wood and construction in guitars. This guitar has considerably less wood than any conventional guitar does. Yet, it sounds remarkably conventional. Also, it does not seem to suffer all that much in terms of sustain. So perhaps this guitar is just making a mockery out of conventional wisdom in terms of guitar building?

The passive electronics are made by Steinberger themselves, which is a bit of a shame in the sense that I have no idea what they are, so I have no frame of reference whatsoever. Combined with the unconventional guitar body, choice of woods and construction, I basically cannot say anything about what makes the guitar sound the way it does.

All I can say is that I quite like how the overall package sounds. The pickups are not very loud, but I’m not sure how much of that is due to the fact that there’s such a small guitar body, and how much of that is due to the pickup design. The guitar seems to have a very ‘clean’ and ‘neutral’ sound to it. It is something I more or less associate with active EMG pickups, but again, I don’t know how much of this comes from the tiny body, and how much comes from deliberately designing the passive pickups to sound like the original Steinberger with EMGs. What I can say however is that the guitar seems to have a very nice top-end. Pinch harmonics sound really powerful and sustain well on the high B and E strings. The guitar body and neck are mostly made of maple, and that is something I would associate with maple.

The quality of the pickups seems very good at any rate. They are extremely low-noise, and you do not suffer from microphonic feedback or other noises (and as I said, no dreaded ‘echo’ from the trem either).

It has a somewhat ‘dry’ palmmuted sound. It sounds okay on the low E string, but lacks a bit of ‘oomph’ compared to my usual guitars on the A and D strings. Again I’m not entirely sure how much of that is down to the overall lack of output of the pickups, or specifically because of the acoustic qualities of the guitar itself (perhaps lacking a bit of midrange).

So it could be that these are very generic pickups, and the guitar just sounds the way it does because of its wood and construction. It could also be that these pickups are very specifically tuned for such an unusual guitar as this one, and regular aftermarket pickups might not work at all in this particular guitar.

But overall, after I set up the guitar to my liking, I find it quite easy to play, and although it takes some getting used to its specific sonal characteristics, it can sound very nice. It has a very unique feel to it. It is extremely light and well-balanced, and you have excellent reach to even the highest frets. It’s so light that the entire guitar shakes when you perform vibrato with your fingers. Coming from the other end of the spectrum, where my first decent electric guitar was a Les Paul, that is very strange indeed. I suppose it also is an indication that I should try to optimize my playing to use the minimum amount of effort required for vibrato, and let the guitar do the work.

I hope to own a ‘real’ Steinberger one day, the original graphite/carbon fiber model with the TransTrem and the active electronics. Sadly they are rare, and since they are so complex, chances are someone has botched them over the years, so buying second-hand is quite a risk (I often find that even with ‘simple’ Floyd Rose guitars… people with no clue have damaged the tremolo system beyond repair). But I will have one, one day! For now I’d settle on just being able to play a real Steinberger someday, just to know what a real one really feels and sounds like.

For now, I’ll leave you with a quick recording I did of a Joe Satriani song:


Posted in Uncategorized | Tagged , , , , , , , , , , , | Leave a comment