AMD Bulldozer: It’s time to settle

As you may remember, AMD’s Bulldozer has always been somewhat controversial, for various reasons. One thing in particular was that AMD claimed it was the ‘first native 8-core desktop processor’. This led to a class-action lawsuit, because consumers thought this was deceptive advertising.

I think they have a point there. Namely, if Bulldozer was just like any other 8-core CPU out there, why would AMD spend all this time talking about their CMT architecture, modules and such? Clearly these are not just regular 8-cores.

AMD argued that the majority of consumers would have the same understanding of ‘core’ as AMD does in their CMT-marketing. The judge basically said: “Well, we’d have to see about that”. This led to AMD wanting to settle, because AMD probably figures that the majority of consumers would NOT have the same understanding, if people would actually investigate, and do a survey among consumers.

Which makes sense, because AMD is still a minor player in all this. Intel is the market leader, and they always marketed their SMT/HyperThreading CPUs as having ‘logical cores’ vs ‘physical cores’. The first Pentiums with HT were marketed as having a single physical core, and two logical cores. That is the standard that was set in the x86 world, which consumers would be familiar with. Intel has always stuck by that. The first Core i7s were marketed as having 4 physical cores and 8 logical cores (or alternatively 4 cores/8 threads). And AMD shot themselves in the foot here… With their marketing of CMT they are clearly implying that their CMT should be seen as more or less the same thing as SMT/HyperThreading. In fact, AMD actually argued that the OS needs a CMT-aware scheduler. Apparently a regular scheduler for a regular 8-core CPU didn’t work as expected.

So, the bottom line is that Bulldozer does not perform as you would expect from a regular 8-core CPU. And there’s enough of AMD’s marketing material around that shows that AMD knows this is the case, and that they felt there is a need to explain this, and also find excuses why performance may not meet expectations.

But you already know my opinion on the matter. I’ve written a number of articles on AMD’s Bulldozer and CMT back in the day, and I’ve always argued that it’s like a “poor man’s HyperThreading”:

This shows that HyperThreading seems to be a better approach than Bulldozer’s modules. Instead of trying to cram 8 cores onto a die, and removing execution units, Intel concentrates on making only 4 fast cores. This gives them the big advantage in single-threaded performance, while still having a relatively small die. The HyperThreading logic is a relatively compact add-on, much smaller than 4 extra cores (although it is disabled on the 2500, the HT logic is already present, the chip is identical to the 2600). The performance gain from these 8 logical cores is good enough to still be ahead of Bulldozer in most multithreaded applications. So it’s the best of both worlds. It also means that Intel can actually put 6 cores into about the same space as AMD’s 8 cores.

So here the difference between CMT and SMT becomes quite clear: With single-threading, each thread has more ALUs with SMT than with CMT. With multithreading, each thread has less ALUs (effectively) than CMT.

And that’s why SMT works, and CMT doesn’t: AMD’s previous CPUs also had 3 ALUs per thread. But in order to reduce the size of the modules, AMD chose to use only 2 ALUs per thread now. It is a case of cutting off one’s nose to spite their face: CMT is struggling in single-threaded scenario’s, compared to both the previous-generation Opterons and the Xeons.

At the same time, CMT is not actually saving a lot of die-space: There are 4 ALUs in a module in total. Yes, obviously, when you have more resources for two threads inside a module, and the single-threaded performance is poor anyway, one would expect it to scale better than SMT.

But what does CMT bring, effectively? Nothing. Their chips are much larger than the competition’s, or even their own previous generation. And since the Xeon is so much better with single-threaded performance, it can stay ahead in heavy multithreaded scenario’s, despite the fact that SMT does not scale as well as CMT or SMP. But the real advantage that SMT brings is that it is a very efficient solution: it takes up very little die-space. Intel could do the same as AMD does, and put two dies in a single package. But that would result in a chip with 12 cores, running 24 threads, and it would absolutely devour AMD’s CMT in terms of performance.

Or perhaps an analogy can make it more clear. Both SMT and CMT partially share some resources between multiple ‘cores’ as they are reported to the OS. As I said, Intel calls them ‘logical’ cores, but you can also see them as ‘virtual cores’.

The analogy then is virtual machines: you can take a physical machine, and use virtualization hardware and software to run multiple virtual machines on that single physical machine. Now, if you were to pay for two physical servers, and you were actually given a single physical server, with two virtual machine instances, wouldn’t you feel deceived? Yes, to the software it *appears* like two physical servers. But performance-wise it’s not entirely the same. The two virtual machines will be fighting over the physical resources that are shared, which they would not if they were two physical machines. That’s pretty much the situation here.

All I can say is that it’s a shame that Bulldozer is still haunting AMD now that they are back on track with Zen, which is a proper CPU with a proper SMT implementation, and they no longer need to market their CPUs as making it sound like they have more physical cores than they actually do.

Advertisement
This entry was posted in Hardware news and tagged , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s