Well, Bulldozer reviews have finally arrived. They shouldn’t surprise people who’ve been following this blog. John Fruehe was spreading lies about Bulldozer’s performance, I called him out over a year ago. The claims he was making about things like “a lot more than 17% IPC” simply didn’t make sense, given the Bulldozer’s design.
So it is no surprise then that we read the following in Anand’s review:
AMD’s goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:
At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD’s efforts, IPC went down.
So, single-threaded performance is a weakness of Bulldozer, as I already said over a year ago (everyone who defended AMD/John Fruehe, thank you for playing). There is no “secret sauce”.
What about multi-threaded performance then? Well, Bulldozer fares a bit better in that, but still it is not too impressive. It often struggles to keep up with the Phenom II x6. Ironically enough it seems to be at its best when the new 256-bit AVX instructions are used:
AMD also sent along a couple of x264 binaries that were compiled with AVX and AMD XOP instruction flags. We ran both binaries through our x264 test, let’s first look at what enabling AVX does to performance:
Everyone gets faster here, but Intel continues to hold onto a significant performance lead in lightly threaded workloads.
The standings don’t change too much in the second pass, the frame rates are simply higher across the board. The FX-8150 is an x86 transcoding beast though, roughly equalling Intel’s Core i7 2600K. Although not depicted here, the performance using the AMD XOP codepath was virtually identical to the AVX results.
Note that these are binaries provided by AMD, so we don’t know how fair or unfair they are against Intel’s CPUs. However, these results are plausible. Intel and AMD both have 4 units for 256-bit AVX in their CPUs. AMD’s Bulldozer runs at a slightly higher clockspeed. So, given that both Intel’s and AMD’s AVX units have about the same performance per cycle, it makes sense that AMD comes out slightly on top in the AVX-heavy second pass. Intel still wins the first pass because of its much better single-threaded performance.
It is funny though that the 256-bit AVX benchmark is one of the best results that Bulldozer chalks up… After all, the real strength of Bulldozer was supposed to be that each 256-bit units splits up in two 128-bit units, for a total of 8 units, where Intel has only 4. But we don’t see Bulldozer outperforming Intel in any of the other tests, while I’m sure that 3dsmax, Cinebench and PhotoShop make heavy use of 128-bit SIMD code. It is about equal to the I7 2600k in the regular x264 second pass test, which I assume is a non-AVX 128-bit test. So apparently the splitting up and sharing of the AVX units doesn’t really work that well for Bulldozer either. Its 8 units cannot outperform Intel’s 4 units. Then again, I already mentioned earlier that a single unit can still have multiple ports, so there is still pipelining and instruction-level parallelism going on in Intel’s SIMD units, as in the Phenom units for that matter. As a result, Phenom II with 6 cores and 6 128-bit units, is never far behind Bulldozer’s 8 units (not much further than the difference in clockspeed would indicate).
Die size, price and power consumption
Now, let’s get to the issue of economics. Performance is only one part of the story. We’ve seen that the Bulldozer FX-8150 is roughly at the same performance level as the Core I5 2500 and the Phenom II X6 1100T.
The Phenom II X6 is the largest, at 346 mm2 die area. Then again, it is the only 45 nm CPU. Bulldozer at 32 nm is 315 mm2. But it has about twice the transistor count that a Phenom II X6 has. So a Phenom II at 32 nm would be a considerably smaller CPU than Bulldozer is.
The 2500 is also much smaller than Bulldozer, measuring only 216 mm2. And that even includes a GPU, which Phenom and Bulldozer do not have. So although all three CPUs have roughly the same performance, the 2500 is by far the cheapest to make. This shows that HyperThreading seems to be a better approach than Bulldozer’s modules. Instead of trying to cram 8 cores onto a die, and removing execution units, Intel concentrates on making only 4 fast cores. This gives them the big advantage in single-threaded performance, while still having a relatively small die. The HyperThreading logic is a relatively compact add-on, much smaller than 4 extra cores (although it is disabled on the 2500, the HT logic is already present, the chip is identical to the 2600). The performance gain from these 8 logical cores is good enough to still be ahead of Bulldozer in most multithreaded applications. So it’s the best of both worlds. It also means that Intel can actually put 6 cores into about the same space as AMD’s 8 cores. In which case Intel’s CPU will actually be better in multithreaded applications as well. We shall see that in a few months time, when Intel launches Sandy Bridge-E, the high-end line of Sandy Bridge.
Do we see the die size reflected in the price though? Not really, no. The FX-8150 is currently at $245, the 1100T is at $190, and the 2500 is $210. So the largest CPU is actually the cheapest. And the 2500 is cheaper than the FX-8150, but not by much. So AMD’s CPUs are not priced that unreasonably, given their performance. The problem is mainly that it comes at the cost of AMD’s profit margin.
Then another issue that often plagues large dies: power consumption. Bulldozer seems to do quite well when idle, better than Phenom, and almost as low as Core I5/I7. However, when we get to load:
Under load however, Bulldozer consumes quite a bit of power easily outpacing the Phenom II X6:
I suppose Global’s 32nm process in combination with Bulldozer’s high frequency targets are to blame here.
AMD might have come up with a better CPU if they shrunk Phenom II down to 32 nm. It would be considerably smaller than Bulldozer, and they could probably increase clockspeed a bit on 32 nm, and/or add a bit more cache, which could boost performance enough to put it past the FX-8150’s.
The conclusion is simple then: Bulldozer is not the competitive chip that AMD needs at this point. Given the lower price and lower power consumption, the Phenom II X6 1100T still seems to be a very good alternative to Bulldozer for most consumers. Because of Phenom’s stronger single-threaded performance, it is as good or better than Bulldozer in many situations, including games. But the Core I5 2500 and I7 2600 are still the undisputed leaders in both performance and power consumption, and still very affordable as well. So Bulldozer really does not have a lot going for it.
I suppose the following phrase would fit here nicely:
“I told you so!”
I suppose this also means the end of John Fruehe’s career. His credibility is completely destroyed. Who will ever take him seriously again?