AMD Bulldozer: Nothing to see here, moving on

Posted on October 12, 2011 by Scali

Well, Bulldozer reviews have finally arrived. They shouldn’t surprise people who’ve been following this blog. John Fruehe was spreading lies about Bulldozer’s performance, I called him out over a year ago. The claims he was making about things like “a lot more than 17% IPC” simply didn’t make sense, given the Bulldozer’s design.

So it is no surprise then that we read the following in Anand’s review:

AMD’s goal with Bulldozer was to have IPC remain constant compared to its predecessor, while increasing frequency, similar to Prescott. If IPC can remain constant, any frequency increases will translate into performance advantages. AMD attempted to do this through a wider front end, larger data structures within the chip and a wider execution path through each core. In many senses it succeeded, however single threaded performance still took a hit compared to Phenom II:

At the same clock speed, Phenom II is almost 7% faster per core than Bulldozer according to our Cinebench results. This takes into account all of the aforementioned IPC improvements. Despite AMD’s efforts, IPC went down.

So, single-threaded performance is a weakness of Bulldozer, as I already said over a year ago (everyone who defended AMD/John Fruehe, thank you for playing). There is no “secret sauce”.

What about multi-threaded performance then? Well, Bulldozer fares a bit better in that, but still it is not too impressive. It often struggles to keep up with the Phenom II x6. Ironically enough it seems to be at its best when the new 256-bit AVX instructions are used:

AMD also sent along a couple of x264 binaries that were compiled with AVX and AMD XOP instruction flags. We ran both binaries through our x264 test, let’s first look at what enabling AVX does to performance:

Everyone gets faster here, but Intel continues to hold onto a significant performance lead in lightly threaded workloads.

The standings don’t change too much in the second pass, the frame rates are simply higher across the board. The FX-8150 is an x86 transcoding beast though, roughly equalling Intel’s Core i7 2600K. Although not depicted here, the performance using the AMD XOP codepath was virtually identical to the AVX results.

Note that these are binaries provided by AMD, so we don’t know how fair or unfair they are against Intel’s CPUs. However, these results are plausible. Intel and AMD both have 4 units for 256-bit AVX in their CPUs. AMD’s Bulldozer runs at a slightly higher clockspeed. So, given that both Intel’s and AMD’s AVX units have about the same performance per cycle, it makes sense that AMD comes out slightly on top in the AVX-heavy second pass. Intel still wins the first pass because of its much better single-threaded performance.

It is funny though that the 256-bit AVX benchmark is one of the best results that Bulldozer chalks up… After all, the real strength of Bulldozer was supposed to be that each 256-bit units splits up in two 128-bit units, for a total of 8 units, where Intel has only 4. But we don’t see Bulldozer outperforming Intel in any of the other tests, while I’m sure that 3dsmax, Cinebench and PhotoShop make heavy use of 128-bit SIMD code. It is about equal to the I7 2600k in the regular x264 second pass test, which I assume is a non-AVX 128-bit test. So apparently the splitting up and sharing of the AVX units doesn’t really work that well for Bulldozer either. Its 8 units cannot outperform Intel’s 4 units. Then again, I already mentioned earlier that a single unit can still have multiple ports, so there is still pipelining and instruction-level parallelism going on in Intel’s SIMD units, as in the Phenom units for that matter. As a result, Phenom II with 6 cores and 6 128-bit units, is never far behind Bulldozer’s 8 units (not much further than the difference in clockspeed would indicate).

Die size, price and power consumption

Now, let’s get to the issue of economics. Performance is only one part of the story. We’ve seen that the Bulldozer FX-8150 is roughly at the same performance level as the Core I5 2500 and the Phenom II X6 1100T.

The Phenom II X6 is the largest, at 346 mm² die area. Then again, it is the only 45 nm CPU. Bulldozer at 32 nm is 315 mm². But it has about twice the transistor count that a Phenom II X6 has. So a Phenom II at 32 nm would be a considerably smaller CPU than Bulldozer is.

The 2500 is also much smaller than Bulldozer, measuring only 216 mm². And that even includes a GPU, which Phenom and Bulldozer do not have. So although all three CPUs have roughly the same performance, the 2500 is by far the cheapest to make. This shows that HyperThreading seems to be a better approach than Bulldozer’s modules. Instead of trying to cram 8 cores onto a die, and removing execution units, Intel concentrates on making only 4 fast cores. This gives them the big advantage in single-threaded performance, while still having a relatively small die. The HyperThreading logic is a relatively compact add-on, much smaller than 4 extra cores (although it is disabled on the 2500, the HT logic is already present, the chip is identical to the 2600). The performance gain from these 8 logical cores is good enough to still be ahead of Bulldozer in most multithreaded applications. So it’s the best of both worlds. It also means that Intel can actually put 6 cores into about the same space as AMD’s 8 cores. In which case Intel’s CPU will actually be better in multithreaded applications as well. We shall see that in a few months time, when Intel launches Sandy Bridge-E, the high-end line of Sandy Bridge.

Do we see the die size reflected in the price though? Not really, no. The FX-8150 is currently at $245, the 1100T is at $190, and the 2500 is $210. So the largest CPU is actually the cheapest. And the 2500 is cheaper than the FX-8150, but not by much. So AMD’s CPUs are not priced that unreasonably, given their performance. The problem is mainly that it comes at the cost of AMD’s profit margin.

Then another issue that often plagues large dies: power consumption. Bulldozer seems to do quite well when idle, better than Phenom, and almost as low as Core I5/I7. However, when we get to load:

Under load however, Bulldozer consumes quite a bit of power easily outpacing the Phenom II X6:

I suppose Global’s 32nm process in combination with Bulldozer’s high frequency targets are to blame here.

AMD might have come up with a better CPU if they shrunk Phenom II down to 32 nm. It would be considerably smaller than Bulldozer, and they could probably increase clockspeed a bit on 32 nm, and/or add a bit more cache, which could boost performance enough to put it past the FX-8150’s.

Conclusion

The conclusion is simple then: Bulldozer is not the competitive chip that AMD needs at this point. Given the lower price and lower power consumption, the Phenom II X6 1100T still seems to be a very good alternative to Bulldozer for most consumers. Because of Phenom’s stronger single-threaded performance, it is as good or better than Bulldozer in many situations, including games. But the Core I5 2500 and I7 2600 are still the undisputed leaders in both performance and power consumption, and still very affordable as well. So Bulldozer really does not have a lot going for it.

I suppose the following phrase would fit here nicely:

“I told you so!”

I suppose this also means the end of John Fruehe’s career. His credibility is completely destroyed. Who will ever take him seriously again?

This entry was posted in Hardware news and tagged AMD, benchmark, Bulldozer, Fruehe, FX, games, JFAMD, John, mistake, performance, review, Zambezi. Bookmark the permalink.

30 Responses to AMD Bulldozer: Nothing to see here, moving on

k1net1cs says:

October 12, 2011 at 12:07 pm

“Who will ever take him seriously again?”
Mostly the same AMD fanboys who defended him back then, who’d in turn blame ‘outdated BIOS’ being used on the test benches.

I was kinda hoping 8150 would put up a better fight than this, so that I can get an i5-2500K at a cheaper price. =/
People would probably sticking to 8120 or 4100 for the asking price; 8120 is essentially 8150 with lesser OC ratio.

Reply
- k1net1cs says:
  
  October 12, 2011 at 12:08 pm
  
  *OC potential
  
  8150 is practically cherry-picked 8120.
  
  Reply
- Scali says:
  
  October 13, 2011 at 7:45 pm
  
  I would hope that AMD fanboys are not *that* stupid.
  The systems used by the major review sites (such as Anandtech.com that I used as an example here), were all provided by AMD, with a high-end Asus motherboard (see the article: “For this review, AMD supplied us with ASUS’ Crosshair V Formula AM3+ motherboard based on AMD’s 990FX chipset.”). Why would AMD provide reviewers with boards with outdated BIOS?
  
  I hope AMD fanboys are smart enough to stand up to AMD for a change, and make it very VERY clear that AMD’s customers are not taking their crap any longer.
  The lies have to stop somewhere. It’s not the first time they’ve done this. Remember Barcelona? The famous ‘40%’ video from Randy Allen… but also the fake benchmarks they spread: http://www.zdnet.com/blog/ou/amd-posts-blatantly-deceptive-benchmarks-on-barcelona/567
  
  Reply
  - k1net1cs says:
    
    October 13, 2011 at 8:08 pm
    
    Yeah, I still remember that Barcelona fiasco.
    Especially when they smack-talked Intel about their underhanded tactics not too long before.
    
    Anyway, some of them already squawks about how using a different motherboard would have better results in Guru3D forum.
    Not that this is the general Guru3D stance about BD, just a small number of butt-hurt AMD users; most of the AMD loyalists there have been bashing BD left & right.
    What they (still) don’t get is that those review sites who generated BD’s mediocre results (and PII’s BD-mocking performance) were using a motherboard that’s considered one of the best pedigree in AMD boards, the Asus Crosshair; can’t get anything better than that these days.
    
    Here’s a thread for reading fun (or make fun of).
    http://forums.guru3d.com/showthread.php?t=352281
    I also tried to explain briefly why BD’s performing not much better than PII but, I might’ve explained it wrong.
bonzaibunnynzai says:

October 12, 2011 at 7:29 pm

I would totally get the cheapest 8-core and OC that bastard, thats for sure. But damn, I really did want AMD to score something here. I hear Piledriver is the next “big” thing with it being a 10 core? or some crap. But when I read that it will be 10% stronger than Bulldozer…..well, erm. Maybe improve the threaded performance and single thread. Hurf Durf.

Intel is ahead and damnit they will keep the prices high. I really was hoping for some competition in pricing heh. Son of a bitch! Those prices will drop though. I just hope Intel doesn’t raise much. Yeah, I don’t see any AMD fanboys coming over here and bowing their head. Right now the rage is great! And wow, my i7-920 OCed @ 4.2 still holds strong.

Reply
Bonzai says:

October 12, 2011 at 7:29 pm

I must be drunk. Boy did I screw up my name.

Reply
- k1net1cs says:
  
  October 13, 2011 at 8:21 pm
  
  Are you sure it’s not a Freudian slip? =b
  
  Reply
Alessandro Vullo says:

October 12, 2011 at 9:54 pm

I recently discovered your blog and I really appreciate the content …
on the web there is a lot of trash and misleading marketing … It’s always nice to find somewhere to learn something new…

greetings from italy,
Ale.

Reply
Somerandomguy says:

October 13, 2011 at 9:02 am

Zambezi is definitely a fail. But people who think they saw that coming were really only right by chance alone. Zambezi was not at all likely to be that bad and it’s likely that we’ll see the “Bulldozer that should have been” with Piledriver.
There’s just nothing revolutionary that AMD has to do with Piledriver to fix this mess. Increase IPC a bit, increase clock speed a bit, improve turbo a bit and the only major problem that’s left is the Windows 7 scheduler. That may or may not be fixable before the release of Windows 8.
But then the uarch itself is actually okay. Certainly not impressive, still a bit behind Sandy, but Ivy isn’t likely to increase CPU performance much, because Intel has quite some catching up to do with their GPU and certainly won’t want Ivy to compete with the 4 core SB-E parts.
An FX with 5 Piledriver modules and a TDP of 140W could well deserve the “FX” though and actually compete with the 4 core SB-E parts in at least some applications.

Reply
- Scali says:
  
  October 13, 2011 at 9:11 am
  
  Give me a bit more credit than “chance alone”.
  I argued my case quite well, directly in the face of AMD’s own John Fruehe. I was right in every way. I’m not that lucky.
  
  And no, I don’t see how Piledriver will just fix this. Increasing IPC and clock speed are not trivial matters. Especially not since AMD is already at relatively high clock speeds.
  And increasing IPC? AMD has been struggling to do that for many years. With an all-new architecture they even go DOWN (for obvious reasons, as I’ve explained well over a year ago already). So in a few months time they can suddenly bring out a CPU that makes a significant jump in IPC, just like that? Gee, why didn’t they do that with Phenom then? Or Phenom II? Or Bulldozer itself?
  No, the microarchitecture has serious problems. Much like Pentium 4, you’re not going to fix that with a few updates. The module idea just doesn’t work, as I’ve said all along. HT gives a much better balance between single-threaded and multi-threaded performance, and single-threaded performance is too important to ignore. Aside from that, HT is just much more efficient in terms of performance-per-transistor-count. As I’ve stated above, Bulldozer is an incredibly large chip, given its performance (which you are definitely not going to fix with a few updates). In fact, the comparison with Sandy Bridge is just unfair. SB is so much smaller. The upcoming 6-core/12-thread variations of SB-E will be a more direct comparison of architectures, since they will at least have similar size and TDP to Bulldozer. And I think we all have a pretty good idea what that is going to look like: massacre.
  
  Reply
  - Somerandomguy says:
    
    October 13, 2011 at 3:21 pm
    
    Well, some of your concerns may have been legitimate. I myself didn’t expect Zambezi to “bulldozer” Sandy Bridge like some AMD fanbois expected it to, because it was obvious that per core performance would’t even be on a par with that of the Sandy Bridge.
    I just expected it to get a lot closer to it than Phenom II is. And that is still possible and IMO probable with Piledriver (again, Ivy Bridge isn’t likely to increase CPU performance much).
    
    As for increasing IPC and clock speed not being trivial matter. Well, guess what, _nothing_ in this business is a trivial matter! You need a lot of brain power to invent and improve anything at all in this business, but that’s not news, is it to you?
    If AMD didn’t have the brain power to improve Bulldozer, Bulldozer wouldn’t even exist as an actual chip to begin with. Bobcat wouldn’t exist. K10 wouldn’t exist. K8 wouldn’t exist. AMD would have gone out of business a long time ago or perhaps they would have become some reseller or whatever doesn’t take much brain power.
    And if Globalfoundries didn’t have the brain power to improve their 32nm process, they wouldn’t even have a 32nm SOI process. They wouldn’t have a 28nm process. They wouldn’t have a 45nm process. You get the idea, right?
  - Scali says:
    
    October 13, 2011 at 3:38 pm
    
    I would say that ALL of my concerns have been legitimate (go ahead, I’ve said many things on my blogs, in the comments, and on various other websites. Find me anything that is not legitimate). And in fact, I was even too optimistic about Bulldozer. For example, although I spoke out my concerns about IPC, given the limited number of ALUs and AGUs, I never actually thought that the IPC would drop below Phenom. I named various reasons that would decrease IPC in Bulldozer, but at the same time acknowledged that there would be room for improvement, so they could probably compensate that. I merely said that Fruehe’s number of ‘a lot more than 17%’ was unrealistically positive.
    
    Other than that, you’re not making a lot of sense… You can name K8, K10 and all that… But have you looked at the development time? These are all ‘new’ microarchitectures, that were in development for multiple years. Bulldozer is another new microarchitecture, and now you’re expecting them to roll out a few years of work on a new microarchitecture in just a few months? That’s not going to happen.
    
    Besides, the move from K8 to K10 was hardly spectacular to begin with. The performance difference between two dualcore K8s on a dual-CPU board and a ‘native quadcore’ K10 was negligible. Micro-architecture-wise, it was hardly spectacular. Not a lot of gain in the IPC department. The main advancements were because of smaller process nodes, so they could fit more cores onto a die, and increase clockspeed. However, Bulldozer is already on GF’s newest process, so it will be a few years before they can do a shrink again.
    
    Also, last time I looked, they DON’T have a 28 nm process. Are you perhaps confused because AMD’s chipsets and GPUs are made by TSMC, not by GF (who also don’t have their 28 nm operational yet, but should have it in a few months time).
    
    Anyway, instead of this baseless faith in AMD, why don’t you try to give any technical examples of how AMD can improve Bulldozer? Clockspeed and die shrink seem out of the question, since 32nm is as good as it gets for now… Does it have a problem like the Phenom’s TLB bug perhaps? Or something else that they can easily solve with a respin that would massively improve performance? I don’t know of anything, and what I see is just that Bulldozer’s module approach simply doesn’t work. The cores are too anemic, as I’ve been saying all along. They’d need to add more execution units, but that’d require a big redesign of the architecture, and they don’t have the space on the die to do that in the first place. So, how?
Somerandomguy says:

October 13, 2011 at 5:44 pm

Yeah, I expected BD to increase per core performance mostly via higher clock speed, and guess what: There have been plenty of rumors that GFs 32nm process hasn’t gone so well so far. Improve that and by next year, AMD should be able to add a few hundred MHz to base clock and perhaps a hundred MHz to maximum turbo.
Even on a process that works the way it’s supposed to at production start, you can still get about a hundred MHz more out of it every few months (Phenom II 955 to Phenom II 965 and so on, for example). Always lots of knobs to turn, AFAIK.

You bet that Piledriver has already been in development for months. Ideas about what to change for PD have certainly been around for longer than that, but of course it takes time to turn them into a reality so I guess there was no way these could have made it into BD.
This is the first x86 uarch with CMT. You bet that lots of things are simply suboptimal. That would mean even more knobs to turn in addition to the process.
(And then there’s the Windows 7 scheduler that may or may not be fixed until Windows 8 is launched. Not everyone is going to install an early version of that, of course.)

Also, there is one major difference between the change from K8 to K10 and BD to PD: K8 wasn’t a fail. AMD screwed up K10 by not improving K8 enough, but now they (and possibly GF) screwed up by not getting BD (and its process) right. I think they underestimated Intel after Pentium 4, now they overestimated how BD (not the uarch itself but its current design) would actually turn out.
So this time around, if AMD’s management isn’t outright stupid, which I don’t think they are (just not as efficient as they should be), they’ve better had their engineers working on the problem for months now, or to hell with AMD. 😉
So I don’t think I have to point out exactly where BD may be flawed but fixable with PD. There’s certainly plenty of knobs to turn that AMD’s engineers know about, and AMD’s management should be able to realize that increased IPC is an important part of the solution and adding more cores is only possible for the very top part (that would still suck without an increase in IPC).
So basically it is obviously IPC or death which wasn’t the case with the change from K8 to K10 since K8 wasn’t a fail.

According to BRIGHT SIDE OF NEWS, GF does have a 28nm-SLP and 28nm-HP process available (http://www.brightsideofnews.com/news/2011/9/6/amd-not-leaving-soi-for-28nm-10-core-macau-and-20-core-dublin-cpus.aspx), prototypes are said to have been around since early 2011 (http://www.brightsideofnews.com/news/2011/4/27/amd-fusion-tapeouts-unveiled-28nm-in-2q-20112c-20nm-in-2012-and-14nm-in-2014.aspx).
So even if production only starts in a few months, that’s just to get a fab ready to actually produce, don’t you think?
So they basically do have a 28nm process. Not a 28nm-SHP process yet though, of course.

Reply
- Scali says:
  
  October 13, 2011 at 6:26 pm
  
  Okay, so AMD is about as close to 28 nm as Intel is to 22 nm… Why do people always forget about moving targets?
  
  For the rest… uhhh, why do people talk about CMT as if it’s new and sooo difficult? CMT is just a stupid term for a watered-down implementation of SMT, which has been around for ages. SMT shares all resources of a core over multiple threads. CMT shares a few resources of a core over multiple threads, where the rest remains dedicated. How is CMT more difficult than SMT?
  The real problem is that CMT will always be suboptimal compared to SMT. Not something you can fix with a few small updates.
  
  And you have to be specific about what a ‘failure’ really is. Again, I asked you to get technical, but you don’t. You don’t produce any arguments.
  Let me give you a technical argumentation of what a ‘failure’ could be in terms of a CPU.
  Roughly we can disginguish the following classes of problems:
  1) The manufacturing process is suboptimal.
  2) The circuit is bugged.
  3) The microarchitecture is suboptimal.
  
  For 1), we could take Fermi as example. There were problems manufacturing such large CPUs, so Fermi was clocked lower than expected, and some units had to be disabled. In such a case, tweaks to the circuit design and general maturing of the process can improve the situation. This resulted in the far more successful GTX460 and later GTX500 chips, which were still the same basic architecture.
  
  With 2) we could take Phenom as example. The chip itself was okay, but because of a bug, the TLB cache had to be disabled, causing a huge performance degradation. This could be fixed by patching the bugged circuit in a later revision. Again, still the same basic architecture.
  
  The obvious example for 3) is Pentium 4. Although initially Pentium 4 was also plagued with manufacturing problems (excessive leakage at high clockspeeds), in the end it was the microarchitecture itself that let it down. Even when manufactured on a good process (such as the 65 nm that also produced the first Core2 series), and clocked at high speeds, the chips just didn’t perform well. From a technical point-of-view it was not a failure… Manufacturing didn’t fail, and there were no bugs in the chip design that held back the true performance. It’s just that even though Intel got the true performance out of the microarchitecture, that performance was not good enough for most popular applications. The Athlon64 and Core2 designs just got a lot more performance out of a lot less die space.
  
  A chip could suffer from any combination of these three. Now, let’s look at Bulldozer. Does it suffer from 1)? I don’t think that is really the case. AMD even set the world record for clock speed with a Bulldozer chip, and most reviewers reported overclocking results in the 4.5-5 GHz region, so it would seem that there aren’t any major manufacturing issues. Clock speed scaling is okay. And considering the huge size of 315 mm², its power consumption is not that outrageous either (it just looks outrageous because its main competitors in performance are much smaller chips). They may be able to squeeze a few hundred extra MHz out of it as the 32 nm process matures, but I don’t expect huge leaps.
  
  Moving on to 2) then. Any bugs? Well, none that we have heard of so far. It would appear that all circuits in the chip are working and enabled, so there doesn’t seem to be an easy fix to the performance problems. The problem seems more that AMD sliced the cores too thin with only two ALUs and AGUs per core, and the shared FPU.
  
  What about 3) then? Yes, I think we have a winner here. Not an easy fix. Just like Pentium 4 was never fixed, even though Intel kept improving it for 5 years in total, before finally retiring it in favour of Core2. So if Intel can’t fix Pentium 4’s microarchitectural problems in 5 years… how is AMD going to do this in just a few months?
  
  Reply
  - Somerandomguy says:
    
    October 13, 2011 at 8:01 pm
    
    No, AMD is probably not about as close to 28nm as Intel is to 22nm (not sure about 28nm Bobcat though, which could actually be just a couple of months away, but that’s a different story).
    For that GF would probably have to have a 28nm-SHP (NOT 28nm-HP, NOT 28nm-SLP) process ready for production in a few months, but they don’t seem to. So no AMD 28nm silicon until 2013 other than Bobcat.
    
    CMT really is a watered-down implementation of SMT? So?
    To me this sounds just like the thing to try out as an underdog, which AMD is, of course.
    Sure, it’s probably never going to be on a par with Intel’s uarch. So?
    If it’s not too far behind overall and does shine here and there you can sell it for a bit less.
    If it is too far behind so that you have to sell it for a lot less (BD ist still way too expensive for how it performs), then something’s probably wrong and you have to fix it. I think this is exactly the case with BD.
    
    So in a way, I think you’re actually right with 3). It’s just that I think it would actually make sense for AMD to try out a suboptimal architecture if it is easier to implement for them, but that doesn’t completely shield you from still screwing it up more or less, does it?
    So I think we may have more or less of a case of 2) here as well and I guess we’ll see plenty of rumors and educated guesses about it in the next few months by people way way more in the know than me like charlie, dresdenboy and Hans De Vries. So why should I try and go into detail here? Not going to, sorry.
    
    I think 1) is quite possibly true. I mean AMD basically admitted that they couldn’t ship as many Llanos as they intended to, because of problems at GF.
    But there will be improvements in manufacturing process anyways.
    
    And then there are of couse the normal, unspectacular improvements as well, if you just started of with a new uarch, because there’s always something to improve, whether it’s an “optimal” uarch or a suboptimal one. (Which makes me wonder what Netburst may have turned into, if Intel didn’t have another uarch in the pipe.)
    
    Now combine all of that and I just. Do. Not. See. Why it would be anything but likely that Piledriver will look significantly better than Bulldozer.
    But I guess we have to agree to disagree.
  - Scali says:
    
    October 13, 2011 at 8:37 pm
    
    No, let’s not agree to disagree… We were in a discussion, you presented some points, I argued those, and now you completely change your stance.
    
    Firstly: you brought up the 28 nm as a way to improve Bulldozer. Now you admit that this is still about 2 years away. How can it save Bulldozer then? Intel is tick-tocking, remember? 2 years means another die-shrink and another micro-architecture. AMD can’t compete today, what will things look like 2 years from now?
    
    Secondly: You brought up CMT, claiming it was all-new, and very suboptimal etc. I say it isn’t, because it’s a simplified form of SMT, which various other CPUs have been using for years. And they all got SMT right the first time. So how badly could AMD possibly screw up a simpler form of that, after they could already see how everyone else has done theirs? Doesn’t make sense.
    I also don’t agree it was the right step to take. AMD was probably better off without it altogether, and should just have made a 6-core, with faster cores. More of an improved Phenom II X6. I think this is a technology that you can’t do halfway. If you do, you end up with something like Bulldozer: you don’t save enough die space by sharing logic to really make a compact 8-core CPU. At the same time you also don’t share enough logic to get good enough performance per core. AMD has 4 ALUs/AGUs per module… Intel has only 3 ALUs/AGUs per HT core. Yet Intel has much better per-core performance, because the 3 ALUs can be used by either thread. Intel also has enough multithreading performance to keep up with AMD’s design if two treads are running in their core. If AMD just made a faster 6-core, they’d at least have reasonably competitive per-core performance, which would make them a lot more competitive than having 8 slow cores.
    
    So, I see no reason to agree to disagree with you. You fail to provide any arguments. You basically just ‘trust’ that AMD will improve Piledriver, because they have to.
    Well, AMD also had to come up with a good Bulldozer in the first place, because Phenom II was no longer cutting it against Core i7. A lot of people trusted that AMD would become competitive again with an all-new architecture. Worked really well for them, didn’t it?
    
    I was one of the few who pointed out that none of AMD’s technical information actually pointed to them closing the performance gap with Intel, and I pinpointed some bottlenecks in their approach. Apparently logic and analysis are more reliable than trust.
    
    So my logic and analysis say: we don’t know of any bugs and structural problems in Bulldozer, so we can assume there aren’t any (we knew about the TBL bug in Barcelona shortly before launch, because OEMs reported it). So until something comes up, you can’t argue that AMD will improve with Piledriver, because you can’t name anything that they can actually improve on.
    Should something come up, I will reconsider and do a new analysis based on that information… but until then, this is just what it is.
Somerandomguy says:

October 13, 2011 at 10:04 pm

You misunderstood why I mentioned 28nm. What I meant was that GF wouldn’t have a 28nm process if it didn’t have the brain power to invent one, but that same brain power can and will also be used to improve an already used process (especially if there’s more or less of a screw up).
I mean initially, you basically exaggerated; as if there was zero chance of any increase in clock speed at all, which is of course totally absurd.
But then you backed off from that. I quote: “They may be able to squeeze a few hundred extra MHz out of it as the 32 nm process matures, but I don’t expect huge leaps.”
Uh… yeah, that’s exactly a part of the improvements Piledriver needs.

What will things look like 2 years from now? I have a bad feeling that they’ll screw things up again with steamroller after they’ve fixed their mess with Piledriver. So Excavator will have to step in and fix again. ;D
Today, BD actually may not be as terribly bad for servers as it is for consumers. So I guess at least some profit is to be had in servers, where software optimizations are certainly more likely to happen, and the consumer parts should just be sold for a lot less.
Even if they don’t make a profit with it in the consumer space, the basic uarch will at least be out in the open already, so that makes it a little more likely that at least a little bit of early software optimizations for that uarch happen in some consumer software as well and before Piledriver will also be in notebooks. BD as a “pipe cleaner” basically.
That’s also why I’m not at all convinced that it would have been a good idea shrinking K10 again, other than in Llano. At some point you just have to come out with a uarch that was until then in development, because it’ll need at least some software optimizations and you want those to happen as early as possible and especially before that uarch will be in just about everything from notebooks to supercomputers.

AFAIK CMT really is new in x86 uarchs, but it’s definitely new for AMD. BD is hands down a new uarch for them anyways. As I said earlier, certainly lots of knobs to turn (we’re still talking high tech here). A bunch of them were probably not turned so wisely.

But I’m repeating myself over and over. And no, you don’t really argue my points. What I hear from you is basically just utter pessimism. Of course there’s also the opposite of that (like with the folks who thought BD would be on a par with Sandy), but then most of the time the truth is somewhere in between.
Now don’t get me wrong, there is always a chance of another screw up right after a screw up. It wouldn’t be pessimistic to say that there is at least a low chance of that happening, it would be realistic.
But to jump on that chance and treat it as if it was like 85% or something and also basically ignoring that there are reasons to think it ‘s way lower than that, that’s just pessimism.
You could be right again, but you can also win the lottery.

Reply
- Scali says:
  
  October 13, 2011 at 10:59 pm
  
  “as if there was zero chance of any increase in clock speed at all, which is of course totally absurd.”
  
  Uhh, no. I said: “Clockspeed and die shrink seem out of the question, since 32nm is as good as it gets for now…”
  With “as good as it gets for now” I mean that 32nm appears to be operating on track (and other processes are still well into the future). Obviously any process will get better as it matures, and CPUs will generally get a few hundred MHz boost during their lifetime because of this. But since this is such a general rule in chip manufacturing, I didn’t think it was necessary to point that out specifically.
  What I meant is that they only have 32 nm to go with, and although it can still improve somewhat, there doesn’t seem to be any structural problem that’s holding back clockspeed. As such it is unlikely that clockspeeds will go up by much more than the usual few hundred MHz over time.
  
  Problem is, since it’s a general rule, the same happens at Intel. They’re just about to release an i7 2700K actually. So these improvements don’t mean anything. If the competition improves as much as you do, you’re not closing the gap.
  
  “Today, BD actually may not be as terribly bad for servers as it is for consumers.”
  
  I have no idea why that would be. Power consumption is even more important in servers (multiple CPUs, small enclosures etc), and Bulldozer is not exactly a star there. It may be able to deliver the same performance as a 2600K in certain threaded environments, but at the much higher power consumption, it’s not attractive at all.
  
  “Even if they don’t make a profit with it in the consumer space, the basic uarch will at least be out in the open already, so that makes it a little more likely that at least a little bit of early software optimizations for that uarch happen in some consumer software as well and before Piledriver will also be in notebooks. BD as a “pipe cleaner” basically.”
  
  Not going to happen. AMD’s market share is too small for developers to optimize for it specifically. That’s why 3DNow! failed as well, for example. Besides, AMD doesn’t provide decent tools. At least Intel provides its own compiler and tools for developers to optimize with. How are developers going to optimize for BD when AMD doesn’t provide the tools?
  Heck, the same is happening to AMD on the GPGPU-side. We’re still waiting for those OpenCL applications to come in. Some GPU physics would be nice as well. Nobody seems to be developing for OpenCL. nVidia on the other hand is actively supporting developers with CUDA tools, and hands-on assistance, and CUDA is actually used in major applications (PhotoShop, Premiere?).
  
  “AFAIK CMT really is new in x86 uarchs, but it’s definitely new for AMD.”
  
  That’s because as I explained already, CMT is a nonsense term for applying technology that has been around for years. The term CMT is new, the actual technology it describes is not. Hence, nonsense.
  
  “As I said earlier, certainly lots of knobs to turn (we’re still talking high tech here). A bunch of them were probably not turned so wisely.”
  
  More nonsense. CPU architectures don’t work that way. You can’t just go and tweak things around. Only very minor tweaks are possible without having to go through a completely new design cycle, that takes years to complete. I don’t see anything in Bulldozer that AMD can easily tweak. That’s my point. They need a big redesign to fix this.
  
  “What I hear from you is basically just utter pessimism.”
  
  Try realism. Some things are just not possible.
  Again, let me throw the Pentium 4 in your face. Now, clearly you’re not going to argue that Intel has at least as many developers on their team as AMD, who are at least as smart, and have at least as many resources at their disposal. Clearly you’re also not going to argue that Intel’s manufacturing process was better than AMD’s at the time of Pentium 4, as it always is anyway.
  So, then tell me… why was Intel never able to ‘turn some knobs’ and make the Pentium 4 into a huge success? They spent 5 years on it, went from 180nm to 130nm, 90nm and finally 65nm… but no matter how often they tweaked and shrunk it, they couldn’t get the IPC and clockspeed up far enough.
  Then they come up with a new architecture, only a fraction of the size of the Pentium 4/D, running at a much lower clockspeed, on the same 65 nm process, and *boom*, IPC shoots through the roof.
  
  Tell me then, how come Pentium 4 didn’t work, no matter what they tried, and Core2 worked properly from the get-go?
  Did those Intel engineers just screw up 5 years in a row, and then get lucky? And have they been lucky since Core2? Because everything they release now is always fine. Or was it just some rare disease that made these engineers stupid for 5 years, and then they were magically cured?
  This is reality though, this really happened, a few years ago. You really still think I’m being pessimistic, and it’s just as simple as turning a few knobs? I think that’s a VERY naive view.
  There are some screwups that you just can’t fix, and you have to start over from scratch.
  
  Reply
  - Somerandomguy says:
    
    October 13, 2011 at 11:09 pm
    
    *sigh* And of course I’d have to repeat myself again (and again and again), which I’m not going to do anymore now.
    Just wait and see. That’s all we can do anyways.
    Over and out.
  - Scali says:
    
    October 13, 2011 at 11:11 pm
    
    No, you’d have to respond to my Pentium 4 statement, which debunks your “but AMD has smart engineers, so they’ll fix it” argument.
    But you can’t.
FX says:

October 14, 2011 at 8:32 am

John Fruehe is a shithole. his 33% more core and 50% performance is absolutely shit. Bulldozer is even whorse than Phenom II. This is an AMD’s version of Pentium 4

Reply
- k1net1cs says:
  
  October 14, 2011 at 11:28 am
  
  Well, Pentium 4 was also the processor to beat in breaking 8GHz record for BD, so I guess now it goes full circle…
  
  Reply
  - Scali says:
    
    October 14, 2011 at 11:35 am
    
    Yup… there is a glaring disparity though: Intel could survive with Pentium 4 for 5 years because they had an advantage in manufacturing, combined with a strong brand. So Intel could get away with having to sell larger chips at higher clockspeeds and more power consumption against AMD.
    But, although AMD now has their own ‘Pentium 4’, Intel still retains the lead in manufacturing and brand strength.
    
    The current comparison between 2600K and FX-8150 is distorted anyway, since the i7 2000-series is the successor to the mainstream 800-series, not the high-end 900-series. That is yet to come, in the form of Sandy Bridge-E. And another successor on 22 nm, Ivy Bridge, is not too far off either.
    So you are comparing AMD’s most high-end processor against a mainstream CPU from Intel (which is MUCH smaller, and not a true display of what the Sandy Bridge architecture is capable of).
  - k1net1cs says:
    
    October 14, 2011 at 1:41 pm
    
    Not surprisingly, you’d actually find AMD apologists who’d say BD isn’t meant to be a high-end parts. =)
    No, seriously, you would. =|
    
    In any case, some interesting links:
    
    http://www.kitguru.net/components/cpu/zardon/amd-fx-8150-black-edition-8-core-review-with-gigabyte-990fxa-ud7/
    Benchmark with a Gigabyte board.
    
    http://www.xtremesystems.org/forums/showthread.php?275873-AMD-FX-quot-Bulldozer-quot-Review-%284%29-!exclusive!-Excuse-for-1-Threaded-Perf
    A thread about how turning off cores in each module actually helps FX-8150’s performance a bit.
    If you could open it, that is; seems like it’s been hammered by visitors ever since.
  - Scali says:
    
    October 14, 2011 at 2:03 pm
    
    Yes, I’ve even had that discussion with Theo Valich of BSN*. He also said BD was not meant to be a high-end part. Well excuse me… but if you’re going to design an 8-core chip with 315 mm² die area, what the heck are you doing if it’s NOT supposed to be high-end? And if this is not high-end, then what is? I know they plan on glueing two dies together for the server-market (yes, the same trick that Intel used for Pentium D and Core2 Quad, which AMD considered ridiculous at the time), but that approach would be even more of a disaster on the desktop market than Zambezi already is. Very expensive chip to make, and very poor performance/watt ratio.
    A lot of irony in that name btw. Zambezi is a river in Africa. Denial is also a river in Africa 🙂
    Obviously just because Zambezi doesn’t *perform* like a high-end part doesn’t mean that it wasn’t *meant* to be one. If I were asked to design a mainstream part, I’d design it to have a mainstream die-size, not the largest chip we’ve seen in years. Mainstream is all about low prices, large volumes. Huge dies aren’t exactly good for chips with low prices, or high volumes for that matter.
    
    And yea, disabling every second core in a module is similar to disabling HT on an Intel CPU: You avoid situations where shared resources in the core/module get starved because of contention.
    Windows 7 has a HT-aware scheduler that already tries to avoid these situations, so you rarely see this problem on HT systems anymore. But there is no Bulldozer support, and as far as I heard, there won’t be a patch for Windows 7 either, so we have to wait for Windows 8 to get a proper Bulldozer scheduler.
    
    Ofcourse the obvious downside is this: Bulldozer is NOT an HT chip, it really is a large 8-core chip. So disabling half of it is a huge waste of transistors, unlike disabling HT.
    
    Doesn’t look like the Gigabyte board is better than the board that AMD supplied (Asus Crosshair). The Cinebench multithreaded score is lower than what Anand got. They didn’t post the single-threaded Cinebench score at Kitguru… seems like a biased thing to do.
NEWIMPROVED JDWII says:

October 16, 2011 at 6:29 am

“I hope AMD fanboys are smart enough to stand up to AMD for a change, and make it very VERY clear that AMD’s customers are not taking their crap any longer.”

Done! I’m sick and tired of AMD being so quiet and then they come out with a failure. I told JF-AMD this i emailed Amd this i said it on basically(troll) every forum I clicked on -1 on their youtube video saying how this is better then the 4 core wit HT from Intel. All i can say is i’m pissed about his launch and I’ll NEVER trust Amd to make a high-end CPU again. At least for awhile i still say they can patch some things up with BD. But its going to take more then a couple of stepping to fix this CPU.
I’m most likely the only AMD fanboy on this site who wants to show you some respect their getting mad at the wrong person(company) Its ok they will understand some day just like i did.

As for CMT on bulldozer it has nice gains but it takes so much more Die space to do.

“No, let’s not agree to disagree… We were in a discussion, you presented some points, I argued those, and now you completely change your stance.”

That was funny.

May i ask why are the FX processors so much money they should be much cheaper. 245$ is to much for the 8150 it needs to be 199.99$ at that price i would of gotten it. But at 245$(280$ on newegg) the 1100T is a much better deal and that’s why i got it and i’m waiting for PD and W8 i don’t care to have a Intel killer i really can care less i just want a competitive CPU or at least a Price/Performance king. and the 8150 is not competitive with the 2500K its competitive with the Phenom II x6 which is a nice cpu but that’s not the point. Was’t just the engineering department that messed this up it was also the marketing department that messed this up i trusted them because they called it the FX and i figured that meant at the very least competitive performance Boy was i wrong.

Oh yeah and i’ll be man enough to admit it you where right all along. Even though i wish you were wrong.

Reply
- Scali says:
  
  October 16, 2011 at 10:32 am
  
  I suppose AMD just has a huge problem with the FX. It’s a very large chip, so expensive to make (and on a relatively immature process, so yields are probably not that high yet… especially considering that they have to push clockspeed and TDP quite far to get to reasonable performance).
  An even more important factor is that they have been working on it for many years, so they have invested a lot in R&D. They will have to earn that money back somehow, so they’re going to have to sell these CPUs for as much as they can.
  So for the first time in years, AMD is not pricing their CPUs very competitively.
  I just wonder what AMD is going to do in the near future. Intel is going to launch a 2700K soon, which will probably bump down prices on their other CPUs. After that, Sandy Bridge-E will introduce new 6-core high-end CPUs, which might also reduce prices of other products… And then the 22 nm Ivy Bridge, again prices going down (or at least, they’ll introduce faster CPUs at the same price-point, so effectively price/performance goes up, hence all other CPUs need to drop in price to remain competitive).
  
  Reply
Pingback: AMD introduces Trinity: When is an improvement not an improvement? | Scali's blog
Pingback: AMD Bulldozer: It’s time to settle | Scali's OpenBlog™
Pingback: Yet more thoughts! | Scali's OpenBlog™

	OEM on MartyPC: PC emulation done…
	equipthering on An Amiga can’t do Wolfen…
	Mike Dawson on Running anything Remedy/Future…
	.NET Core: the small… on Migrating to .NET Core: the fu…
	Scali on Video playback on low-end MS-D…