The myth of CMT (Cluster-based Multithreading)

The first time I heard someone use the term ‘CMT’, I was somewhat surprised. Was there a different kind of CPU multithreading technology that I somehow missed? But when I looked it up, things became quite clear. If you google the term, you’ll mainly land on AMD marketing material, explaining ‘cluster-based multithreading’ (or sometimes also ‘clustered multithreading’):

This in itself is strange, as one page you will also find is this: http://dl.acm.org/citation.cfm?id=640477.640525

Triggered by the ever increasing advancements in processor and networking technology, a cluster of PCs connected by a high-speed network has become a viable and cost-effective platform for the execution of computation intensive parallel multithreaded applications.

So apparently the term ‘cluster-based multithreading’ has been used before AMD’s CMT, and is a lot less confusing: it just speaks of conventional clustering of PCs to build a virtual supercomputer.

So CMT is just an ‘invention’ by AMD’s marketing department. They invented a term that sounds close to SMT (Simultaneous Multithreading), in an attempt to compete with Intel’s HyperThreading. Now clearly,  HyperThreading is just a marketing-term as well, but it is Intel’s term for their implementation of SMT, which is a commonly accepted term for a multithreading approach in CPU design, and has been in use long before Intel implemented HyperThreading (IBM started researching it in 1968, to give you an idea of the historical perspective here).

Now the problem I have with CMT is that people are actually buying it. They seem to think that CMT is just as valid a technology as SMT. And worse, they think that the two are closely related, or even equivalent. As a result, they are comparing CMT with SMT in benchmarks, as I found in this Anandtech review a few days ago: http://www.anandtech.com/show/5279/the-opteron-6276-a-closer-look/6

AMD claimed more than once that Clustered Multi Threading (CMT) is a much more efficient way to crunch through server applications than Simultaneous Multi Threading (SMT), aka Hyper-Threading (HTT).

Now, I have a problem with comparisons like these… Let’s compare the benchmarked systems here: http://www.anandtech.com/show/5279/the-opteron-6276-a-closer-look/2

Okay, so all systems have two CPUs. So let’s look at the CPUs themselves:

  • Opteron 6276: 8-module/16-thread, which has two Bulldozer dies of 1.2B transistors each, total 2.4B transistors
  • Opteron 6220: 4-module/8-thread, one Bulldozer die of 1.2B transistors
  • Opteron 6174: 12-core/12-thread, which has two dies of 0.9B transistors each, total 1.8B transistors
  • Xeon X5650: 6-core/12-thread, 1.17B transistors

Now, it’s obvious where things go wrong here, by just looking at the transistorcount: The Opteron 6276 is more than twice as large as the Xeon. So how can you have a fair comparison of the merits of CMT vs SMT? If you throw twice as much hardware at the problem, it’s bound to be able to handle more threads better. The chip is already at an advantage anyway, since it can handle 16 simultaneous threads, where the Xeon can only handle 12.

But if we look at the actual benchmarks, we see that the reality is different: AMD actually NEEDS those two dies to keep up with Intel’s single die. And even then, Intel’s chip excels in keeping response times short. The new CMT-based Opterons are not all that convincing compared to the smaller, older Opteron 6174 either, which can handle only 12 threads instead of 16, and just uses vanilla SMP for multithreading.

Let’s inspect things even closer… What are we benchmarking here? A series of database scenarios, with MySQL and MSSQL. This is integer code. Well, that *is* interesting. Because, what exactly was it that CMT did? Oh yes, it didn’t do anything special for integers! Each module simply has two dedicated integer cores. It is the FPU that is shared between two threads inside a module. But we are not using it here. Well, lucky AMD, best case scenario for CMT.

But let’s put that in perspective… Let’s have a simplified look at the execution resources, looking at the integer ALUs in each CPU.

The Opteron 6276 with CMT disabled has:

  • 8 modules
  • 8 threads
  • 4 ALUs per module
  • 2 ALUs per thread (the ALUs can not be shared between threads, so disabling CMT disables half the threads, and as a result also half the ALUs)
  • 16 ALUs in total

With CMT enabled, this becomes:

  • 8 modules
  • 16 threads
  • 4 ALUs per module
  • 2 ALUs per thread
  • 32 ALUs in total

So nothing happens, really. Since CMT doesn’t share the ALUs, it works exactly the same as the usual SMP approach. So you would expect the same scaling, since the execution units are dedicated per thread anyway. Enabling CMT just gives you more threads.

The Xeon X5650 with SMT disabled has:

  • 6 cores
  • 6 threads
  • 3 ALUs per core
  • 3 ALUs per thread
  • 18 ALUs in total

With SMT enabled, this becomes:

  • 6 cores
  • 12 threads
  • 3 ALUs per core
  • 3 ALUs per 2 threads, effectively ~1.5 ALUs per thread
  • 18 ALUs in total

So here the difference between CMT and SMT becomes quite clear: With single-threading, each thread has more ALUs with SMT than with CMT. With multithreading, each thread has less ALUs (effectively) than CMT.

And that’s why SMT works, and CMT doesn’t: AMD’s previous CPUs also had 3 ALUs per thread. But in order to reduce the size of the modules, AMD chose to use only 2 ALUs per thread now. It is a case of cutting off one’s nose to spite their face: CMT is struggling in single-threaded scenario’s, compared to both the previous-generation Opterons and the Xeons.

At the same time, CMT is not actually saving a lot of die-space: There are 4 ALUs in a module in total. Yes, obviously, when you have more resources for two threads inside a module, and the single-threaded performance is poor anyway, one would expect it to scale better than SMT.

But what does CMT bring, effectively? Nothing. Their chips are much larger than the competition’s, or even their own previous generation. And since the Xeon is so much better with single-threaded performance, it can stay ahead in heavy multithreaded scenario’s, despite the fact that SMT does not scale as well as CMT or SMP. But the real advantage that SMT brings is that it is a very efficient solution: it takes up very little die-space. Intel could do the same as AMD does, and put two dies in a single package. But that would result in a chip with 12 cores, running 24 threads, and it would absolutely devour AMD’s CMT in terms of performance.

So I’m not sure where AMD thinks that CMT is ‘more efficient’, since they need a much larger chip, which also consumes more power, to get the same performance as a Xeon, which is not even a high-end model. The Opteron 6276 tested by Anandtech is the top of the line. The Xeon X5650 on the other hand is a midrange model clocked at 2.66 GHz. The top model of that series is the X5690, clocked at 3.46 GHz. Which shows another advantage of smaller chips: better clockspeed scaling.

So, let’s not pretend that CMT is a valid technology, comparable to SMT. Let’s just treat it as what it is: a hollow marketing term. I don’t take CMT seriously, or people who try to use the term in a serious context, for that matter.

This entry was posted in Hardware news and tagged , , , , , , . Bookmark the permalink.

56 Responses to The myth of CMT (Cluster-based Multithreading)

  1. NewImprovedjdwii says:

    Simple, They want to be like HP/Apple/Nintendo and that’s be different, Now i will say SMT usually scales around 20-30% where CMT can be 55-80%, But i will agree wiith you and say its harder to do since its a bigger die and it just means Amd doesn’t make as much money as Intel.

  2. Pingback: AMD Steamroller | Scali's OpenBlog™

  3. Pingback: Anonymous

  4. Pingback: AMD's New High Performance Processor Cores Coming Sometime in 2015 - Giving Up on Modular Architecture

  5. Pingback: AMD Confirms Development of High-Performance x86 Core With Completely New Architecture

  6. Pingback: AMD’s New High Performance Processor Cores Coming Sometime in 2015 … « Reviews Technology

  7. Ventisca says:

    so you’re saying that AMD’s CMT is nothing but marketing gimmick?
    I’m no expert, but after reading your article, (maybe) I have a similar opinion. :D
    The module is actually two core, but just under one instruction fetch and decode. So what’s AMD done is not same level of technology of SMT, instead, they just do more thread in more core.
    The new-ish AMD’s core architecture, Steamroller, split the decode unit for each core in the module so each module has 2 instruction decoder, so it’s clear that that they are actually two “separated” core.

    • Randoms says:

      They are still sharing the branch predictor, fetch and the SIMD cluster.

      So it is still need to separated cores. It is a step backwards from the original CMT design, but is it still a CMT design.

  8. Pingback: AMD FX Series Making a Comeback Within Two Years - APU 14 Conference Reveals Future Roadmaps

  9. Pingback: F.A.Q pertanyaan yang sering diajukan tentang Arsitektur AMD CMT yang ada di AMD APU dan FX - SutamatamasuSutamatamasu

  10. Lionel Alva says:

    Would you know of any tenable alternatives to SMT then?

    • Scali says:

      Well no… There is no alternative. Why should there be an alternative? That’s like asking “What is an alternative to cache?” or “What is an alternative to pipelining instructions?”
      There are no alternatives, they are just techniques to improve performance in a CPU design.

  11. Scali says:

    Yay, gets posted on Reddit for the umpteenth time… Cognitive dissonance ensues with posters there, trying to discredit this piece hard… with far-fetched and non-sensical arguments (actually going against the AMD marketing material that I put directly on here. If you have to argue against AMD’s own marketing material in order to discredit my article, you know you’ve completely lost it)… But nobody is man enough to comment here.
    The reason I can’t wait for AMD going bankrupt is that it is hopefully the end of these AMD fanboys.
    I am tired of their endless insults and backstabbing.

    • UIGoWild says:

      Do you think the cpu market will be better without cometition? It doesn’t take a marketing degree to understand that without competition, prices would sky-rocket and innovation would go slower. Now I guess you’re thinking that I’m an AMD fan and all that, but that just childish. I’m not trying to defends people who insluted you, being a fanboy of a company and never thinking twice is not clever at all.

      Although, by saying:

      The reason I can’t wait for AMD going bankrupt is that it is hopefully the end of these AMD fanboys.

      You kinda show that you’re just the opposite. An “Anti-AMD”. Thats not better than a fan boy. I hope AMD will get better and that we’ll see a real competition now that they announced that they’re going for SMT, not because I’m a AMD fan, but because I want the best for the customers.

      • Scali says:

        Do you think the cpu market will be better without cometition?

        This is the fallacy known as a ‘leading question’.

        It doesn’t take a marketing degree to understand that without competition, prices would sky-rocket and innovation would go slower.

        This is the fallacy known as ‘slippery slope’.

        You kinda show that you’re just the opposite. An “Anti-AMD”. Thats not better than a fan boy.

        Nice try, but I’m anti-fanboy, not anti-AMD.

        Anyway, if you take a glimpse at reality for a moment, you’ll see that we’ve effectively been without any real competition for many years in the CPU-market. Prices didn’t exactly skyrocket so far, and innovation didn’t exactly slow down. What we do see is that innovation has moved into other areas than just CPU-performance at all cost (such as the breakneck GHz-race in the Pentium3/4-era, which customers didn’t exactly benefit from. They received poor, immature products with a tendency to overheat, become unstable or just break down, from both sides).
        Currently there’s innovation in things like better power-efficiency, Intel scaling down their x86 architectures to also move into tablet/smartphone/embedded markets, and more focus on graphics acceleration and features (for the first time ever, Intel is actually the leader in terms of GPU features, with the most complete DX12 GPUs on the market).

      • UIGoWild says:

        Okay. Lets say I haven’t been perfectly clear. And yeah my comment may have looked like a attack or something, but I was just thinking that you were at risk to ruin your credibility by saying that you wished for AMD to go bankrupt.

        You said:
        Nice try, but I’m anti-fanboy, not anti-AMD.

        So okay, I might have been reacting a bit too quickly. Actually, I totally agree with you on that point. Being a fanboy of a company, any company, is not a clever choice. But I still hold to my point: I would rather keep AMD in the race just to be sure there’s a “tangible” competitor to intel (or nvidia for that matter). I would be saying the same thing if Intel was the one lagging behind. I may be pessimistic, but I don’t like the idea of having only one company holding more than 70% of a market. (Which is already a huge chunk and the actual share of intel at the moment [ps. don’t quote me on that but I’m pretty its close to that].)

        And even though the competition over performance wasn’t really strong (its been forever since AMD was close to Intel), I still think that this competition was good for the customers in the end.

      • Klimax says:

        @UIGoWild
        You are still massively wrong. There is still competition. It is called older Intel’s chips. If there are no improvements and price higher then the only sold new chips will be replacements and trickle of new computers. And massive second hand market. There for no price change is to be expected. Look up monopoly pricing. It is not what you think it is. Not even remotely.

  12. Justin Ayers says:

    “There is still competition. It is called older Intel’s chips.” But the key you’re missing is that competition between businesses is essential.

    • Klimax says:

      Not necessary for some markets. Like CPU market. Because even five years old chips can be good enough for many people, they form effective competition to new chips since potential buyers don’t have pressing need to upgrade them and if new chips were substantially more expensive then even new buyers can skip them and get old chips.

      One of reasons why monopoly are not illegal, only abuse of dominant/monopoly position is. And you forgot that we are already there. AMD ceased to be competitor to Intel about four to six years ago.

      • HowDoMagnetsWork says:

        Let’s assume that Intel actually will end up increasing their prices, believing they’d make more money. Then customers buy more older chips. Years pass, barely any new Intel CPUs are bought, most of the old ones are out of stock. What now? If AMD is in the race, people switch to AMD, even if their devices are half as good as Intel’s. If AMD is not in the race, customers will be forced to pay Intel tremendous prices or just not use their products. Of course, if the company is full of good people, they would never do that, rendering competition useless. But what company is full of good people? Competition is very important for any market.

      • Scali says:

        People aren’t forced to buy new CPUs. CPUs don’t really break down or wear out (in case you missed it, earlier this year, I was part of the team that released 8088 MPH, a demo that runs on the original IBM PC 5150 from 1981. We used plenty of original 80s PCs during development, with their original 8088 CPUs, and they still worked fine, 30+ years after they were made).
        There’s no point in buying older chips if you already have an older chip like that.
        Likewise, performance-per-dollar is a delicate balance. If Intel makes their CPUs too expensive, people simply will not upgrade, because they cannot justify the cost vs the extra performance (perhaps you youngsters don’t know this, but in the good old days when Intel was the only x86-supplier, it often took many years for a new CPU architecture to become mainstream. For example, the 386 was introduced in 1985, but didn’t become mainstream until around 1990. It was just too expensive for most people, so they bought 8088/286 systems instead).

        This means that Intel is always competing against itself, and has only limited room for increasing prices. At the same time they constantly need to improve performance at least a little, to keep upgrades attractive enough.
        If they don’t, they will price themselves out of their own market. If people don’t buy new CPUs, Intel has no income. Which is obviously a scenario that Intel needs to avoid at all costs.

        AMD is really completely irrelevant in most of today’s market already, because their fastest CPUs can barely keep up with mainstream Intel CPUs of a few generations ago. A lot of people have already upgraded to these CPUs or better, and have no interest in getting an AMD CPU at all, even if AMD would give them away for free.
        So we’ve already had the scenario of Intel competing against its older products for many years now. Not much will change if AMD disappears completely.

        It seems a lot of AMD fanboys think that the whole CPU market is in the sub-$200 price bracket where AMD operates. In reality most of it is above that.

  13. Reality Cop says:

    Scali, you’re damn blind. In those “good old days when Intel was the only x86 supplier”:

    1. x86 wasn’t the only option. You had PCs built with MOS, Motorola, and Zilog CPUs all over the place. You had Sun SPARC workstations.

    2. Intel was NOT the only x86 supplier. AMD, NEC, TI, and other were making x86 clones before 1990.

    • Scali says:

      Oh really now?

      1. x86 wasn’t the only option. You had PCs built with MOS, Motorola, and Zilog CPUs all over the place. You had Sun SPARC workstations.

      You think I didn’t know that? I suggest you read some of my Just keeping it real articles. You could have figured it out anyway, since I explicitly said ‘x86 supplier’.

      2. Intel was NOT the only x86 supplier. AMD, NEC, TI, and other were making x86 clones before 1990.

      They were not clones, they were ‘second source’. These fabs made CPUs of Intel’s design, commissioned by Intel. That’s like saying TSMC makes ‘Radeon and GeForce clones’ because they build the actual GPUs that nVidia and AMD design.
      For all intents and purposes, these second source CPUs are Intel CPUs. Intel was the only one designing the x86 CPUs, even if other fabs also manufactured them (which was the point in that context anyway).

      What is your point?

      • k1net1cs says:

        “What is your point?”

        Likely trying to look overly smart.
        At least he tried…but IGN said “6/10 for looking up Wikipedia”.

        Funny how a “Reality Cop” who tried to call you out has to be directed to a collection of articles titled “Just Keeping It Real” for actual, real info on what you’ve done.

  14. OrgblanDemiser says:

    Sooo… who care if AMD continue to exists? Does it hurts anyone? Personally as long as my computer works fine and don’t cost me too much I’m happy with that.

    • Scali says:

      It’s mostly AMD’s marketing and its fanboy following, which distort the truth, misleading/hurting customers.

      • OrgblanDemiser says:

        True. But isn’t it the case with most companies nowaday? I mean, just looking at some HDMI cables boxes make me laugh sometimes. (i.e High speed 1080P ready, Gold plated and such.) Internet providers displaying the speeds in Mega bits instead of Mega bytes. Apple showcasing a good old tablet pen, calling it an “innovation”. (I’ll be careful and not going to extrapolate on this.) And to be topical with recent news: (“recent”) Volkswagen. (No need to add more :P)

        At this point it seems like the customer is taken for a fool at every corner. Fanboy or not, I guess you have to be careful and seek the truth backed by facts and not by advertisement money.

        So again, with AMD, I think people have to admit that when you buy their chips. You buy sub par components. For budget builds I agree the price might be a valuable argument, but its sub par nonetheless.

      • Scali says:

        Fanboy or not, I guess you have to be careful and seek the truth backed by facts and not by advertisement money.

        That is what this blog is here for.

      • semitope says:

        “That is what this blog is here for.”

        hahahhahaha

        you can’t be serious with that line. You know you are always bashing AMD. Really strange but you are a hateboy.

      • Scali says:

        But I am serious. Thing is, AMD has far more dubious marketing than most other companies (I mean take the recent HBM scam… you can’t really be defending that nonsense can you?), so I don’t cover Intel and nVidia as often. They do come along every now and then. You know what’s strange? AMD is just a marginal player in the CPU and GPU market these days, with < 20% marketshare in both arenas… So they aren't selling a whole lot of products compared to Intel and nVidia. Yet there are so many people always thrashing me and my AMD-related blogs (and only those blogs, the other blogs don’t receive such thrash-posts at all). It's amazing how rabid the following is of such a small and meaningless company.

      • semitope says:

        AMD doesn’t have dubious marketing, at best they have weak marketing. Dubious marketing is lying about 970 specs so it doesn’t look weaker than its competition. Dubious marketing is gameworks etc. What AMD does is at best a little misunderstanding here and poor statements there. Like overclockers dream, when they likely meant it can take tons of power and has watercooling.

        You are just extremely biased against them. There aren’t many people posting against you here and I just ended up back here after doing a search. Yet you think they are rabid, never mind the mountains of ignorant nvidia consumers who get crapped on at every turn. But you completely ignore what nvidia does to its consumers.

      • Scali says:

        Sounds like you don’t know what dubious is, and what isn’t. The claims about HBM’s bandwidth compensating for the lower memory capacity is an outright lie. GameWorks is not dubious at all. It does exactly as it says on the tin: it offers added value for nVidia hardware.
        If anything, the 970 specs were a misunderstanding/poor statement. nVidia responded by explaining how the 970 uses its 4 GB of memory in detail, so that is cleared up.

        I am not biased at all, but your statements clearly are. You are only proving my point further with posts such as this one.

      • semitope says:

        again, if its a lie fury x should not keep up with 980ti when vram becomes very important.

        Instead of claiming its a lie, why not figure out how the hell the lie seems true?

      • Scali says:

        But it isn’t true. There’s plenty of evidence of Fury X performance tanking when the vram-wall is hit.
        This review for example measures average frame times and such: http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/14
        As you can tell from their charts, and their conclusion, in certain memory-heavy games, there are spikes in the framerate, and it is not as smooth in 4k as the GeForce cards, which have more memory.
        Here is another review that shows similar data: http://hexus.net/tech/reviews/graphics/84170-amd-radeon-r9-fury-x/?page=12
        As they also say, it isn’t as smooth.
        Here is a third review, concluding the same: http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/22
        And an entire article investigating the issue here: http://www.extremetech.com/gaming/213069-is-4gb-of-vram-enough-amds-fury-x-faces-off-with-nvidias-gtx-980-ti-titan-x/2

      • semitope says:

        That looks like a mix.

        eg fury x renders more frames under 25ms than 980ti (33% vs 16%) even though slowest 1% (ONE PERCENT) takes longer on average. The witcher 3 results are not bad. Not confirmed to be VRAM related.

        Don’t see point of techreport link. Results aren’t bad

        For the extreme tech link the problem with assuming its HBM is that the same difference is seen at other resolutions. The issue exists for all the resolutions. It simply gets worse because the resolution and demand is higher.

        Anandtech assumes its due to HBM size, but at 1440 the same trend persists. Should we assume the fury x has more spikes due the game and driver or jump to assuming the hbm is an issue even though it is not an issue during most of the test. Also, when were the frame dips? during frame transitions?

      • Scali says:

        That’s the difference between you and me. I write my own graphics engines, and I can easily create scenarios that use a lot of memory, and benchmark them. I don’t need to rely on games to do that (games whose code AMD and NVidia have also analysed, and created driver ‘optimizations’ for, so you’re never sure what you’re testing exactly anyway). Even so, it is clear that everyone concludes that the AMD hardware doesn’t run as smoothly. So you will have to accept that as fact, even if you want to continue being in denial about 4 GB being the reason (what else could it possibly be?).

        “Drivers” would be a poor reason obviously, since Fury is not a new architecture. It’s the same GCN 1.2 they’ve been using for a few years now, so drivers should be quite mature.

      • semitope says:

        Are you an unreal engine developer?

      • semitope says:

        I was taking dubious to mean something worse. Its really a meaningless word in the capacity being used. Almost all companies concerned are very guilty of this.

        I dont get how you can defend nvidia, yet bash AMD for minor things. What huddy said could be a simple way of explaining what their engineers are doing with HBM. The important thing I remember is they said they had engineers specifically assigned to make the memory limitation less of an issue.

        yet here you are claiming it was a lie rather than realizing it works and trying to figure out what he was really saying.

      • Scali says:

        I am not defending nVidia. Difference is, nVidia admitted that they had published the wrong specs for 970, and explained how it worked. AMD doesn’t admit their lies, they just keep piling on new lies time and time again.

        And please, don’t try defending Huddy. He’s just a clueless marketing guy. I am a graphics developer with decades of experience in the field. I know the ins and outs of CPUs, GPUs and APIs, and what he says simpl is not true, for technical reasons I have already explained earlier. No point in further discussion.
        So stop wasting your time.

      • semitope says:

        Huddy is not a clueless marketing guy. He has technical experience.

        Nvidia only spoke about the 970 issue when it was found out. lets take a guess if they would if nobody pointed it out.

        What lies should AMD come clean on? Odds are they are just perceived lies on your part.

      • Scali says:

        Really? There are a number of lies documented on this blog, which AMD has never come clean on.
        Eg, claims of Barcelona being 40% faster than any other CPU on the market at launch. Or Bulldozer having more than 17% higher IPC than Barcelona.
        Then there’s the tessellation issues.
        And what about all those claims about Mantle? Being an open standard, being a console API and whatever else.
        And now HBM.
        It has been proven on all counts that AMD’s claims were false. I don’t “perceive lies”, I am an expert in the field.
        Huddy understands as little about technology as the average fanboy. He’s proven that much. He even commented on some of my blogs personally, but wasn’t able to have a technical discussion. He just threw insults and threats around.
        But you probably understand even less about technology than he does, if you believe his crap. I suggest you talk to someone who actually has a clue. You’ll see that nobody with a clue will be able to contest anything I write on this blog on a technical level. Everything is 100% true and verified. I stand by that. Feel free to try and prove me wrong, but you’ll have to do that with technical arguments and proper evidence. I see neither.

      • semitope says:

        Barcelona looked like it ran into major bugs and severely cut the clock speed and some silicon. With Barcelona and Bulldozer there is always the question of “in what?” a single number won’t represent all cases of comparisons. if its 40% faster in anything and 17% faster in anything, its not a lie.

        Some of the claims you respond to are from users.

        eg. mantle being a console API? who said that?

        Mantle was never released. APPARENTLY things can change. Mantle is gone to vulkan which is good enough. I really do not get that complaint. They told us their plans, their plans changed and mantle went into vulkan. What is the issue? If dx12 wasn’t what it is they probably would have put out mantle IMO, they decided to back dx12 instead iirc.

        Not sure what tesselation you are talking about but users see it in their own gaming so I don’t know how its AMD lying about something. If users see no benefit to going over 16x tess yet suffer huge performance loss, then that’s that. Even disregarding AMD, its not to our benefit. AMD’s complaint is that using that much tessellation is overdoing it, gamers complain about the same thing. Even nvidia’s gamers complain about it. What is the issue?

        Already responded to your HBM claim. Huddy did not say that its because of faster swap to system RAM, he said the working set can be kept in HBM while swapping with system RAM doesn’t get in the way of the GPU. What you were responded to was likely some interpretation you heard from lay people.

        Go ahead and link to huddy’s comments, with proof it was huddy. From his education and history he seems technically competent. And you are clearly biased far more than he could be considering he has worked for AMD’s competition after working for AMD. Not sure why you feel you need to belittle the guy. He has more experience in these things that you. being a graphics designer means next to nothing when he’s involved with much more than you. https://www.linkedin.com/in/richardhuddy

        I would understand if he chose to not spend much time with you. You do not see your own bias but its very obvious and he would be wasting his time.

        I realize you are just ultra sensitive to anything AMD does.

      • Scali says:

        So you’re still in denial? Well, it’s your choice to be ignorant.
        I see no point in discussing these issues with you again. Everything is already explained in the blog posts. Your excuses are pathetic in light of all the information already presented.

        Linking to Huddy’s comments is simple:
        https://scalibq.wordpress.com/2012/05/03/richard-huddy-comments-on-my-blog/
        https://scalibq.wordpress.com/2012/05/09/richard-huddy-responds-again/

        Feel free to email him to ask if it was really him. I did.
        Note that Andrew Copland (also a game developer, look him up in LinkedIn if you like) also asks the same questions… but Huddy is unable to answer anything of a technical nature.

        Note also that at the time there was no information about Mantle yet… but even so, everything said back then turned out to be true. We’re not going to drop DirectX, and we’re not going to drop hardware abstraction. Even Mantle, being specific to the GCN-architecture, had some level of abstraction, instead of just programming the GPU directly.
        As you see, neither Johan Andersson, Michael Gluck, Andrew Copland, nor myself were expecting to drop the API. We’re all graphics developers. Huddy was the only one claiming something different, and he was wrong.

        Also, see the link already in the HBM article: https://www.reddit.com/r/Amd/comments/3xn0zf/fury_xs_4gb_hbm_exceeds_capabilities_of_8gb_or/
        As Huddy is quoted there: 4GB HBM “exceeds capabilities of 8GB or 12GB”.
        So he literally says that 4 GB of HBM is better than 4 GB, or even 8 to 12 GB of GDDR5. Since the only difference between HBM and GDDR5 is bandwidth, he attributes more bandwidth to be a substitute for 8 or 12 GB.
        It’s all there, you’re just trying hard to remain in denial.

      • semitope says:

        I just looked through the huddy stuff. In the post you said he replied to you were working off an assumption he wanted there to be no api. You claim this is dumb, but why do you assume he wouldn’t know an api is necessary and just jump to assuming he must really mean there should be no api? Then someone corrects you (basically destroys your entire post) by pointing out huddy was not the one asking for the api to go away, and I would assume developers did not mean for the api to literally go away, but to get less in the way.

        The fact that you missed something so obvious in the interview. VERY obvious in the interview and decided to bash huddy for it should clue you in to your bias. The text you linked to says

        “Huddy says that one of the most common requests he gets from game developers is: ‘Make the API go away.'”

        not hard to know where the statement comes from. You should not have titled the post as “Richard Huddy talks nonsense again”

        I see huddy points out it wasn’t his opinion. You still attack the guy even though there’s no error in what he is saying. Some developers said that. end of story.

        You do not agree with those developers apparently, but why take issue with huddy?

        Coplands comment was misguided as well. He should email Andersson and ask Huddy for other developers who shared the sentiment so he can have his question answered. I suspect he made that comment because you made it seem like these were Huddy’s comments. In the very blog post the only thing huddy said was repi gets it. The larger quote was from andersson himself.

        Also, developers would not be forced into anything. Just because lower level access might be possible, does not mean it has to be used. eg. iirc naughty dog did really low level stuff for uncharted games on ps3, doesn’t mean other developers had to get that hardcore. Simply having the capability to do it might be what some developers want.

        Huddy wasn’t claiming the API needed to go away, he was saying developers said that. AND a reasonable assumption would be that the developers do not mean it should go away but say that in the sense that it should get OUT of the way because dx11 might have been a problem for them.

        again, hypersensitive

        Exceeds capabilities, not capacity. I assumed it was estimates and stretching it with up to 12GB, but it depends on how VRAM works and is used by the GPU. If you’re going to store more than 4GB of data to be used right away then probably no. But clearly their claim is along the lines that GDDR5 has a lot of inefficiencies and the way HBM works with their optimizations gets rid of some of them.

        I doubt any gddr5 GPU would be able to process that much at the same time anyway.

      • Scali says:

        “Destroys your post”… heh, not quite. Anyway, I already point to that ‘other developer’, that’s Johan Andersson of DICE (repi), and I quoted him verbatim. He does NOT ask for the API to go away, that much is clear from his quote. Huddy misrepresented Andersson’s statement. I even contacted Andersson himself about it, but he did not want to side with Huddy on this. I think we’re done here anyway. You can’t even keep track of all the things that have already been discussed, and your posts aren’t adding any value. It’s just noise.

        Stupid nonsense about GDDR5 vs HBM. As already stated, it’s just memory. The only difference is bandwidth. Memory management is done in software, not in GDDR5 or HBM itself. So nothing changes there. AMD basically claims that they can suddenly do much more efficient memory management now that they have HBM. Which is a load of BS.
        Besides, as already said… when you run out of vram, the memory bottleneck is the PCI-e interface to system memory. This is orders of magnitude slower than GDDR5 or HBM, making the vram technology completely irrelevant for performance in this case.
        I mean, how little do you understand about computers?
        If you want to copy from A to B, and your access speed from A is lower than that to B, you will never be able to copy faster than the speed of A.
        Look it up for yourself. Even a Skylake with overclocked DDR4 can do only about 50 GB/s max: http://www.legitreviews.com/ddr4-memory-scaling-intel-z170-finding-the-best-ddr4-memory-kit-speed_170340/2
        Even a mainstream GDDR5-based card such as the GTX950 already has twice that: http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-950/specifications
        A GTX 980Ti or Titan X’s memory is more than three times as fast still.
        So if the memory you’re copying from is so much slower than your GDDR5 or HBM, how is HBM going to make a difference? It isn’t. The bottleneck is on the other side.

      • semitope says:

        Did Huddy even link to that statement from andersson? Anyway, you are taking a negative interpretation as usual. first assuming huddy must be an idiot for saying something he didn’t then when you find out he didn’t you try to claim he must be lying that developers said that, and even if they did, they didn’t.

        A sensible interpretation would be they say they want the API out of the way. w.e.

        Andersson did not want to side with Huddy or did he just not bother replying to you? I am sure those two speak often enough. I doubt repi is the type to waste his time on someone like you.

        The only difference between HBM and GDDR5 is bandwidth? and you claim huddy says dumb things…
        Between the internal operation of HBM, the memory controller, Connection to GPU and their own software optimizations, maybe they can do more.

        Yes when you “run out of VRAM” i.e. when what you have in VRAM is not what the GPU needs, you have an issue. Nobody denies that I think.

        Maybe message AMDs engineers for an explanation of how their use of HBM differs from GDDR5s typical usage.

      • Scali says:

        What the hell do you want anyway?
        It’s crystal-clear: Andersson simply spoke out that he would have liked a more low-level interface to the hardware, like on consoles (but not direct hardware access without any kind of abstraction layer).
        Huddy misinterpreted that, and in the bit-tech article you can clearly see him using the words “drop the API”, and other parts of his story also indicate that he is pushing for direct hardware access. That is NOT what developers were talking about. They know the downside of direct hardware access, and they know it’s never going to work on a heterogeneous platform such as the PC.

        And yes, the only difference between HBM and GDDR5 is indeed bandwidth, at least as far as the rest of the system is concerned. The internal operation of HBM, the memory controller and connection to the GPU aren’t relevant. The rest of the system doesn’t see this, and it isn’t relevant. The net result of these differences is just higher bandwidth for the system.
        “Software optimizations”… that’s nonsense of course. You can perform the same optimizations for any type of memory, and these have been done for years already.

        Yes when you “run out of VRAM” i.e. when what you have in VRAM is not what the GPU needs, you have an issue. Nobody denies that I think.

        You are. Because you’re the one arguing that 4 GB suddenly isn’t 4 GB when it’s HBM. So ‘running out of VRAM’ is somehow different when you have HBM?
        In the real world, 4 GB is 4 GB, regardless of what memory technology or speed. It fits exactly 4 GB. So you always run out of memory at the exact same point, namely at the 4 GB threshold.

        Why would I need to message AMD engineers? I already know it isn’t going to work. Magic doesn’t exist. Besides, if there was some kind of magic to it, then it would be Huddy’s job to talk to the engineers, and make some kind of press release about this magic. Instead, we got smoke and mirrors… and cards that clearly exhibit spiky performance in memory-hungry games.

      • semitope says:

        This is just too weird. Let me get this straight, your main objection to dropping the api etc is just that it would be difficult and therefore no developer would want it? It would be tough so Huddy must be an idiot for even talking about that kind of situation? No way Andersson would ever want that kind of access to a GPU? You’re using your personal opinion to call him and possibly any developer who actually did voice these views an idiot. You try to say andersson didn’t because you do not want to look a fool for calling someone like him an idiot. Why excuse him for his statement by claiming its an ideal and not anyone else? What if he really wanted that situation to come about?

        So what if the rest of the system does not see it? It’s still a factor and one the software could exploit in a way not possible with gddr5.

        Not claiming 4GB isn’t 4GB, I am saying how you use 4GB of HBM can be different from how you use 8GB GDDR5.

        Your evidence for issues with high memory usage was dubious.
        You would ask the engineers because you you do not know. You are just brushing every thing aside and pretend hbm is gddr5. You did no research before going off on your biased witch hunt, most of which would never happen (probably all) if nvidia was the one making the case for 4GB HBM etc.

      • Scali says:

        STFU and RTFA, I have explained in great detail why an abstraction layer is required.

        Also, I *am* an engineer, unlike Huddy. I don’t need to ask AMD’s engineers, I know everything they do… and by the looks of it, a considerable deal more, given the fact that I pointed out in great detail that Bulldozer wasn’t going to work in practice, more than a year before they had the actual CPU on the market.

      • semitope says:

        You explained why its preferred iirc. Its not a requirement and you cannot say just because you think it should be so, all developers would want it so.

        AFAIK bulldozer CPUs work in practice. Otherwise AMD would be in a lot more trouble after selling CPUs that wouldn’t turn on.

        Sorry if you have been involved in GDDR5 and HBM development and know the tech in detail, thought you didn’t.

      • Scali says:

        So you don’t get it. TL;DR: If you don’t use an abstraction layer, your code will only work on the exact hardware you’ve targeted. You lose any kind of backward and forward compatibility. I suggest you read up on 8088 MPH and just how picky it is in regards to hardware (and you thought PC compatible meant compatible…). Acceptable for a demoscene prod, but not for any games or other commercial software.

      • semitope says:

        What I am curious about is what position you take on this:

  15. aron says:

    talking about used market is pretty short sighted. I mean think about this. The growth of people using computers and business using computers, if new processors that come out don’t compete with used ones that is going to drive the price of old processors up. it won’t drop prices though of new processors. even so everyone cant go used, eventually those would be gone. either that or we would be using really really shitty processors that don’t support modern stuff (such as ddr3/4 ram, pci slots even for graphics cards, (want to start digging around for those old agp cards?) motherboards, it would have an adverse effect on the other lines of production. who would purchase ddr4 memory if the only cpu they can find is an intel q6600 or an old amd x64. either that or in order for those companies to remain in business we would have to go back to older technologies.

    • Scali says:

      Most people don’t buy processors, they buy complete systems. You won’t be able to buy a new system with a used processor. And especially for business users, used systems aren’t an option. They need a reliable system with a proper service contract. So new is the only option.

  16. Stashix says:

    Firstly thanks for the blog, it makes for some interesting reading.
    I would be interested in your take on the whole Gameworks deal, especially since it seems to me there are a lot of unsubstantiated claims circulating around.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s