FutureMark’s Time Spy: some people still don’t get it

Today I read a review of AMD’s new Radeon RX470 on Tweakers.net, by Jelle Stuip. He used Time Spy as a benchmark, and added the following description:

About 3DMark Time Spy has recently been some controversy, because it is found that the benchmark does not use vendor specific code paths, but a generic code path used for all hardware. Futuremark see that as a plus; it therefore does not matter which video card you test, because when turning Time Spy they all follow the same code path, so you can make fair comparisons. That also means that there is no specific optimization for AMD GPUs and AMD’s implementation of asynchronous compute is not fully exploited. In games that do use can be the relationship between AMD and Nvidia GPUs also different from the Time Spy benchmark represents.

Sorry Jelle, but you don’t get it.

Indeed, Time Spy does not use vendor specific code paths. However, ‘vendor’ is a misnomer here anyway. I mean, you can write a path specific for AMD or NVidia’s current GPU architecture, but that is no guarantee that it is going to be any good on architectures from the past, or architectures from the future. You need to write architecture-specific paths, is what people of the “vendor specific code path”-school of thought really mean. In this case, it is not just the microarchitecture itself, but the actual configuration of the video card has a direct effect on how async compute code performs as well (balance between number of shaders, shader performance and memory bandwidth and such factors).

However, in practice that is not going to happen of course, because that means that games have to receive updates to their code indefinitely, for each new videocard that arrives, until the end of time. So in practice your shiny new videocard will not have any specific paths for yesterday’s games either.

But, more importantly… He completely misinterprets the results of Time Spy. Yes, there is less of a difference between Pascal and Polaris than in most games/benchmarks using Async Compute. However, the reason for that is obvious: Currently Time Spy is the only piece of software where Async Compute is enabled on nVidia devices *and* results in a performance gain. The performance gains on AMD hardware are as expected (around 10-15%). However, since nVidia hardware now benefits from this feature as well, the difference between AMD and nVidia hardware is smaller than in other async compute scenarios.

Important to note also is that both nVidia and AMD are part of FutureMark’s Benchmark Development Program: http://www.futuremark.com/business/benchmark-development-program

As such, both vendors have been actively involved in the development process, had access to the source code throughout the development of the benchmark, and have actively worked with FutureMark on designing the tests and optimizing the code for their hardware. If anything, Time Spy might not be representative of games because it is actually MORE fair than various games out there, which are skewed towards one vendor.

So not only does Time Spy exploit async compute very well on AMD hardware (as AMD themselves attest to here: http://radeon.com/radeon-wins-3dmark-dx12/), but Time Spy *also* exploits async compute well on nVidia hardware. Most other async compute games/benchmarks were optimized by/for AMD hardware alone, and as such do not represent how nVidia hardware would perform with this feature, since it is not even enabled in the first place. We will probably see more games that benefit as much as Time Spy does on nVidia hardware, once they start optimizing for the Pascal architecture as well. And once that happens, we can judge how well Time Spy has predicted the performance. Currently, DX12/Vulkan titles are still too much of a vendor-specific mess to draw any fair conclusions (eg. AoTS and Hitman are AMD-sponsored, ROTR is nVidia-sponsored, DOOM Vulkan doesn’t have async compute enabled for nVidia (yet?), and uses AMD-specific shader extensions).

Too bad, Jelle. Next time, please try to do some research on the topic, and get your facts straight.

Posted in Direct3D, Hardware news, Software development, Software news, Vulkan | Tagged , , , , , , , , , , , | 5 Comments

GeForce GTX1060: nVidia brings Pascal to the masses

Right, we can be short about the GTX1060… It does exactly what you’d expect: it scales down Pascal as we know it from the GTX1070 and GTX1070 to a smaller, cheaper chip, aiming at the mainstream market. The card is functionally exactly the same, apart from missing a SLI connector.

But let’s compare it to the competition, the RX480. And as this is a technical blog, I will disregard price. Instead, I will concentrate on the technical features and specs.

Die size: 230 mm²
Process: GloFo 14 nm FinFET
Transistor count: 5.7 billion
Memory bandwidth: 256 GB/s
Memory bus: 256-bit
Memory size: 4/8 GB
TDP: 150W
DirectX Feature level: 12_0

Die size: 200 mm²
Process: TSMC 16 nm FinFET
Transistor count: 4.4 billion
Memory bandwidth: 192 GB/s
Memory bus: 192-bit
Memory size: 6 GB
TDP: 120W
DirectX Feature level: 12_1

And well, if we would just go by these numbers, then the Radeon RX480 looks like a sure winner. On paper it all looks very strong. You’d almost think it’s a slightly more high-end card, given the higher TDP, the larger die, higher transistor count, higher TFLOPS rating, more memory and more bandwidth (the specs are ~30% higher than the GTX1060). In fact, the memory specs are identical to that of the GTX1070, as is the TDP.

But that is exactly where Pascal shines: due to the excellent efficiency of this architecture, the GTX1060 is as fast or faster than the RX480 in pretty much all benchmarks you care to throw at it. If this would come to a price war, nVidia would easily win this: their GPU is smaller, their PCB can be simpler because of the smaller memory interface, and the lower power consumption, and they can use a smaller/cheaper cooler because they have less heat to dissipate. So the cost for building a GTX1060 will be lower than that of a RX480.

Anyway, speaking of benchmarks…

Time Spy

FutureMark recently released a new benchmark called Time Spy, which uses DirectX 12, and makes use of that dreaded async compute functionality. As you may know, this was one of the points that AMD has marketed heavily in their DX12-campaign, to the point where a lot of people thought that:

  1. AMD was the only one supporting the feature
  2. Async compute is the *only* new feature in DX12
  3. All gains that DX12 gets, come from using async compute (rather than the redesign of the API itself to reduce validation, implicit synchronization and other things that may reduce efficiency and add CPU overhead)

Now, the problem is… Time Spy actually showed that GTX10x0-cards gained performance when async compute was enabled! Not a surprise to me of course, as I already explained earlier that nVidia can do async compute as well. But many people were convinced that nVidia could not do async compute at all, not even on Pascal. In fact, they seemed to believe that nVidia hardware could not even process in parallel period. And if you take that as absolute truth, then you have to ‘explain’ this by FutureMark/nVidia cheating in Time Spy!

Well, of course FutureMark and nVidia are not cheating, so FutureMark revised their excellent Technical Guide to deal with the criticisms, and also published an additional press release regarding the ‘criticism’.

This gives a great overview of how the DX12 API works with async compute, and how FutureMark made use of this feature to boost performance.

And if you want to know more about the hardware-side, then AnandTech has just published an excellent in-depth review of the GTX1070/1080, and they dive deep into how nVidia performs asynchronous compute and fine-grained pre-emption.

I was going to write something about that myself, but I think Ryan Smith did an excellent job, and I don’t have anything to add to that. TL;DR: nVidia could indeed do async compute, even on Maxwell v2. The scheduling was not very flexible however, which made it difficult to tune your workload to get proper gains. If you got it wrong, you could receive considerable performance hits instead. Therefore nVidia decided not to run async code in parallel by default, but just serialize it. The plan may have been to ‘whitelist’ games that are properly optimized, and do get gains. We see that even in DOOM, the async compute path is not enabled yet on Pascal. But the hardware certainly is capable of it, to a certain extent, as I have also said before. Question is: will anyone ever optimize for Maxwell v2, now that Pascal has arrived?

Update: AMD has put a blog-post online talking about how happy they are with Time Spy, and how well it pushes their hardware with async compute: http://radeon.com/radeon-wins-3dmark-dx12

I suppose we can say that AMD has given Time Spy its official seal-of-approval (publicly, that is. They already approved it within the FutureMark BDP of course).

Posted in Direct3D, Hardware news, OpenGL, Software development, Vulkan | Tagged , , , , , , , , , , | 21 Comments

AMD’s Polaris debuts in Radeon RX480: I told you so

In a recent blogpost, after dealing with the nasty antics of a deluded AMD fanboy, I already discussed what we should and should not expect from AMD’s upcoming Radeon RX480.

Today, the NDA was lifted, and reviews appear everywhere on the internet. Cards are also becoming available in shops, and street prices become known. I will make this blogpost very short, because I really can’t be bothered:

I told you so. I told you:

  1. If AMD rates the cards at 150W TDP, they are not magically going to be significantly below that. They will be in the same range of power as the GTX970 and GTX1070.
  2. If AMD makes a comparison against the GTX970 and GTX980 in some slides, then that is apparently what they think they will be targeting.
  3. If AMD does not mention anything about DX12_1 or other fancy new features, it won’t have any such things.
  4. You only go for aggressive pricing strategy if you don’t have anything else in the sense of a unique selling point.

And indeed, all this rings true. Well, with 3. there is  a tiny little surprise that AMD does actually make some vague claims to some ‘foveated rendering’ feature. But at this point it is not entirely clear what it does, how developers should use it, let alone how it performs.

So, all this shows just how good nVidia’s Maxwell really is. As I said, AMD is one step behind, becaue they missed the refresh-cycle that nVidia did on Maxwell. And this becomes painfully clear now: Even though AMD moved to 14nm FinFET, their architecture is so much worse in efficiency that they can only now match Maxwell’s performance-per-watt at 28 nm. Pascal is on a completely different level. Aside from that, Maxwell already has the DX12_1 featureset.

All this adds up to Polaris being too-little-too-late, which has become a time-honoured AMD tradition by now. At first, only in the CPU department. But lately the GPU department appears to have been reduced to the same.

So what do you do? You undercut the prices of the competition. Another time-honoured AMD tradition. This is all well-and-good for the short term. But nVidia is going to launch those GTX1050/1060 cards eventually (and rumour has it that it will be sooner rather than later), and then nVidia will have the full Pascal efficiency at its disposal to compete with AMD on price. This is a similar situation to the CPU department again, where Intel’s CPUs are considerably more efficient, so Intel can reach the same performance/price levels with much smaller CPUs, which are cheaper to produce. So AMD is always on the losing end of a price war.

Sadly, the street prices are currently considerably higher than what AMD promised us a few weeks ago. So even that is not really working out for them.

Right, I think that’s enough for today. We’ll probably pick this up again soon when the GTX1060 surfaces.

Posted in Hardware news | Tagged , , , , , , , , , | 52 Comments

GameWorks vs GPUOpen: closed vs open does not work the way you think it does

I often read people claiming that GameWorks is unfair to AMD, because they don’t get access to the sourcecode, and therefore they cannot optimize for it. I cringe everytime I read this, because it is wrong on so many levels. So I decided to write a blog about it, to explain how it REALLY works.

The first obvious mistake is that although GPUOpen itself may be open source, the games that use it are not. What this means is that when a game decides to use an open source library, the code is basically ‘frozen in time’ as soon as they build the binaries, which you eventually install when you want to play the game. So even though you may have the source code for the effect framework, what are you going to do with it? You do not have the ability to modify the game code (eg DRM and/or anti-cheat will prevent you from doing this). So if the game happens to be unoptimized for a given GPU, there is nothing you can do about it, even if you do have the source.

The second obvious mistake is the assumption that you need the source code to see what an effect does. This is not the case, and in fact, in the old days, GPU vendors generally did not have access to the source code anyway (it might sound crazy, but in the old days, game developers actually developed games, as in, they developed the whole renderer and engine. Hardware suppliers supplied the hardware). These days, they have developer relations programs, and tend to work with developers more closely, which also involves getting access to source code in some cases (sometimes to the point where the GPU vendor actually does some/a lot of the hard work for them). But certainly not always (especially when the game is under the banner of the competing GPU vendor, such as Gaming Evolved or The Way It’s Meant To Be Played).

So, assuming you don’t get access to the source code, is there nothing you can do? Well no, on the contrary. In most cases, games and effect frameworks (even GameWorks) generally just perform standard Direct3D or OpenGL API calls. There are various game development tools available to analyze D3D or OpenGL code. For example, there is Visual Studio Graphics Diagnostics: https://msdn.microsoft.com/en-us/library/hh873207.aspx

Basically, every game developer already has the tools to study which API calls are made, which shaders are run, which textures and geometry are used, how long every call takes etc. Since AMD and nVidia develop these Direct3D and OpenGL drivers themselves, they can include even more debugging and analysis options into their driver, if they so choose.

So in short, it is basically quite trivial for a GPU vendor to analyze a game, find the bottlenecks, and then optimize the driver for a specific game or effect (you cannot modify the game, as stated, so you have to modify the driver, even if you would have the source code). The source code isn’t even very helpful with this, because you want to find the bottlenecks, and it’s much easier to just run the code through an analysis-tool than it is to study the code and try to deduce which parts of the code will be the biggest bottlenecks on your hardware.

The only time GPU vendors actually want/need access to the source code is when they want to make fundamental changes to how a game works, either to improve performance, or to fix some bug. But even then, they don’t literally need access to the source code, they need the developer to change the code for them and release a patch to their users. Sometimes this requires taking the developer by the hand through the source code, and making sure they change what needs to be changed.

So the next time you hear someone claiming that GameWorks is unfair because AMD can’t optimize for it, please tell them they’re wrong, and explain why.


Posted in Direct3D, OpenGL, Software development, Software news | Tagged , , , , , , , , , , , , | 19 Comments

The damage that AMD marketing does

Some of you may have have seen the actions of a user that goes by the name of Redneckerz on a recent blogpost of mine. That guy posts one wall of text after the next, full of anti-nVidia rhetoric, shameless AMD-promotion, and an endless slew of personal attacks and fallacies.

He even tries to school me on what I may or may not post on my own blog, and how I should conduct myself. Which effectively comes down to me having to post *his* opinions. I mean, really? This is a *personal* blog. Which means that it is about the topics that *I* want to discuss, and I will give *my* opinion on them. You don’t have to agree with that, and that is fine. You don’t have to visit my blog if you don’t like to read what I have to say on a given topic. In fact, I even allow people to comment on my blogs, and they are free to express their disagreements.

But there are limits. You can express your disagreements once, twice, perhaps even three times. But at some point, when I’ve already given off several warnings that we are not going to ‘discuss’ this further, and keep things on-topic, you just have to stop. If not, I will just make you stop by removing (parts of) your comments that are off-limits. After all, nobody is waiting for people to endlessly spew the same insults, and keep making the same demands. It’s just a lot of noise that prevents other people from having a pleasant discussion (and before you call me a hypocrit, I may delete the umpteenth repeat of a given post, but I left the earlier ones alone, so it’s not like I don’t allow you to express your views at all).

In fact, I think even without the insults, the endless walls of text that Redneckerz produces are annoying enough. He keeps repeating himself everywhere. And that is not just my opinion. Literally all other commenters on that item have expressed their disapproval of Redneckerz’ posting style (which is more than a little ironic, given the fact that at least part of Redneckerz’ agenda is to try and paint my posting style as annoying and unwanted).

Speaking about the feedback of other users, they also called him out on having an agenda, namely promoting AMD. Which seems highly likely, given the sheer amount of posts he fires off, and the fact that their content is solely about promoting AMD and discrediting nVidia.

The question arose mainly whether he was just a brainwashed victim of AMD’s marketing, or whether AMD would actually be compensating him for the work he puts in. Now, as you can tell from the start of the ‘conversation’, this was not my first brush with Redneckerz. I had encountered him on another forum some time ago, and things went mostly the same. He attacked me in various topics where I contributed, in very much the same way as here: an endless stream of replies with walls-of-text, and poorly conceived ideas. At some point he would even respond to other people, mentioning my name and speculating what my reply would have been. However, I have not had contact with him since, and Redneckerz just came to my blog out of the blue, and started posting like a maniac here. One can only speculate what triggered him to do that at this moment (is it a coincidence that both nVidia and AMD are in the process of launching their new 16nm GPU lineups?)

Now, if Redneckerz was just a random forum user, we could leave it at that. But in fact, he is an editor for a Dutch gaming website, Gamed.nl: http://www.gamed.nl/editors/215202

That makes him a member of the press, so the plot thickens… I contacted that website, to inform them that one of their editors had gone rampant on my blog and other forums, and that they might want to take action, because it’s not exactly good publicity for their site either. I got some nonsensical response about how they were not responsible for what their editors post on other sites. So I replied that this isn’t about who is responsible, but what they could do is talk some sense into him, for the benefit of us all.

Again, they were hiding behind the ‘no responsibility’-guise. So basically they support his conduct. Perhaps they are in on the same pro-AMD thing that he is, whatever that is exactly.

I’ve already talked about that before, in general, in my blog related to the release of DirectX 12. About how the general public is being played by AMD, developers and journalists. Things like Mantle, async compute, HBM, how AMD allegedly has an advantage in games because they supply console APUs and whatnot. This nonsense has become so omnipresent that people think this is actually the reality. Even though benchmarks and sales figures prove the opposite (eg, nVidia’s GTX960 and GTX970 are the most popular cards among Steam users by a margin: http://store.steampowered.com/hwsurvey/videocard/).

Just like we have to listen to people claiming Polaris is going to save AMD. Really? The writing is already on the wall: AMD’s promotional material showed us a slides with two all-important bits of information:



First, we see them compare against the GeForce GTX970/980. Secondly, we see them stating a TDP of 150W. So, the performance-target will probably be between GTX970 and GTX980 (and the TFLOPS rating also indicates that ballpark). And the power envelope will be around 150W. They didn’t just put these numbers on there at random. The low-balling pricetag is also a tell-tale sign. AMD is not a charitable organization. They’re in this business to make money. They don’t sell their cards at $199 to make us happy. They sell them at $199 because they’ve done the maths and $199 will be their sweet-spot for regaining marketshare and getting enough profit. Desperately trying to keep people from buying more of those GTX960/970/980 cards until AMD gets their new cards on the market. If they had a killer architecture, they’d charge a premium because they could get away with. nVidia should have little trouble matching that price/performance-target with their upcoming 1050/1060.

Which matches exactly with how I described the situation AMD is in: they are one ‘refresh’ behind on nVidia, architecture-wise, since they ‘skipped’ Maxwell, where nVidia concentrated on maximizing performance/watt, since they were still stuck at 28 nm. I said that it would be too risky for AMD to do the shrink to 16 nm and at the same time, also do a major architectural overhaul. So it would be unlikely for AMD to completely close the gap that nVidia had opened with Maxwell. And that appears to be what we see with Polaris. When I said it, I was accused of being overly negative towards AMD. In fact, Kyle Bennett of HardOCP said basically the same thing. And he was also met by a lot of pro-AMD people who attacked him. After AMD released their information on Polaris however, things went a bit quiet on that side. We’ll have to wait for the actual release and reviews at the end of this month, but the first signs don’t point to AMD having an answer to match Pascal.

The sad part is that it always has to go this way. You can’t say anything about AMD without tons of people attacking you. Even if it’s the truth. Remember John Fruehe? Really guys, I’m trying to do everyone a favour by giving reliable technical info, instead of marketing BS. I can do that, because I actually have a professional background in the field, and have a good hands-on understanding of CPU internals, GPU internals, rendering algorithms and APIs. Not because I’m being paid to peddle someone’s products, no matter how good or bad they are.

In fact, a lot of the comments I make aren’t so much about AMD’s products themselves, but rather about their inflated and skewed representation in the media.

Posted in Hardware news, Science or pseudoscience? | Tagged , , , , , | 25 Comments

Commander Keen 4: now in 16-colour composite CGA!

Perhaps you have already heard it through the grapevine, but just to be sure, I’d like to mention that VileR (of 8088 MPH fame)has patched the Keen 4 code and redid all the graphics to make the most of the 16-colour composite CGA mode.

It started with a tweet from John Carmack, after he had seen the video from the 8-bit guy covering CGA graphics (and featuring 8088 MPH):

I never knew about composite CGA optimizations!

So the original Keens had graphics designed for 4-colour RGBI mode only. Well, challenge accepted!

VileR has documented everything quite nicely on his blog. And you can find the download links and some discussion in this Vogons thread. So enjoy reading all about it!

I will leave you with some captures from my own PCs.

First my IBM 5160 with a new-style CGA card:

Then my Commodore PC10-III (8088 at 9.54 MHz) with its onboard Paradise PVC4:

And finally, the PC10-III with my ATi Small Wonder card:

The graphics were originally done for old-style CGA. As you can see, the new-style CGA is more saturated, but still acceptable.

The Paradise PVC4 is very saturated as well, and the colours are slightly off, but still somewhat recognizable.

The ATi Small Wonder is completely off however.

Posted in Oldskool/retro programming, Software development | Tagged , , , | Leave a comment

Thought: Neutral and objective versus politically correct.

I suppose most people don’t understand the difference between being neutral/objective and being politically correct. A neutral and objective observer can still be critical about what he observes. A politically correct observer cannot voice any criticism.

I suppose you could say that being neutral/objective is being at the origin, while being politically correct is being exactly halfway the two extremes. When one extreme is larger than the other, the two are not the same.

My aim is to be as neutral and objective as possible. I have no desire whatsoever to be politically correct though. I voice whatever criticism I see fit.

Posted in Uncategorized | 6 Comments