Well, DirectX 12 is launched, together with a new version of Windows (Windows 10) and an improved driver system (WDDM 2.0).
Because, if you remember, we also had:
- Windows Vista + DirectX 10 + WDDM 1.0
- Windows 7 + DirectX 11 + WDDM 1.1
- Windows 8 + DirectX 11.1 + WDDM 1.2
- Windows 8.1 + DirectX 11.2 + WDDM 1.3
So, this has more or less been the pattern since 2006. Now, if we revisit AMD and Mantle, their claims were that introducing Mantle triggered Microsoft to work on DirectX 12. But, since DX12 is released together with Windows 10 and WDDM 2.0, AMD is really claiming that they triggered Microsoft to rush Windows 10.
Also, since WDDM 2.0 is quite a big overhaul of the driver system, and both Windows 10 and DirectX 12 are built around this new driver system, Microsoft would have had to work out WDDM 2.0 first, before they could design the DirectX 12 runtime on top of it.
An interesting tweet in this context is by Microsoft’s Phil Spencer:
We knew what DX12 was doing when we built Xbox One.
This clearly implies that the ideas behind DX12 existed prior to the development of the Xbox One, and probably the ideas of using a single OS and API on all devices, from phones to desktops to game consoles (we already saw this trend with Windows 8.x/RT/Phone).
Perhaps the real story here is that Sony rushed Microsoft with the PS4, so Microsoft had to develop Direct3D 11.x as an ‘inbetween’ API until Windows 10 and DirectX 12 were ready, because they couldn’t afford to delay the Xbox One until they were ready. I guess we will never know. We do see signs of this ‘one Windows’ in early news about the Xbox One (by then still known as ‘Xbox 720’) in news articles going back as far as 2012 though. And we know that Direct3D 11 was used on Windows Phone as-is, which makes Windows 8 already mostly a ‘single OS on every platform’, with just Xbox One being the exception (Xbox actually does run a stripped-down version of Windows 8 for apps and general UI, but not for games).
Now, if we move from the software to the hardware, there are some other interesting peculiarities. As I already mentioned earlier, AMD does not have support for feature level 12_1 in their latest GPUs, which were launched as their official DirectX 12 line. I think even more telling is the fact that they do not support HDMI 2.0 either, and all their ‘new’ GPUs are effectively rehashes of the GCN1.1 or GCN1.2 architecture. The only new feature is HBM support, but nothing in the GPU-architecture itself.
It seems that the AMD camp has already started with the anti-12_1 offensive. I have already read people claiming that “DX12.1” as nVidia advertises it, is not an official standard:
I’ve yet to find anything about DX12.1 that isn’t from nVidia, so it’s either an nVidia-specific extension to DX12 (e.g. like DX9a) or it’s a minor addendum (e.g. like DX10.1). Either way it appears the GeForce 900 series are the only thing that support it, and if that’s the case, it’s unlikely to be very important in the long run as obtuse/narrowly supported features tend to be passed over (e.g. like DX9a or 10.1, or other things like TerraScale or Ultra Shadow). Of course history may prove this assumption wrong, but that’s my guess. The Overclock.net link above includes slides from an nVidia PR presentation that shows a few features for DX12 and 12.1; perhaps others can find more about this.
Or this little gem on Reddit:
the guy in that blog surely show an anti amd bias. I would not dig too deep into his comments.
He spent a whole blog post complaining that amd does not have conservative rasterization to a standard that has not been released….
Erm yes, because when AMD released new videocards less than a month before the official release of DirectX 12, they are going to come up with ANOTHER new line of videocards right now with 12_1 support? Well no, we’ll be stuck with these GPUs for quite some time. If AMD had GPUs with 12_1 support just around the corner, they wouldn’t have put all that effort in releasing the 300/Fury line now. A new architecture is likely still more than a year away. AMD’s roadmap does not say much, other than HBM2 support in 2016.
The plot thickens, as it seems that Intel’s Skylake GPU actually does support 12_1 as well, and in fact, supports even higher tiers than nVidia does (Intel does not advertise with ‘DX12.1’ as nVidia does, but they do advertise with ‘DX11.3’, which as we know is a special update including the new DX12_1 features, as mentioned earlier with the introduction of Maxwell v2). Clearly AMD has dropped the ball with DX12. It looks like they simply do not have the resources to develop new GPUs in time for new APIs anymore. Which might explain the Mantle-offensive. AMD knew they couldn’t deliver new hardware in time. But they could deliver a ‘DX12-lite’ API before Microsoft was ready with the real DX12.
Lower CPU overhead, for whom?
The main point of Mantle was supposed to be lower CPU overhead. But is that all that relevant for the desktop? It doesn’t seem that way. Mantle didn’t exactly revolutionize gaming. What about consoles then? Well no, consoles always had special APIs anyway, with low-level access, so they wouldn’t really need Mantle or DX12 either.
But, there are other devices out there, with GPUs and even slower CPUs: mobile devices.
That might have been what Microsoft’s main goal was. Phones and tablets. Getting higher performance and better battery life out of these devices. This is also what we see with Apple’s Metal. They launched it primarily as a new API for iOS. That is not just a coincidence.
Mantle is DX12?
And what of these claims that Microsoft would have copied Mantle? There was even some claim that the documentation was mostly the same, with some screenshots of alleged documentation of both, and alleged similarities. Now that the final documentation is out, it is clear that the two are not all that similar at the API level. DX12 is still using a lightweight COM approach, where Mantle is a flat procedural approach. But most importantly, DX12 has some fundamental differences with Mantle. Such as the distinction between bundles and command lists. Mantle only has ‘command buffers’. Again, it looks like Mantle is just a simplified version, rather than Microsoft cloning Mantle.
Time for some naming and shaming?
Well, we all know Richard Huddy by now, I suppose. He’s made many sorts of claims about Mantle and DX11, changing his story over time. But what about some of the other people involved in this Mantle marketing scheme? I get this very bad taste in my mouth with all these ‘developers’ involved with AMD-related promotions.
First there is Johan ‘repi’ Andersson (Dice). I wonder if he really believed the whole Mantle-thing, even including the part where there were claims of no DX12 in the early days. He sure played along with the whole charade. I wonder how he feels now, now that AMD has pulled the plug from Mantle after little more than a year, and only a handful of games that even support Mantle at all, some of them not even being faster than DX11. It appears he has also lived in an AMD-vacuum as well, with claims such as that DX11 multithreading was broken.
What he really meant to say was that AMD’s implementation of DX11 multithreading was broken. Which you can see in FutureMark’s API overhead test.
As you can see, there is virtually no scaling on AMD hardware. nVidia however gets quite reasonable scaling out of DX11 multithreading. Sure, Mantle and DX12 are better, but nevertheless, DX11 multithreading is not completely broken. The problem is in AMD’s implementation: AMD’s drivers can not prepare a native command buffer beforehand. So the command queue for each thread is saved, and by the time the actual draw command is issued on the main thread, AMD’s driver needs to patch up the native command buffer with the current state. This effectively makes it a single-threaded implementation. As nVidia shows, this is not a DX11-limitation, it *is* possible to make DX11 multithreading scale (and in fact, even single-threaded DX11 scales somewhat on CPUs with more cores, so it seems that nVidia also does some multithreading of their own at a lower level).
Then I ran into another developer, named Jasper ‘PrisonerOfPain’ Bekkers (Dice). He was active on some forums, and was doing some promotion of Mantle there as well, by making claims about DX11 that were simply not true. Claiming that DX11 could not do certain things. When I pointed out certain DX11-features to do the things he claims were not possible, he changed his story somewhat, down to claims that Mantle would be able to do the same more efficiently, in theory. Which is something I never denied, as you know. I merely said that the gains would not be of revolutionary proportions. Which we now know is true.
And a third Mantle-developer I ran into on some forums was Sylvester ‘.oisyn’ Hesp (Nixxes). He also made various claims about Mantle, DX11 and DX12. None of which held up in the end, as more became known about DX12 and the future of Mantle. He also made some very dubious claims, which make me wonder how well he even understands the hardware in the first place (I suppose us oldskool coders have a slightly different idea of what ‘understanding the hardware’ really means than the newer generation). He literally claimed that an API-design such as DX12 could even have been used back in the DX8-era. Now, firstly such a claim is quite preposterous, because you’re basically saying that Microsoft and the IHVs involved with the development of DX have been completely clueless for all these years, and with DX12 they suddenly ‘saw the light’… Secondly, you demonstrate a clear lack of understanding what problem DX12 is actually trying to solve.
That problem is about managing resources and pipeline states, in order to reduce CPU-overhead on the API/driver side. In the world of DX8, we had completely different usage patterns of resources and pipeline states. We had much slower GPUs with much less memory, and much more primitive pipelines and programmability. So the problems we faced back then were quite different from those today, and DX12 would probably be less efficient at handling GPUs and workloads of that era than DX8 was.
And there are more developers, or at least, people who pretend to be developers, who have made false claims about AMD and Mantle. Such as the comment from someone calling himself ‘Tom’, on an earlier blog of mine about DirectX 11.3 and nVidia’s Maxwell v2. In that blog I pointed out that there had been no indication of current or future AMD hardware being capable of these new features. ‘Tom’ made the claim that conservative rasterization and raster ordered views would be possible on existing GCN hardware through Mantle.
Well, DirectX 12 is out now, and apparently AMD could not make it work in their drivers, because they do not expose this functionality.
Or Angelo Pesce with his ‘C0DE517E’ blog, whom I covered in an earlier blog. Well, on the desktop, GCN has not been very relevant at all, since the introduction of Maxwell. AMD has been losing marketshare like mad, and is at an all-time low currently, and dropping fast:
And don’t get me started on Oxide… First they had their Star Swarm benchmark, which was made only to promote Mantle (AMD sponsors them via the Gaming Evolved program). By showing that bad DX11 code is bad. Really, they show DX11 code which runs single-digit framerates on most systems, while not exactly producing world-class graphics. Why isn’t the first response of most people as sane as: “But wait, we’ve seen tons of games doing similar stuff in DX11 or even older APIs, running much faster than this. You must be doing it wrong!”?
But here Oxide is again, in the news… This time they have another ‘benchmark’ (do these guys actually ever make any actual games?), namely “Ashes of the Singularity”.
And, surprise surprise, again it performs like a dog on nVidia hardware. Again, in a way that doesn’t make sense at all… The figures show it is actually *slower* in DX12 than in DX11. But somehow this is spun into a DX12 hardware deficiency on nVidia’s side. Now, if the game can get a certain level of performance in DX11, clearly that is the baseline of performance that you should also get in DX12, because that is simply what the hardware is capable of, using only DX11-level features. Using the newer API, and optionally using new features should only make things faster, never slower. That’s just common sense.
Now, Oxide actually goes as far as claiming that nVidia does not actually support asynchronous shaders. Oh really? Well, I’m quite sure that there is hardware in Maxwell v2 to handle this (nVidia has had asynchronous shader support in Cuda for years, via a technology they call HyperQ. Long before AMD had any such thing. The only change in DX12 is that a graphics shader should be able to run in parallel with the compute shaders. Not something that would be that difficult to add to nVidia’s existing architecture, and therefore quite implausible that nVidia didn’t do this properly, or even ‘forgot’ about it). This is what nVidia’s drivers report to the DX12-API, and it is also well-documented in the various hardware reviews on the web.
It is unlikely for nVidia to expose functionality to DX12 applications if it is only going to make performance worse. That just doesn’t make any sense.
There’s now a lot of speculation out there on the web, by fanboys/’developers’, trying to spin whatever information they can find into an ‘explanation’ of why nVidia allegedly would be lying about their asynchronous shaders (they’ve been hacking at Ryan Smith’s article on Anandtech for ages now, claiming it has false info). The bottom line is: nVidia’s architecture is not the same as AMD’s. You can’t just compare things such as ‘engine’ and ‘queue’ without taking into account that they mean vastly different things depending on which architecture you’re talking about (it’s similar to AMD’s poorly scaling tessellation implementation. Just because it doesn’t scale well doesn’t mean it’s ‘fake’ or whatever. It’s just a different architecture, which cannot handle certain workloads as well as nVidia’s).
What Oxide is probably doing, is probably the same thing as they did with Star Swarm: They feed it a workload that they KNOW will choke on a specific driver/GPU (in the case of Star Swarm they sent extremely long command lists to DX11. This mostly taxed the memory management in the driver, which was never designed to handle lists of that size. nVidia fixed up their drivers to deal with it though. It was never really an API issue, they just sent a workload that was completely unrepresentative of any realistic game workload). Again a case of bad code being bad. When you optimize a rendering pipeline for an actual game, you will look for a way to get the BEST performance from the hardware, not the worst. So worst case you don’t use asynchronous shaders, and you should get DX11-level as a minimum (there is no way to explicitly use asynchronous shaders in DX11). Best case you use a finely tuned workload to make use of new features such as asynchronous shaders to boost performance.
It sounds like Oxide is just quite clueless in general, and that isn’t the first time. Remember this?
With relatively little effort by developers, upcoming Xbox One games, PC Games and Windows Phone games will see a doubling in graphics performance.
Suddenly, that Xbox One game that struggled at 720p will be able to reach fantastic performance at 1080p. For developers, this is a game changer.
The results are spectacular. Not just in theory but in practice (full disclosure: I am involved with the Star Swarm demo which makes use of this kind of technology.)
Microsoft never claimed any performance benefits for DX12 on Xbox at all, and pointed out that DX11.x on the Xbox One already gave you these performance benefits over regular DX11. Even so, DX12 gives you performance benefits on the CPU-side, while making the Xbox One go from 720p to 1080p would require more fillrate on the GPU-side. Not something any API can deliver (if the Xbox One was CPU-limited, then you could just bump up the resolution to 1080p for free in the first place). Oxide has a pretty poor track record here, spreading dubious benchmarks, and outright wrong information.
What is interesting though, is that AMD’s Robert Hallock has FINALLY admitted that DirectX 12 is not just Microsoft stealing Mantle, but Microsoft’s own creation:
DX12 it’s Microsoft’s own creation, but we’re hugely enthusiastic supporters of any low-overhead API. 🙂
Glad we got that settled.
So basically, not a lot of what we heard about AMD and Mantle turned out to be true. As I have been saying all along. Welcome to the era of Windows 10 and DirectX 12. These are going to be interesting times for game engines and rendering technology!
Edit: There have been some updates on the async compute shader story between nVidia, AMD and Oxide. See ExtremeTech’s coverage for the details. The short story is exactly as I said above: nVidia’s and AMD’s approach cannot be compared directly. nVidia does indeed support async compute shaders on Maxwell v2, and indeed, there are workloads where nVidia is faster than AMD, and workloads where AMD is faster than nVidia. So Oxide did indeed (deliberately?) pick a workload that runs poorly on nVidia. Their claim that nVidia did not support it at all is a blatant lie. As are claims of “software emulation” that go around.
The short story is that nVidia’s implementation has less granularity than AMD’s, and nVidia also relies on the CPU/driver to handle some of the scheduling work. It looks like nVidia is still working on optimizing this part of the driver, so we may see improvements in async shader performance with future drivers.
So as usual, you read the truth here first 🙂