Direct3D 11.3 (and nVidia’s Maxwell Mark 2)

Posted on September 21, 2014 by Scali

A few days ago, nVidia introduced the new GTX970 and GTX980, based on the Maxwell architecture. A bit of a surprise however, is that this is a ‘Mark 2’ version of the Maxwell architecture, which has some new features that the original Maxwell in the GTX750 series did not have. Related to that surprise was another surprise, namely that these features will also be supported in Direct3D 11.3, not just Direct3D 12. It appears that Microsoft wants to support Direct3D 11.3 and Direct3D 12 side-by-side, at least for the short term. The new features include conservative rasterization, which was already mentioned earlier, with the presentation of Direct3D 12. Also, rasterizer ordered views, which were also mentioned before, which make efficient order-independent translucency possible, for example. Typed UAV loads are also new, which have not been mentioned before as far as I know. And finally, tiled resources are now expanded from 2d to 3d volume textures. These new features (along with nVidia’s new multiple-projection acceleration) allow for more efficient voxel-based rendering, especially interesting for efficient global illumination. This is something that nVidia is promoting heavily with their new cards, something they call Voxel accelerated Global Illumination (VXGI). It will be interesting to see how this will be implemented in actual games. It could be another big step up in realistic lighting. At any rate, this shows once again that the claims made by the pro-Mantle crowd are certainly not true: we most definitely have not reached the end of development as far as rendering methods are concerned. APIs and GPUs are still being updated to support new ways of rasterizing, as I already said before. As I said: if AMD does not see it that way, that could just mean a lack of vision on their behalf. nVidia has once again shown that they are committed to pushing graphics technology ever further, by once again being the first to introduce new features. We will have to see when and how AMD will respond. When will they introduce their next architecture? And will it include features such as conservative rasterizing (volume tiled resources should already be supported by GCN, although for some reason D3D11.2 only received support for 2d textures)? Even moreso: can AMD also improve the efficiency as much as nVidia has? Because another interesting feature of Maxwell is that despite the larger transistor count, and still being 28 nm, it is significantly less powerhungry than Kepler.

This entry was posted in Direct3D, Hardware news, OpenCL, OpenGL, Software development and tagged AMD, conservative rasterizing, D3D, Direct3D, Direct3D 11.3, Direct3D 12, DirectX, geforce, GTX970, GTX980, Maxwell, Microsoft, nvidia, volume tiled resource. Bookmark the permalink.

26 Responses to Direct3D 11.3 (and nVidia’s Maxwell Mark 2)

Klimax says:

September 21, 2014 at 4:35 pm

And then there is this funny bit:
http://blogs.msdn.com/b/directx/archive/2014/09/18/directx-12-lights-up-nvidia-s-maxwell-editor-s-day.aspx
“Developing an API requires working in a graphics stack where many pieces are constantly changing: the graphics kernel, hardware specific kernel drivers, the API, hardware specific user-mode drivers, and the app itself. Adding new features and fixing bugs in such an environment requires the owners of each piece to work together in real-time to solve problems together. For several months, NVIDIA’s engineers worked closely with us in a zero-latency environment. When we encountered bugs, NVIDIA was right there with us to help investigate. When we needed new driver features to make something run, NVIDIA set an aggressive implementation date and then met that date.”

Might explain AMD’s claims about DirectX 12…

Reply
- Scali says:
  
  September 21, 2014 at 5:15 pm
  
  It sounds more and more like AMD just dropped the ball with Microsoft, and then tried to do their own “DirectX 12-lite”, while MS and nVidia were doing the real next-gen stuff, with new hardware features and all (as I already said, can’t be coincidence that all DX11 hardware from nVidia will support DX12. This blog points in that direction yet again).
  It can’t be coindidence that nVidia already has hardware on the market that supports features that will be in DX11.3/DX12. They most certainly can’t have developed that in response to Mantle. There must have been talk of these features for upcoming DX standards before Mantle was there, and AMD must have known about it, and AMD is probably working on support for these features in future GPUs. I wonder how long it will take AMD to respond, it might explain a lot.
  
  Reply
  - Klimax says:
    
    September 21, 2014 at 8:44 pm
    
    Well, we can’t forget the other player. Intel. ROV looks like Pixelsync. (Seems they were party to it too)
  - Scali says:
    
    September 21, 2014 at 8:56 pm
    
    Yup, I mentioned that in an earlier blog. Seems that Intel does not work as closely with Microsoft on the drivers/API though (then again, graphics drivers were never Intel’s strong point). Still, Intel does seem closer to Microsoft than AMD, since Intel has demo’ed the power-saving capabilities of DX12 on their systems recently.
    
    It will be a crazy world when Intel gets a GPU out with support for these DX11.3/12 features before AMD does.
  - Klimax says:
    
    September 21, 2014 at 8:45 pm
    
    Forgot one bit of news:
    http://www.kitguru.net/components/graphic-cards/anton-shilov/amd-and-synopsys-to-co-design-14nm-10nm-apu-gpu-products/
    
    I don’t think it is bright one though…
Tom says:

September 22, 2014 at 9:10 am

Most of these features are supported by Mantle.
Typed UAV loads: While UAVs are not defined, Mantle use a universal buffer type (image), that can support any read/write operations with full type conversion. This is a more natural path for GCN, which architecture can create unlimited reasources.
The other architectures only support limited number of resource creation. For example Intel gen7dot5/gen8 has 255 slots for UAV/SRV/CBV/sampler/descriptor table. Fermi can support 8 UAV, 128 SRV, 14 CBV, 16 sampler, 5 descriptor table. Kepler/1st Maxwell can support 8 UAV, 2^20 SRV and sampler, 14 CBV, 5 descriptor table. 2nd Maxwell can support 64 UAV, 2^20 SRV and sampler, 14 CBV, 5 descriptor table. GCN can support anything with the ability of unlimited resource creation. This is also how the console APIs works. D3D12 will have a TIER_3 class binding model for the Xbox One, and this class can be supported by GCN. The other microarchs will support TIER_1 (gen7dot5/gen8/Fermi/Kepler/1st Maxwell) and TIER_2 (2nd Maxwell).

Volume tiled resources is also supported on Mantle. It just called Partially Resident Textures. It can also support 2D and 3D textures. Every GCN can use it.

Rasterizer order view is also possible on Mantle, because API allow explicit access to the GDS so it is possible to launch any execution depends on GDS changes.

Conservative rasterization. this is also possible with LDA and GDS access. And the interpolation is manual on the GCN.

Reply
- Scali says:
  
  September 22, 2014 at 9:26 am
  
  Most of these features are supported by Mantle.
  
  I think what you mean is that they *could be* supported by Mantle.
  Typed UAV loads, yes.
  Edit: after looking around for some of the more obscure info, partially resident volume textures appear to be supported by current GCN cards (http://www.slideshare.net/DevCentralAMD/gs4106-the-amd-gcn-architecture-a-crash-course-by-layla-mah). I had only seen examples and mention of 2d PRT’s so far.
  The other things, as you say, can only be emulated in software, so even if you can make them work, they won’t be efficient, and that’s the point.
  This site claims that the next GCN-iteration will have conservative rasterization in hardware: http://videocardz.com/51021/amd-gcn-update-iceland-tonga-hawaii-xtx
  And the thing they call UAV should be ROV, I think.
  
  Also, I’m not quite sure if you understand ROV properly. It is about guaranteeing the order in which threads are executed by the rasterizer. I don’t think you can do that on GCN, with or without ‘explicit access to the GDS’. Because there is a difference between just having some kind of atomic/ordered operations on compute shaders (which most GPUs have been able to do for years now), and having them efficiently dispatched in order by the rasterizer.
  If you can, then do as I do: give a proper technical explanation how this would work, instead of just making meaningless claims.
  
  Reply
  - Tom says:
    
    September 22, 2014 at 1:20 pm
    
    PRT supports 3D textures. On PS4 there will be some games with voxel cone tracing. Those will use cascaded 3D textures. The same feature can be used in Mantle.
    Volume Tiled Resources is nothing new. The Xbox One already got the extension. MS just enable it on PC. It was planned for DX11.2, but it was delayed to improve it. MS just catching up Sony (GNM) with this function.
    
    I know what is ROV. On GCN with Mantle you can limit the wavefront execution. It can wait fot the GDS change preventing the launch of further shader invocation arrays. This is how it can guarantee the order.
    
    Videocardz also wrote this: http://videocardz.com/51018/exclusive-upcoming-games-support-mantle
    Sims 4 is not a Mantle title. Should we trust them?
  - Scali says:
    
    September 22, 2014 at 4:32 pm
    
    I know what is ROV.
    
    Do you know what is English?
    Also, you fail to answer the question… What you describe sounds like GPGPU-related stuff. You ignore how the rasterizer fits into this story. The question was: can you force the order from threads generated by the rasterizer? If so, how?
    Intel and nVidia do this by guaranteeing the triangle-order from the rasterizer, and then having a sort of ‘critical section’ inside a pixel-shader to make sure that the per-pixel operations of each triangle are performed in-order as well.
    If the rasterizer does not ‘know’ about ROV, then it may try to be smart and triangles might ‘overtake’ eachother. For example, say triangles 0-4 are queued on one cluster, where triangles 5-8 are queued on another… or if triangles 0, 2, 4 etc are queued on one cluster and triangles 1, 3, 5 etc are queued on another, and triangles 0, 2, 4 take longer to render than 1, 3, 5… many kinds of scenarios where triangle order can not be solved by just a critical section inside the shader.
    
    If this is possible with GCN/Mantle, I’d like to have some detailed code explaining how to set up both the rasterizer and the pixel shaders for that. And then we can see how efficient that will be. The most naive solution would just serialize all triangles, making it extremely slow. The critical section part is what makes it very efficient, since it only slows down when there is actual overlap of pixels.
    
    Should we trust them?
    
    Not saying we should, not saying we shouldn’t.
- Scali says:
  
  March 6, 2015 at 12:07 pm
  
  Funny, you never answered the question *how* this could be implemented in Mantle… Yet people still link to this disucssion as ‘proof’ that GCN-cards can do ROV as specced by DX12. It doesn’t prove anything, because you never explained anything.
  And I think you never explained anything because you simply cannot answer my question properly.
  
  Because as we both know… if you can’t control the order of threads coming from the rasterizer, the only alternative is to implement a ‘software’ rasterizer via a GPGPU-approach.
  Which would have a significant impact on the performance of the rasterizer.
  
  Reply
- Scali says:
  
  June 5, 2015 at 1:33 pm
  
  It’s official now, AMD will not support featurelevel 12_1, with volume tiled resources nor conservative rasterization on current hardware: http://www.computerbase.de/2015-06/directx-12-amd-radeon-feature-level-12-0-gcn/
  So as I already said Tom, you are wrong.
  
  In fact, because of all the rebrands, some of the 2×0-series are even stuck at 11_1.
  Ouch.
  
  Reply
Maxwell says:

September 22, 2014 at 10:18 am

http://blog.icare3d.org/2014/09/maxwell-gm204-opengl-extensions.html

Reply
- Scali says:
  
  September 22, 2014 at 10:40 am
  
  Ah yes, that adds some extra information to the discussion above. As I said: ROV works via the rasterizer. Apparently nVidia has an extension (NV_fragment_shader_interlock) that allows you to use beginInvocationInterlockNV() and endInvocationInterlockNV() in GLSL. We will have to see if AMD can implement this extension on their hardware.
  
  The other interesting extension is EXT_sparse_texture2: “This new extension adds the ability to retrieve texture access residency information from GLSL, to specify minimum allocated LOD to texture fetches and to return a constant zero value for lookups into unallocated pages. It also adds support for multi-sampled textures.”
  So although AMD’s GCN already supports volume textures, Maxwell is more capable (and if D3D11.3/12 requires support for all these features, then GCN may not be able to support it after all).
  
  Reply
DX12 says:

October 2, 2014 at 7:04 pm

http://blogs.msdn.com/b/directx/archive/2014/10/01/directx-12-and-windows-10.aspx

“The final version of Windows 10 will ship with DirectX 12”

Closed, proprietary Mantle is Dead on Arrival and irrelevant.

Reply
Maxwell says:

October 7, 2014 at 2:07 pm

http://www.geforce.com/whats-new/articles/geforce-gtx-900m-laptops-available-now

Reply
Jonatan Reed (@TehRoot) says:

October 17, 2014 at 5:46 pm

Just a comment about your power efficiency notes. The main reason behind Maxwell power efficiency is a more aggressive power throttling algorithm, not really improvements in the actual architecture.

When the card is at full load the actual power consumption differential between 7xx and 9xx is something like 12% at the very best case.

Reply
- Scali says:
  
  October 17, 2014 at 9:19 pm
  
  You’re wrong though. Look at the actual specs: the 780Ti is a GPU with 7.1 billion transistors, where the 980 has ‘only’ 5.2 billion transistors.
  Technically the 980 is a midrange chip, where 780Ti is high-end. Just look at the specs: 256-bit memory vs 384-bit, less Cuda cores, less texture units etc.
  
  Even so, the 980 performs slightly better than the 780Ti in virtually all scenarios. This is because they have improved the efficiency, by using larger/smarter caches for the cores, using more advanced colour compression, and various other tweaks.
  Add to that the fact that the 980 even supports various new features, and perhaps you get an idea of just how much better the Maxwell architecture really is.
  
  At any rate, power throttling certainly is not the whole story. In fact, I would argue that having ~2 billion less transistors is probably the largest factor in power reduction. It’s rather remarkable that they managed to get 780+ performance out of this smaller chip.
  
  Reply
Maxwell says:

October 22, 2014 at 7:42 pm

http://www.geforce.com/whats-new/articles/geforce-344-48-whql-driver-released

GeForce 344.48 WHQL driver brings DSR to Kepler & Fermi GPUs.

Reply
DX12 says:

February 6, 2015 at 7:50 pm

http://www.anandtech.com/show/8962/the-directx-12-performance-preview-amd-nvidia-star-swarm

AMD’s total defeat is complete, their dead on arrival proprietary API is irrelevant as predicted.

Reply
- mhagain says:
  
  March 7, 2015 at 2:40 am
  
  I dunno.
  
  Leaving aside the fact that Vulkan *is* Mantle, it did serve a very important purpose in that it put the shits up both Microsoft and the ARB in terms of developing their APIs. I don’t know if that would have ever happened if Mantle hadn’t existed, but both parties were long overdue having the shits put up them. If AMD were to die tomorrow that’s one thing they can at least deservedly take credit for.
  
  Reply
  - Scali says:
    
    March 7, 2015 at 9:34 am
    
    Well, I don’t buy the D3D-part. I think it’s more likely that GPU-vendors and game devs have discussed this ‘Mantle-like’ technology in meetings about planning D3D’s future. Then AMD ran with those ideas, knowing that Microsoft wouldn’t release a major update to DX until their next OS, which was still a long way off.
    Things came too soon for Windows 8, because DX9-class hardware was still too important (think Windows Phone/RT as well), and an API such as DX12/Mantle makes a lot of assumptions about how a GPU works, in order to reduce the abstraction.
    
    I suppose MS learnt the hard way with DX10 that launching an API that only runs on the latest OS and the latest hardware is not going to work, and lands on a lot of resistance from developers and users.
    
    Vulkan clearly is inspired by Mantle/DX12, and AMD clearly had a big hand in that, because Khronos could never have developed such a radical new API by themselves. OpenGL overhauling failed time and time again.
    But I don’t think OpenGL was ever actually a target for AMD/Mantle.
    
    Mantle is mostly a PR-stunt, and I’m not buying it.
    I suppose the people at MS also had a bit of a “Lol? Whatever!” feeling about Mantle.
DX12 says:

March 2, 2015 at 8:00 pm

http://community.amd.com/community/amd-blogs/amd-gaming/blog/2015/03/02/on-apis-and-the-future-of-mantle

“However, if you are a developer interested in Mantle “1.0” functionality, we suggest that you focus your attention on DirectX® 12 or GLnext.”

DING DONG THE WITCH IS DEAD

Reply
- namae nanka says:
  
  March 15, 2015 at 9:37 pm
  
  Spoke too soon there.
  
  Reply
Pingback: No DX12_1 for upcoming Radeon 3xx series? | Scali's OpenBlog™
Pingback: DirectX 12 is out, let’s review | Scali's OpenBlog™
Pingback: Rise of the Tomb Raider: DX12 update | Scali's OpenBlog™

	BugoTheCat on Running anything Remedy/Future…
	OEM on MartyPC: PC emulation done…
	equipthering on An Amiga can’t do Wolfen…
	Mike Dawson on Running anything Remedy/Future…
	.NET Core: the small… on Migrating to .NET Core: the fu…