For some reason I keep reading the same misinformed nonsense about tessellation. A lot of people seem to think that nVidia somehow ’emulates’ tessellation on their shaders while AMD has a fixed-function unit. And for some reason they think that this will bottleneck nVidia’s hardware, while this will not happen on AMD (which is rather ironic, since as we all know, AMD is the one being bottlenecked in tessellation) Let me try to make it clear:
There is no such thing! nVidia’s shader use for tessellation is NOT, I repeat NOT different from AMD’s!!!
I have no idea where this nonsense originally came from, but it is rather annoying that so many people keep repeating it. The simple truth is that ALL vendors use shaders for tessellation, since that is simply how the pipeline in Direct3D 11 is designed. See Microsoft’s explanation for more detail: http://msdn.microsoft.com/en-us/library/windows/desktop/ff476882(v=vs.85).aspx
The short version is this:
Vertex shader -> Hull shader -> Tessellator -> Domain Shader -> Geometry Shader -> Pixel shader
In bold, you see the two new types of shaders that were added in the Direct3D 11 pipeline. These types of shaders have been added to the pipeline for the simple reason that the tessellation is programmable. So any vendor implementing a Direct3D 11-compatible GPU will be using shaders for tessellation. Since shaders have been unified since Direct3D 10, the hull and domain shader will be executed by the same shader units as all the other types of shaders. That is simply how Direct3D 11 works, regardless of brand.
The difference between AMD and nVidia is in the part between the hull and domain shader stages: the tessellator. The tessellator itself is a fixed-function unit. The difference between AMD and nVidia here is that nVidia has implemented the tessellator in a parallel way. nVidia calls this PolyMorph. In short, what happens is this:
The hull shader gets the source geometry, and does some calculations to decide how many new triangles to add (the magical tessellation factors). The tessellator then adds the extra triangles, and the domain shader can do some final calculations to position the new triangles correctly. The bottleneck in AMD’s approach is that it is implemented as a conventional pipeline. Where you’d normally pass a single triangle through the entire pipeline, you now get an ‘explosion’ of triangles at the tessellation stage. All these extra triangles need to be handled by the same pipeline that was only designed to handle single triangles. As a result, the rasterizer and pixel shaders get bottlenecked: they can only handle a single triangle at a time. This problem was already apparent in Direct3D 10, where the geometry shader could do some very basic tessellation as well, adding extra triangles on-the-fly. This was rarely used in practice, because it was often slower than just feeding a more detailed mesh through the entire pipeline.
nVidia decided to tackle this problem head-on: their tessellator is not just a single unit that tries to stuff all the triangles through a single pipeline. Instead, nVidia has added 16 geometry engines. There is now extra hardware to handle the ‘explosion’ of triangles that happens through tessellation, so that the remaining stages will not get bottlenecked. There are extra rasterizers to set up the triangles, and feed the pixel shaders efficiently.
With AMD it is very clear just how much they are bottlenecked: the tessellator is the same on many of their cards. A Radeon 5770 will perform roughly the same as a 5870 under high tessellation workloads. The Radeon 5870 may have a lot more shader units than the 5770, but the bottlenecking that occurs at the tessellator means that they cannot be fed. So the irony is that things work exactly the opposite of what people think: AMD is the one whose shaders get bottlenecked at high tessellation settings. nVidia’s hardware scales so well with tessellation because they have the extra hardware that allows them to *use* their shaders efficiently, ie NOT bottlenecked.