Time to get a bit more practical with the 3D engine. First things first, you can download the current build here:
It requires the following to be installed:
VC++ 2008 sp1 redistributable: http://www.microsoft.com/DOWNLOADS/details.aspx?FamilyID=a5c84275-3b97-4ab7-a40d-3802b2af5fc2&displaylang=en
DirectX August 2009 redistributable: http://www.microsoft.com/downloads/details.aspx?familyid=04AC064B-00D1-474E-B7B1-442D8712D553&displaylang=en
What I have here is basically a rough ‘proof of concept’. It’s my current codebase, compiled for D3D9, D3D10 and D3D11, playing a simple animation stored in a BHM file. I’ve tried it on a few computers, and I was especially interested in the more low-end/outdated hardware.
First up is an old Athlon XP1800+ with a Radeon 9600XT card. It is my old development PC (the one I originally wrote most of the D3D9 engine on). It has Windows XP sp3 installed. So it can only run the D3D9 engine. The automatic fallback mechanism worked fine:
So I was quite happy with that.
Then I tried a PC with Vista installed, and an Intel Q35 chipset. This is also a DX9 class chip, much like the Radeon 9600XT. However, it ONLY supports pixelshaders in hardware. Vertex processing is done in software. The nice thing about having Vista installed was that I could test the ‘downlevel’ functionality in D3D11. That’s a nice new feature in D3D11, which allows you to use DX9 hardware (SM2.0 or better) via the new API. This is something they should have done right away with D3D10, in my opinion. The newer API has a very nice and straightforward interface, without all the legacy stuff that D3D9 still has in it (lots of renderstates and fixedfunction shading options). The Q35 chipset seems to be the absolute minimum that D3D11’s downlevel supports (actually even less, because of the missing vertex hardware). This is known as ‘level 9.1’. There are two other DX9-related ‘downlevels’, namely ‘level 9.2’ and ‘level 9.3’. These are for more powerful SM2.0a/2.0b-class of DX9 hardware (none of them requires SM3.0, just SM2.0 with some extra features, see also here). Aside from that, there are also ‘level 10.0’ and ‘level 10.1’ for D3D10/10.1 hardware. So in short, D3D11 is compatible with all DX9-class hardware or better (which means, all hardware with SM2.0 or better, the actual DX9 API supports a lot of legacy hardware as well). That would have been a great feature in D3D10. Sadly we didn’t have it back then. Now it seems less relevant, as DX10+ hardware is now widespread. So there are two sides to this functionality… Nice to have, but a shame that we didn’t have it much sooner.
Anyway, I fired up the engine on the Vista machine with the Q35 chipset. Initially it didn’t work, because it could not find any supported display modes. I had assumed that all devices at least support the R8G8B8A8_UNORM pixelformat modes (as Microsoft does as well, it seems, since they hardcoded that in their example code). It worked on nVidia hardware, ATi hardware, and my Intel GM965. However, for some reason, the Q35 supported only B8G8R8A8_UNORM pixelformats. So I added a quick check to my initialization code, that tried to pick a supported pixelformat (I could just replace it with B8G8R8A8, but then the GM965 didn’t work, because that one ONLY supports R8G8B8A8…).
It was quite cool to see that it actually worked just fine, even on this limited hardware:
So that was a good stress-test for the backward-compatibility that I tried to build into my D3D11 engine. One thing I noticed though… This Q35 got a framerate of about 360 fps. My laptop with Intel GM965 (a DX10/SM4.0 part) only scored about 120 fps at best.
While debugging the software vertexprocessing fallback path in my D3D9 engine, I figured out why my GM965 was so much slower: The Q35 doesn’t have hardware vertexprocessing, so it will always default to software. When forcing my X3100 to software processing, I got about 315 fps out of it.
So it seems that while my X3100 has REAL vertexprocessing in hardware (some hardware used to report hardware vp for compatibility reasons, but would still use a CPU path internally), the hardware is so low-end that it’s actually slower than software vp.
Not too surprising in retrospect. After all, the X3100 has only 8 unified shaders. With software vp, that means it has 8 dedicated pixel pipelines. With hardware vp, it has to share the 8 shader pipes between vertex and pixel processing. The concept of unified shaders is a good one, but it goes from the assumption that a significant portion of the pipes are idle, so that they can be re-used for other tasks. Works fine when you have dozens of pipelines, but when you have only 8, they aren’t ever idle.
Perhaps I should add a feature that the user can force software vp, if that suits his hardware better. The irony is that this isn’t possible on the GM965 in D3D10/11, because it sees it as a ‘real’ SM4.0 part with actual hardware. Unlike in D3D9, there is no way to force software vertexprocessing, as far as I know. Technically you should always have hardware vertexprocessing, at least on D3D10 hardware and higher.
Another thing that is interesting… In the previous blog I already said something about how the vertex declarations have changed in D3D10, to make them more efficient… There are various other changes in the D3D10/11 API that should make it more efficient than D3D9. However, on all machines I’ve tried so far, the D3D9 version delivered the highest framerates. Perhaps that will change though, when I render more complex scenes than this one… But I didn’t really expect D3D9 to be faster in ANY scenario. I wonder if that is because the D3D10/11 drivers just aren’t mature enough yet, in terms of optimizations, or if the new API design’s performance wins are just purely a theoretical matter, and won’t amount to any tangible wins in practice, on modern PCs.