It has been months since I wrote about my 3D engine. There hasn’t been to much to write about anyway. I have been working on other stuff most of the time, like the CPUInfo library, the BHM file format project, and the JPEG Loader. These projects are vaguely related to the 3D engine. As I discussed before, I use the dependency walker routines from the CPUInfo library to analyze loading problems. And the BHM file format was originally developed to export objects and animations from 3dsmax, and import them into my own 3D engine.
The main thing I have been doing is splitting up the engine into a loader EXE and client DLLs containing the actual engine code, for various APIs, currently being Direct3D9, Direct3D10 and Direct3D11. The codebase had always been a monolithic executable, so I had to check all compiler and linker settings to make sure that everything worked properly according to MFC-rules. The EXE is now an MFC application (CWinApp), and the engine DLLs are MFC Extension DLLs. There are 32-bit and 64-bit versions of everything, and I use the Launcher from the CPUInfo project to automate the launching of the proper executable, as I mentioned before.
After everything compiled and ran properly again, I picked up work on the BHM importer. So far, I had a basic rendering framework, which could render in Direct3D9, Direct3D10 and Direct3D11, but it couldn’t load any objects or animations, so all I had to show were some simple generated objects, such as the archetypal donut. I decided to first try and make the BHM loader and animation work again in Direct3D9, as I still had the code from the old engine.
I ran into the problem that Direct3D9 uses vertex declarations, which are different from Direct3D10/11. The new engine was based around the newer input layout declaration scheme. In Direct3D10/11, you bind the input layout to the shader at creation time. Therefore I wanted to have input declarations as extra parameters for my shader loading and compiling function. I decided to build a routine that could translate a D3D10/11 input layout declaration to a D3D9 vertex declaration. Then I pass the D3D10/11 declaration to the shader loading and compiling function, which will silently convert it to D3D9 format and build a vertex declaration from it. Semantically, a vertex declaration isn’t entirely the same as an input layout, as the input layout is shader-specific, and a vertex declaration can be shared between all shaders that desire the same input (the actual mapping of the input to the shader is done at runtime in D3D9, causing extra overhead). However, if I treat a vertex declaration as shader-specific, it will still work as expected, and it makes my code simpler, as there are no special cases (I could use a simple caching system to silently re-use vertex declarations, rather than creating a new one for every shader).
Once that was solved, I got the D3D9 code up and running reasonably quickly. D3D10/11 followed not too much later, although I had to modify the shaders a bit, for the new constant buffer system that the new APIs use. I haven’t yet figured out a good way to abstract those differences away. D3D10/11 allow you to directly map a structure in videomemory, and fill it. D3D9 accesses each variable separately, via name or index. You have to query the handle of the variable you want to change, and then set the value (where there are various setter functions, one for each supported datatype).
At any rate, the BHM file can load, and the animation can be played (including skinning). I am now thinking of making an OpenGL example for loading and animating the object, as an example for the BHM project. I also want to add some kind of ‘info screen’ based on the CPUInfo library. Other than that, I have a basic rendering framework again, now being able to leverage DX11, so I think I’ll play around with as well. I’ll have a closer look at the tessellation API, and I will see what DirectCompute can do for me. After all, I had these ideas for tessellation a few months ago, when I used Cuda. Now I have no Cuda, but I have DirectCompute and OpenCL at last.
On another note…
The Radeon HD5770 isn’t that big of a success with me. There are still issues in Windows 7 x64, even though I have just installed the latest 10.1 drivers. I had the ‘wallpaper’ crash again, during the typing of this blog (and ofcourse auto-save wasn’t on, so I had to retype it from scratch). The installer for Vista x64 also seems broken. It always gives me a failure on the HDMI driver. I also tried to install it manually, but it seems the .inf file is broken, as it doesn’t detect any compatible drivers for my system. At least they fixed the audio problems in Windows 7, meaning that I also no longer had problems with skipping audio when I had a fullscreen Youtube movie etc. Vista is not so lucky.
Another thing I noticed… Although there finally is some OpenCL support for ATi, it’s still not that great. Firstly, they still support a minimal featureset, while nVidia had added a lot of extra functionality already with the Cuda 3.0 release. Secondly, it’s still not part of the driver itself, so you still have to install the Stream SDK to get the required OpenCL drivers. Lastly, performance is not that great. I tried the included samples of GPU Caps Viewer on my 9800GTX+ in my work PC and my Radeon HD5770 at home (both systems using a Core2 Duo at 3 GHz, one a Conroe, the other a Penryn). The 9800GTX+ was faster in pretty much all the samples. Eg. the mesh deforming sample ran at about 75 fps on the HD5770, but 85 fps on the 9800GTX+. That’s odd, given the fact that the GeForce is a much older card, and generally considerably slower in regular 3d graphics (and considerably cheaper than the HD5770 if you were to buy one today, now known as the GTS250). I suppose it points back to this earlier blog: ATi’s architecture isn’t that suited to GPGPU as nVidia’s is. It looks like this can only get worse when nVidia’s Fermi arrives. By the looks of it, ATi will still lose the physics battle, even when an OpenCL-based solution arrives.