The past weeks I have been working on the OpenGL sample for loading and animating a simple BHM file. As I mentioned earlier, it has been a while since I used OpenGL, so I had never really used the modern features such as vertex buffer objects and shaders in OpenGL. Things have progressed quite nicely so far, and I have a basic framework set up now.
I mentioned earlier that OpenGL does not seem to have an equivalent of the D3DX library, which makes it easy to load textures and shaders from disk, or to do basic math operations. At least, I have not been able to find a ‘go-to’ library for that sort of thing. I am not sure how developers handle that in general. I have seen that some like to use the code in the nVidia SDK. Perhaps other OpenGL developers don’t use OpenGL directly, but rather through middleware such as Ogre3D, Unity or Unigine, which solves these problems for them.
In my case, I want to release the sourcecode, under the BSD license. I want the code to be as platform-independent as possible. I also want to keep the example simple and lightweight. So I figured it would be best to just write a simple helper library myself. I have decided to call it GLUX (GL Useful eXtensions), a reference both to D3DX and other ‘helper’ libraries such as GLU and GLUT. Coming from a D3D-background, I also modeled the interface after D3DX, which I think is quite convenient. Being able to load a texture or shader in just a single call, for example. Not having to worry about creating OpenGL resources, and cleaning them up if the loading somehow failed, things like that.
For the texture loading I decided to use FreeImage. FreeImage basically bundles a number of common free open source image libraries, and wraps them in a single interface. This makes it easy to support all the common formats (such as jpg, png, gif, tga, dds) as textures in OpenGL. I ran into a problem though, when I wanted to compile it for Windows x64. Apparently some of the OpenJPEG code that FreeImage uses, uses a function called lrint(), which is not part of the Microsoft standard C/C++ library. So the function was added manually, but it was implemented in inline-assembly. Since inline assembly is not supported in x64 mode, the code would not compile. I had to add an #ifdef for x64 mode, and add an implementation in C++. Once that was done, I could compile it, and loading a JPG file worked nicely.
Sadly it’s not quite as simple as that though. You don’t know what pixelformat FreeImage loads (it depends on the filetype). OpenGL generally wants 24-bit or 32-bit data. So you first call a FreeImage function to convert the data to 32-bit, so you at least know for sure that your pixels are 4 bytes. Then the problem is that the order of bytes that FreeImage uses for 32-bit is not one of the standard OpenGL ones. OpenGL likes GL_RGB or GL_RGBA. FreeImage however uses BGRA as its 32-bit format. There is a GL_BGRA_EXT extension, but the most compatible option is to just reorder the pixels yourself. The GL_BGRA_EXT might not be supported everywhere, and I have also seen mention of performance problems when using it, on some hardware. So I decided to just do the extra conversion step. It seems there is always more hassle than necessary to get something done…
For the maths, I am using a combination of my own routines, and some wrapped-up functionality from CML. To the end-user of the functions, this is not noticeable. I just use CML for convenience at this time. I didn’t want to waste too much time on writing all the math routines.
For the shaders, I thought the OpenGL API is especially clunky. There are many steps involved in getting GLSL shaders working:
- Load the sourcecode of the shader into memory yourself, as OpenGL will only understand an array of zero-terminated strings as input.
- Create a shader object.
- Bind the sourcecode to the shader.
- Compile the shader (why is this a separate step? When is there ever a time that you would want to bind sourcecode to a shader and NOT compile it?).
- You can now discard the memory for the sourcecode.
- Call a function to check if the shader compiled correctly (Why is this a separate step? Why is there not a return value for the compile-function?).
- If there was an error, call ANOTHER function to get the LENGTH of the error string.
- If there was an error, call YET ANOTHER function to copy the actual error string to a local buffer, so you can display it.
- If there was an error, don’t forget to discard the shader object.
- If compilation is successful, you now have a working shader object… now you need to link it to a program object. A what? Yes, a program object. Create a program object.
- Attach your compiled shader (or shaders, a program manages both vertex and fragment (aka pixel) shaders).
- Link the program.
- Again, the function may fail, but it has no return value, so see steps 6 through 9 for what to do next.
- Congratulations, we can start using our shader!
So I decided to wrap up most of this cruft into a few helper functions that will give me a single call to load a shader from disk, compile it, and in the case of an error, clean up and retrieve the error message. I’ve done the same for the linker function. I still kept the shader and program objects separate, to retain control over the shaders, and make it to use either just a single shader instead of a vertex/fragment shader pair, and to be able to insert glBindAttribute()-calls before the linking process… That is convenient, but I may get into that at some later time.
The program object is something that does not exist in Direct3D. It makes sure that the vertex shader’s output matches the fragment shader’s input. In Direct3D you can just set all your shaders to the pipeline directly. The vertex shader needs to match a certain vertex format, but the linkage between vertex and pixel shader is not important I suppose. In the end the vertex shader just outputs a set of interpolators. There could be a semantic mismatch, but no more than that.
Hack the pipeline
I mentioned before that the legacy pipeline is a tad limited… Apparently there has actually been an attempt to add a palette of world matrices to the pipeline in ARB_vertex_blend and ARB_matrix_palette. However, it looks like these extensions are not widely supported. It seems that people have decided that an array of matrix stacks is a bit of overkill. So I have just built my own independent matrix palette, and copy that to my shader’s uniform variables. In order to make single world matrices a tad easier to handle, I have also used an old trick: Instead of using GL_MODELVIEW for the model*view matrix and GL_PROJECTION for the perspective matrix, I move the view-matrix over to GL_PROJECTION, so that GL_MODELVIEW behaves like a regular world matrix, and GL_PROJECTION is now actually view*perspective. This gives me direct access to the world matrix. I don’t have to worry about the view matrix all the time. I still cannot retrieve the view matrix, but I will just build my own matrix storage for that. I already need it for my matrix palette anyway.
Getting things to work
After figuring out the best way to handle GLSL and its various different types of variables (attributes, uniforms and varying), I managed to eventually get the skinning to render properly on my Radeon 5770. I more or less expected that it wouldn’t work on the Intel GM965 in one go… but it took a while to get it working on there. As I found out, the Intel driver only supports the basic 1.10 version of the language spec. Some things that have been in Direct3D forever, such as non-square matrix support and unsigned integer types, apparently weren’t supported in version 1.10 of GLSL. So yet another case of the hardware having capabilities that the driver will not expose. Anyway, after ‘dumbing down’ the shader, I managed to get it to work on the Intel chip.
Then I went to my FreeBSD system, which has an Intel G31 chipset, even more basic than the GM965. The FreeBSD drivers also aren’t as well-supported as the Windows ones. I already knew at an early stage that GLSL wasn’t going to work, since the required extensions aren’t implemented. It does support the older ARB_vertex_program and ARB_fragment_program extensions though. Those are basically SM2.0-level programmable shaders, but they can only be programmed in assembly language (those are two weaknesses of OpenGL right there: firstly there is no vendor-agnostic support for programmable shaders prior to SM2.0, and secondly, not all programmable hardware can be programmed with GLSL).
So I figured I would try my hand at adding support for this legacy programmable shader support. It actually went quite smoothly. Assembling the shaders wasn’t quite as much work as the GLSL stuff I mentioned earlier. There is no linker stage required either. And I could re-use some of the code that I had written to set the vertex attributes. Apparently they share the same API there. So I mainly had to write the shaders themselves in assembly, and then write some new code to upload the matrix palette for this shader type. And before I knew it, there it was, a skinned animation on my FreeBSD box (I had not implemented lighting yet, so it looks a bit odd, not a FreeBSD-specific issue):
This feels rather peculiar to me, since I have never really used any 3d software on the FreeBSD machine anyway. I don’t think there’s a lot available for FreeBSD anyway… and not much is going to work on the Intel chip anyway. But nevertheless, it runs the animation just fine. And it is a great test case for my code. It compiles and runs on a completely different platform, with very limited features compared to my Windows machine with DirectX 11 card.
CPU Performance bottleneck
I said before that the OpenGL code seemed to have quite a bit of CPU overhead. I accidentally stumbled across the major culprit here: freeglut. I originally used freeglut because I wanted to have a 64-bit version of GLUT. Since the sourcecode of GLUT isn’t maintained anymore, I thought perhaps newer open source projects such as this freeglut would be a better choice. However, at one point I saw that my Radeon ran a certain build of my program at a much higher framerate. The performance was much closer to nVidia and D3D performance now. As I figured out, it ran with the original GLUT binaries, rather than the freeglut one. So I decided to grab the old unmaintained GLUT sourcecode and tried to recompile for x64. That worked pretty smoothly, and indeed it made a major difference in performance.
One thing I noticed is that I am getting more and more comfortable with OpenGL. Yes, some things are pretty clunky or just downright weird, but once you have written some functions to hide that from you, you quickly forget about it. As you get more experienced with the API, you also adapt to their way of thinking, and it is easier to get your head around new things.
I also seem to spend quite a bit more time on cleaning up this code and tweaking it, than I do on certain other code. After all, this is supposed to be an open source example, so it should be robust code, and easy to read, so that people can understand it and learn from it. I originally set up this framework based on my earlier Java and Direct3D frameworks (which both support the BHM format with skinned animations), but by re-implementing the code and paying attention to the details, I’ve also found things that could be improved, although they have been that way for many years. I may not ever have given them a second look otherwise.
The code is not quite ready for release yet, I still need to clean up some issues, add some comments, and write a basic readme file. I may also want to improve the lighting a bit. Currently it is only diffuse lighting. But it is getting there.
Pingback: My most ‘DailyWTF’ moment? | Scali's blog