The previous blog turned out to have more of an effect than I could have hoped for. I originally wrote it for a well-known website, because I wanted to reach a large audience. I was pleased to find that even though I just posted it on my humble blog, it was still picked up by various websites, and discussed on various forums.
I was also pleased to see that people slowly but surely started to agree with what I wrote. Some of them were sceptic… some of them even tried to compile Bullet in x87 and SSE-mode themselves… and that took away their scepticism (or even prejudice against me), because they saw exactly the same results as the ones I published. To some people it was apparently a bit of a culture-shock that SSE wasn’t doing as much as they were always led to believe. But well, apparently it is what it is.
Two other things have happened as well in the mean time. The first is that since I wrote the blog originally, nVidia released the PhysX 2.8.4 SDK, which now defaults to compiling with SSE. It also contains some other optimizations for CPU, and in certain cases it delivers 14% better performance than the previous version. Which proves once again that although SSE could indeed boost PhysX’ performance a bit, it would not be anywhere near as dramatic as David Kanter’s claims. I’m very happy to see that nVidia has addressed this issue however. As I said before, there was no reason not to use SSE, even if the performance difference isn’t that great. We will take any performance we can get.
Another thing that has happened is that Bullet released their 2.77 SDK with some early GPGPU support. They have examples of both DirectCompute and OpenCL. This should convince people who are still sceptic, that things like cloth effects indeed ARE much faster on GPU than on CPU… that it was not just nVidia artificially slowing the CPU down to make their GPUs look good.
In a twist of irony, even though AMD heavily promoted their partnership with Bullet/OpenCL, and some of the Bullet examples even use AMD and ATi textures, not all of the OpenCL examples from the Bullet SDK render correctly. And funny enough, Erwin apparently has problems getting AMD to fix this problem… So what is this partnership really?