After I wrote the blog on microsleep, my interest in the issue was rekindled. This passage was very much out-of-character for me:
When you have two or more windows each trying to render as quickly as possible, it seems that they are somehow starving eachother. I am not quite sure what is happening, but it might have something to do with the fact that Windows likes to temporarily boost an application when it receives events, to make it more responsive.
I am one of those people who can’t leave well enough alone. I need answers, I need to know what is happening, I need to understand, so that I can find the best possible solution.
So, digging into the issue some more, I first implemented a messageloop with just GetMessage(). The UpdateWindow() was moved into the WM_PAINT handler, and the WM_PAINT handler would do InvalidateRect() before returning, so that a new WM_PAINT would be available instantly, giving a very similar idea of constantly drawing new frames as the PeekMessage()-based loop had.
This construction did not require usleep() to keep the threads happy. However, at the same time the number of UpdateWindow() calls per thread was lower than what PeekMessage() was capable of with a well-chosen value for usleep(). So PeekMessage() is still the most efficient solution, even with the usleep().
Then I started experimenting with the loop a bit. First I tried to vary the amount of calls to PeekMessage(), calling it only every 2nd, 4th or 8th iteration. This did not appear to affect the issue much. Then I made an artificial delay (just a for-loop, not a Sleep() obviously) inside UpdateWindow(). I noticed that as long as the delay is long enough, usleep() is not required either. But what else does UpdateWindow do? Nothing much really, just SetWindowText() once every N iterations.
So, I started to experiment with that value of N. And I noticed that more calls to SetWindowText() make the problems worse, less calls make the problems less. In fact, if you do too many SetWindowText() calls, even the GetMessage()-approach starts to choke (and again, the entire desktop becomes unresponsive, not just your own windows). Replacing SetWindowText() with OutputDebugString() made the problems go away. So, now I’m onto something, perhaps. Initially I didn’t suspect SetWindowText(), since I already started seeing problems in my application while I only updated the window title twice per second. I thought this frequency of updating was low enough that it would not be a problem.
An Aero to the knee…
Apparently SetWindowText() has something to do with the responsiveness here, interesting… But I am running Windows 7 with Aero enabled, and that has changed the way the windows are being drawn. So I wondered if setting Windows 7 to the classic theme would make a difference here. And indeed it did: in classic, there is no need for usleep(). I tried it in Windows XP x64 as well, just to be sure. And yes, Windows XP x64 does not require usleep() either.
So now the picture is starting to become clear: When running Aero, the SetWindowText() is not executed directly, but is queued for processing in the background. With a single thread, this works fine, but when two or more threads are both hammering the display at the same time, apparently there is never enough time to complete the drawing operations (even though you may have idle cores left on your CPU). That is where usleep() comes in. When running classic theme (or a version of Windows that predates Aero, such as XP), the drawing is handled differently, and SetWindowText() is not performed in the background, and as such, does not need to struggle for resources.
I have to disappoint people who run classic theme because they think it’s faster and lighter though: despite the issues with usleep(), Aero still posts the highest update times (after all, it uses multithreading and hardware acceleration, what else did you expect?). Classic theme is marginally slower, and XP is slower still, in this test case.
Anyway, I now have a more or less satisfactory answer to the issue. And I can now be even more confident that the usleep() is a good workaround for the issue (although Microsoft might want to improve Aero so that it cannot be hung this easily). The GetMessage()-approach might be an even more robust workaround, albeit slightly less efficient. So I have not quite made up my mind as to which version I will be using in the end. I might just keep both and allow for the user to switch.
About that usleep()…
Before I end this post, I would also like to mention some more background information about usleep(). Namely, by default, the regular Sleep() works on the normal system timer interval, which is about 15 ms. However, this timer interval can be adjusted by timeBeginPeriod(). If you set this interval to 1 ms, Sleep(1) will do more or less what you expect it to (mind you, this affects the entire system, and may reduce performance somewhat, since there will be more context switching). However, this still limits you to about ~1000 fps maximum. Granted, more than enough for our needs… but given that I’ve already seen values of more than 3000 fps per window rendering actual D3D scenes with my own usleep(), it is far from ideal.
Another way to implement usleep() is to use a waitable timer. However, although the timer interval can be specified with 100 ns accuracy, in reality it is nowhere near that accurate. It appears that this timer is also affected by timeBeginPeriod(), and as a result, you won’t get better than 1 ms accuracy, so you may as well just use regular Sleep() with timeBeginPeriod().
So the usleep()-implementation I presented earlier, based on SwitchToThread() is indeed the most accurate implementation, as far as I know.