The story of microsleep

It is time for a more hands-on post again. This time I am even going to give you some actual source code to play with!

The issue I want to discuss today, is the issue of multithreaded GUIs. For the project I am currently working on, there are two reasons why the GUI should be multithreaded:

  1. The software can render to multiple displays at the same time. Each display can be run on its own GPU. If each display gets its own renderthread, then each GPU could have its own CPU core assigned to it, giving true parallel rendering.
  2. Aside from the render windows, there is also a separate UI window where the user can control the renderers. If this UI gets its own thread(s), the renderers will not be slowed down while the UI is updating, so no sudden hiccups or slowdowns on the displays.

The anatomy of a single display renderer

First, let us look at the simplest form of rendering application. An application that only has a single window. Generally you will use a single thread, with a message loop of the following form:

	MSG msg;

	while( true )
	{
		while (PeekMessage( &msg, NULL, 0, 0, PM_REMOVE ))
		{
			TranslateMessage( &msg );
			DispatchMessage( &msg );
		}
		if (msg.message == WM_QUIT)
			break;

		UpdateWindow();
	}

This loop will just check if there are any messages, and process them. Once it has processed the entire message queue, it will render one frame. Then it will check for messages again, etc. This is the simplest and most efficient way to handle a window that needs to render frames as quickly as possible, such as for a game or a graphics demo. It works because most of the time there aren’t any messages, so it falls right through and renders a new frame. When there are messages, they generally don’t take a lot of time to process, so your framerate is not affected much. At the same time, the application always responds to messages after a single frame of rendering, so as long as you have ‘realtime’ framerates, Windows will never complain that the application has stopped responding.

This is a classic messageloop and worked fine back in the day of single-core processors. Even today, with multi-core processors, this is still the best way to handle a messageloop. Namely, Windows wants the messages for a window to be handled by the same thread that created the window. So if you want to make some clever multithreaded handling, you still run into the problem that everything needs to be synchronized back to the main window thread with the messageloop. In most cases, it’s more trouble than it’s worth, since the extra threads and synchronization logic will introduce more overhead than just this simple peek-and-dispatch approach from a single thread.

Scaling up to multiple windows

Windows will give each thread its own message queue, and will also make sure that the messages for a given window are posted to the queue of the thread that created it. If we want to extend the above approach to multiple windows and threads, then it follows that each thread will get its own copy of the messageloop. This means that each window will be completely independent of every other window, and the processing of each window can be allocated to its own physical core on the CPU.

Or at least… that is the theory… In practice, you will find that this approach will not work as well as you had hoped. With a single window, the above approach is well-behaved: your window will get as much CPU time as the system will allow, while the system will continue to respond just fine. When you have two or more windows like this, it seems to be one of Raymond Chen’s “What if two programs did this”-scenarios…

When you have two or more windows each trying to render as quickly as possible, it seems that they are somehow starving eachother. I am not quite sure what is happening, but it might have something to do with the fact that Windows likes to temporarily boost an application when it receives events, to make it more responsive (this is the default setting for regular versions of Windows. Server versions default to a more ‘fair’ scheduler. You can switch to this scheduler in regular versions as well, but this would require the adminstrator of the system to reconfigure the system, so not very userfriendly). The resulting behaviour appears to be that the message queues are filling up, and as a result, not only the windows themselves are becoming unresponsive, but the system as a whole is starting to respond poorly. This seems to happen even on a quadcore system with HyperThreading, which should be able to handle 8 threads simultaneously. So just having enough cores is not always the answer, apparently.

Time to sleep

Perhaps then, we can make the threads behave better if we let them sleep occasionally. As you might already know, Sleep(0) is legal: it merely makes your thread give up the remainder of its timeslice, and gets rescheduled for execution right away. So would that be enough? Sadly, no. Although responsiveness improves slightly, Sleep(0) does not give the fighting threads enough time to settle.

What about Sleep(1) then? Well, it fixes the issue of thread-fighting, and the system is well-behaved again. However, there is a problem… The granularity of the Windows scheduler is not fine enough to do an actual sleep of 1 ms. Instead, it will generally sleep for the duration of a timeslice. This is somewhere in the order of 10 ms on most systems. As a result, my rendering windows are now running at around 100 fps maximum. That is not acceptable. It means that it will be impossible to render in sync with a display of 100 Hz or more. So, we need a better solution.

Microsleep to the rescue

What we really want, is a Sleep() that can sleep for extremely short periods of time. In the order of 1 ms or even less. Some OSes have a ‘usleep()’ function for that (the u stands for the Greek letter Mu, or μ. Since this is not a valid letter in the ASCII character set, it is often replaced by the similar ‘u’, hence usleep for microsleep). This function will generally work with microseconds, hence the name (some OSes may even offer nanosleep() or such).

Such exact timing can generally only be performed reliably by a real-time OS, which Windows neither is, nor strives to be. As a result, Windows does not have anything like usleep() (although certainly not all OSes that provide usleep() or nanosleep() are anywhere near realtime, and have anywhere near the implied resolution). But we need it anyway! So what do we do? We roll our own, obviously!

In this particular case, we don’t need very exact timing anyway. As long as we free up enough time for all threads to co-exist peacefully, we are happy. My first approach then, was to just call Sleep(0) repeatedly inside a for-loop. The only question is: how many iterations does one need? After a bit of experimenting, it seemed that somewere between 25 and 50 iterations worked fine on my system. I could do more iterations, such as 100, but it had little effect (a nice side-effect of Sleep(0) is that it returns immediately when there is no other work scheduled, so once all threads started their ‘sleep-cycle’, they should exit the loop quickly, so extra iterations are almost ‘free’).

While looking into Sleep(0), I also noticed that since Windows XP, there is also SwitchToThread(). This seems to be a proper ‘yield’ operation for threads, as Sleep(0) itself has changed a bit since Windows Server 2003. So I decided to use that instead. My first attempt at usleep() then was this:

void usleep(int iters)
{
	for (int i = 0; i < iters; i++)
		SwitchToThread();
}

It works fine, the threads are behaving nicely, and the framerates of the windows are virtually uncompromised. But, it is a bit nasty that you never quite know how many iterations you need, and how long you are sleeping effectively… The number of iterations may differ from one computer to the next, depending on how many cores it has, and how fast they are.

You made it work, now make it work better

It would be nice to have a ‘real’ usleep(), where you can specify the time to wait in microseconds. If we measure the elapsed time after each SwitchToThread()-call, we can exit the loop when the wait-time was exceeded. It might not be super-accurate, but it is bound by time nonetheless, rather than just running for a given number of iterations regardless of how long they take.

There really is only one Windows timer that would be accurate enough for sub-millisecond-timing, and that is the performance counter. So, let’s use QueryPerformanceCounter() to time each SwitchToThread() call:

void usleep(unsigned __int64 ticks)
{
	LARGE_INTEGER frequency;
	LARGE_INTEGER currentTime;
	LARGE_INTEGER endTime;

	QueryPerformanceCounter(&endTime);

	// Ticks in microseconds (1/1000 ms)
	QueryPerformanceFrequency(&frequency);
	endTime.QuadPart += (ticks * frequency.QuadPart) / (1000ULL * 1000ULL);

	do
	{
		SwitchToThread();

		QueryPerformanceCounter(&currentTime);
	} while (currentTime.QuadPart < endTime.QuadPart);
}

And there we have it, our very own usleep() for Windows! As it turns out, the accuracy is actually quite reasonable, and certainly a big improvement over Sleep(1). With usleep(100L), each window can now update thousands of times per second again, while the threads remain well-behaved at all times. So we get very efficient use of a multi-core/multi-GPU system when building a multi-display renderer with this approach.

It might still be a good idea to keep the usleep() period user-configurable though. I have tested it on a variety of systems, from a Pentium 4 HT to a Core i7 2600K, and they all seemed well-behaved with a value of 100 microseconds. But you never know, there might be systems out there that need longer periods of time to remain well-behaved. Conversely, you may want to set the time as small as possible on your system, to get the most performance out of it. So it would be nice if end-users can configure this value somewhere, in case of trouble.

I have made a simple proof-of-concept program based on usleep(), which opens a number of windows, each running in a separate thread, and updates a counter in the title bar, so you get a good idea of how quickly each window runs. You can experiment with the values for usleep(), or replace it with Sleep(1), or vary the number of windows/threads the program uses. You can also remove the sleep altogether and see what happens. Be prepared though: your system may become very unresponsive and it might be difficult to even kill the process.

You can download the code here: http://bohemiq.scali.eu.org/MultithreadGUI.zip

About these ads
This entry was posted in Direct3D, Software development and tagged , , , , , , , , , . Bookmark the permalink.

6 Responses to The story of microsleep

  1. Pingback: The story of microsleep, followup | Scali's blog

  2. Pingback: The story of multi-monitor rendering | Scali's blog

  3. Rex says:

    The source code link is dead. Can you kindly please fix it?

    • Scali says:

      I’m afraid the web server will be down for the coming days.
      I can mail it to you though. Did you use a valid email address when you posted this? I can send it there.

      • Exitcode_0 says:

        Thank you very much for posting this information and sharing everything on your blog. I have had quite a time thumbing through them all. If you could email me the source code for this microsleep program I could greatly appreciate it.

  4. Phil Braica says:

    This is AWESOME!
    Yeah of course a million caviots ’cause it is windows, but fantastic!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s