Sunday, September 12, 2010

CPython Threading: Interrupting

I've decided to kick off a return to blogging with a series on multi-threaded development in Python (CPython, to be specific).  Yes, we all know there's a GIL in the water, but multi-threading is still an extremely useful concurrency strategy in Python for i/o-bound activities ... which tends to characterize most of my use cases for concurrency.  But there are lots of things that make multi-threaded programming tricky and there aren't quite as many resources out there for Python (as there are for Java, say).

Disclaimer: I am not an expert at multi-threaded programming in Python (or any other language). Most of this has been trial & error, some help from the Google, and a lot of foundation from the excellent book on the subject by Brian Goetz: Java Concurrency in Practice (despite the title, the principles in the book apply to Python too). If you know of a better way or better explanation, please leave a comment so we can all benefit.

After getting over some of the challenges of mutable state and atomicity of operations, I think one of the things that probably bit me next in Python specifically was handling of asynchronous exceptions (like KeyboardInterrupt and in some cases SystemExit) -- and specifically how one goes about actually stopping a multi-threaded application.  Too many times I would end up with a script that would just hang when I hit CTRL-C (and I'd have to explicitly kill it).  So let's start there.

Asynchronous Exceptions


The KeyboardInterrupt exception is actually an OS signal; specifically, the signal module translates SIGINT into the KeyboardInterrupt exception. The rule is that on platforms where this signal module is present, these signal exceptions will be raised in the main thread. The SystemExit exception is similar, in that no matter which thread raises it, it will always be raised in the main thread.  On other platforms, apparently they may be raised anywhere (see the "Caveats" section of the thread module reference documentation for more info); for the sake of focus here, we will assume that you are working on a platform with the signal module present.

Let's start out with a simple example of a multi-threaded program that you cannot abort with CTRL-C:

import time
import threading

def dowork():
  while True:
    time.sleep(1.0)

def main():
  t = threading.Thread(target=dowork, args=(), name='worker')
  t.start()

  # Block until the thread completes.
  t.join()

if __name__ == '__main__':
  main()

The problem here is that the worker thread will not exit when the main thread receives the KeyboardInterrupt "signal".  So even though the KeyboardInterrupt will be raised (almost certainly in the t.join() call), there's nothing to make the activity in the worker thread stop. As a result, you'll have to go kill that python process manually, sorry.

Stopping Worker Threads


Solution 1: Kill with Prejudice


So the quick fix here is to make the worker thread a daemon thread.  From the threading reference documentation:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.
So in practice here, if you stop your main thread, your daemon thread will just stop in the middle of whatever it was doing & exit. In many cases this abrupt termination of any worker threads may be appropriate; however, there may also be cases where you actually want to manage what happens when your threads terminate; maybe they need to commit (or rollback) a transaction, save their state, etc. For this a more thoughtful approach is required.

Solution 2: Instruct Politely


The alternative to just killing them is to instruct the thread to stop using some agreed-upon system. You are probably aware (or have guessed) by now that there is no Thread.stop() method in Python (and the one in Java is deprecated and generally considered a Bad Idea™).  So what you must do is to implement a "thread interruption policy" which in our case is basically a signaling mechanism that the main thread can use to tell the worker thread to stop.  Python provides a threading.Event class that is for exactly this type of inter-thread signaling.

The threading.Event objects are very simple two-state (on/off) flags that can be used without any additional locking to pass "messages" between threads. Here is a basic stratagy for using a threading.Event to communicate a 'shutdown' message to a worker thread:
  1. You share a "shutdown" threading.Event instance between the threads (i.e. you either pass it to the threads or put it in a mutually accessible place).
  2. You set the event from the main thread when you receive the appropriate signal. Here we're focused on KeyboardInterrupt, but presumably users could also take some action within your application (e.g. "stop" button) to stop your application, i.e.
    shutdown_event.set()
  3. You check it (frequently) in another thread and take the appropriate action once it has been set.
    while not shutdown_event.is_set():
       do_some_work()
    do_some_cleanup()
    

It is probably worth pointing out here that this system is really just some conventions that you've established between your main thread and the workers. If the workers don't periodically check the shutdown event, then they won't stop their work -- and CTRL-C still won't work.

Putting it Together


After applying the threading.Event model to our example, we are able to have our CTRL-C respected relatively quickly (as quickly as the worker thread gets around to checking the event).

import time
import threading

shutdown_event = threading.Event()

def dowork():
  while not shutdown_event.is_set():
    time.sleep(1.0)

def main():
  """ Start some threads & stuff. """

  t = threading.Thread(target=dowork, args=(), name='worker')
  t.start()

  try:
    while t.is_alive():
      t.join(timeout=1.0)
  except (KeyboardInterrupt, SystemExit):
    shutdown_event.set()

if __name__ == '__main__':
  main()

Working around uninterruptable Thread.join()


You may have noticed that we changed how we called Thread.join(). Calling the join() method on a thread without a timeout will block until that thread returns/completes. As I understand it, this is due to a mutex in the join() method which has the implication that you cannot interrupt it with KeyboardInterrupt.  You can work around this, though, by essentially checking in a loop until the thread does exit:
while t.is_alive():
      t.join(timeout=0.1)

Other Events and Exceptions


You may notice that in the compiled example that I am also catching the SystemExit for sake of completeness. In a more complex app, you would need to make sure that other exceptions were also handled so that they would result in the shutdown message going to the worker threads.

You could also choose to register a signal handler (in your main thread) for other OS signals and raise an appropriate exception (e.g. SystemExit) or take other actions. The important point here is that these would all need to be handled in your main thread and communicated by some sort of convention to the worker thread(s).

In Summary


Dealing with these "asynchronous events" in multi-threaded applications can be a little confusing (and sometimes a little frustrating when your app refuses to exit). Understanding the key points here will hopefully help make this a bit clearer:
  1. Signals are handled by the main thread. This means that KeyboardInterrupt is always raised in the main thread.
  2. Daemon threads will exit automatically when the main thread exits.
  3. For cases where you need more control over thread termination, use threading.Event objects to "signal" threads to exit.
  4. Be aware that Thread.join() calls will block and cannot be interrupted! Use an alternative while-loop strategy for joining instead.