Output synchronisation in Python

A simple trick to making print more atomic.

(Not applicable to Pythons without the GIL, nor to those without the print statement!)

One of the first problems that a programmer might encounter when writing a multithreaded program is the interleaving of outputs from multiple threads. As the old joke demonstrates:

Q. Why did the multithreaded chicken cross the road?
A. to get side. To other the

(A source for the joke points out that the punchline isn’t even going to be repeatable each time!)

If a certain output is one of the primary products of your program, it’s not too much work to explicitly synchronise it in a variety of ways, such as locks and output queues. But if (like most of us from time to time) you want to add logging or print statements at ad hoc points in your code, worrying about synchronisation for each becomes a burden.

Say your program has defined a warn function, that can be used to print a message to the standard error stream.

import sys
import threading

def warn(msg):
    thread_name = threading.current_thread().name
    print >>sys.stderr, thread_name, msg

def task():
    for i in range(5):
        warn('Iteration %d' % i)

t1 = threading.Thread(target=task)
t1.start() # Run task in another thread
task()     # Also run task in the main thread
t1.join()

When you run the program, you get output like:

MainThreadThread-1 Iteration 0Iteration 0

MainThreadThread-1 Iteration 1Iteration 1
MainThread
Thread-1Iteration 2
Iteration 2MainThread
Thread-1Iteration 3
Iteration 3MainThread
Thread-1Iteration 4
Iteration 4

It’s obvious that strings aren’t being chopped up arbitrarily — any single element to the print statement is intact — but at the same, print statements are interfering with each other. Printing the thread name for one completes, but then a context switch occurs and the thread name for another is printed, before the message for the first thread can be appended to that line. It’s a mess to read when you’re trying to debug your program.

I hasten to aside that this is not the only possible output from the program, which is part of the problem! I ran it a couple of times to get one that I thought was pretty. Additionally, the fact that a given behaviour — no single elements chopped up — exhibits in a run does not guarantee that it’s dependable. It may simply be that context switches between elements are 1000000 times as likely as those between characters. In this case however, it happens that switches between characters are impossible, as will be explained below.

So, you fix it by constructing a whole string first, then printing it:

def warn(msg):
    thread_name = threading.current_thread().name
    line = '%s %s' % (thread_name, msg)
    print >>sys.stderr, line

MainThread Iteration 0Thread-1 Iteration 0

MainThread Iteration 1Thread-1 Iteration 1
MainThread Iteration 2
Thread-1 Iteration 2
MainThread Iteration 3

MainThread Iteration 4Thread-1 Iteration 3

Thread-1 Iteration 4

It seems that the most of the line is printed atomically, but the implicit end-of-line character is not! Why not? Because it gets compiled to a separate Python bytecode instruction:

>>> import sys
>>> import dis
>>> def f():
...     print >>sys.stderr, 'hi'
...
>>> dis.dis(f)
  1           0 LOAD_GLOBAL              0 (sys)
              3 LOAD_ATTR                1 (stderr)
              6 DUP_TOP
              7 LOAD_CONST               1 ('hi')
             10 ROT_TWO
             11 PRINT_ITEM_TO
             12 PRINT_NEWLINE_TO
             13 LOAD_CONST               0 (None)
             16 RETURN_VALUE

CPython’s much-loathed Global Interpreter Lock has done most of the work for making our print statements atomic. As long as two threads interfere only through their use of print, the worst that unsynchronised code can do is mess up the output, and that’s only because the statement may have been compiled to more than one instruction, as above. In other languages, not only could the output be garbled at the character level, the internal housekeeping for maintaining the stream could be “garbled” as well. At that point, the entire program’s behaviour is hard to reason about with any certainty.

So, how can we make our use of print fully atomic? Easily: by moving the end-of-line character into the string:

def warn(msg):
    thread_name = threading.current_thread().name
    line = '%s %s\n' % (thread_name, msg) # \n is now in explicitly in line
    print >>sys.stderr, line, # Use trailing , to indicate no implicit end-of-line

MainThread Iteration 0
Thread-1 Iteration 0
MainThread Iteration 1
Thread-1 Iteration 1
MainThread Iteration 2
Thread-1 Iteration 2
MainThread Iteration 3
Thread-1 Iteration 3
MainThread Iteration 4
Thread-1 Iteration 4

Whole lines are now printed, with no overlap.

What if you want to print a group of lines contiguously? Simply join them into a single string first. When they are printed, that thread will obtain the GIL, print the entire string using a single instruction, then release the lock and let other threads continue.

Note that autoflushing normally only applies by default to stderr. To autoflush stdout you could use python -u, or reopen it with more explicit file modes, or just use sys.stdout.flush() after your print statement. However the last option will compile to separate instructions, so you might need to add locking around the print and flush statements… (See this Stack Overflow question for more info.)

1 Response to Output synchronisation in Python

B. Doyle says:

August 7, 2016 at 7:51 pm

This is really cool and useful; I’ve bookmarked it as another useful idea when I might be working with threading. Unfortunately, I’ve begun to understand why there’s so much hype over the GIL as well, and hope to learn far more about it in the months ahead.