Some practical uses of goto

A few days ago there was another submission on Reddit on the evils of goto. The flow control primitive “goto” is “considered harmful” (which, I understand, forms was the editor’s title to Dijkstra’s famous letter).

“Considered harmful” has become a meme in the programming community. So has the simplistic tenet that goto should never, ever be used, culminating in other memes such as this one. It’s not in itself funny, but it does illustrate the schooled aversion that modern programmers have to goto. (The cartoon comments on the attitude, rather than intentionally enforces it. But it still provides ammunition to those who say absolutely that goto should not exist.)

The problems with goto

Goto is a low-level primitive. At the assembly language level, programs are sequences of instructions, normally executed consecutively. There are occasional branch points where, depending in the result of a computation, the next instruction to be executed may optionally be one located elsewhere in the sequence, where it is identified by a label. And there are gotos, which unconditionally continue execution from a labelled instruction. At this level, gotos are inevitable, and harmless, because the code should be generated by compilers from higher-level programs.

But early “higher-level” languages used goto, too. There were two major problems with this. It lead to progams that were hard to read. The result was called “spaghetti code”, because the flow of control weaved all through the program, and each strand had to be individually followed by the reader for them to have any understanding of what the program was doing.

The other objection is more theoretical. Ideally, the correctness of a program should be provable (and actively proved by the programmer). This is still a tricky problem, and in practice most programs are not proved correct.

Structured programs are composed from smaller programs. If the correctness of smaller programs can be proved, and we can have rules for analysing a compound program in terms of its parts, then we should be able to prove its correctness. So, for instance, Tony Hoare’s logic for program correctness has inference rules like the following:

Which essentially says, if P holds true before and after the execution of S (and B also holds true before S), then P will hold true before and after the execution of while B do S. The programmer’s job is to define P as a useful invariant which can be used to prove that the loop performs a correct computation.

In structured programming, the entire program is composed of these structures; and, with suitable invariants, its correctness can be proved.

What about goto? The problem with goto is that there is no longer an obvious recursive structure. It is harder to find self-contained subprograms whose correctness can be proved, and hence harder to prove the correctness of the whole program. The Knuth paper from the Reddit submission contains logic rules for gotos and labels; but the potential for overlapping code between gotos and their targets is still a problem.

How goto is used in practice

Although I like to think I use goto in sensible ways, I thought I should conduct a survey of my codebase. The keyword goto occurs in 24 places in my C code, in the projects fractal, compiler, fs, and svnindex. There are three ways in which it is used.

Cleaning up

The first is a jump to cleanup code before exiting a function, of this form:

int f(void)
{
    void *stuff = malloc(1000);

    if (!try_something())
    {
        result = -1;
        goto cleanup;
    }

    result = do_other_stuff();

cleanup:
    free(stuff);
    return result;
}

This occurs in svnindex, in code I copied from other parts of Subversion (in fact there is a helpful comment there, saying as much). Subversion has an exception framework in which the return value of a function is an exception (or NULL if no exception is thrown). But being C, it’s also the case that memory allocated for exceptions must be manually freed. In this piece of code an exception is thrown and handled in preceding lines; the cleanup code simply clears that previous exception before returning the final exception result of the function.

Jumping to cleanup code is a classic permitted use of goto in C. In this case it is a little more complicated because of the exception logic, but basically it’s just freeing some memory that was allocated earlier in the function.

Retrying an operation

The majority of my gotos are in this form:

int f(void)
{
    int rv;
restart:
    rv = do_something();
    if (specific_unusual_condition)
    {
        goto restart;
    }
    return rv;
}

I use this form when the operation in question is something that almost always succeeds the first time, but occasionally something arises part way through that means we need to start again. Structured programming purists would probably insist on replacing it with a while loop:

int f(void)
{
    int rv;
    int successful = 0;
    while (!successful)
    {
        successful = 1;
        rv = do_something();
        if (specific_unusual_condition)
        {
            successful = 0;
            continue;
        }
    }
    return rv;
}

But, not only is that harder to read, it obscures the qualitative fact that, normally, we expect the loop to run exactly once. It also requires an extra variable because the structure forces the head of the while loop to have two possible roles in the flow control. An alternative is a do-while loop:

int f(void)
{
    int rv;
    do
    {
        rv = do_something();
        if (specific_unusual_condition)
        {
            continue;
        }
    }
    while (0);
    return rv;
}

Which removes the variable. But the structure is identical to the goto implementation. The only difference is that we use the head of the loop as the label, and the word “continue” instead of “goto”. The loop’s test is vacuous and required only by the syntax.

A dubious case

Finally, there is one instance of goto in my compiler where it is used somewhat differently. There is a part of the compiler that emits an arbitrary control-flow-graph as a linear sequence of instructions. It has a loop like this:

while (!all_done)
{
    vertex = pop_vertex();
    if (done(vertex))
    {
        continue;
    }

do_next:
    emit(vertex);
    done(vertex);

    if (!next_vertex or done(next_vertex))
    {
        continue;
    }
    else
    {
        vertex = next_vertex;
        goto do_next;
    }
}

(It is only coincidence that the data the program operates on is also program code that whose structure is undergoing transformation. The general problem is that of a directed graph being listed in an efficient format, where edges are inferred to exist between subsequent nodes unless indicated otherwise.)

The loop effectively has two heads: one for the case where the next instruction follows immediately on from its predecessor, and one for when a new subsequence of instructions begins. Depending on the instruction emitted, the loop with either goto the do_next label for an immediately following instruction, or will continue to the top of the while statement for a new sequence.

I suspect in this case that the goto is unnecessary, and that it should be rewritten as a nested while loop.

However, the algorithm is inherently complicated. It is still necessary to understand all possible cases of how a vertex related to its successors, and cover them appropriately. There have been bugs in this bit of code where that wasn’t the case! Regardless of while loops, it is spagehetti-like code and should be rewritten.

A question on compilation

There is a well-known theorem in structured programming: any occurrence of goto in a program can be rewritten to use structured flow control with the addition of a finite number of boolean variables. The rewritten “restart” loop shown above provides an example of that.

Conversely, some structured programs should be able to be rewritten using gotos, with the elimination of a boolean variable.

If I was going to rewrite my restart loops as while loops, I wondered if the compiler would be able to optimise away the extra control variable.  Doing so could reduce register pressure, let more important variables remain in registers, and result in a faster program.  Take this bit of test code:

extern int check();

extern int work();

void goto_test1(void)
{
restart:
    if (check())
        goto restart;

    work();
}

void goto_test2(void)
{
    int successful = 0;
    while (!successful)
    {
        successful = 1;
        if (check())
        {
            successful = 0;
            continue;
        }
    }

    work();
}

In gcc with no optimisation, the functions are compiled as:

_goto_test1:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        jmp     L2
L4:
        nop
L2:
        call    _check
        testl   %eax, %eax
        jne     L4
L3:
        call    _work
        leave
        ret

_goto_test2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movl    $0, -12(%ebp)
        jmp     L6
L7:
        movl    $1, -12(%ebp)
        call    _check
        testl   %eax, %eax
        je      L6
        movl    $0, -12(%ebp)
        nop
L6:
        cmpl    $0, -12(%ebp)
        je      L7
        call    _work
        leave
        ret

The second, properly structured function spends instructions on setting and checking that control variable. But, with even a low level of optimisation (-O1), the output is:

_goto_test1:
L2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
L3:
        call    _check
        testl   %eax, %eax
        jne     L3
        call    _work
        leave
        ret

_goto_test2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
L6:
        call    _check
        testl   %eax, %eax
        jne     L6
        call    _work
        leave
        ret

The compiler is smart enough to tell that the value of the variable depends entirely on the path taken through the program, and when the structure is rewritten, the variable is superfluous.  It is optimised out.

This demonstrates that concern about extra boolean variables, at least, is not an excuse for using gotos.

Posted in Programming | Tagged , | Leave a comment

Debugging wxPython programs

wxPython is a great way to write full-featured native GUI applications in Python.  It’s a wrapper for wxWidgets, a portable windowing toolkit written in C++.  (wxWidgets used to be called wxWindows, but, thanks to Microsoft’s inexcusable penchant for giving all their products inanely generic names, inevitably had to be renamed.)

But it can be painstaking to add program features: you add a few lines, restart the app, and try to test that they behave correctly.  Then repeat for the new few lines. This is only made a harder by Python’s dynamic typing, because so many more incorrect pieces of code are happily admitted by the parser and runtime, and only become apparent when actually executed.

With most Python programs you can use the command line interpreter to run individual functions, or type in snippets of Python code to quickly test them.  You can also use Python’s introspective facilities to find out how to use a module.  There are two particular functions which are great for this:

  1. dir(x), which will return a list of the properties on xx could be a module, class or any other object; also the somewhat longwinded [k for k in dir(x) if 'str' in k] when the list is long and you want to cut it down to likely possibilities.
  2. help(x), which will print all available inbuilt documentation on the object x.  By default the help text will consist of the a list of the method definitions in x (if x is a class or an instance of a class), or a recursive list of definitions in a whole module if x is a module.  It will also include the docstrings if any are present.

When I’m learning to use a new module, I will load it on the command line, and find what I need using the these two functions.

But running a wxPython program is all or nothing affair.  The wxPython app is initialised, and then control is passed to the inbuilt main program loop.  How many times I’ve wanted to examine the value of part of my running app, just to see how it’s behaving!  The workaround has been to add lots of print statements.

It occurred to me today that I could possibly run it alongside the Python command line.

Typical wxPython programs, as I write them, have a file main.py that looks like this:

from gui import MainFrame
import wx

def main():
    app = wx.PySimpleApp()
    frame = MainFrame()

    frame.Show(1)
    app.MainLoop()

if __name__ == '__main__':
    main()

app and frame are Python objects that wrap a lot of serious C++ GUI code, which interacts with the operating system.  Essentially all of the application state is contained in those objects. Normally when the program is run, it is executed as above. When app.MainLoop is called, wxPython will take over and display an interactive, native window. That function only returns when the main window is closed.

I want to keep the command line running in the main thread, and launch the GUI in another thread. Doing so would mean I can test the GUI normally, but can also run snippets of code and introspect the running GUI from the command line.

I tried creating app and frame directly on the command line, and then starting app.MainLoop as a new thread:

from gui import MainFrame
import wx
app = wx.PySimpleApp()
frame = MainFrame()
frame.Show(1)
import threading
t = threading.Thread(target=app.MainLoop)
t.start()

It does not work, because wxPython has certain expectations about how these vital functions are executed. In particular, the main GUI code, including initialising the app and creating the main window, should all run from a single thread.

So instead, I tried something even simpler:

# In main.py:
def main():
    global app, frame
    ...

# On the command line:
import main
import threading
t = threading.Thread(target=main.main)
t.start()

At this point the GUI opens and runs normally. And the command line is ready for input at the same time! I can inspect it:

>>> main.frame.GetTitle()
'SQL Editor'

>>> tab = main.frame.notebook.GetCurrentPage()
>>> dir(tab.tc) # Find out what methods the text control on the tab supports
[..., 'Clear', ...]
>>> tab.tc.Clear() # Remove all the text in the GUI text box on that tab
>>> main.frame.Refresh() # Tell the GUI to update itself

These can be used to manipulate the running program in ad-hoc ways. There are dangers in doing so; a wild thread is coming and poking round in the internals of a running program, after all. But the GUI is mostly idle waiting for an event; in practice it is generally safe to examine it. Python’s much-maligned Global Interpreter Lock protects the simple Python objects in it, and the C++ parts of wxWidgets are threadsafe — up to a limit. wxWidgets uses a single thread for all graphical updates, which is why Refresh must be used from another thread to trigger drawing activity, rather than using drawing methods directly.

Posted in Programming | Tagged , , | Leave a comment

Starry Night

Since getting back into astronomy, I’ve become more aware of the night sky. I’ve always had a fondness for it: knowing that “approximately” 99.9% of the universe is out there; appreciating the cold beauty of a field of stars in the night. Continue reading

Posted in Reflections, Science | Tagged , | Leave a comment

Chrome vs Firefox rant

So Chrome is now the world’s No. 1 browser. That’s not surprising, since it’s had more obvious development lately than anything else, and an equally strong marketing campaign behind it. (It’s hard to do anything Google-related without inadvertently installing Chrome or at least having an ad for it somewhere on the page.) A lot of the presumed future direction of the web is implemented first and best in Chrome now; things like efficient JavaScript and WebGL.  Chrome Experiments is a great place to see what is becoming possible. Continue reading

Posted in Rants | Tagged , | Leave a comment

More blogging statistics

Since this is EJRH’s 150th “postiversary”, and I have no other exciting news to share, I might as well make it a meta-blog post.  Soon after I started, I was a bit worried that 20% of all my blog posts were navel-gazing postings about the blog itself.  That hasn’t turned out to be true in the longer term.  It may feel that I’ve run out of stuff to write about, but I’ve felt that way before and yet here we are. Continue reading

Posted in Reflections | Tagged , | Leave a comment

Fixing a ZIP file

A certain firewall I know habitually chops off the last few bytes of some responses. It’s usually noticeable when large binaries are downloaded. Installers may fail to run (fortunately most of them contain an integrity check), and ZIP files may refuse to open. In the old DOS days, one could attempt a repair of such a ZIP file with a program called PKZIPFIX. Most modern archive managers do not include repair functionality. (It should be noted that repair can take two approaches: single-bit repairs using redundant recovery information; and salvaging of the unaffected files in an archive. Some archive formats, such as RAR, can carry recovery information, but only the latter approach is applicable to ZIP files.) Continue reading

Posted in Programming | Tagged , | 1 Comment

Beyond Hello World

A common pattern for introducing a programming language is the Hello World program.

But what exactly can you tell about a language from looking at its Hello World program? Continue reading

Posted in Programming, Rants | Tagged | 12 Comments