Some practical uses of goto

A few days ago there was another submission on Reddit on the evils of goto. The flow control primitive “goto” is “considered harmful” (the phrase was coined by the editor of the journal in which Dijkstra’s famous letter was published).

“Considered harmful” has become a meme in the programming community. So has the simplistic tenet that goto should never, ever be used, culminating in other memes such as this one. It’s not in itself funny, but it does illustrate the schooled aversion that modern programmers have to goto. (The cartoon comments on the attitude, rather than intentionally enforces it. But it still provides ammunition to those who say absolutely that goto should not exist.)

The problems with goto

Goto is a low-level primitive. At the assembly language level, programs are sequences of instructions, normally executed consecutively. There are occasional branch points where, depending in the result of a computation, the next instruction to be executed may optionally be one located elsewhere in the sequence, where it is identified by a label. And there are gotos, which unconditionally continue execution from a labelled instruction. At this level, gotos are inevitable, and harmless, because the code should be generated by compilers from higher-level programs.

But early “higher-level” languages used goto, too. There were two major problems with this. It lead to progams that were hard to read. The result was called “spaghetti code”, because the flow of control weaved all through the program, and each strand had to be individually followed by the reader for them to have any understanding of what the program was doing.

The other objection is more theoretical. Ideally, the correctness of a program should be provable (and actively proved by the programmer). This is still a tricky problem, and in practice most programs are not proved correct.

Structured programs are composed from smaller programs. If the correctness of smaller programs can be proved, and we can have rules for analysing a compound program in terms of its parts, then we should be able to prove its correctness. So, for instance, Tony Hoare’s logic for program correctness has inference rules like the following:

Which essentially says, if P holds true before and after the execution of S (and B also holds true before S), then P will hold true before and after the execution of while B do S. The programmer’s job is to define P as a useful invariant which can be used to prove that the loop performs a correct computation.

In structured programming, the entire program is composed of these structures; and, with suitable invariants, its correctness can be proved.

What about goto? The problem with goto is that there is no longer an obvious recursive structure. It is harder to find self-contained subprograms whose correctness can be proved, and hence harder to prove the correctness of the whole program. The Knuth paper from the Reddit submission contains logic rules for gotos and labels; but the potential for overlapping code between gotos and their targets is still a problem.

How goto is used in practice

Although I like to think I use goto in sensible ways, I thought I should conduct a survey of my codebase. The keyword goto occurs in 24 places in my C code, in the projects fractal, compiler, fs, and svnindex. There are three ways in which it is used.

Cleaning up

The first is a jump to cleanup code before exiting a function, of this form:

int f(void)
{
    void *stuff = malloc(1000);

    if (!try_something())
    {
        result = -1;
        goto cleanup;
    }

    result = do_other_stuff();

cleanup:
    free(stuff);
    return result;
}

This occurs in svnindex, in code I copied from other parts of Subversion (in fact there is a helpful comment there, saying as much). Subversion has an exception framework in which the return value of a function is an exception (or NULL if no exception is thrown). But being C, it’s also the case that memory allocated for exceptions must be manually freed. In this piece of code an exception is thrown and handled in preceding lines; the cleanup code simply clears that previous exception before returning the final exception result of the function.

Jumping to cleanup code is a classic permitted use of goto in C. In this case it is a little more complicated because of the exception logic, but basically it’s just freeing some memory that was allocated earlier in the function.

Retrying an operation

The majority of my gotos are in this form:

int f(void)
{
    int rv;
restart:
    rv = do_something();
    if (specific_unusual_condition)
    {
        goto restart;
    }
    return rv;
}

I use this form when the operation in question is something that almost always succeeds the first time, but occasionally something arises part way through that means we need to start again. Structured programming purists would probably insist on replacing it with a while loop:

int f(void)
{
    int rv;
    int successful = 0;
    while (!successful)
    {
        successful = 1;
        rv = do_something();
        if (specific_unusual_condition)
        {
            successful = 0;
            continue;
        }
    }
    return rv;
}

But, not only is that harder to read, it obscures the qualitative fact that, normally, we expect the loop to run exactly once. It also requires an extra variable because the structure forces the head of the while loop to have two possible roles in the flow control. An alternative is a do-while loop:

int f(void)
{
    int rv;
    do
    {
        rv = do_something();
        if (specific_unusual_condition)
        {
            continue;
        }
    }
    while (0);
    return rv;
}

Which removes the variable. But the structure is identical to the goto implementation. The only difference is that we use the head of the loop as the label, and the word “continue” instead of “goto”. The loop’s test is vacuous and required only by the syntax.

A dubious case

Finally, there is one instance of goto in my compiler where it is used somewhat differently. There is a part of the compiler that emits an arbitrary control-flow-graph as a linear sequence of instructions. It has a loop like this:

while (!all_done)
{
    vertex = pop_vertex();
    if (done(vertex))
    {
        continue;
    }

do_next:
    emit(vertex);
    done(vertex);

    if (!next_vertex or done(next_vertex))
    {
        continue;
    }
    else
    {
        vertex = next_vertex;
        goto do_next;
    }
}

(It is only coincidence that the data the program operates on is also program code that whose structure is undergoing transformation. The general problem is that of a directed graph being listed in an efficient format, where edges are inferred to exist between subsequent nodes unless indicated otherwise.)

The loop effectively has two heads: one for the case where the next instruction follows immediately on from its predecessor, and one for when a new subsequence of instructions begins. Depending on the instruction emitted, the loop with either goto the do_next label for an immediately following instruction, or will continue to the top of the while statement for a new sequence.

I suspect in this case that the goto is unnecessary, and that it should be rewritten as a nested while loop.

However, the algorithm is inherently complicated. It is still necessary to understand all possible cases of how a vertex related to its successors, and cover them appropriately. There have been bugs in this bit of code where that wasn’t the case! Regardless of while loops, it is spagehetti-like code and should be rewritten.

A question on compilation

There is a well-known theorem in structured programming: any occurrence of goto in a program can be rewritten to use structured flow control with the addition of a finite number of boolean variables. The rewritten “restart” loop shown above provides an example of that.

Conversely, some structured programs should be able to be rewritten using gotos, with the elimination of a boolean variable.

If I was going to rewrite my restart loops as while loops, I wondered if the compiler would be able to optimise away the extra control variable.  Doing so could reduce register pressure, let more important variables remain in registers, and result in a faster program.  Take this bit of test code:

extern int check();

extern int work();

void goto_test1(void)
{
restart:
    if (check())
        goto restart;

    work();
}

void goto_test2(void)
{
    int successful = 0;
    while (!successful)
    {
        successful = 1;
        if (check())
        {
            successful = 0;
            continue;
        }
    }

    work();
}

In gcc with no optimisation, the functions are compiled as:

_goto_test1:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        jmp     L2
L4:
        nop
L2:
        call    _check
        testl   %eax, %eax
        jne     L4
L3:
        call    _work
        leave
        ret

_goto_test2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movl    $0, -12(%ebp)
        jmp     L6
L7:
        movl    $1, -12(%ebp)
        call    _check
        testl   %eax, %eax
        je      L6
        movl    $0, -12(%ebp)
        nop
L6:
        cmpl    $0, -12(%ebp)
        je      L7
        call    _work
        leave
        ret

The second, properly structured function spends instructions on setting and checking that control variable. But, with even a low level of optimisation (-O1), the output is:

_goto_test1:
L2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
L3:
        call    _check
        testl   %eax, %eax
        jne     L3
        call    _work
        leave
        ret

_goto_test2:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
L6:
        call    _check
        testl   %eax, %eax
        jne     L6
        call    _work
        leave
        ret

The compiler is smart enough to tell that the value of the variable depends entirely on the path taken through the program, and when the structure is rewritten, the variable is superfluous.  It is optimised out.

This demonstrates that concern about extra boolean variables, at least, is not an excuse for using gotos.

Advertisements
This entry was posted in Programming and tagged , . Bookmark the permalink.

3 Responses to Some practical uses of goto

  1. Pingback: Struggling with line number problem « cartesian product

  2. Pingback: Line numbers problem solved, after a fashion « cartesian product

  3. Pingback: What’s this? Oh, ‘That’! | Yeah, Right!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s