Namespaces in C

This post describes why namespaces are useful in programming. It also discusses some of the obvious ways of simulating them in C, including a technique for “reifying” them, using structs.

A namespace is a set of names of objects in a system; it provides a way to disambiguate its objects from those with similar names in other namespaces. Namespaces are useful in larger programs, especially as a way to avoid clashes between symbols from independently-developed modules and libraries.

They also have the benefit of organising code semantically. When the namespace of a function, for instance, is visible in the code, it is obvious where a reader of the code should look for more information. That can apply both to external documentation, and to the implementation of the function in other source files.

Unfortunately, C has little namespacing functionality at the language level. In the case of identifiers (functions and variables), there is one namespace, subject to scoping rules. And any identifier that must be shared between two compilation units is necessarily shared between them all: when you “extern” an identifier, you are using the link-level namespace.

Naming conventions for namespaces

The common workaround to C’s lack of namespaces is to use a standard name prefix for each module. The prefix indicates the app, library or module the name belongs to.

For example, in Subversion you have names like svn_fs_initialize — this indicates that the function belongs to the svn_fs library. Subversion describes several layers of namespacing conventions. svn_fs_initialize is an example of a name in the public API. Names that need to visible to other modules within a single Subversion library use a double underscore to separate the namespace, for example, svn_fs_base__dag_get_node. Finally, non-exported functions used within a single module do not use a prefix.

Note that use of double underscores within names is discouraged for application programming; it is reserved for the “implementation” (often read as compiler and standard library). Subversion is a set of libraries, however, and its use of double underscores is well-defined and always alongside the prefix “svn”, so it is arguably safe in their case. In my code I have often used it to separate a class name from the names of methods, e.g. galaxy__add_star, as in my earlier post on encapsulation, and in the examples below. Cleaner code would avoid using double underscores like this, though.

There is a tradeoff between prefix uniqueness and verbosity. A very short namespace prefix runs the risk of conflicting with other libraries. A long one expands the size of the code and encumbers features like autocompletion — you need to type the whole namespace before the list of possible matches begins to narrow from the full set of symbols in your library.

Reified namespaces with structs

Can we separate the namespace from the name by something more tangible than a few characters? Instead of writing “namespace_object”, and relying on a human parser to recall that “_” separates the two parts, could we write the parts as separate names with a language symbol between them? After all, that is what other languages like C++ (“::”) and Python (“.”) do.

C++ is a static language in which the location and scope of every name is determined during compilation; “Namespace::Object” indicates that the object is defined in that namespace.

Python, on the other hand, is a dynamic language and does not have explicit namespaces (it does have scopes, however). When objects are created, the can be assigned to properties of some other object; the parent effectively becomes a namespace. This is what Python modules are: they are objects with properties, such as the module “os”, with its property “os.getcwd”.

C does not have an operator “::”. But it does have “.”: this is used to access a field in a struct. If we define a suitable struct, and make an instance of it with appropriate initial values, we can use it as a namespace. The namespace becomes reified (made real) in the sense that it is no longer a convention on the first few letters in each name, but is now a separate symbol in the program.

/* galaxy.h */
#ifndef GALAXY_H
#define GALAXY_H

typedef struct STAR { ... } STAR;

typedef struct GALAXY { ... } GALAXY;

extern GALAXY *galaxy__create(void);
extern void galaxy__create(GALAXY *galaxy);
extern void galaxy__add_star(GALAXY *galaxy, STAR *star);

static const struct
{
    GALAXY *(* create)(void);
    void (* destroy)(GALAXY *galaxy);
    void (* add_star)(GALAXY *galaxy, STAR *star);
} galaxy = {
    galaxy__create,
    galaxy__destroy,
    galaxy__add_star
};

#endif /* GALAXY_H */

The user of the module can refer to its functions using the namespace. It is obvious that galaxy.add_star is a member of the galaxy “namespace”. As long as no other symbols called galaxy are defined, there is no risk that add_star will conflict with another symbol in the program. The full name of the function is galaxy__add_star which obeys a traditional namespace convention.

/* main.c */
#include "galaxy.h"

int main(void)
{
    /* Every call to a galaxy-related function is prefixed by the "galaxy" namespace. */
    GALAXY *g = galaxy.create();
    galaxy.add_star(g, s);
    galaxy.destroy(g);
}

Within the galaxy module, it is possible to avoid use of the namespace and refer to galaxy__add_star directly. This may make a difference to performance (discussed below).

/* galaxy.c */
GALAXY *galaxy__load(const char *filename)
{
    GALAXY *g = galaxy__create();
    ...
    while (!feof(f))
    {
        STAR *s = read_star(f);
        galaxy__add_star(s);
    }
    ...
    return g;
}

Tradeoffs

There are some downsides to this approach.

The first is that the writer of the module needs to write a lot of header file boilerplate. Secondly, the users of the module need to use the namespace in every case — there is no easy way import its symbols into their own code, as can be done with C++, Java or Python. They could refer to the symbol by its full name, but this is approximately as verbose.

There are limitations too. Types (structs, enums, typedefs) cannot be put in a namespace, because there is no way to represent them by variables. So we cannot have a type called “galaxy.GALAXY”.

Having modifiable global variables in the namespace requires thought. As implemented above, the namespace becomes a distinct static object in every module that includes the header file. If it were to contain a variable (e.g. galaxy.enable_tracing), each module would have its own private copy of a “global” variable; setting galaxy.enable_tracing in main.c would not have any effect on the tracing code in galaxy.c.

For a namespace struct to support mutable global variables, it needs to be defined in one place only and referenced in all the compilation units that use it. So it cannot be a static header object. Instead, the above template becomes:

/* galaxy.h */
#ifndef GALAXY_H
#define GALAXY_H

typedef struct STAR { ... } STAR;

typedef struct GALAXY { ... } GALAXY;

struct galaxy_struct
{
    GALAXY *(* create)(void);
    void (* destroy)(GALAXY *galaxy);
    void (* add_star)(GALAXY *galaxy, STAR *star);
    int enable_tracing;
};

extern struct galaxy_struct galaxy;

#endif /* GALAXY_H */

/* galaxy.c */
#include "galaxy.h"

/* The single instance of the namespace object. */
struct galaxy_struct galaxy =
{
    galaxy__create,
    galaxy__destroy,
    galaxy__add_star,
    0
} galaxy;

This also moves some of the verbiage out of the header and into the private implementation file, which is useful, too. It means the full names of the library functions can be shortened, since they no longer need to be externally visible (assuming they are defined in the same module as the namespace instance).

Efficiency considerations

Using the first implementation of a namespace, as a static variable in a header file, means that each compilation unit that includes that header gets its own private copy. The const qualifier means the compiler can, in theory, inline all uses of the struct members with the functions themselves. Does this happen in practice?

#include

void f1(int x)
{
    printf("f1 %d\n", x);
}

void ns_f2(int x)
{
    printf("f2 %\n", x);
}

static const struct {
    int filler;
    void (* f2)(int x);
} ns = { ns_f2 };

struct nsx_struct {
    int filler;
    void (* f3)(int x);
};

extern struct nsx_struct nsx;

int main(void)
{
    f1(1);
    ns.f2(2);
    nsx.f3(2);
}

Without optimisation, the three calls are compiled by GCC to:

; Call f1, directly
	movl	$1, (%esp)
	call	_f1
; Call ns.f2 via the local, static namespace
	movl	_ns+4, %eax
	movl	$2, (%esp)
	call	*%eax
; Call nsx.f3 via the external namespace
	movl	_nsx+4, %eax
	movl	$2, (%esp)
	call	*%eax

So, calling a function in a namespace requires looking up the function in the struct, including applying its offset, and calling it via a register.

At optimisation level 1 and higher, this becomes:

; Call f1, directly
	movl	$1, (%esp)
	call	_f1
; Call ns.f2 via the local, static namespace
	movl	$2, (%esp)
	call	*_ns+4
; Call nsx.f3 via the external namespace
	movl	$2, (%esp)
	call	*_nsx+4

In practice (with this compiler), the local namespace and the external namespace call compiled to the same instructions. Testing showed that if f2 is the first element in ns, then the call to f2 can be optimised to a direct call (as with f1). However, adding filler prevents this optimisation in GCC. I suspect this is because it is difficult to express the jump to an offset of a symbol at the link level.

Further reading. This StackOverflow question has more discussion on the above technique, and additional suggestions for C namespaces.

This entry was posted in Higher-level C and tagged , , . Bookmark the permalink.

6 Responses to Namespaces in C

  1. Pingback: Check mate | EJRH

  2. andrew cooke says:

    thanks for putting this online; i was wondering about exactly the same question today and it was helpful to read someone else’s thoughts.

  3. andrew cooke says:

    i ended up with a slightly different approach that gives a “static” import – see http://isti.bitbucket.org/2012/06/09/isti-libc.html

    • ejrh says:

      Thanks Andrew, that looks like a neat trick! It means the implementation can have a fully distinguished prefix, but the usage of it can use a shorter one within that file.

  4. The problem of the second approach is that if the parameters of the pointer functions in the header file is different from that in .c file you will not know!
    no errors no nothing!!!

  5. Danny Chung says:

    Instead of defining functions in a structure, just use prefix_fuction. Then place the prototype in the header. That get rid of the inefficiency.

Leave a comment