Encapsulation in C

A previous post in my Higher-level C series introduced basic object creation, using structs. Here I’d like to elaborate on how to define and organise code that operates on objects.

Encapsulation is the combination of code with data. Essentially, if the caller has an object, they should be able to find the functions that can be applied to it the same way they find the data that comprises it.

There are several facets to encapsulation:

  1. Organising code so that functions that operate on an object type are close to the definition of the type. It can be close in terms of source location, for instance, in the same file. Closeness can also be accomplished with a naming convention, such as common prefix for methods that operate on the type.
  2. Including the code as part of the object definition: methods.
  3. Using encapsulation in conjunction with data hiding to ensure that all access to object data by the caller is through methods on that object.
  4. Including the object as part of a method: closures. This is not directly possible in C, since a function pointer points only to a statically defined piece of code, and cannot carry information about the target object or other parameters.
  5. Polymorphism: finding the right method to apply to an object whose precise type is not known at compile time.

This post will discuss the simpler goals of organising the code related to a type, and of using data hiding. The term method will be used to mean a function whose primary argument is an object: this function will be considered a method of that object’s type.

I’ll go into closures and polymorphism in future posts.

Technique 1: naming conventions

The first technique is simply to use a disciplined naming system. There are two levels to this: file organisation, and function/type organisation.

Organise related types and methods into files with meaningful names. An idiom in C is, for each major type, to have two files: a header file containing the type definition and extern declarations of its methods; and a source code file containing the implementations of those methods.

Give consistent names to type and method names. For instance, start each method with a prefix that is similar to the name of the type. An example:

/* In the header file galaxy.h:
typedef struct GALAXY { ... } GALAXY;
extern GALAXY *galaxy_create();
extern void galaxy_destroy(GALAXY *galaxy);
extern void galaxy_add_star(GALAXY *galaxy, STAR *star);
...

This adds helpful information to the program: the methods are part of the semantic definition of that type. They are the recommended means of interacting with objects of the type (and the only means when data hiding is in effect; see below). At a more practical level, anyone reading the code can tell from the name of a method where to find it in the source — it’s a GALAXY method, so it must be in galaxy.c!

There are several competing case conventions for types and methods. Unlike Java — whose official style guide specifies TitleCase for classes and camelCase for variables and methods — C has no dominant convention. My own C program have tended to use UPPER_CASE for type names, and lower_case for variables and functions, both using underscore as a separator. But it is almost always a good idea to follow existing conventions when adding to a program.

A naming convention is a simple, useful and — in existing C code — widespread technique of associating types with their methods. This post does not intend to prescribe any one convention or coding style, except to recommend that style be consistent within a given program.

Technique 2: function pointers

If polymorphism is likely to become important, then it may be worth using the more flexible technique of making the methods part of the object. In C this is achieved with function pointers.

/* In the header file: */
typedef struct GALAXY {
    ...
    void (* add_star)(GALAXY *galaxy, STAR *star);
    ...
} GALAXY;
extern GALAXY *galaxy_create(); /* Constructor is unchanged since must be called without reference to an existing object. */

/* In the implementation: */
static void galaxy_add_star(GALAXY *galaxy, STAR *star) { ... }

GALAXY *galaxy_create()
{
    GALAXY *g = malloc(...);
    ...
    g->add_star = galaxy_add_star; /* Method is set when object is created. */
    ...
    return g;
}

Calling the methods is done as follows. Note that the object must be mentioned twice: once in finding the method, and again as an argument in calling the method.

GALAXY *galaxy = galaxy_create();
galaxy->add_star(galaxy, star);

There are costs to using function pointers like this, most importantly in redundant syntax, but also in runtime performance. The former cost is a matter of subjective opinion and may be considered worth it, especially if it provides consistency with other polymorphic parts of the program. The performance cost is quite small: a few extra instructions will be executed when the method is called.

Data hiding

What are the benefits of hiding data (or other implementation details) ? Two spring to mind:

  1. It reduces the amount of detail that programmers are confronted with when trying to use a type.
  2. It can help to prevent over-dependence by the programmers on the implementation of that type.

The brute force method is to use void * pointers outside the implementation. This unnecessarily discards both type safety and semantic information, however. Types should at least be distinguishable by intention, even if their implementations are hidden.

A particularly nice trick exploits the fact that pointers to structs can be used without knowledge of the struct declaration.

/* In public header file calcstate.h: */
typedef struct CALC_STATE CALCULATION_STATE;

extern void calculation_state_update(CALCULATION_STATE *state, double timestep);

/* In implementation file calcstate.c: */
typedef struct CALCULATION_STATE
{
    double current_time;
    int num_bodies;
    ...
} CALCULATION_STATE;

void calculation_state_update(CALCULATION_STATE *state, double timestep)
{
    /* implementation has access to full definition of CALCULATION_STATE */
    state->current_time += timestep;
    ...
}

/* In code that uses CALC_STATE: */
/* The writer of this code does not need to know anything
   about CALC_STATE except its name and what its methods are!
   The maintainer of CALC_STATE can then change the
   implementation without this programmer having to rewrite
   a single line of code. */
CALC_STATE *cs = calc_state_create();
calc_state_update(cs, 0.42);
...

There are some costs to this technique, but they are all minor:

  1. The struct declaration (but not its full definition) has to be written twice: once in the public header file and once in the implementation (in the source file or in a private header file).
  2. Access to member data by the caller must be via getter and setter methods, which is slightly less efficient than direct access.
  3. It can be harder to debug a program when the implementation is hidden from the caller.

With reasonable exceptions on a case-by-case basis, this technique should probably be the default policy for large-scale C programs.

This entry was posted in Higher-level C and tagged , , , , . Bookmark the permalink.

6 Responses to Encapsulation in C

  1. Pingback: Higher-level C | EJRH

  2. phoxis says:

    This is a very good post. I am a C programmer and i structure the problems in almost the style you have described in this post.
    A naming convention is always needed which should contain the object name and the work which it does separated by an underscore. Although i follow the POSIX style of appending an _t to the end of a new type. For example mytype_t .
    For polymorphic functions i think a manual static polymorphism is also possible, which does not follow as defined in OOP approach. Like say you have a function fun which have two variants then name the functions with fun_1, fun_2 and consider these functions polymorphic and call them as needed. In this case the static linking of the function code is done by you instead of the compiler.
    Dynamic polymorphism examples can be seen in the OS VFS codes, where the correct filesystem manipulation functions are assigned to the function pointers.
    We can make private and public functions by making an entire object defined in a single file and making the object data members global. To make a data member or function private qualify it with static, and for public qualify it with extern. This makes the structure almost like we use in OOP languages.

    • ejrh says:

      Thank you! I’m hoping to go into polymorphism more in the next post. The VFS is an excellent example, I’ll try to use it. Apart from vtables of function pointers, the other main form of polymorphism is simple switch statements, which I’ve seen used in several large projects. I’m planning to contrast the two approaches.

  3. Pingback: On C++ | EJRH

  4. Pingback: Namespaces in C | EJRH

  5. HellMaster says:

    Thanks you for this post. I found it very interesting and educative. But there are two things which are puzzling me. :-)
    First why do you have to typedef the struct twice? Wouldn’t it be possible to typedef the struct in header as:

    typedef struct CALCULATION_STATE CALCULATION_STATE;

    And then in your c-file just declare it as:

    struct CALCULATION_STATE
    {

    }

    And the second thing, why are you declaring the functions in your header file with extern keyword?
    Maybe stupid questions, sorry for that, I’m not that skilled and I’m just missing something. :-)

Leave a comment