Objects in C

As the first topic for Higher-level C, I’ve picked a big one: objects.

User-defined objects in a language tie into many other topics, such as polymorphism, interfaces, inheritance, data-hiding. This is quite an intimidating list, so I’ll start with basic object use, and see about the harder stuff later.

Although one can argue about whether true object-oriented programming — objects as independent pieces of code that send messages to each other — has caught on, objects and classes have become a feature of most modern languages.

I want to create objects for these reasons:

  • User-defined types: languages provide the primitives, but algorithms are often better expressed using more specialised values.
  • Encapsulation: defining data and behaviour close to each other, and keeping related data together.
  • Polymorphism: being able to write code that works on several types regardless of their internal differences.

The technique covered in this post will aid the first goal and part of the second.  Polymorphism in C will be discussed in a later post.

Basic technique: struct

struct Planet {
    double x, y;
    double mass;
    char *name;
};

This creates a struct called Planet, which is effectively a type called “struct Planet”. It’s more convenient just to make a type directly, as:

typedef struct Planet {
    ...
} Planet;

Note that the name Planet on line 1 isn’t necessary in this case, but it is required for self-referential types. For instance, if each Planet also had a member called parent, it would be defined as “struct Planet *parent”. Use of struct Planet (as opposed to just Planet) tells the compiler unambiguously that it’s a type; Planet will not work, because Planet does not exist as a type until the typedef statement as a whole is completely processed.

We now have a type called Planet; so we can now declare variables like:

Planet p; /* resides on stack/static memory. */
Planet *q = malloc(sizeof (Planet)); /* new object, resides on heap. */

The problem with these is that they don’t initialise the object at all; the values of their fields are garbage or, at best, null/zero. It’s best to always use a constructor to create the object. But this only works for heap allocation. So we have two choices:

  1. Don’t provide for stack/static allocation, thus leaving that up to the caller. Letting the caller do this is highly undesirable since it’s error prone and limits our ability to hide the implementation).
  2. Provide a separate initialiser function, that can be used to initialise an object that has already been created. The problem is the caller has to remember to call it.

I generally prefer option 1, since it means there is a single official way of creating an object. And, if I resolve never to use static/stack allocation, there is no risk of me declaring an object and forgetting to initialise it.

A constructor looks like this:

Planet *create_planet(double x, double y, const char *name) {
    Planet *p = malloc(sizeof (Planet));
    p->x = x;
    p->y = y
    p->name = strdup(name);  /* Use own copy of the name, in case name gets freed or changed after Planet is created. */
    return p;
}

In symmetry with the above, it’s wise to immediately write the destructor:

void destroy_planet(Planet *p) {
    free(p->name);
    free(p);
}

(I hasten to point out that destroy_planet destroys a Planet object, and not the habitable body that the program’s host computer may reside on. Destructors can, of course, contain additional cleanup code that acts on external resources if the program required it. But in general, the scope of a destructor is to remove a representation of something rather than the thing itself.)

Obviously, for some objects, the destructor does not do anything except free the object pointer itself. But it’s always a good idea to use the destructor explicitly rather than saying free(p), because the internal composition of the object is really not the concern of the caller, and may one day contain members that require their own special cleanup. Additionally, the name “destroy_planet” has a semantic use that helps the programmer understand that a Planet object is being destroyed here, rather than the vague deallocation of heap memory implied by free.

Efficiency considerations

Structs are translated to contiguous blocks of memory with a fixed offset for each member.  Access to a member is reasonably efficient:

double xpos = p->x;

The unoptimised code for x86 (generated by GCC) is:

movl    28(%esp), %eax   # 28 is the offset of p on the stack
fldl    8(%eax)          # 8 is the offset of the membe x

So you can see that it doesn’t entail much work — the only additional work is loading the object pointer into a register first, and then locating the member relative to that register rather than relative to the stack pointer or data segment.  (This is in contrast to some languages implementations where member offsets are not fixed inside an object, but must be looked up at runtime.)

One final note on efficiency: for intensive data processing, the arrangement of memory can be significant. The main component of this is locality of reference. Reading a member of a struct into will tend to load adjacent members into the cache too, which is great if several members tend to be needed at the same in a tight loop.

However, if a single member is processed over all objects, caching is more effective if that data is in contiguous memory. Putting objects into structs impedes this. In very extreme cases, it can pay to put a member out of the struct into an array, as in the technique below.

Alternative technique: no objects!

If structs did not exist (and in languages with no record type, or in programs written by programmers who started in BASIC :p), each field would be a separate array:

/* A big predefined maximum which ought to be enough for anyone! */
#define MAX_PLANETS 100
double xs[MAX_PLANETS];
double ys[MAX_PLANETS];
char *names[MAX_PLANETS];

Each planet is identified by an index into those arrays, and a planet is allocated by picking the next unused index.

There are a number of problems with this approach:

  • The predefined maximum might not be enough for some applications; but making it larger means more memory is reserved for one kind of object that might be needed later for another kind in the same program.
  • Every member is in the global namespace — there’s no prefixing it with a object variable, so the name cannot be reused for other variables.
  • Any housekeeping for allocating objects, etc. needs to be kept in sync for every member.

Unfortunately, there are plenty of C programs written like this.  In my organisation they often seem to have been ported word-for-word from Fortran.

This entry was posted in Higher-level C and tagged , , . Bookmark the permalink.

3 Responses to Objects in C

  1. Pingback: Higher-level C | EJRH

  2. Kris Price says:

    Interesting post, can’t wait to see more on the subject :)

  3. Pingback: Encapsulation in C | EJRH

Leave a comment