Progress on fractals, and a minor (but arguably obvious) optimisation I stumbled upon today.

One of the “next steps” for my Fractal project is improving the colour scheme; my tentative plan for this is to allocate colours in a uniform distribution. The algorithm is:

- Sort the
`N`pixel values (on my desktop’s screen,`N`= 8294400). - Allocate a colour map, of size
`k`(e.g.`k`= 256 for 256 distinct colours). - Set
`M`_{i}in the map to value`i`(`N`/`k`), i.e. to the`i`‘th`k`‘ile of the sorted values. - To colour a pixel with value
`x`, do a binary search on the map, finding the the greatest index`i`where`M`_{i}≤`x`.

Sorting is the most expensive step (at around O(`N` log `N`) operations). But I’m using the standard `qsort`

function for that, so there’s little point trying to optimise it (except my own simple comparison function). The other expensive step is mapping each pixel, at a total cost of around O(`N` log `k`). I’m writing the binary search myself, so some optimisation is in scope. The initial code is:

/** Map a value to an index in the colour map. * * @param x value to map * @param map the colour map (e.g. built by @a build_colour_map) * @param map_size the size of the map * @return the index of this value in the map, an integer between 0 and @a map_size - 1. */ int map_colour(float x, float *map, int map_size) { int p = 0, q = map_size-1; /* What a pity that bsearch doesn't do range searches! It's bound to be better code than my off-the-cuff binary search... */ while (q > p) { int mp = (p + q)/2; /* mp and mp+1 are valid indices here, because p < q and mp is rounded down, hence the greatest mp can be is q-1. */ if (map[mp] <= x && map[mp+1] > x) return mp; else if (map[mp] > x) q = mp; else p = mp+1; } return 0; //TODO not sure if this is the best thing to return! }

(I saw on Ohloh.Net that my codebase is in the worst 10% of the population for number of comments, so I’m making a special effort! Comments notwithstanding, the code has not been thoroughly tested and so may not be correct…)

Whilst prematurely examining the generated assembly code, I saw strange sequences of instructions like this:

movl %eax, %edx shrl $31, %edx addl %eax, %edx sarl %edx

Why’s the compiler doing that? It knows it can use the “shift arithmetic right” instruction as a fast way of dividing by 2, but it’s not safe to use it on what *could* be a negative number. So, the compiler is extracting the sign bit (from position 31), adding it the number, and then using shift right. The only difference is that, if the input is negative, the result will be rounded differently; it is rounded towards zero, instead of downwards as is done by the pure `sarl`

instruction. (See the closing paragraph of “History and details” in the afore-linked Wikipedia article.)

But all this is care is unnecessary when we know there aren’t going to be any negative values. If we place the signed integers in this function with unsigned ones , that part of code becomes much more straightforward:

sarl %edx

Does it make a difference to performance, though? A simple benchmark gives the following results:

int | unsigned int | Improvement | |
---|---|---|---|

Time in seconds | 0.5883 | 0.5065 | 16% |

The benchmark consists of generating random pixel values, building the map, and then mapping each pixel value. The time for the program *sans* mapping is subtracted from each; so the 16% improvement applies entirely to the mapping stage. `N` was 10000000, and `k` was 256. The times shown are the average of 5 runs.

It’s worth putting a bit of thought into when signed integers are in fact necessary. One obvious example would be when -1 is used in an edge case; for instance if (in a similar use case) I wanted to return the index of the greatest value less than the argument, -1 would be a reasonable answer if the argument was equal to the first answer. (And of course the rest of the program would have to act on the assumption that an index outside those in the array could be returned.)

In this case, signed integers are not required. And one tiny, tiny part of my fractal program will run a little more efficiently.

That’s quite interesting that using a signed value has such a huge impact on the code. Though I suppose if you were dividing by any other number it wouldn’t make a difference. But that’s a good thing to know for optimization.