Why string operations should not be done using floating point.

Today at work I’ve been “sanity checking” some Python code to load trade data. To me, a *sanity check* is a brief check for well-formedness and obvious errors, rather than a correctness proof. In the case of data input it could be verification against a grammar or schema; for a calculation it’s checking for known relationships between values. This implies knowing what some of those relationships should be, of course!

Although I had no specifications there was a part of the program that deserved close checking. It’s a formula to convert a price in “display format” to a numeric value. The display format is interpreted as:

Display fomat: XXX.YYZ Numeric value: XXX + YY/32 + Z/32/8

Where XXX are digits in base 10, YY is a digit in base 32, and Z is a digit in base 8. For example, “100.051” = 100 + five 32nds + one 8th of a 32nd = 100.16015625.

Why they do this is unknown to me. Finance and accounting is full of crazy definitions. At least this one is merely confusing, inconsistent with mathematical convention, and prone to misunderstanding. (A previous case I’ve seen involved a risk measure defined as the square root of a monetary amount, expressed as a monetary amount. Dimensional analysis shows that such a figure is completely meaningless, because if sqrt($1) = $1, and sqrt(100 cents) = (10 cents), then $1 = 10 cents.)

Anyway, the original code translated the format by cutting it up using `int`

and `% 1`

(i.e. the fractional part of a number). Here’s one implementation (admittedly my own first draft of the algorithm as explained to me):

def convert_price2(p): """Price is displayed in the format XXX-YYZ (or XXX.YYZ) which is converted to a value as: x = XXX + YY/32 + Z/32/8 i.e. XXX are digits in base 10, YY is a digit in base 32, and Z is a digit in base 8 (crazy isn't it!).""" x = int(p) q = (p % 1) * 100 y = int(q) z = (q % 1) * 10 return x + y/32.0 + z/32.0/8.0

What happens when the display format is: 100.020? First it gets parsed as a floating point value. But how’s it stored? Not all real values can be stored precisely in floating point. And even apparently round (in decimal at least) values like 0.02 can’t be stored. This nearest floating point value is 100.0199999999999960209606797434389591217041015625. The flow of calcuation is:

Variable | convert_price2 | Expected |
---|---|---|

p | 100.0199999999999960209606797434389591217041015625 | 100.020 |

x | 100 | 100 |

q | 1.99999999999960209606797434389591217041015625 | 2.0 |

y | 1 | 2 |

z | 9.9999999999960209606797434389591217041015625 | 0 |

Return value | 100.0703124999999857891452847979962825775146484375 | 100.0625 |

A better algorithm would treat it as a string and extract the groups of digits properly. It’s not elegant, exactly, but it gives the right results.

def convert_price3(p): """Price is displayed in the format XXX-YYZ (or XXX.YYZ) which is converted to a value as: x = XXX + YY/32 + Z/32/8 i.e. XXX are digits in base 10, YY is a digit in base 32, and Z is a digit in base 8 (crazy isn't it!). N.B. Perform calculations on substrings rather than fractions if you want to avoid floating point errors!""" s = '%07.3f' % p x = int(s[0:3]) y = int(s[4:6]) z = int(s[6:7]) return x + y/32.0 + z/32.0/8.0

I guess it’s another example of why some computations (financial ones in particular) shouldn’t be done in floating point.

But I prefer to see it as an example of why financial computation is often just broken. If you try to extract digits from floating point using arithmetic, you have to expect this kind of behaviour. And if you use floating point for storing hetrogeneous-base numbers, you’re just crazy.

Finally, to compound my digit-shuffling woes, I’ve been asked to implement the reverse algorithm in an SQL query. Dealing with this format has certainly come close to a check on my sanity! :p