A friend sent me a link to QuantCup, which is “a quant trading themed programming contest”. Now, I have little enthusiasm for high speed trading, but I am always interested in programming.
Their first challenge is a price-time matching engine. This is a program that receives ask and bid “limits” (offers), and matches them against existing bid/ask limits, or queues them if there are no remaining matches.
There’s a nice specification of the problem. The contestant is to implement the functions given in a C header file; an inefficient version is provided as an example. The aim of the contest is to replace it with an optimised implementation. This is scored on both mean response time, and the standard deviation of response time.
There are three basic operations that need to be made efficient:
- Matching an incoming limit against existing ones.
- Queuing an unmatched limit.
- Cancelling a queued limit.
Unfortunately, the contest is open only to students at “top universities” (the usual US suspects are cited as examples of the right kind of institution).
I’ve never been in a contest, but I like the idea of them. So regardless of my eligibility I’ve begun implementing a version of the program.
The data structures used are combined linked list and skip list; and a hash table. The list contains price points ordered by price. Bids are at the low end, asks at the top. The two halves do not overlap, because overlapping bids and asks are matched as soon as they are entered. The price points each contain a linked list of orders, ordered be insertion time.
There is a limit to the number of live orders (MAX_LIVE_ORDERS), so the memory for these is preallocated (orders). Initially these are kept in a linked list, whose head and tail is kept in a structure (free_orders).
Cancelling entails looking up the order id in a hash table, to find the order structure. This is then removed from the queue. The hash table maps order ids to orders. This is necessary because a sequential order id is returned when a limit is placed, and this is used in the cancel command — if we could return pointers as ids, we could use them directly when an order is cancelled (but pointers would be reused, potentially reducing robustness in the face of repeated cancellations; whereas the example implementation, at least, handles these gracefully). At any point in time, each order is on the free list or in a hash table bin. The order ids that hash to the same bin are kept in a singly-linked list; a new order is appended to the end. This is done on the (possibly faulty) presumption that old orders are likely to be removed first.
Matching is done by finding the best ask/bid price point, and removing orders from it and subsequent price points as long as the incoming limit has value and the queued orders match it.
If the limit has value remaining after matching, a direct lookup is done to find its price point, if it exists. If not, a new price point is created and inserted into the queue, using the skip list structure. The order is appended to the end of the linked list attached to that price point.
The major outstanding optimisation difficulty is maintaining a good skip list. Currently the height of a node in it is determined randomly, since that’s easier than examining and updating the neighbours. When a node is removed nothing is done to optimise the remaining list. So it is possible for the skip list to degenerate over time.
The accompanying score program measures the latency for each order or cancel using predefined (but possibly randomly generated) sequence of about 35000 of them. The score is (mean(latency) + stddev(latency))/2, and is measured in nanoseconds. This is clearly dependent on the machine the program runs on. This is not so much an issue of fairness (since they will all be judged on the same machine), but uncertainty as to how well the competition is doing on the leader board.
On my netbook, the example program yields about 16275 and 16845 for each component, resulting in a score of 16560.
My attempt on the same machine gives about 2829 and 4051 respectively, scoring 3440. On my newer desktop, the numbers are 297 and 721, with a more respectable score of 509, which IMHO is not bad for a few hours work. I’m not sure why the standard deviation is so bad; it’s fairly sporadic, too. (To get these numbers I averaged three runs; standard deviation could be double between runs.) The computation should be completely deterministic (repeatable), so this suggests an external process or OS call is occupying that time.
Of course things will be different on the judging machine, if the program ever runs there.
Footnote. In C it’s easy to inadvertently write a nondeterministic (i.e. unrepeatable) program. A common example is a hash table in which the keys are pointers. If there is variation in initial conditions for things like heap and stack addresses, then pointers will be different and hash differently. For most operations this is hidden inside the hash implementation (since it will still conform to the interface of either containing an object, or not). But looping over the hash can return the items in a different order. In my case, this step was used in graph colouring, with the result that on different runs, a compiler would perform different — but equally correct — register allocation…