[ts-gen] bug fix for debian lenny and other systems with gcc 4.3.2

Bill Pippin pippin at owlriver.net
Tue Mar 30 20:07:55 EDT 2010


Today's release fixes a bug in the shim that impacts users with debian
lenny, and other systems where g++ is still back at version 4.3.2.  The
bug does not show up with newer gcc, which is in part how the flawed
code was released originally.

The *next* release will include a dbms version increment, and
non-backward compatable changes to journalling, so you probably want
to download this tarball to give yourself an interim checkpoint.

The fix occurred as part of a thorough refactoring of the hash table
component template class, and the remainder of this post after the next
paragraph is about that refactoring.  For those using debian lenny and
the like, please keep in mind that those details may not concern you;
the actual fix was only a minor part of the refactoring.

The bug was in two parts; the hash table component did not adequately 
filter client hints about the desired starting size for the table, and
a client constructive call used just such an improperly small starting
size.  The code used to work because older client code did have a valid
size request, and the failure to check in the component was not observed.
At some point I changed the client code, though only after a newer version
of gcc included some change that allowed the component code to work.  As
part of the hash table refactoring, that table now uses the same handle
component other block-doubling containers use, and that handle class
includes a filter step for requested start sizes, so this bug can not
re-occur without rapid and dramatic symptoms, i.e., a crash locally on
the first try, so, it won't be able to escape again.

Debian lenny users may stop reading if they so desire, and, now, from
the NEWS file:

      * Refactor hash map to use the same block-doubled handle objects
        as other containers use ...  There are other significant
        internal changes to the hash map class, though none other fix
        known bugs as with the thinko mentioned above.

        ...

        Unusually large collision sets spill to a splay tree --- tests
        show that the max extent size of eight cells for 64 bit hardware
        rules out all but one such case --- and extents as well as the
        hash array are allocated by a container-specific allocator from
        the block handle.

        Note that for 32 bit hardware, where MaxExtent decreases to
        four cells, there is a non-trivial number of spills, e.g., in
        the high single digit percentages, to the splay tree.

In other words, with respect to the last two paragraphs, if you are
deeply concerned about performance, consider using a 64 bit as opposed
to 32 bit system.  I've made performance tests of the new code, and
given the number of symbols in the database, and the relatively small
number of large-collision-set elements spilled to a tree, there is
absolutely no measureable performance impact.

That being said, if the number of symbols in the database was to increase
greatly, as with options chains, there probably would be some small
performance impact for a 32 bit architecture.

I say this since when I completely replace the hash table with a splay
tree, with the same effect as if all elements spilled to the tree, there
is a small but measureable latency cost in event processing, on the order
of a microsecond for each event.  For any given event this is unimportant,
given much larger IO delays outside the control of the shim, yet over
several thousand events, as with market depth in a fast market, it would
add up.

Note also that there may be other, much stronger performance-related
reasons why you should prefer a 64 bit system.  In testing locally I
see a big difference, though there are many factors besides word size,
so that the differences are apples-and-oranges comparisons.

Perhaps a better way to state this point is that relatively new
hardware will give you better performance than some old clunker, as
you would expect, and that memory architecture is probably the key
issue, so that you want a modern processor with large caches.  Not
a surprise.

Thanks,

Bill


More information about the ts-general mailing list