[ts-gen] Exploring shim sources (was [Re: recommended C++ compiler for R ...])

Bill Pippin pippin at owlriver.net
Tue Feb 27 15:17:38 EST 2007

Mel writes -- and I sympathize -- about the amount of source code to wade

> I will look at the trading-shim code ...
> without proper doc[s] ... 349 files

I've delayed writing documentation to the source code, mostly because it
would be immediately out of date.

I'm delighted to hear that someone has become interested in the sources.
Please, please, feel free to ask questions about the code.

Some of those 349 files are documentation, admittedly low level, but
documentation nevertheless.  In directory doc you'll find 45 .dot files,
each a source file for a hierarchical diagram, mostly of the derivation
relationships for the application domain class hierarchies.

Beyond the *.dot documentation files, there is the source code.  It is,
admittedly, voluminous:

    dir     .c      .h      .sql    tot
    src     26      83              109
    lib      7      76               83
    afd             34               34
    sql                     38       38
    tot     33     193      38      264

I'll leave the *.sql files outside the scope of this message, and focus
on the c++ code, which divides naturally between .c and .h source files.

First, note the large quantity of header file source code.  Although
it is exaggerated by counting files -- many of the header files are
shorter than the typical .c file -- still, using lines of code instead
as a measure, there are far more lines of header than .c, about 20k vs 13k
in the directories src, afd, and lib.

In the remainder of this message I'll provide a thumbnail sketch/guide to
the sources for the trading-shim, first considering the .h files, to see
how one can avoid reading them in depth, and then focusing on the .c files,
with the goal of identifying those essential files that illuminate the
rest of the code.


The trading-shim c++ .h source code

One reason for the amount of header file code is that some include large
block comments.  One file in particular, dictionary.h in the newest release,
provides extensive comments explaining the naming conventions used for class,
member, and typedef names, and should be read as a starting guide to
identifier names, of which there are very many.  Some of the lib directory
header files also include large block comments, particularly for the container
class templates. 

The fundamental reason, however, that there is so much code in the header
files, is the sheer number of derived classes.  This is the result of a
purposeful design decision, to use derivation hierarchies and virtual member
functions to represent conditional logic, resulting in more concise .c
code, at the (lesser) cost of many derived classes.

You may use the bin/hierarchy script to update *.dot files from the related
*.h files, and, as noted previously, the dot program from the Graphviz
package to convert the dot files to postscript or pdf.  Looking at the
derivation hierarchies as a diagram is a much better starting point to
understanding the longer header files than reading the header sources

By the way, there are other dot files of interest as well; foreign.dot
defines a foreign key dependency graph for the database, and singles.dot,
the inclusion hierarchy for the upper part of the singletree, itself the
root object of all Shim singletons, and about which more later.

The problem of looking up identifier definitions is another area where
reading the header files line by line should be the last resort.  Use
grep instead to find the typedef or class definition of an name, e.g., 
for Id:

    grep 'typedef.*Id' *.h    
    grep '^class  *Id' *.h

For the common case, the only reason you might need to read the header files
is to find data member declarations (nearly always at the very end of the
class definition), in order to determine their types, name, and position
in the derivation hierarchy.  Although this may sometimes be fastest, the
class constructors also provide this information, and for non-trivial cases
are typically defined in .c files, with a conspicuous block comment labelled
"Construction", so that again you may be able to delay looking at the headers.

So, to sum up, as an alternative to wading through header files, consider
using grep or dot, and for classes, first look at the constructor (ctor)
definitions in .c files.  Consider the header files as reference materials,
not narrative text.


The trading-shim c++ .c source code

Of the .c sources in directory src, consider starting with shim.c.  You'll
find both the main() entry point, and following that, the run() method of
class IoFlow.  The main() function performs, first, construction of the
root singleton object (singletree), and then calls into the IoFlow::run()
loop, itself the main loop for the shim.

The program call graph branches, then, into two parts, as main() calls
ctors for the singletree, and then enters the main loop.  Considering
that loop first, and the singletree constructor after, the immediately
following files provide the sources for the run loop and most of the
procedures called from it:

    shim.c      main() and the main run loop for the entire program
    time.c      bsd-select based polling, the resulting IO, and timeouts
    read.c      the tuple and event parser, also Message and Command ctors
    flow.c      event routing and (some) of the resulting dataflow
    next.c      command -> request mapping, more event dataflow
    send.c      the initial stages of event output (out)
    post.c      (out) posting of selected events to the database
    wire.c      (out) wire-format layout of requests according to the tws api
    echo.c      (out) log, or text formatted, layout of events
    term.c      term specific procedures (fairly low level)
    leaf.c      more low level procedures, particulary rules for operator[]() 

So, the guts of the application, once initialization is complete, are found
in just 11 source files, for about 4500 lines of code.

The construction of the Shim singletree, the root object for all singletons,
is described in the following sources:

src:once.c      constructors for the top level application domain singletons
    make.c      subsidiary procedures called by the singletons, above
    init.c      singleton initialization called from the Shim ctor
    mode.c      command line mode conditional initialization
    help.c      text printed by, e.g.: shim, shim --help, or shim --help cmds
    data.c      constructors for top level singleton pure-constant objects
    type.c      constructors and supporting code for the Types singleton
    tabs.c      (tws api) version-specific ctors and data tables
    rule.c      constructors and tables for the various language start symbols
    kind.c      tables for the various application finite domain tables
    atom.c      constructors for all the database relation objects
    load.c      (part of the) code to slurp the entire database into memory
lib:boot.c      constructors for the library Components singleton

The file help.c is particularly interesting, since it provides the help
text explaining the command line modes and options, and in addition a brief
reference to the downstream command language.

As you can see, initialization is quite involved; there are almost 6500 lines
of .c source code involved.  Since much of the initialization is ctor and
table definitions, it is often much easier to read than the imperative core
mentioned previously.

The following files in directory src will be of less immediate interest:

src:else.c      code for exceptional-case processing
    warn.c      dead code, more outside-the-common-case exception processing
    unit.c      a unit test, of no possible interest to anyone besides testers

The following files in directory lib are listed here for completeness:

lib:pool.c      memory allocator operations
    hash.c      string, hash code, and hash table lookup operations
    call.c      IO and string buffer operations
    fork.c      pseudo-terminal setup
    wrap.c      system-call wrappers
    time.c      not linked into the shim, for timestamp unit testing only



More information about the ts-general mailing list