[ts-gen] Ib tws api --- shim log record format

Bill Pippin pippin at owlriver.net
Wed Mar 24 18:00:04 EDT 2010


This message is in response to a query from Mike Thornton about
the interpretation of the log record format, and in particular
the meaning of history data detail lines.  This message is,
however, much more general than the particular question that
triggered its posting.

The following is a general description of the shim's log format,
including pointers to reference documents and source files where
you can find more information.  The log format can only be
understand in the context of the IB tws api, so, to start, a
brief introduction to that interface:

    IB Tws Api Intro
    ----------------

    The IB tws api is a tcp-socket-based specification for client
    requests, and the resulting asynchronous messages produced in
    response by the IB tws process.  It provides access to market
    data and account information, and accepts user-initiated
    orders.

    Of particular interest to the trader, orders may be grouped
    into one-cancels-all (OCA) groups, either by the user or by
    default under control of the IB tws, and the latter occurs
    for those orders that share a common parent order.  So, the
    application can, to some degree, chain together entries and
    stops.

    There are a variety of resource limitations that apply to
    the api, including max counts for subscriptions to market
    data (100) and market depth (3), and pacing limits for
    requests in general (20 milliseconds) and history queries
    in particular (cumulative, 10 seconds each).  There is also
    folklore to the effect that parent-child orders and order
    modification events should obey a moderate pacing limit,
    with 300 milliseconds being the collective consensus as to
    the largest pacing delay needed.

    The api requests and messages (collectively events) consist
    of null terminated tokens, of either a fixed number, or else,
    in a few cases, to form a counted repeating group; and there
    is *no* message delimiter.

    Events begin with a pair of natural numbers, the event index
    and event version, and it is by matching on those that an api
    program can determine how the tokens that follow should be
    interpreted, and in particular, given the known length for
    most events, or the counted length for repeating groups ---
    history messages are the classic example here --- where the
    *next* event starts.

    The tws api process seems to respond to non-trivial errors in
    request format, including in particular invalid request indices,
    by terminating the connection.  Errors in individual data values,
    in contrast, seem to produce limited but useful error messages.

    The features of the api, including both the type and number of
    elements in events, and the range of possible events themselves,
    varies from one version to the next, and significantly over
    time.  Api versions are best identified by the (currently) two
    digit numbers exchanged by the client [shim] and server [tws]
    processes.  The tws server api level is currently into the
    mid-forties, and continues to provide support for earlier
    versions, all the way back to the mid or low single digits.
    Lower version clients lose access, to some degree, to newer
    api features.

    Newer versions of the api have included an increasing number of
    new message types that provide end markers for message lists;
    e.g., for the sequence of account data records periodically
    produced in response to an account data query, or the sequence
    of contract detail records, in response to a contract wildcard
    query.  There has been no change, in contrast, to the use of
    variable-length events with no message terminators.

For more information on the api, please see the official IB tws
api guide, and also two source files from their java sample client,
EClientSocket.java and EReader.java, which serve as the only
trustworthy documentation of the wire formats for requests
(EClientSocket) and messages (EReader).

The sample client source files will give you name, type, and number
of attribute values per api event, while the api guide explains the
values that those attributes can assume.  So, you'll need to look
at both.

The api guide is substantially redundant, consisting of similar
sections distinguished by language interface, and you can pick just
one and use only that; I recommend that you read either the Java
or C++ sections.  The api guide is available from the IB web site,
and the following link should work:

http://www.interactivebrokers.com/php/apiUsersGuide/apiguide.htm

If you download and unpack the java sample client sources, the
EClientSocket.java and EReader.java files should be found via the
path:

    IBJts/java/com/ib/client/

That's all herein about the api in isolation, and now, before I
focus on the shim's log record format, some context first about the
shim itself:

    The trading shim
    ----------------

    The trading shim has been described as a "dbms-augmented command
    interpreter" for the IB tws api.  By that is meant that client
    commands are expanded with information from the symbols database
    to generate api requests, and that the shim then reads and
    relays the resulting messages.  As an interpreter, the shim is
    written with the expectation that it be driven by downstream
    programs; it is of course possible to type commands in by hand,
    or pipe in short test scripts, but the full power, especially
    conditional processing for orders, requires a controlling client
    program, typically written in some scripting language such as
    Perl, Python, or Ruby.

    So, what exactly does the shim bring to the table?  What does it
    provide that makes it easier to write such downstream client
    programs?  After all, given the socket api, client scripts could
    just open a connection directly, poke api tokens down the
    socket, and read the resulting message tokens that are returned.

    On the request side, the shim provides translation of simple text
    command statements into api requests, using the dbms augmentation
    referred to above, as well as request queueing to honor pacing
    limits and other resource limits.

    On the message side, the shim provides both error handling, 
    with recovery from message syntax errors, in contrast to the
    abrupt termination approach used by the IB tws; and, finally 
    the log formatting of events, about which more to follow.

This raises the notion of shim log events, and the resulting
log stream.

    Shim log records
    ----------------

    A trading-shim api event is a command, request, message, or
    comment, where commands are accepted from the downstream and
    enqueued and dequeued according to pacing and resource limits;
    requests are translated from dequeued commands, and sent to
    the IB tws; messages are received, asynchronously, in response
    to api requests; and comments are created by the shim to 
    annotate other events.

    A log event is either a command, request, message, or comment,
    as above; or a log detail line, an item from a message repeating
    group, e.g., a history bar detail line.

    The log stream is the serialized sequence of log events output
    by the shim.

This brings me, finally, to the shim log record format, which in
part echoes the wire format of api events, so that you *will* need
the above mentioned EClientSocket and EReader files in order to
predict, as well as interpret, the api events listed in the log.
In fact, the log format makes little sense until considered in the
context of the api itself, and the shim's role as a intermediary to
simplify the development of downstream --- that is, client ---
programs.

    Log event format
    ----------------

    The format text of log events consists of vertical-bar-terminated
    tokens with a trailing newline.  This format is chosen to allow
    easy processing by downstream scripts, which can use the newlines
    to recognize record boundaries, and can split on vertical bars
    to break records up into tokens.  All events have a common prefix,
    about which more to follow; the payload, ditto; and an optional
    suffix.

    Log event prefix
    ----------------

    For the log file and stdout channels for the log, the log event
    prefix consists of four fields: the process pid; the seconds
    field of the timestamp; the fraction part of the timestamp; and
    the event code.

    More precisely, of the four fields of the prefix:

    the 1st field ends with the pid, has only the pid for the file
    and stdout channels, while, for the syslog output channel only,
    has also textual date, time, and user name prior to the pid,
    though with colons between parts, so that there is still just one
    vertical bar terminated field for the pid;

    the 2nd field is seconds past midnight;

    the 3rd, the fractional part of the timestamp, is in microseconds,
    either absolute, or, with the diff option, as the differential from
    the previous event; and

    the 4th, the event type, is a numerical code in {1, 2, 3, 4}, as
    the event is a command, request, message, or comment, respectively.

    Api payload text
    ----------------

    In addition, the following is true of the event payload, that
    part of the log record text after the prefix and not including
    suffix annotation:

    commands: for syntactically correct commands, the log echoes
              their text, with some reordering to reflect queueing
              Note: syntax errors are horribly garbled, and so you
              must use the stderr text to understand what went wrong.

    requests: faithfully echo the wire format; the same code that
              uses the request object to lay out the request with
              null separators for the upstream, uses vertical bars
              for the log; the only difference is the output stream
              object (Logger or Sender) and the buffer type (NulBuf
              or BarBuf).

    messages: reflect the wire format, though with both whitespace
              formatting and often suffix annotation, e.g., the
              symbol translation for the tick id; the format for
              market data is explained below.

    Leaving aside the message prefix, and any suffix annotation, and
    focusing specifically on requests and messages, it's worth
    emphasizing the following about the payload for those event types:

    All log-formatted requests and messages have, as their payload
    immediately following the prefix, a one-for-one list of the
    tokens that make up the wire format request or message.

    So, since request and message logging reflects the wire format,
    and that format is documented by IB in the EClientSocket.java
    and EReader.java files of their sample client, you can match the
    attributes of the api event payload in the log against IB's
    source file text one-for-one.

Some examples may be of interest.  E.g., for market data, and with price
data obscured to honor IB's license restrictions on redistribution,
market data for Apple [last year] appeared in the log as follows: 

15481|65972|  0.000017|1| 9| 0|select tick  STK:SMART:AAPL:USD 1;|
15481|65972|  0.000016|2| 1| 5|1|5|1|AAPL|STK||0.0000||1|SMART||USD||||||
15481|65972|  0.022712|1| 2| 0|wait 2;|
15481|65972|  0.229638|3| 1| 5|    1| 1|   1?3.4|       3|1|...
15481|65972|  0.000008|3| 1| 5|    1| 2|   1?3.6|       2|1|...
15481|65972|  0.000004|3| 1| 5|    1| 4|   1?3.6|       1|0|...
15481|65972|  0.000005|3| 2| 5|    1| 0|                3|0|...
15481|65972|  0.000003|3| 2| 5|    1| 3|                2|0|...
15481|65972|  0.000003|3| 2| 5|    1| 5|                1|0|...
15481|65972|  0.000003|3| 2| 5|    1| 8|           179???|0|...
15481|65972|  0.000005|3| 1| 5|    1| 6|   1?5.0|       0|0|...
15481|65972|  0.000004|3| 1| 5|    1| 7|   1?2.9|       0|0|...
15481|65972|  0.000004|3| 1| 5|    1| 9|   1?2.8|       0|0|...
15481|65974|  1.798008|1|10| 0|cancel tick  STK:SMART:AAPL:USD;|
15481|65974|  0.000063|2| 2| 1|2|1|1||||

In the above text, the first four columns are prefix information: the 
process id, seconds since midnight, microseconds, and event type,
here: a cmd, req, cmd, 10 market data msgs, cmd, and req.  Note from
the low-microsecond values of the fractional part of the timestamp
for most of the events that the shim was running with the diff option.

Payloads are repeated below; note that market data price and size
messages have seven and five attributes, respectively:

     1| 5|    1| 1|   1?3.4|       3|1|
     1| 5|    1| 2|   1?3.6|       2|1|
     1| 5|    1| 4|   1?3.6|       1|0|
     2| 5|    1| 0|                3|
     2| 5|    1| 3|                2|
     2| 5|    1| 5|                1|
     2| 5|    1| 8|           179???|
     1| 5|    1| 6|   1?5.0|       0|0|
     1| 5|    1| 7|   1?2.9|       0|0|
     1| 5|    1| 9|   1?2.8|       0|0|

Of course the question marks above are not part of the logging,
and are used herein to avoid running afoul of IB's license
restrictions.  For market data price events, the attributes are
message index, version, tick id, market data subtype, price, 
quantity, and the "can auto execute" flag.  Note also that the
whitespace formatting is not found in the original wire format
data as received over the socket.

The suffix annotation here consists of the 0-value flag padding for
market data size events, the tick subtype explanation, and the tick
id translation of 1 as STK:SMART:AAPL:

    1| 1|   1?3.4|       3|1|price.outcry.bid.   |STK:SMART:AAPL:
    1| 2|   1?3.6|       2|1|price.outcry.ask.   |STK:SMART:AAPL:
    1| 4|   1?3.6|       1|0|price.summary.last. |STK:SMART:AAPL:
    1| 0|                3|0|size.bid.           |STK:SMART:AAPL:
    1| 3|                2|0|size.ask.           |STK:SMART:AAPL:
    1| 5|                1|0|size.last.          |STK:SMART:AAPL:
    1| 8|           1?9494|0|size.volume.        |STK:SMART:AAPL:
    1| 6|   1?5.0|       0|0|price.summary.high. |STK:SMART:AAPL:
    1| 7|   1?2.9|       0|0|price.summary.low.  |STK:SMART:AAPL:
    1| 9|   1?2.8|       0|0|price.summary.close.|STK:SMART:AAPL:

For another example, recall that Mike asked about the interpretation
of history bar detail lines.  The following is an excerpt of payload
text from the log after running the exs/test script, which includes a
history query.  The prices have, again, been obscured to honor IB's
license restrictions on redistribution.

1|11| 0|select past FUT:ECBOT:YM:USD:20100618 h1 11 1d now;|
2|20| 3|14901|69734|  0.000049|2|20| 3||
1| 2| 0|wait  6;|
3| 4| 2|   -1|2106|HMDS data farm connection is OK:ushmds2a|
3| 4| 2|    6| 165|Historical Market Data Service query message: ...
1| 9| 0|select tick ibc:  266093 at SMART 1;|
2| 1| 5|14901|69740|  0.000052|2| 1| 5||
1| 2| 0|wait  2;|
3|17| 3|    6|8|
0| 1| 0|20100323  08:30:00|107nn.00|107nn.00|107nn.00|107nn.00|  98nn ...
0| 1| 0|20100323  09:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 203nn ...
0| 1| 0|20100323  10:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 149nn ...
0| 1| 0|20100323  11:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 103nn ...
0| 1| 0|20100323  12:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 106nn ...
0| 1| 0|20100323  13:00:00|107nn.00|107nn.00|107nn.00|107nn.00|  98nn ...
0| 1| 0|20100323  14:00:00|107nn.00|108nn.00|107nn.00|108nn.00| 259nn ...
0| 1| 0|20100323  15:00:00|108nn.00|108nn.00|108nn.00|108nn.00|  28nn ...
3| 2| 5|    7| 8|              nnnnnn|0|size.volume.                  ...
3| 1| 5|    7| 6|    nn.nn00|       0|0|price.summary.high.           ...
3| 1| 5|    7| 7|    nn.nn00|       0|0|price.summary.low.            ...
3| 1| 5|    7| 1|    nn.nn00|      nn|1|price.outcry.bid.             ...
3| 2| 5|    7| 0|                  nn|0|size.bid.                     ...
3| 1| 5|    7| 2|    nn.nn00|       n|1|price.outcry.ask.             ...
3| 2| 5|    7| 3|                   n|0|size.ask.                     ...
4|12| 0|# |post|event: history insert|

In the transcript above, there is the dequeue event for the history query
command; the resulting request; a wait command, used here to help keep the
history query answer reasonably close to the its initiating request;
two status messages from the IB tws about the status of the history data
farm; a market data command, and request; another wait command; and, 
finally, the history query answer.

History data messages begin with a header, including the history bar
detail line count, here 8, as well as, left to right, the history
message index, 17; the version, 3; and the temporary contract id, 6;
assigned on the fly in sequence to stand for the contract expressions
occurring in the command text as FUT:ECBOT:YM:USD:20100618 .

By the way, for most messages, the suffix is just the contract
expression from the related command, so that the downstream need not
understand these temporary contract id indices, what I call tick ids. 

Now, finally, to the answer for Mike's question, about the
interpretation of the prices in the detail lines.  The following excerpt
is from the file EReader.java:

    case HISTORICAL_DATA:
        ...
        ...    date = read ...
        ...    open = read ...
        ...    high = read ...
        ...     low = read ...
        ...   close = read ...
        ...  volume = read ...
        ...     WAP = read ...
        ... hasGaps = read ...

Repeating the history query answer, and focusing on the detail lines,
it's clear that the prices are open, high, low, close, and with those
followed by volume:

3|17| 3|    6|8|
0| 1| 0|20100323  08:30:00|107nn.00|107nn.00|107nn.00|107nn.00|  98nn ...
0| 1| 0|20100323  09:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 203nn ...
0| 1| 0|20100323  10:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 149nn ...
0| 1| 0|20100323  11:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 103nn ...
0| 1| 0|20100323  12:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 106nn ...
0| 1| 0|20100323  13:00:00|107nn.00|107nn.00|107nn.00|107nn.00|  98nn ...
0| 1| 0|20100323  14:00:00|107nn.00|108nn.00|107nn.00|108nn.00| 259nn ...
0| 1| 0|20100323  15:00:00|108nn.00|108nn.00|108nn.00|108nn.00|  28nn ...

I realize that users might prefer some form of documentation besides
the IB sample client java sources, the files EClientSocket.java and
EReader.java.

Nevertheless, the IB sources are the only official "docs"; it's their
api, and this is their approach to documentation.  I'm not going to
try to duplicate it, especially given the version issues involved.

Thanks,

Bill


More information about the ts-general mailing list