[ts-gen] Ib tws api --- shim log record format
Bill Pippin
pippin at owlriver.net
Wed Mar 24 18:00:04 EDT 2010
This message is in response to a query from Mike Thornton about
the interpretation of the log record format, and in particular
the meaning of history data detail lines. This message is,
however, much more general than the particular question that
triggered its posting.
The following is a general description of the shim's log format,
including pointers to reference documents and source files where
you can find more information. The log format can only be
understand in the context of the IB tws api, so, to start, a
brief introduction to that interface:
IB Tws Api Intro
----------------
The IB tws api is a tcp-socket-based specification for client
requests, and the resulting asynchronous messages produced in
response by the IB tws process. It provides access to market
data and account information, and accepts user-initiated
orders.
Of particular interest to the trader, orders may be grouped
into one-cancels-all (OCA) groups, either by the user or by
default under control of the IB tws, and the latter occurs
for those orders that share a common parent order. So, the
application can, to some degree, chain together entries and
stops.
There are a variety of resource limitations that apply to
the api, including max counts for subscriptions to market
data (100) and market depth (3), and pacing limits for
requests in general (20 milliseconds) and history queries
in particular (cumulative, 10 seconds each). There is also
folklore to the effect that parent-child orders and order
modification events should obey a moderate pacing limit,
with 300 milliseconds being the collective consensus as to
the largest pacing delay needed.
The api requests and messages (collectively events) consist
of null terminated tokens, of either a fixed number, or else,
in a few cases, to form a counted repeating group; and there
is *no* message delimiter.
Events begin with a pair of natural numbers, the event index
and event version, and it is by matching on those that an api
program can determine how the tokens that follow should be
interpreted, and in particular, given the known length for
most events, or the counted length for repeating groups ---
history messages are the classic example here --- where the
*next* event starts.
The tws api process seems to respond to non-trivial errors in
request format, including in particular invalid request indices,
by terminating the connection. Errors in individual data values,
in contrast, seem to produce limited but useful error messages.
The features of the api, including both the type and number of
elements in events, and the range of possible events themselves,
varies from one version to the next, and significantly over
time. Api versions are best identified by the (currently) two
digit numbers exchanged by the client [shim] and server [tws]
processes. The tws server api level is currently into the
mid-forties, and continues to provide support for earlier
versions, all the way back to the mid or low single digits.
Lower version clients lose access, to some degree, to newer
api features.
Newer versions of the api have included an increasing number of
new message types that provide end markers for message lists;
e.g., for the sequence of account data records periodically
produced in response to an account data query, or the sequence
of contract detail records, in response to a contract wildcard
query. There has been no change, in contrast, to the use of
variable-length events with no message terminators.
For more information on the api, please see the official IB tws
api guide, and also two source files from their java sample client,
EClientSocket.java and EReader.java, which serve as the only
trustworthy documentation of the wire formats for requests
(EClientSocket) and messages (EReader).
The sample client source files will give you name, type, and number
of attribute values per api event, while the api guide explains the
values that those attributes can assume. So, you'll need to look
at both.
The api guide is substantially redundant, consisting of similar
sections distinguished by language interface, and you can pick just
one and use only that; I recommend that you read either the Java
or C++ sections. The api guide is available from the IB web site,
and the following link should work:
http://www.interactivebrokers.com/php/apiUsersGuide/apiguide.htm
If you download and unpack the java sample client sources, the
EClientSocket.java and EReader.java files should be found via the
path:
IBJts/java/com/ib/client/
That's all herein about the api in isolation, and now, before I
focus on the shim's log record format, some context first about the
shim itself:
The trading shim
----------------
The trading shim has been described as a "dbms-augmented command
interpreter" for the IB tws api. By that is meant that client
commands are expanded with information from the symbols database
to generate api requests, and that the shim then reads and
relays the resulting messages. As an interpreter, the shim is
written with the expectation that it be driven by downstream
programs; it is of course possible to type commands in by hand,
or pipe in short test scripts, but the full power, especially
conditional processing for orders, requires a controlling client
program, typically written in some scripting language such as
Perl, Python, or Ruby.
So, what exactly does the shim bring to the table? What does it
provide that makes it easier to write such downstream client
programs? After all, given the socket api, client scripts could
just open a connection directly, poke api tokens down the
socket, and read the resulting message tokens that are returned.
On the request side, the shim provides translation of simple text
command statements into api requests, using the dbms augmentation
referred to above, as well as request queueing to honor pacing
limits and other resource limits.
On the message side, the shim provides both error handling,
with recovery from message syntax errors, in contrast to the
abrupt termination approach used by the IB tws; and, finally
the log formatting of events, about which more to follow.
This raises the notion of shim log events, and the resulting
log stream.
Shim log records
----------------
A trading-shim api event is a command, request, message, or
comment, where commands are accepted from the downstream and
enqueued and dequeued according to pacing and resource limits;
requests are translated from dequeued commands, and sent to
the IB tws; messages are received, asynchronously, in response
to api requests; and comments are created by the shim to
annotate other events.
A log event is either a command, request, message, or comment,
as above; or a log detail line, an item from a message repeating
group, e.g., a history bar detail line.
The log stream is the serialized sequence of log events output
by the shim.
This brings me, finally, to the shim log record format, which in
part echoes the wire format of api events, so that you *will* need
the above mentioned EClientSocket and EReader files in order to
predict, as well as interpret, the api events listed in the log.
In fact, the log format makes little sense until considered in the
context of the api itself, and the shim's role as a intermediary to
simplify the development of downstream --- that is, client ---
programs.
Log event format
----------------
The format text of log events consists of vertical-bar-terminated
tokens with a trailing newline. This format is chosen to allow
easy processing by downstream scripts, which can use the newlines
to recognize record boundaries, and can split on vertical bars
to break records up into tokens. All events have a common prefix,
about which more to follow; the payload, ditto; and an optional
suffix.
Log event prefix
----------------
For the log file and stdout channels for the log, the log event
prefix consists of four fields: the process pid; the seconds
field of the timestamp; the fraction part of the timestamp; and
the event code.
More precisely, of the four fields of the prefix:
the 1st field ends with the pid, has only the pid for the file
and stdout channels, while, for the syslog output channel only,
has also textual date, time, and user name prior to the pid,
though with colons between parts, so that there is still just one
vertical bar terminated field for the pid;
the 2nd field is seconds past midnight;
the 3rd, the fractional part of the timestamp, is in microseconds,
either absolute, or, with the diff option, as the differential from
the previous event; and
the 4th, the event type, is a numerical code in {1, 2, 3, 4}, as
the event is a command, request, message, or comment, respectively.
Api payload text
----------------
In addition, the following is true of the event payload, that
part of the log record text after the prefix and not including
suffix annotation:
commands: for syntactically correct commands, the log echoes
their text, with some reordering to reflect queueing
Note: syntax errors are horribly garbled, and so you
must use the stderr text to understand what went wrong.
requests: faithfully echo the wire format; the same code that
uses the request object to lay out the request with
null separators for the upstream, uses vertical bars
for the log; the only difference is the output stream
object (Logger or Sender) and the buffer type (NulBuf
or BarBuf).
messages: reflect the wire format, though with both whitespace
formatting and often suffix annotation, e.g., the
symbol translation for the tick id; the format for
market data is explained below.
Leaving aside the message prefix, and any suffix annotation, and
focusing specifically on requests and messages, it's worth
emphasizing the following about the payload for those event types:
All log-formatted requests and messages have, as their payload
immediately following the prefix, a one-for-one list of the
tokens that make up the wire format request or message.
So, since request and message logging reflects the wire format,
and that format is documented by IB in the EClientSocket.java
and EReader.java files of their sample client, you can match the
attributes of the api event payload in the log against IB's
source file text one-for-one.
Some examples may be of interest. E.g., for market data, and with price
data obscured to honor IB's license restrictions on redistribution,
market data for Apple [last year] appeared in the log as follows:
15481|65972| 0.000017|1| 9| 0|select tick STK:SMART:AAPL:USD 1;|
15481|65972| 0.000016|2| 1| 5|1|5|1|AAPL|STK||0.0000||1|SMART||USD||||||
15481|65972| 0.022712|1| 2| 0|wait 2;|
15481|65972| 0.229638|3| 1| 5| 1| 1| 1?3.4| 3|1|...
15481|65972| 0.000008|3| 1| 5| 1| 2| 1?3.6| 2|1|...
15481|65972| 0.000004|3| 1| 5| 1| 4| 1?3.6| 1|0|...
15481|65972| 0.000005|3| 2| 5| 1| 0| 3|0|...
15481|65972| 0.000003|3| 2| 5| 1| 3| 2|0|...
15481|65972| 0.000003|3| 2| 5| 1| 5| 1|0|...
15481|65972| 0.000003|3| 2| 5| 1| 8| 179???|0|...
15481|65972| 0.000005|3| 1| 5| 1| 6| 1?5.0| 0|0|...
15481|65972| 0.000004|3| 1| 5| 1| 7| 1?2.9| 0|0|...
15481|65972| 0.000004|3| 1| 5| 1| 9| 1?2.8| 0|0|...
15481|65974| 1.798008|1|10| 0|cancel tick STK:SMART:AAPL:USD;|
15481|65974| 0.000063|2| 2| 1|2|1|1||||
In the above text, the first four columns are prefix information: the
process id, seconds since midnight, microseconds, and event type,
here: a cmd, req, cmd, 10 market data msgs, cmd, and req. Note from
the low-microsecond values of the fractional part of the timestamp
for most of the events that the shim was running with the diff option.
Payloads are repeated below; note that market data price and size
messages have seven and five attributes, respectively:
1| 5| 1| 1| 1?3.4| 3|1|
1| 5| 1| 2| 1?3.6| 2|1|
1| 5| 1| 4| 1?3.6| 1|0|
2| 5| 1| 0| 3|
2| 5| 1| 3| 2|
2| 5| 1| 5| 1|
2| 5| 1| 8| 179???|
1| 5| 1| 6| 1?5.0| 0|0|
1| 5| 1| 7| 1?2.9| 0|0|
1| 5| 1| 9| 1?2.8| 0|0|
Of course the question marks above are not part of the logging,
and are used herein to avoid running afoul of IB's license
restrictions. For market data price events, the attributes are
message index, version, tick id, market data subtype, price,
quantity, and the "can auto execute" flag. Note also that the
whitespace formatting is not found in the original wire format
data as received over the socket.
The suffix annotation here consists of the 0-value flag padding for
market data size events, the tick subtype explanation, and the tick
id translation of 1 as STK:SMART:AAPL:
1| 1| 1?3.4| 3|1|price.outcry.bid. |STK:SMART:AAPL:
1| 2| 1?3.6| 2|1|price.outcry.ask. |STK:SMART:AAPL:
1| 4| 1?3.6| 1|0|price.summary.last. |STK:SMART:AAPL:
1| 0| 3|0|size.bid. |STK:SMART:AAPL:
1| 3| 2|0|size.ask. |STK:SMART:AAPL:
1| 5| 1|0|size.last. |STK:SMART:AAPL:
1| 8| 1?9494|0|size.volume. |STK:SMART:AAPL:
1| 6| 1?5.0| 0|0|price.summary.high. |STK:SMART:AAPL:
1| 7| 1?2.9| 0|0|price.summary.low. |STK:SMART:AAPL:
1| 9| 1?2.8| 0|0|price.summary.close.|STK:SMART:AAPL:
For another example, recall that Mike asked about the interpretation
of history bar detail lines. The following is an excerpt of payload
text from the log after running the exs/test script, which includes a
history query. The prices have, again, been obscured to honor IB's
license restrictions on redistribution.
1|11| 0|select past FUT:ECBOT:YM:USD:20100618 h1 11 1d now;|
2|20| 3|14901|69734| 0.000049|2|20| 3||
1| 2| 0|wait 6;|
3| 4| 2| -1|2106|HMDS data farm connection is OK:ushmds2a|
3| 4| 2| 6| 165|Historical Market Data Service query message: ...
1| 9| 0|select tick ibc: 266093 at SMART 1;|
2| 1| 5|14901|69740| 0.000052|2| 1| 5||
1| 2| 0|wait 2;|
3|17| 3| 6|8|
0| 1| 0|20100323 08:30:00|107nn.00|107nn.00|107nn.00|107nn.00| 98nn ...
0| 1| 0|20100323 09:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 203nn ...
0| 1| 0|20100323 10:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 149nn ...
0| 1| 0|20100323 11:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 103nn ...
0| 1| 0|20100323 12:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 106nn ...
0| 1| 0|20100323 13:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 98nn ...
0| 1| 0|20100323 14:00:00|107nn.00|108nn.00|107nn.00|108nn.00| 259nn ...
0| 1| 0|20100323 15:00:00|108nn.00|108nn.00|108nn.00|108nn.00| 28nn ...
3| 2| 5| 7| 8| nnnnnn|0|size.volume. ...
3| 1| 5| 7| 6| nn.nn00| 0|0|price.summary.high. ...
3| 1| 5| 7| 7| nn.nn00| 0|0|price.summary.low. ...
3| 1| 5| 7| 1| nn.nn00| nn|1|price.outcry.bid. ...
3| 2| 5| 7| 0| nn|0|size.bid. ...
3| 1| 5| 7| 2| nn.nn00| n|1|price.outcry.ask. ...
3| 2| 5| 7| 3| n|0|size.ask. ...
4|12| 0|# |post|event: history insert|
In the transcript above, there is the dequeue event for the history query
command; the resulting request; a wait command, used here to help keep the
history query answer reasonably close to the its initiating request;
two status messages from the IB tws about the status of the history data
farm; a market data command, and request; another wait command; and,
finally, the history query answer.
History data messages begin with a header, including the history bar
detail line count, here 8, as well as, left to right, the history
message index, 17; the version, 3; and the temporary contract id, 6;
assigned on the fly in sequence to stand for the contract expressions
occurring in the command text as FUT:ECBOT:YM:USD:20100618 .
By the way, for most messages, the suffix is just the contract
expression from the related command, so that the downstream need not
understand these temporary contract id indices, what I call tick ids.
Now, finally, to the answer for Mike's question, about the
interpretation of the prices in the detail lines. The following excerpt
is from the file EReader.java:
case HISTORICAL_DATA:
...
... date = read ...
... open = read ...
... high = read ...
... low = read ...
... close = read ...
... volume = read ...
... WAP = read ...
... hasGaps = read ...
Repeating the history query answer, and focusing on the detail lines,
it's clear that the prices are open, high, low, close, and with those
followed by volume:
3|17| 3| 6|8|
0| 1| 0|20100323 08:30:00|107nn.00|107nn.00|107nn.00|107nn.00| 98nn ...
0| 1| 0|20100323 09:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 203nn ...
0| 1| 0|20100323 10:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 149nn ...
0| 1| 0|20100323 11:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 103nn ...
0| 1| 0|20100323 12:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 106nn ...
0| 1| 0|20100323 13:00:00|107nn.00|107nn.00|107nn.00|107nn.00| 98nn ...
0| 1| 0|20100323 14:00:00|107nn.00|108nn.00|107nn.00|108nn.00| 259nn ...
0| 1| 0|20100323 15:00:00|108nn.00|108nn.00|108nn.00|108nn.00| 28nn ...
I realize that users might prefer some form of documentation besides
the IB sample client java sources, the files EClientSocket.java and
EReader.java.
Nevertheless, the IB sources are the only official "docs"; it's their
api, and this is their approach to documentation. I'm not going to
try to duplicate it, especially given the version issues involved.
Thanks,
Bill
More information about the ts-general
mailing list