[ts-gen] General followup questions, after: Availability of support and development status of Trading Shim
R P Herrold
herrold at owlriver.com
Fri Mar 26 11:17:02 EDT 2010
On Fri, 26 Mar 2010, a private email asked:
> Thanks for taking the time to prepare your response. A
> couple more questions:
Initially that poster had asked about some private support
contract matters, not generally applicable for discussion
here, and which were answered privately. That person's follow
on, however, raised matters of general interest, and so I
'surface' it here, with elidement of personally identifying
information
> * In skimming your documentation (which may be dated) it
> didn't seem that Trading Shim parsed the IB tick (well their
> equivalent of tick data) data but that was left up to the
> individual programmer using it. Is that correct or did I
> just get that wrong?
We are very focused with the shim to provide as transparent
an interpreter as possible, from the binary and ornate command
language and state machine that the TWS represents, and into
something textually clean and able to be manipulated as a well
formed Unix tool feeding a stream. We also solve the two uses
of NUL as both a end of line ("EOL") separator, and
intra-field separator, which causes others a lot of pain,
making the EOL a NEWLINE
The documentation is accurate, in this regard, but finding the
answer needs a Rosetta stone, and we handle that in the
mailing list for the project presently. A manual write /
re-write is on the docket, but we have some features to attain
first, that make it a bit premature to spend time documenting
items that were not yet 'nailed down' as to their final
implementation details. The description of the command
language in the online material, and the discussion of the
configuration file options stand out as sore thumbs that do
reflect that we turned out to not anticipate how they would
finally end up; that said, most of it remains accurate
Turning to consider ticks, you have noted that IB provides
'summarized' ticks:
Each tick element type is distinctly marked by IB, and thence
the shim, and has been from the first external deliverable
release -- The 'marking' is in the 'pipe' ["|" delimited]
field values, which pass through the IB message values
While these values are potentially changeable by IB, and thus
our outreference to their documentation,
http://www.trading-shim.org/doc/node55.html
it turns out that IB has been extremely good about NOT
changing the mappings
Bill had a post about reading history this week, but the same
principles apply to ticks
http://www.trading-shim.org/pipermail/ts-general/2010-March/000715.html
and his example here was apple ("AAPL") in the 190s:
1| 1| 1?3.4| 3|1|price.outcry.bid. |STK:SMART:AAPL:
1| 2| 1?3.6| 2|1|price.outcry.ask. |STK:SMART:AAPL:
1| 4| 1?3.6| 1|0|price.summary.last. |STK:SMART:AAPL:
1| 0| 3|0|size.bid. |STK:SMART:AAPL:
1| 3| 2|0|size.ask. |STK:SMART:AAPL:
1| 5| 1|0|size.last. |STK:SMART:AAPL:
1| 8| 1?9494|0|size.volume. |STK:SMART:AAPL:
1| 6| 1?5.0| 0|0|price.summary.high. |STK:SMART:AAPL:
1| 7| 1?2.9| 0|0|price.summary.low. |STK:SMART:AAPL:
1| 9| 1?2.8| 0|0|price.summary.close.|STK:SMART:AAPL:
We see that the following tick detail are present both in a
numeric and a matching (added by us, expanded) text form:
1| 0 size.bid.
1| 1 price.outcry.bid.
1| 2 price.outcry.ask.
1| 3 size.ask.
1| 4 price.summary.last.
1| 5 size.last.
1| 6 price.summary.low.
1| 7 price.summary.low.
1| 8 size.volume.
(here, cumulative, for the session)
1| 9 price.summary.close.
There are additional attributes added with later server and
client versions, but those tick attributes shown are present
for ALL versions of the TWS supported by IB presently
> In other words, I want each element of
> the tick stream mapped to a distinct data object (or
> variable) that can then be manipulated directly.
* nod * .. using the left column as a pointer to index to a
data object name, you have just such a mapping trivially -- My
personal downstream parser maintains just such a table (I use
an associative array in the particular implementation I looked
at, I see ... slower, but it has never gotten 'overrun' by the
data stream, so there is not reason for me to refactor it yet)
> It seemed like Trading Shim provided a data stream, but not
> a parsed data stream ready for consumption by an analysis
> module.
The parsing is subtle to a person who has not had to 'muck'
through binary traces -- NUL to 'pipe' or NEWLINE, as the
context is implied
> Is that part of the current functionality of Trading Shim or
> is that "an exercise left to the reader"?
the former, as noted
> * In looking at the doctoral paper, it seems that one of the
> project objectives was to create a multi-threaded,
> concurrent downloading tool.
I'll let Bill speak to his paper and the doctoral studies
objectives, as its author. :)
> Is that supported in the Trading Shim?
The pronoun here referring to 'multi-threaded' concurrency, I
think. The shim has never needed to add multiple threads to
fully keep up with the content fed to it by the TWS, and has a
'select' based input model which makes it pretty unlikely that
it would need to do so. See:
man 2 select_tut
for a far better discussion of the fine points of 'select(2)'
than I could ever write, with worked example code
> How are the multiple sessions spawned and
> managed? (If this is documented, just point me to the
> appropriate materials). How are the different sessions
> managed in such a way to keep the sequence of the data
> stream in the proper event sequence? Is there a means of
> buffering (writing out to temp files possibly) the results
> until the backend database can write out the data stream?
I will give a high level walk-through, and hope that I do not
oversimplify the complexity of what is happening.
Particularly, there is an amazing amount of detail checking
and precise actions building data structures, and validating
inputs happening 'under the hood' in the shim which I almost
wholly gloss over. The details are out of scope here, but
stated in the common form:
the shim is a racehorse, built for speed and extremely
fast throughout, consistent with accuracy
I sample and track the competition, and have no doubt we
'scream' through data that others wallow and waddle about in
http://www.trading-shim.org/faq/?other-voices
The first question uses the term 'multiple sessions' in a
context that I think means 'multiple connections to a single
upstream TWS'. The TWS provides for up to 8 concurrent socket
connections from downstream clients (and seemingly one or more
additional but non-socket based connections to support the
Java AWT GUI displays. The IB .jar providing that
functionality is obfuscated and I have not tried to look)
One usage model we think sensible for the shim is to have a
single 'trading mode' connection, able to enter and supervise
orders, and as many 'data mode' connections up to the limit of
7 remaining (== 8 minus 1), as one wishes. It turns out that
one can do all 'data mode' operations in 'trading ['risk']
mode' so as few as one socket connection to the TWS by the
shim can suffice for both 'risk' and 'data' operations, at
the risk of adding latencies, discussed later herein.
These are simple 'nix processes, started from the command
line; triggered by clicking a GUI wrapper to fire off a
command; spawned as a sub-task from an already running parent;
or living in the inittab. Whatever. At one point we had the
shim firing off a charting data refresh and display thread,
but I think that turned out to be a dead end and that code
removed. What I am saying is that there is no magic to
starting a process to connect to the TWS socket interface,
whether it is the shim, or otherwise.
The SHIM in its start up process reads an optionally present
configuration file, and then reads the standard input for
optional additional configuration details in a 'let' command,
recently described on the mailing list in a post by Bill.
There are also 'hard-coded' configuration values in the source
code, which may or may not have been over-ridden by values
from the config file or the 'let' command.
Thereafter in the usual case, the shim connects to the
database, and to the TWS as the net effective configuration
settings indicate, and resumes processing input as it appears
on the standard in, and talking back and forth to the TWS
across the socket interface. It is well possible to graft
additional data and trade and control stream connectors onto
the shim's current complement, but we have not encountered a
compelling use case able to released in the GPLv3 code, to
expose such an interface here. Talk to me privately for a
quote if you think you need such
The next question is: "How are the different sessions managed
in such a way to keep the sequence of the data stream in the
proper event sequence" and I think a quick review of
assumptions is in order. There is a whole collection of
'unknowable unknown' latencies in play as IB marshalls data to
deliver to the TWS on the TWS' upstream side. A trivial
example: Are ticks for a given instrument being received
through multicast by IB, or across a low speed serial TTY
link?
We can speculate as to some common instruments, but if a stock
is essentially traded OTC or on a bid and ask 'quotation'
basis, the tick data may be quite stale -- I watched SCOX one
morning a few years ago, and there was not a single tick or
trade until after 10:45 -- it looked like the sole market
maker just overslept ;)
What are the data transmission latencies TO IB's plants; what
are the data cleanup and transfer latencies inside and between
IB's plants? What latency does the particular IB load balancer
one is connected to add? Does it differ between .us, ,.ch and
.hk? Almost certainly. Look at the 'data farm' up and down
messages through a trading day. They come, they go, and one
may not be ABLE to even detect an outage for a quarter-minute
or two
Once delivered to the local TWS, across a data link of
variable latency, if there is only a single connection,
pulling a big chunk of history, I know that any transaction
detail that came in, and all tickstream data, will be delayed
until that history transfer is completed, because of the way
the TWS message that is carrying history detail is structured
[This is part of the reason I mentioned the shim's ability to
running a 'data' connection, and in another separate shim, one
as the 'risk' connection -- one can then structure and
classify one's apparent latencies are expressed. But think
this through -- there is also only a single encrypt-able FIX
connection between the TWS and its IB counter-party servers.
What roadblock did it add? We cannot see, and so cannot know
but rather only speculate]
A particular symbol's tick datastream ** seems ** to be
sequentially delivered, when I have compare recorded tick
streams against one second historical data, but I know of no
guarantees by IB in this regard.
Once down at the TWS, each session between it and a shim
instance are TCP connections, and as such, natively
serialized.
Then the shim processes its 'select' data basically in a
round-robin fashion, and enqueues it for (writing to a file,
sending to the log, sending to standard out, or a combination
thereof) and there are minimal delays (usually a maximum of 20
mSec, usually sub micro-Sec, if one is on the right part of
the processing cycle) before it it is done being handed
off to the host operating system's routines. ... again
latencies of appearance which I cannot control and of which do
not speak, as they will vary
A trading strategy that requires sub 20 mSec response CANNOT
be run across data links that take over 2 times that to
transfer data to and from the counter-party [IB's next hop is
at least that far away from anyone not co-located at a
data-center with them], and it is a fools game to pretend
otherwise. Probably a trading strategy that HAS to see EVERY
TICK should consider a platform other than IB, as IB does not
purport to deliver such, not to deliver without time skew
between symbol samples
The last question about 'holding back' sending messages along
until the database is coherent at a given state is not
something we do, nor that I see that we would ever want to do.
Do you have a use case please?
> * Has your team done any work with NVidia's CUDA platform?
> If not, is that on your roadmap (to leverage GPU's for
> massaging datastreams) with some type of target date?
The 'CUDA' numeric co-processor parallelization is more
applicable downstream of the shim's functions {very high speed
simulation and solving of projected trading surface topology}.
The shim is so fast that the book-keeping overhead of farming
out, and synchronizing returns would be wasted. It would be
bolting a battery of jet engines onto a barracuda ... not
really a good fit ;)
I do not discount 'CUDA' at all -- it is just not needed at
the layer the shim is working at
> * I should have stated that I'm using Ubuntu 64
> Desktop/Server editions as to access the larger memory that
> may be necessary as more data is retained in memory to avoid
> backend database IO issues. Does your code leverage a 64bit
> platform?
We develop on 64 bit, and encountered an anomaly with an older
Debian Stable compiler on a 32 bit platform recently. I
cannot presently recreate the anomaly after a recent
re-factoring. Our peak memory uses far below the limits a 32
bit 'nix architecture imposes, but yes, we can use a 64 bit
arch quite well ;)
> * The Freshmeat site has some broken links - you may want to
> check it or have their webmaster fix them
We do not have access to alter what appears at freshmeat
beyond using their web interface. Please send me a private
note of the outages, and I will pass it along to their admins.
I know they have been 'refreshing' their site recently, and
regret that they have removed some detail I enjoyed reading
> * You mentioned exploratory project estimates. At this
> point, I'm just looking for IB to provide data. Since the
> historical data lacks certain data elements, I'm looking at
> using the "tick" feed as the source for all data. This
> would require streaming out the tick data to a database
> while also using to feed real-time analytics. Does the
> "off-the-shelf" Trading Shim provide this or would this be
> considered custom or add-on work to achieve this?
This database write function is doable natively, and has been
'forever' -- it would be an example in the 'open loop' (no
closed feedback, back into the shim's standard in) client
model.
To implement it, one hooks a simple state machine filter
written in the scripting language of your choice to accept its
standard in through a pipe on the standard out of the shim,
and have that downstream client emit database transactions to
taste; additionally you can have the filter (in its state
machine aspect) fire off (say through a 'curl' POST) a control
event to signal a co-process to wake up and refresh analytics
only when new data is present (compared to say, refreshing
'regardless' every 5 seconds)
Alternatively one can hook a 'reader' to 'tail' the syslog
file which the shim has been set up to write to, and post
process there in the same fashion
Alternatively one can hook a 'reader' to 'tail' the output
file which the shim has been set up to write to, and post
process there in the same fashion
The shim is designed to be a well-behaved 'nix tool ;)
-- Russ herrold
More information about the ts-general
mailing list