[ts-gen] General followup questions, after: Availability of support and development status of Trading Shim

R P Herrold herrold at owlriver.com
Fri Mar 26 11:17:02 EDT 2010


On Fri, 26 Mar 2010, a private email asked:

> Thanks for taking the time to prepare your response.  A 
> couple more questions:

Initially that poster had asked about some private support 
contract matters, not generally applicable for discussion 
here, and which were answered privately.  That person's follow 
on, however, raised matters of general interest, and so I 
'surface' it here, with elidement of personally identifying 
information

> * In skimming your documentation (which may be dated) it 
> didn't seem that Trading Shim parsed the IB tick (well their 
> equivalent of tick data) data but that was left up to the 
> individual programmer using it.  Is that correct or did I 
> just get that wrong?

We are very focused with the shim to provide as transparent 
an interpreter as possible, from the binary and ornate command 
language and state machine that the TWS represents, and into 
something textually clean and able to be manipulated as a well 
formed Unix tool feeding a stream.  We also solve the two uses 
of NUL as both a end of line ("EOL") separator, and 
intra-field separator, which causes others a lot of pain, 
making the EOL a NEWLINE

The documentation is accurate, in this regard, but finding the 
answer needs a Rosetta stone, and we handle that in the 
mailing list for the project presently.  A manual write / 
re-write is on the docket, but we have some features to attain 
first, that make it a bit premature to spend time documenting 
items that were not yet 'nailed down' as to their final 
implementation details.  The description of the command 
language in the online material, and the discussion of the 
configuration file options stand out as sore thumbs that do 
reflect that we turned out to not anticipate how they would 
finally end up; that said, most of it remains accurate

Turning to consider ticks, you have noted that IB provides 
'summarized' ticks:

Each tick element type is distinctly marked by IB, and thence 
the shim, and has been from the first external deliverable 
release -- The 'marking' is in the 'pipe' ["|" delimited] 
field values, which pass through the IB message values

While these values are potentially changeable by IB, and thus 
our outreference to their documentation,
 	http://www.trading-shim.org/doc/node55.html
it turns out that IB has been extremely good about NOT 
changing the mappings

Bill had a post about reading history this week, but the same 
principles apply to ticks
 	http://www.trading-shim.org/pipermail/ts-general/2010-March/000715.html
and his example here was apple ("AAPL") in the 190s:

     1| 1|   1?3.4|       3|1|price.outcry.bid.   |STK:SMART:AAPL:
     1| 2|   1?3.6|       2|1|price.outcry.ask.   |STK:SMART:AAPL:
     1| 4|   1?3.6|       1|0|price.summary.last. |STK:SMART:AAPL:
     1| 0|                3|0|size.bid.           |STK:SMART:AAPL:
     1| 3|                2|0|size.ask.           |STK:SMART:AAPL:
     1| 5|                1|0|size.last.          |STK:SMART:AAPL:
     1| 8|           1?9494|0|size.volume.        |STK:SMART:AAPL:
     1| 6|   1?5.0|       0|0|price.summary.high. |STK:SMART:AAPL:
     1| 7|   1?2.9|       0|0|price.summary.low.  |STK:SMART:AAPL:
     1| 9|   1?2.8|       0|0|price.summary.close.|STK:SMART:AAPL:

We see that the following tick detail are present both in a 
numeric and a matching (added by us, expanded) text form:
 	1| 0		size.bid.
 	1| 1		price.outcry.bid.
 	1| 2		price.outcry.ask.
 	1| 3		size.ask.
 	1| 4		price.summary.last.
 	1| 5		size.last.
 	1| 6		price.summary.low.
 	1| 7		price.summary.low.
 	1| 8		size.volume.
 				(here, cumulative, for the session)
 	1| 9		price.summary.close.

There are additional attributes added with later server and 
client versions, but those tick attributes shown are present 
for ALL versions of the TWS supported by IB presently

> In other words, I want each element of 
> the tick stream mapped to a distinct data object (or 
> variable) that can then be manipulated directly.

* nod * .. using the left column as a pointer to index to a 
data object name, you have just such a mapping trivially -- My 
personal downstream parser maintains just such a table (I use 
an associative array in the particular implementation I looked 
at, I see ... slower, but it has never gotten 'overrun' by the 
data stream, so there is not reason for me to refactor it yet)

> It seemed like Trading Shim provided a data stream, but not 
> a parsed data stream ready for consumption by an analysis 
> module.

The parsing is subtle to a person who has not had to 'muck' 
through binary traces -- NUL to 'pipe' or NEWLINE, as the 
context is implied

> Is that part of the current functionality of Trading Shim or 
> is that "an exercise left to the reader"?

the former, as noted

> * In looking at the doctoral paper, it seems that one of the 
> project objectives was to create a multi-threaded, 
> concurrent downloading tool.

I'll let Bill speak to his paper and the doctoral studies 
objectives, as its author.  :)

> Is that supported in the Trading Shim?

The pronoun here referring to 'multi-threaded' concurrency, I 
think.  The shim has never needed to add multiple threads to 
fully keep up with the content fed to it by the TWS, and has a 
'select' based input model which makes it pretty unlikely that 
it would need to do so.  See:
 	man 2 select_tut
for a far better discussion of the fine points of 'select(2)' 
than I could ever write, with worked example code

> How are the multiple sessions spawned and 
> managed?  (If this is documented, just point me to the 
> appropriate materials).  How are the different sessions 
> managed in such a way to keep the sequence of the data 
> stream in the proper event sequence?  Is there a means of 
> buffering (writing out to temp files possibly) the results 
> until the backend database can write out the data stream?

I will give a high level walk-through, and hope that I do not 
oversimplify the complexity of what is happening. 
Particularly, there is an amazing amount of detail checking 
and precise actions building data structures, and validating 
inputs happening 'under the hood' in the shim which I almost 
wholly gloss over.  The details are out of scope here, but 
stated in the common form:

 	the shim is a racehorse, built for speed and extremely
 	fast throughout, consistent with accuracy

I sample and track the competition, and have no doubt we 
'scream' through data that others wallow and waddle about in
 	http://www.trading-shim.org/faq/?other-voices

The first question uses the term 'multiple sessions' in a 
context that I think means 'multiple connections to a single 
upstream TWS'.  The TWS provides for up to 8 concurrent socket 
connections from downstream clients (and seemingly one or more 
additional but non-socket based connections to support the 
Java AWT GUI displays.  The IB .jar providing that 
functionality is obfuscated and I have not tried to look)

One usage model we think sensible for the shim is to have a 
single 'trading mode' connection, able to enter and supervise 
orders, and as many 'data mode' connections up to the limit of 
7 remaining (== 8 minus 1), as one wishes.   It turns out that 
one can do all 'data mode' operations in 'trading ['risk'] 
mode' so as few as one socket connection to the TWS by the 
shim can suffice for both 'risk' and 'data' operations, at 
the risk of adding latencies, discussed later herein.

These are simple 'nix processes, started from the command 
line; triggered by clicking a GUI wrapper to fire off a 
command; spawned as a sub-task from an already running parent; 
or living in the inittab.  Whatever.  At one point we had the 
shim firing off a charting data refresh and display thread, 
but I think that turned out to be a dead end and that code 
removed.  What I am saying is that there is no magic to 
starting a process to connect to the TWS socket interface, 
whether it is the shim, or otherwise.

The SHIM in its start up process reads an optionally present 
configuration file, and then reads the standard input for 
optional additional configuration details in a 'let' command, 
recently described on the mailing list in a post by Bill. 
There are also 'hard-coded' configuration values in the source 
code, which may or may not have been over-ridden by values 
from the config file or the 'let' command.

Thereafter in the usual case, the shim connects to the 
database, and to the TWS as the net effective configuration 
settings indicate, and resumes processing input as it appears 
on the standard in, and talking back and forth to the TWS 
across the socket interface.  It is well possible to graft 
additional data and trade and control stream connectors onto 
the shim's current complement, but we have not encountered a 
compelling use case able to released in the GPLv3 code, to 
expose such an interface here.  Talk to me privately for a 
quote if you think you need such

The next question is: "How are the different sessions managed 
in such a way to keep the sequence of the data stream in the 
proper event sequence"  and I think a quick review of 
assumptions is in order.  There is a whole collection of 
'unknowable unknown' latencies in play as IB marshalls data to 
deliver to the TWS on the TWS' upstream side.  A trivial 
example: Are ticks for a given instrument being received 
through multicast by IB, or across a low speed serial TTY 
link?

We can speculate as to some common instruments, but if a stock 
is essentially traded OTC or on a bid and ask 'quotation' 
basis, the tick data may be quite stale -- I watched SCOX one 
morning a few years ago, and there was not a single tick or 
trade until after 10:45 -- it looked like the sole market 
maker just overslept ;)

What are the data transmission latencies TO IB's plants; what 
are the data cleanup and transfer latencies inside and between 
IB's plants? What latency does the particular IB load balancer 
one is connected to add?  Does it differ between .us, ,.ch and 
.hk?  Almost certainly.  Look at the 'data farm' up and down 
messages through a trading day.  They come, they go, and one 
may not be ABLE to even detect an outage for a quarter-minute 
or two

Once delivered to the local TWS, across a data link of 
variable latency, if there is only a single connection, 
pulling a big chunk of history, I know that any transaction 
detail that came in, and all tickstream data, will be delayed 
until that history transfer is completed, because of the way 
the TWS message that is  carrying history detail is structured

[This is part of the reason I mentioned the shim's ability to 
running a 'data' connection, and in another separate shim, one 
as the 'risk' connection -- one can then structure and 
classify one's apparent latencies are expressed.  But think 
this through -- there is also only a single encrypt-able FIX 
connection between the TWS and its IB counter-party servers. 
What roadblock did it add?  We cannot see, and so cannot know 
but rather only speculate]

A particular symbol's tick datastream ** seems ** to be 
sequentially delivered, when I have compare recorded tick 
streams against one second historical data, but I know of no 
guarantees by IB in this regard.

Once down at the TWS, each session between it and a shim 
instance are TCP connections, and as such, natively 
serialized.

Then the shim processes its 'select' data basically in a 
round-robin fashion, and enqueues it for (writing to a file, 
sending to the log, sending to standard out, or a combination 
thereof) and there are minimal delays (usually a maximum of 20 
mSec, usually sub micro-Sec, if one is on the right part of 
the processing cycle) before it it is done being handed 
off to the host operating system's routines. ... again 
latencies of appearance which I cannot control and of which do 
not speak, as they will vary

A trading strategy that requires sub 20 mSec response CANNOT 
be run across data links that take over 2 times that to 
transfer data to and from the counter-party [IB's next hop is 
at least that far away from anyone not co-located at a 
data-center with them], and it is a fools game to pretend 
otherwise.  Probably a trading strategy that HAS to see EVERY 
TICK should consider a platform other than IB, as IB does not 
purport to deliver such, not to deliver without time skew 
between symbol samples

The last question about 'holding back' sending messages along 
until the database is coherent at a given state is not 
something we do, nor that I see that we would ever want to do. 
Do you have a use case please?

> * Has your team done any work with NVidia's CUDA platform? 
> If not, is that on your roadmap (to leverage GPU's for 
> massaging datastreams) with some type of target date?

The 'CUDA' numeric co-processor parallelization is more 
applicable downstream of the shim's functions {very high speed 
simulation and solving of projected trading surface topology}. 
The shim is so fast that the book-keeping overhead of farming 
out, and synchronizing returns would be wasted.  It would be 
bolting a battery of jet engines onto a barracuda ... not 
really a good fit ;)

I do not discount 'CUDA' at all -- it is just not needed at 
the layer the shim is working at

> * I should have stated that I'm using Ubuntu 64 
> Desktop/Server editions as to access the larger memory that 
> may be necessary as more data is retained in memory to avoid 
> backend database IO issues.  Does your code leverage a 64bit 
> platform?

We develop on 64 bit, and encountered an anomaly with an older 
Debian Stable compiler on a 32 bit platform recently.  I 
cannot presently recreate the anomaly after a recent 
re-factoring.  Our peak memory uses far below the limits a 32 
bit 'nix architecture imposes, but yes, we can use a 64 bit 
arch quite well ;)

> * The Freshmeat site has some broken links - you may want to 
> check it or have their webmaster fix them

We do not have access to alter what appears at freshmeat 
beyond using their web interface.  Please send me a private 
note of the outages, and I will pass it along to their admins. 
I know they have been 'refreshing' their site recently, and 
regret that they have removed some detail I enjoyed reading

> * You mentioned exploratory project estimates.  At this 
> point, I'm just looking for IB to provide data.  Since the 
> historical data lacks certain data elements, I'm looking at 
> using the "tick" feed as the source for all data.  This 
> would require streaming out the tick data to a database 
> while also using to feed real-time analytics.  Does the 
> "off-the-shelf" Trading Shim provide this or would this be 
> considered custom or add-on work to achieve this?

This database write function is doable natively, and has been 
'forever' -- it would be an example in the 'open loop' (no 
closed feedback, back into the shim's standard in) client 
model.

To implement it, one hooks a simple state machine filter 
written in the scripting language of your choice to accept its 
standard in through a pipe on the standard out of the shim, 
and have that downstream client emit database transactions to 
taste; additionally you can have the filter (in its state 
machine aspect) fire off (say through a 'curl' POST) a control 
event to signal a co-process to wake up and refresh analytics 
only when new data is present (compared to say, refreshing 
'regardless' every 5 seconds)

Alternatively one can hook a 'reader' to 'tail' the syslog 
file which the shim has been set up to write to, and post 
process there in the same fashion

Alternatively one can hook a 'reader' to 'tail' the output 
file which the shim has been set up to write to, and post 
process there in the same fashion

The shim is designed to be a well-behaved 'nix tool  ;)

-- Russ herrold


More information about the ts-general mailing list