[ts-gen] Simple way to download historical data into a file

Bill Pippin pippin at owlriver.net
Fri Mar 13 15:33:46 EDT 2009


Someone on the yahoo list asked for a simple way to download
historical data, and I'm reposting a (lightly edited) copy of
my reply here.

For those already familiar with the exs/past.30.rb script, you
should know that it has been revised to work with the new value
form contract syntax, so that it is easier to understand; the
array of symbols is now simply a list of dow30 component names.

The repost follows:

... included a crude ruby script below that downloads 1 min
bars for the dow 30 from the beginning of the year up until today.
Its key virtue is its simplicity, and there are a number of changes
that you might make for production use.

I posted an older form of this script before, in response to a similar
question last year, and the new form below reflects both changes to
the dates of trading days, and also improvements to the shim's command
language.

In the past, users needed to look up contract ids in the database,
and use those numbers in commands to refer to the contracts.  Now,
in addition to the IB conids from the database, where the contract
expression would be, e.g., ibc:266093 at SMART, it's also possible to
use contract value expressions such as STK:SMART:AMAT:USD directly,
and this new form is illustrated with the ruby script I'm posting
below.

The core of the script is a loop that sends commands down a pipe:

    for i in mons
    for j in days[i]
    for s in syms
           printf past, s, i+1, j, "16:00:00";      printf wait;
      Shim.printf past, s, i+1, j, "16:00:00"; Shim.printf wait;
      sleep(11);
    end end end
    Shim.printf exit

The variable past is a format statement for a history command, where
past = "select past STK:SMART:%-4s:USD m1 1 1d Ymd_T(2009%02u%02u  %s);"
The result from the ruby print formatting is a sequence of commands
such as the following:

    select past STK:SMART:AA  :USD m1 1 1d Ymd_T(20090102  16:00:00);
    select past STK:SMART:AXP :USD m1 1 1d Ymd_T(20090102  16:00:00);
    select past STK:SMART:BA  :USD m1 1 1d Ymd_T(20090102  16:00:00);
    select past STK:SMART:BAC :USD m1 1 1d Ymd_T(20090102  16:00:00);
    select past STK:SMART:C   :USD m1 1 1d Ymd_T(20090102  16:00:00);
    ...

In the commands above, the verb phrase "select past" indicates a
history query; the following colon separated expression is a contract
value expression, as mentioned above; and the additional arguments
are: a literal, m1, for 1-minute bars; a record index, 1, for other,
more obscure history query operands to be retrieved from the database;
another literal, 1d, for a query interval of 1 day; and the end
date-time, 4pm each trading day.

The popen to get a pipe is also straight-forward:

    Shim = IO.popen("./shim --data file save", "w")

Results appear in the shim's log file, due to the "file" command line
option in the popen above, and could be dup'd to the ruby script
by including a "cout" argument as well.  Log file text would include,
in part, text such as the following (dates are obscured to comply with
IB's license restrictions on redistribution of price data):

yyyymmdd  09:31:00|84.41|84.49|84.12|84.24|   192|84.27|false|STK:SMART:IBM:
yyyymmdd  09:32:00|84.34|84.49|84.33|84.39|   198|84.37|false|STK:SMART:IBM:
yyyymmdd  09:33:00|84.38|84.44|84.31|84.35|   109|84.35|false|STK:SMART:IBM:

For interpretation of the message attributes above, see the IB sample
client sources.  The complete ruby script follows my sig, and is also
included with the sources in the examples directory when you download
the shim.

Thanks,

Bill

-------- cut here ------------------------------------------------------

#!/usr/bin/ruby

#  author: Bill Pippin
#  email: pippin at trading-shim dot com; msgs may gate to the mailing list
#  copyright (c) 2008 Trading-shim.com, LLC  Columbus, OH
#  GPL version 3 or later, see COPYING for details
#  Updated 20090313 to reflect new contract value syntax and 2009 calendar.

=begin

  Collect one minute history bars for each:

    o trading week day from the beginning of the year to yesterday
    o regular trading hour from 9:00 to 16:00
    o contract of the dow 30

  See additional comments following the executable code for details.

=end

Shim = IO.popen("./shim --data file save", "w")
mons = [ 0, 1, 2 ]
days = [

#M  T  W  R  F   M  T  W  R  F   M  T  W  R  F   M  T  W  R  F   M  T  W  R  F
[            2,  5, 6, 7, 8, 9, 12,13,14,15,16,    20,21,22,23, 26,27,28,29,30]
,
[2, 3, 4, 5, 6,  9,10,11,12,13,    17,18,19,20, 23,24,25,26,27                ]
,
[2, 3, 4, 5, 6,  9,10,11,12                                                   ]
]

time = [" 9:30:00", "10:00:00", "10:30:00", "11:00:00", "11:30:00", 
                    "12:00:00", "12:30:00", "13:00:00", "13:30:00", 
                    "14:00:00", "14:30:00", "15:00:00", "15:30:00", "16:00:00"]

syms = [
    "AA"  , "AXP" , "BA"  , "BAC" , "C"   , "CAT",
    "CVX" , "DD"  , "DIS" , "GE"  , "GM"  , "HD" ,
    "HPQ" , "IBM" , "INTC", "JNJ" , "JPM" , "KFT",
    "KO"  , "MCD" , "MMM" , "MRK" , "MSFT", "PFE",
    "PG"  , "T"   , "UTX" , "VZ"  , "WMT" , "XOM"
]

past = "select past STK:SMART:%-4s:USD m1 1 1d Ymd_T(2009%02u%02u  %s);\t"
wait = "wait 11;\n"
exit = "exit;\n"

# for each month in jan..mar
# for each weekday in the month until yesterday, excluding holidays
# for each half hour in regular trading hours
#   emit query, pause the shim for 11 sec, and sleep for 11 sec

                                                               sleep( 2);
  for i in mons
  for j in days[i]
  for s in syms
         printf past, s, i+1, j, "16:00:00";      printf wait;
    Shim.printf past, s, i+1, j, "16:00:00"; Shim.printf wait; sleep(11);
  end end end
  Shim.printf exit
exit

=begin

  Data will show up in the log file (defaults to log/ShimText); the
  shim must be installed, and exist as a binary in the current
  directory.  You'll need to uncomment out the appropriate line in the
  body of the inner loop to print text output to the shim subprocess.

  The literal arguments in the query format variable "past" select one
  minute TRADE bars during regular trading hours for the day; other
  parameters to the printf statement can be used to change the query
  granularity, and the time array is provided for that purpose.  If
  you do want to modify this script, consider the following:

  As written, with queries for one minute bars over the trading day, this
  script produces nearly 1500 queries.  That's 48 days, times 30 symbols
  in the dow 30.  At eleven seconds each, this'll take around 4 hours
  to run.

  If you want to increase the bar frequency, e.g., to one second
  intervals, consider first that using a half-hour duration for queries
  will require nearly two 40 hour collection weeks.  Catching up is
  hard to do!

  At this point you'll be needing to stop and restart the script from
  one day to the next.  In addition, since valid queries must request
  no more than 2k records, multiple queries would be needed for each
  day, so that it would also be necessary to insert a for loop using
  the table "time", and replace the end-of-day literal in the query
  print statement with the value indexed from time.  

  Modifying this script to synchronize the ruby script with the shim
  (use the cout command line option to dup the log output to the script)
  parameterize the PastFilter table index, choose subranges of the data,
  understand the calendar, check regular trading hours for the route and
  symbol, and otherwise preen data, is left as an exercise.

=end



More information about the ts-general mailing list