[ts-gen] Simple way to download historical data into a file
Bill Pippin
pippin at owlriver.net
Fri Mar 13 15:33:46 EDT 2009
Someone on the yahoo list asked for a simple way to download
historical data, and I'm reposting a (lightly edited) copy of
my reply here.
For those already familiar with the exs/past.30.rb script, you
should know that it has been revised to work with the new value
form contract syntax, so that it is easier to understand; the
array of symbols is now simply a list of dow30 component names.
The repost follows:
... included a crude ruby script below that downloads 1 min
bars for the dow 30 from the beginning of the year up until today.
Its key virtue is its simplicity, and there are a number of changes
that you might make for production use.
I posted an older form of this script before, in response to a similar
question last year, and the new form below reflects both changes to
the dates of trading days, and also improvements to the shim's command
language.
In the past, users needed to look up contract ids in the database,
and use those numbers in commands to refer to the contracts. Now,
in addition to the IB conids from the database, where the contract
expression would be, e.g., ibc:266093 at SMART, it's also possible to
use contract value expressions such as STK:SMART:AMAT:USD directly,
and this new form is illustrated with the ruby script I'm posting
below.
The core of the script is a loop that sends commands down a pipe:
for i in mons
for j in days[i]
for s in syms
printf past, s, i+1, j, "16:00:00"; printf wait;
Shim.printf past, s, i+1, j, "16:00:00"; Shim.printf wait;
sleep(11);
end end end
Shim.printf exit
The variable past is a format statement for a history command, where
past = "select past STK:SMART:%-4s:USD m1 1 1d Ymd_T(2009%02u%02u %s);"
The result from the ruby print formatting is a sequence of commands
such as the following:
select past STK:SMART:AA :USD m1 1 1d Ymd_T(20090102 16:00:00);
select past STK:SMART:AXP :USD m1 1 1d Ymd_T(20090102 16:00:00);
select past STK:SMART:BA :USD m1 1 1d Ymd_T(20090102 16:00:00);
select past STK:SMART:BAC :USD m1 1 1d Ymd_T(20090102 16:00:00);
select past STK:SMART:C :USD m1 1 1d Ymd_T(20090102 16:00:00);
...
In the commands above, the verb phrase "select past" indicates a
history query; the following colon separated expression is a contract
value expression, as mentioned above; and the additional arguments
are: a literal, m1, for 1-minute bars; a record index, 1, for other,
more obscure history query operands to be retrieved from the database;
another literal, 1d, for a query interval of 1 day; and the end
date-time, 4pm each trading day.
The popen to get a pipe is also straight-forward:
Shim = IO.popen("./shim --data file save", "w")
Results appear in the shim's log file, due to the "file" command line
option in the popen above, and could be dup'd to the ruby script
by including a "cout" argument as well. Log file text would include,
in part, text such as the following (dates are obscured to comply with
IB's license restrictions on redistribution of price data):
yyyymmdd 09:31:00|84.41|84.49|84.12|84.24| 192|84.27|false|STK:SMART:IBM:
yyyymmdd 09:32:00|84.34|84.49|84.33|84.39| 198|84.37|false|STK:SMART:IBM:
yyyymmdd 09:33:00|84.38|84.44|84.31|84.35| 109|84.35|false|STK:SMART:IBM:
For interpretation of the message attributes above, see the IB sample
client sources. The complete ruby script follows my sig, and is also
included with the sources in the examples directory when you download
the shim.
Thanks,
Bill
-------- cut here ------------------------------------------------------
#!/usr/bin/ruby
# author: Bill Pippin
# email: pippin at trading-shim dot com; msgs may gate to the mailing list
# copyright (c) 2008 Trading-shim.com, LLC Columbus, OH
# GPL version 3 or later, see COPYING for details
# Updated 20090313 to reflect new contract value syntax and 2009 calendar.
=begin
Collect one minute history bars for each:
o trading week day from the beginning of the year to yesterday
o regular trading hour from 9:00 to 16:00
o contract of the dow 30
See additional comments following the executable code for details.
=end
Shim = IO.popen("./shim --data file save", "w")
mons = [ 0, 1, 2 ]
days = [
#M T W R F M T W R F M T W R F M T W R F M T W R F
[ 2, 5, 6, 7, 8, 9, 12,13,14,15,16, 20,21,22,23, 26,27,28,29,30]
,
[2, 3, 4, 5, 6, 9,10,11,12,13, 17,18,19,20, 23,24,25,26,27 ]
,
[2, 3, 4, 5, 6, 9,10,11,12 ]
]
time = [" 9:30:00", "10:00:00", "10:30:00", "11:00:00", "11:30:00",
"12:00:00", "12:30:00", "13:00:00", "13:30:00",
"14:00:00", "14:30:00", "15:00:00", "15:30:00", "16:00:00"]
syms = [
"AA" , "AXP" , "BA" , "BAC" , "C" , "CAT",
"CVX" , "DD" , "DIS" , "GE" , "GM" , "HD" ,
"HPQ" , "IBM" , "INTC", "JNJ" , "JPM" , "KFT",
"KO" , "MCD" , "MMM" , "MRK" , "MSFT", "PFE",
"PG" , "T" , "UTX" , "VZ" , "WMT" , "XOM"
]
past = "select past STK:SMART:%-4s:USD m1 1 1d Ymd_T(2009%02u%02u %s);\t"
wait = "wait 11;\n"
exit = "exit;\n"
# for each month in jan..mar
# for each weekday in the month until yesterday, excluding holidays
# for each half hour in regular trading hours
# emit query, pause the shim for 11 sec, and sleep for 11 sec
sleep( 2);
for i in mons
for j in days[i]
for s in syms
printf past, s, i+1, j, "16:00:00"; printf wait;
Shim.printf past, s, i+1, j, "16:00:00"; Shim.printf wait; sleep(11);
end end end
Shim.printf exit
exit
=begin
Data will show up in the log file (defaults to log/ShimText); the
shim must be installed, and exist as a binary in the current
directory. You'll need to uncomment out the appropriate line in the
body of the inner loop to print text output to the shim subprocess.
The literal arguments in the query format variable "past" select one
minute TRADE bars during regular trading hours for the day; other
parameters to the printf statement can be used to change the query
granularity, and the time array is provided for that purpose. If
you do want to modify this script, consider the following:
As written, with queries for one minute bars over the trading day, this
script produces nearly 1500 queries. That's 48 days, times 30 symbols
in the dow 30. At eleven seconds each, this'll take around 4 hours
to run.
If you want to increase the bar frequency, e.g., to one second
intervals, consider first that using a half-hour duration for queries
will require nearly two 40 hour collection weeks. Catching up is
hard to do!
At this point you'll be needing to stop and restart the script from
one day to the next. In addition, since valid queries must request
no more than 2k records, multiple queries would be needed for each
day, so that it would also be necessary to insert a for loop using
the table "time", and replace the end-of-day literal in the query
print statement with the value indexed from time.
Modifying this script to synchronize the ruby script with the shim
(use the cout command line option to dup the log output to the script)
parameterize the PastFilter table index, choose subranges of the data,
understand the calendar, check regular trading hours for the route and
symbol, and otherwise preen data, is left as an exercise.
=end
More information about the ts-general
mailing list