R and MySQL for tick data - Matlab, R project and Python | futures io social day trading
futures io futures trading


R and MySQL for tick data
Updated: Views / Replies:1,430 / 3
Created: by Big Mike Attachments:0

Welcome to futures io.

(If you already have an account, login at the top of the page)

futures io is the largest futures trading community on the planet, with over 90,000 members. At futures io, our goal has always been and always will be to create a friendly, positive, forward-thinking community where members can openly share and discuss everything the world of trading has to offer. The community is one of the friendliest you will find on any subject, with members going out of their way to help others. Some of the primary differences between futures io and other trading sites revolve around the standards of our community. Those standards include a code of conduct for our members, as well as extremely high standards that govern which partners we do business with, and which products or services we recommend to our members.

At futures io, our focus is on quality education. No hype, gimmicks, or secret sauce. The truth is: trading is hard. To succeed, you need to surround yourself with the right support system, educational content, and trading mentors Ė all of which you can find on futures io, utilizing our social trading environment.

With futures io, you can find honest trading reviews on brokers, trading rooms, indicator packages, trading strategies, and much more. Our trading review process is highly moderated to ensure that only genuine users are allowed, so you donít need to worry about fake reviews.

We are fundamentally different than most other trading sites:
  • We are here to help. Just let us know what you need.
  • We work extremely hard to keep things positive in our community.
  • We do not tolerate rude behavior, trolling, or vendors advertising in posts.
  • We firmly believe in and encourage sharing. The holy grail is within you, we can help you find it.
  • We expect our members to participate and become a part of the community. Help yourself by helping others.

You'll need to register in order to view the content of the threads and start contributing to our community.  It's free and simple.

-- Big Mike, Site Administrator

Reply
 
Thread Tools Search this Thread
 

R and MySQL for tick data

  #1 (permalink)
Site Administrator
Manta, Ecuador
 
Futures Experience: Advanced
Platform: My own custom solution
Favorite Futures: E-mini ES S&P 500
 
Big Mike's Avatar
 
Posts: 46,240 since Jun 2009
Thanks: 29,353 given, 83,234 received

R and MySQL for tick data

First, I want to say that the main MySQL tick data thread is here:

https://futures.io/elite-circle/21664-using-mysql-storing-tick-data.html

I don't want to have a split topic but I do think some discussion specific to using R and MySQL for tick data is applicable to the R subforum.

I've been storing tick data in MySQL for years but am recently expanding this project. Right now my database is around 50GB for 2 symbols (CL, ES) for 4 years of tick data.

I am going to expand this to around 20 symbols soon but wanted a better way of storing the data.

So I wrote this post:
https://futures.io/elite-circle/21664-using-mysql-storing-tick-data-9.html#post366671

Which I will go ahead and copy below for this thread:


Big Mike's other thread
I wanted to get input from you guys. I am continuing to expand my database efforts, in particular with tick data.

I am doing analysis on this data in R, but I also need to be able to do some SQL queries on the data from time to time.

This has lead to a problem. With SQL, the db is enormous and growing rapidly. For just a couple of symbols we're already over 50GB, and I am expanding to 20 symbols soon.

The file space is not the problem. I have 70TB of local storage... the problem is the amount of time it takes to run queries.

Even with 32GB ram on this machine, the most it can do (i5 class), SQL queries can take 10 minutes because it cannot keep the entire structure in memory.

Now, going forward I plan to be using R for almost everything, and minimizing SQL queries. But the issue is that just importing the data into R (SELECT query) can take 10 minutes... again, same reasons.

Here is the current table structure:

 
Code
CREATE TABLE `ES` (
  `key` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `date` date DEFAULT NULL,
  `time` time DEFAULT NULL,
  `last` double unsigned DEFAULT NULL,
  `lastsize` smallint(5) unsigned DEFAULT NULL,
  `bid` double unsigned DEFAULT NULL,
  `ask` double unsigned DEFAULT NULL,
  PRIMARY KEY (`key`),
  KEY `datetimelast` (`date`,`time`,`last`,`lastsize`)
) ENGINE=InnoDB;
So I was wondering if it would be more efficient as so:

 
Code
CREATE TABLE `ES` (
  `key` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `date` date DEFAULT NULL,
  `startid` double unsigned DEFAULT NULL,
  `endid` smallint(5) unsigned DEFAULT NULL,
  `file` char(10) DEFAULT NULL,
  PRIMARY KEY (`key`),
) ENGINE=InnoDB
date = YYYY-MM-DD, one entry per day
startid = IQFeed TickID for the first tick of 'date'
endid = IQFeed TickID for the last tick of 'date'
file = filename on filesystem that represents the actual data for this day (ie: 2013-11-05.csv)

Then 2013-11-05.csv on filesystem would look like:
time, last, lastsize, bid, ask

But we can make optimizations there too. Instead of each line having the full time stamp, full last price, full bid and full ask, why not use an offset?

So instead of:
11:22:05.201,41.25,15,41.24,41.26
11:22:06.725,41.26,21,41.25,41.27

We could storage just the full first tick and then subsequent ticks could be stored as an offset:
11:22:05.201,41.25,15,41.24,41.26
1.524,1,6,1,1

1.524 = difference from last timestamp was +1.524
1 = difference from last timestamp was 1 tick, use -1 if price had decreased by same
6 = difference from last timestamp was 6 size, use -6 if size had decreased by same
1 = difference from last timestamp was 1 tick bid price, use -1 as needed
1 = difference from last timestamp was 1 tick ask price, use -1 as needed

So now we have just one row in the table per day that tells us the starting tickid and ending tickid, and where to find the ticks themselves on the filesystem.

Then the actual data on the filesystem will be half the size because of using offsets instead of full values.

What this means is that when I am importing the data into R:

- Should be twice as fast in terms of disk I/O (half the disk I/O)
- Slower due to would need to modify the data (+/- offset) before mapping it to an array

But I think that memory I/O is ~100x faster than disk I/O, so the overhead from the +/- offset calculations would be far offset by reduced disk I/O.

In the end, in R, the data would be the same in the array as if I stored it all in MySQL like the first table structure example.

Any thoughts on this?

Mike

I think that for storing light data, like daily bars, MySQL by itself is fine. But the issue is when you get into databases being 500GB - 1TB like I am looking at, better solutions are required.

Simply looking at column oriented databases is not the answer. I already went down that path (discussed in the main MySQL thread linked above). Besides, for tick databases you are almost always just doing straight select's of the entire range of data, say 1 day at a time, 1 week at a time, 1 month, 1 year whatever.

So I think what I am proposing makes more sense, but I have not had time to test it yet. Before I start heading that direction I wanted to hear from some others to see if what I've written makes sense or could be improved.

Mike

Due to time constraints, please do not PM me if your question can be resolved or answered on the forum.

Need help?
1) Stop changing things. No new indicators, charts, or methods. Be consistent with what is in front of you first.
2) Start a journal and post to it daily with the trades you made to show your strengths and weaknesses.
3) Set goals for yourself to reach daily. Make them about how you trade, not how much money you make.
4) Accept responsibility for your actions. Stop looking elsewhere to explain away poor performance.
5) Where to start as a trader? Watch this webinar and read this thread for hundreds of questions and answers.
6)
Help using the forum? Watch this video to learn general tips on using the site.

If you want
to support our community, become an Elite Member.

Reply With Quote
The following user says Thank You to Big Mike for this post:
 
  #2 (permalink)
Quick Summary
Quick Summary Post

Quick Summary is created and edited by users like you... Add FAQ's, Links and other Relevant Information by clicking the edit button in the lower right hand corner of this message.

 
  #3 (permalink)
Site Administrator
Manta, Ecuador
 
Futures Experience: Advanced
Platform: My own custom solution
Favorite Futures: E-mini ES S&P 500
 
Big Mike's Avatar
 
Posts: 46,240 since Jun 2009
Thanks: 29,353 given, 83,234 received


I took one year of tick data (267M ticks, 25GB file) for SPY and compared the speed of fetching this from MySQL vs fetching from flat file in filesystem. The flat file is exported from MySQL, it is not based on the new method (no offsets, no length savings).

 
Code
# Standard fread flat file

> system.time(temp <- fread('/shares/temp.txt'))
   user  system elapsed 
186.988  18.852 334.565 

> nrow(temp)
[1] 267637370

# Pull same number of ticks from SQL table

> system.time(sql <- dbGetQuery(con,statement='SELECT * from SPY ORDER BY 1 LIMIT 267637370'))
   user  system elapsed 
136.340  11.292 675.592
Result is a 49% speed increase by using flat file instead of MySQL.

Mike

Due to time constraints, please do not PM me if your question can be resolved or answered on the forum.

Need help?
1) Stop changing things. No new indicators, charts, or methods. Be consistent with what is in front of you first.
2) Start a journal and post to it daily with the trades you made to show your strengths and weaknesses.
3) Set goals for yourself to reach daily. Make them about how you trade, not how much money you make.
4) Accept responsibility for your actions. Stop looking elsewhere to explain away poor performance.
5) Where to start as a trader? Watch this webinar and read this thread for hundreds of questions and answers.
6)
Help using the forum? Watch this video to learn general tips on using the site.

If you want
to support our community, become an Elite Member.

Reply With Quote
The following user says Thank You to Big Mike for this post:
 
  #4 (permalink)
Elite Member
near Paris, France
 
Futures Experience: Beginner
Platform: -
Favorite Futures: -
 
Nicolas11's Avatar
 
Posts: 1,070 since Aug 2011
Thanks: 2,232 given, 1,729 received

For what is worth, an article : Translating between R and SQL: the basics
Translating between R and SQL: the basics - Burns Statistics

Reply With Quote
The following 3 users say Thank You to Nicolas11 for this post:

Reply



futures io > > > > R and MySQL for tick data

Thread Tools Search this Thread
Search this Thread:

Advanced Search



Upcoming Webinars and Events (4:30PM ET unless noted)

Jigsaw Trading: TBA

Elite only

FuturesTrader71: TBA

Elite only

NinjaTrader: TBA

Jan 18

RandBots: TBA

Jan 23

GFF Brokers & CME Group: Futures & Bitcoin

Elite only

Adam Grimes: TBA

Elite only

Ran Aroussi: TBA

Elite only
     

Similar Threads
Thread Thread Starter Forum Replies Last Post
Using MySQL for storing tick data Big Mike The Elite Circle 131 October 3rd, 2016 04:09 PM
Looking for Software to store Price data in MySQL caliph Matlab, R project and Python 4 January 7th, 2013 03:11 PM
Store real time data from NT in mysql bascher NinjaTrader 23 November 4th, 2011 06:53 AM
trying to make a chartinig program that goes through data tick by tick acdrew0 Platforms and Indicators 11 January 27th, 2011 01:55 PM
Platform that can go tick by tick through historical data acdrew0 Platforms and Indicators 1 October 21st, 2010 01:10 PM


All times are GMT -4. The time now is 07:58 PM.

Copyright © 2017 by futures io, s.a., Av Ricardo J. Alfaro, Century Tower, Panama, +507 833-9432, info@futures.io
All information is for educational use only and is not investment advice.
There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
no new posts
Page generated 2017-12-15 in 0.09 seconds with 19 queries on phoenix via your IP 54.163.61.66