Tick Database Storage - futures io
futures io



Tick Database Storage


Discussion in Platforms and Indicators

Updated
      Top Posters
    1. looks_one Big Mike with 10 posts (1 thanks)
    2. looks_two MXASJ with 7 posts (0 thanks)
    3. looks_3 phyzfer with 5 posts (0 thanks)
    4. looks_4 darthtrader3.6 with 2 posts (0 thanks)
      Best Posters
    1. looks_one gomi with 1 thanks per post
    2. looks_two samurai with 1 thanks per post
    3. looks_3 NetTecture with 1 thanks per post
    4. looks_4 Big Mike with 0.1 thanks per post
    1. trending_up 22,801 views
    2. thumb_up 6 thanks given
    3. group 18 followers
    1. forum 40 posts
    2. attach_file 3 attachments




Welcome to futures io: the largest futures trading community on the planet, with well over 125,000 members
  • Genuine reviews from real traders, not fake reviews from stealth vendors
  • Quality education from leading professional traders
  • We are a friendly, helpful, and positive community
  • We do not tolerate rude behavior, trolling, or vendors advertising in posts
  • We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community.  It's free and simple.

-- Big Mike, Site Administrator

(If you already have an account, login at the top of the page)

 
Search this Thread
 

Tick Database Storage

(login for full post details)
  #1 (permalink)
 MXASJ 
Asia
 
Experience: Beginner
Platform: NinjaTrader, TOS
 
Posts: 798 since Jun 2009
Thanks: 109 given, 799 received

Have any of you gone through the process of building a tick database? The deeper I look into it the harder it seems to implement as a one-man show.

I basically need the database to update from Ninja, IQFeed, or csv files, and then have that data available for use back in Ninja, Excel, R, etc. Over time that data becomes important.

It sounds simple enough until you try to do something like taking the raw tick data from two instruments and create 5 minute OHLC "bars" data to read into R for a covariance test or whatever... and then try 3, 7, and 10 minute OHLC data for comparison. I know Matlab has a package called Tick2Bar or something like that but much of this already makes my head hurt (not a Matlab user. Yet.).

Is there a simple way you guys and gals do this? Obviously I want to automate these processes as much as possible.

Started this thread Reply With Quote

Journal Challenge April 2021 results:
Competing for $1800 in prizes from Jigsaw
looks_oneMaking a Living with the Microsby sstheo
(618 thanks from 61 posts)
looks_twoSalao's Journalby Salao
(160 thanks from 29 posts)
looks_3Learning to Profit - A journey in algorithms and optionsby Syntax
(115 thanks from 27 posts)
looks_4Deetee’s DAX Trading Journal (time based)by Deetee
(94 thanks from 30 posts)
looks_5Maybe a little bit different journalby Malykubo
(53 thanks from 32 posts)
 
Best Threads (Most Thanked)
in the last 7 days on futures io
I finally blew up an account
224 thanks
The Crude Dude Oil Trading System
83 thanks
Spoo-nalysis ES e-mini futures S&P 500
65 thanks
The tiyfTradePlanFactory indicator
31 thanks
Building a Crypto Mining Rig
19 thanks
 
(login for full post details)
  #2 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received

I've done it.

I built a Ninja indicator that reads/writes ticks to a MySQL database. Although I just looked and couldn't find the .cs file, I will check for it in another spot later.

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #3 (permalink)
 MXASJ 
Asia
 
Experience: Beginner
Platform: NinjaTrader, TOS
 
Posts: 798 since Jun 2009
Thanks: 109 given, 799 received


I'd love to see that code, Mike. I take it that it doesn't pull data out of the NT database but writes on OnBarUpdate? Does that tax your system?

I've just tried third-party data conversion utility to try to export the NT7 db (MS SQL CE)to mySQL without success... but I admit I'm out of my depth. The file structure thing is confusing to the uninitiated.

I'm beginning to appreciate how much resource some people commit to getting their data managed properly.

Started this thread Reply With Quote
 
(login for full post details)
  #4 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received


MXASJ View Post
I'd love to see that code, Mike. I take it that it doesn't pull data out of the NT database but writes on OnBarUpdate? Does that tax your system?

I've just tried third-party data conversion utility to try to export the NT7 db (MS SQL CE)to mySQL without success... but I admit I'm out of my depth. The file structure thing is confusing to the uninitiated.

I'm beginning to appreciate how much resource some people commit to getting their data managed properly.

Right, it uses OnMarketData and etc, it does not convert the existing database.

I used to run it on a separate box, a Quad Core Q6600, it would come close to maxing out NT 6.5 single-core/thread, it would be much better to do it on NT7. The SQL db server should be local if possible as well to reduce latency. I was recording for a half dozen instruments.

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #5 (permalink)
 swandro 
England
 
Experience: Advanced
Platform: SierraChart
 
Posts: 70 since Jul 2009
Thanks: 9 given, 79 received

Creating n-minute bar data (or indeed daily data etc.) from tick data is quite straightforward. However there are complications. These arise from deciding what to do with missing data. For example, what if the market had no transactions in a particular period - do you create dummy bars to fill in the gaps? The answer depends on what you want to do with the data in your analysis. You do not want dummy bars if you are doing moving-average type studies, but you will need them if you are doing time-related studies.

The second complication is to do with market hours. If you are creating a generalised routine, it will need to know the open and close times for each market, and any holiday dates, half-days etc.

Something else to consider is database size. If you are storing actual tick data, this can create huge databases. These will have to be planned carefully because they will need to be efficient and you will need to be able to back them up somewhere.

A final thought relates to multiple inputs to the database - you will need to decide which is the primary source and what to do when the data overlaps.

Just a few things to think about. It is something I want to do myself one day.

Reply With Quote
 
(login for full post details)
  #6 (permalink)
 darthtrader3.6 
wny
 
Experience: Intermediate
Platform: ninja, excel
Trading: YM, equities
 
darthtrader3.6's Avatar
 
Posts: 86 since Jan 2010


Big Mike View Post
I used to run it on a separate box, a Quad Core Q6600, it would come close to maxing out NT 6.5 single-core/thread, it would be much better to do it on NT7. The SQL db server should be local if possible as well to reduce latency. I was recording for a half dozen instruments.


ouch, I've been wanting to do something like this..it bugs me having all this data streaming to my machine and not saving it but 6 instruments is pretty low for the amount of work involved.
Mike do you know where the bottle neck is with this? I wonder how much more this could be pushed using a time series db as opposed to SQL, if the bottle neck was with SQL.

Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #7 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received

Bottleneck is NT 6.5 not MySQL.

Haven't tried it with NT 7. Also I wrote it a year ago, my coding technique was not nearly as good back then, a newer version could be written from ground up and be far more efficient I am sure.

For instance, I remember I was storing ticks each tick (one query per tick). Duh, that is stupid. You can easily implement some buffering, but a year ago I didn't know how.

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #8 (permalink)
 darthtrader3.6 
wny
 
Experience: Intermediate
Platform: ninja, excel
Trading: YM, equities
 
darthtrader3.6's Avatar
 
Posts: 86 since Jan 2010

Ahh that is cool but I do really wonder what the upper limit would be with sql...
Have you ever messed with Berkley DB?

Here is a guy with a post doing this with some insane specs..
BerkeleyDB: High Volume database hitting limits : Sleepycat, BerkeleyDB, C++ API
"I am using Berkeley DB to store real time ticks for market data. To give you an idea of the amount of data...
- I get roughly 15000 ticks/sec on an average
- by the end of the day the database file grows upto 50GB."

The bottom he says he is subscribed to 10000 instruments. Thats way overkill for my purposes but to me storing less than the 500 in the S&P would not really be worth the effort.

Of course I'm not sure how ninja would handle anything close to that. There are C# bindings for it.
Berkeley DB C# Bindings | Dinosaur Technology and Trading
"To give you an idea of why this is important, take a look at their white paper from 2006 on performance. In a transacted environment they achieved 125,486 single record writes per second. With modern 2009 hardware, and multiple CPU / Solid State Disk systems, this could readily record every single tick coming off of the NYSE and NASDAQ (multiple million per second)."

MXASJ, I'm pretty much in the same boat..ninja, iqfeed...I think even though matlab is expensive it would be worth it over R just as far as time and stuff available.
Matlab supports berkley DB so I wouldn't think it would be too bad once you got stuff streaming into matlab.
The problem is with the low cost data venders you are basically on your own. Esignal doesn't sound like its worth the effort with matlab from what I've found, DTN you can't really find much at all on this...
CQG API supports matlab...I've kind of taken the position that its probly not worth doing this project until your ready to pony up to the next level of datastream expense. Unless someone comes up with some mind blowing solution, but then again I'm just not sure enough people are looking to do this stuff for DTN/esignal to waste their resources on this.

Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #9 (permalink)
 wh 
Neubrandenburg, Germany
 
Experience: Advanced
Platform: R
Trading: Stocks
 
wh's Avatar
 
Posts: 538 since Jun 2009
Thanks: 298 given, 505 received


swandro View Post
Creating n-minute bar data (or indeed daily data etc.) from tick data is quite straightforward. However there are complications. These arise from deciding what to do with missing data. For example, what if the market had no transactions in a particular period - do you create dummy bars to fill in the gaps? The answer depends on what you want to do with the data in your analysis. You do not want dummy bars if you are doing moving-average type studies, but you will need them if you are doing time-related studies.

The second complication is to do with market hours. If you are creating a generalised routine, it will need to know the open and close times for each market, and any holiday dates, half-days etc.

Something else to consider is database size. If you are storing actual tick data, this can create huge databases. These will have to be planned carefully because they will need to be efficient and you will need to be able to back them up somewhere.

A final thought relates to multiple inputs to the database - you will need to decide which is the primary source and what to do when the data overlaps.

Just a few things to think about. It is something I want to do myself one day.

i agree with you, but not to pessimistic ...

https://www.google.com/codesearch/p?hl=de#hAxBGal4fIY/matlabcentral/files/3398/Tick2Bar.m&q=tick2bar&sa=N&cd=1&ct=rc

create bar data, and then smooth this data (maybe with heikin ashi) ... but you are right there a too much approach (forex, stocks, futures) i look on forex and there is no real problem to do this, but it is also depends from your feed

Causality is the relationship between an event (the cause) and a second event (the effect), where the second event is a consequence of the first.
Reply With Quote
 
(login for full post details)
  #10 (permalink)
 MXASJ 
Asia
 
Experience: Beginner
Platform: NinjaTrader, TOS
 
Posts: 798 since Jun 2009
Thanks: 109 given, 799 received


I just had a silly thought that will probably consume me for the rest of today.

A strategy has access to BarsArray[0], which is OHLC data for type, periodicity, and days loaded of the strategy (say ES 03-10, type min, periodicity 5, days back 20).

How easy or difficult would it be to write that "old" BarsArray data to a csv file with time stamps?

Is this me not seeing the forest for the trees or is it something hard to do? I've been always thinking of the Historical Data Manager export function which exports ticks, but if a BarsArray can be written to a file... that could be enormously helpful for a lot of things.

Thoughts? Ideas? I'll go read up on FileRead/Write, Stream, and other similar things I never tried in NT.

Started this thread Reply With Quote
 
(login for full post details)
  #11 (permalink)
 MXASJ 
Asia
 
Experience: Beginner
Platform: NinjaTrader, TOS
 
Posts: 798 since Jun 2009
Thanks: 109 given, 799 received

This outputs to the Output Window OK, going back the number of days specified in "Days to load" when you start the strategy. Next hurdle is getting to write a CSV file to disk:

 
Code
                            
#region Using declarations
using System;
using System.ComponentModel;
using System.Diagnostics;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Xml.Serialization;
using NinjaTrader.Cbi;
using NinjaTrader.Data;
using NinjaTrader.Indicator;
using NinjaTrader.Gui.Chart;
using NinjaTrader.Strategy;
using System.IO;
#endregion
// This namespace holds all strategies and is required. Do not change it.
namespace NinjaTrader.Strategy
{
/// <summary>
/// Enter the description of your strategy here
/// </summary>
[Description("Strategy to write historical data from BarSeries[0]")]
public class 
DataWriter Strategy
{
#region Variables
 
#endregion
/// <summary>
/// This method is used to configure the strategy and is called once before any strategy method is called.
/// </summary>
protected override void Initialize()
{
CalculateOnBarClose true;
BarsRequired 20;
 
}
/// <summary>
/// Called on each bar update event (incoming tick)
/// </summary>
protected override void OnBarUpdate()
{
PrintWithTimeStamp(Open[0] + "," High[0] + "," Low[0] + ","Close[0]);
}
#region Properties
 
#endregion
}

Having a bit of agro with StreamWriter. This one only writes one line, though the Output Window looks OK. Here is the code;

 
Code
                            
{
//Print the data to the Output Window 
PrintWithTimeStamp(Open[0] + "," High[0] + "," Low[0] + "," Close[0] + "," Volume[0]);
 
//Write the data to PathFileName
 
using (StreamWriter sw File.CreateText(PathFileName))
{
sw.WriteLine(Time[0] + "," Open[0] + "," High[0] + "," Low[0] + "," Close[0] + "," Volume[0]);
sw.Close();
}

I've attached the strat as it stands now. NT7.

Attached Files
Register to download File Type: zip DataWriter.zip (1.1 KB, 32 views)
Started this thread Reply With Quote
 
(login for full post details)
  #12 (permalink)
 MXASJ 
Asia
 
Experience: Beginner
Platform: NinjaTrader, TOS
 
Posts: 798 since Jun 2009
Thanks: 109 given, 799 received

Too late to edit the bad code above, so here is the good code. Its based on something that was already available on the Ninja support site (Duh!). My next step is to offer the option of changing the DateTime format.

This should make my workflow a bit simpler for now at least.

Indicator ExportChartData and sample output attached (NT7).

EDIT: I've completed the indicator (mostly) and posted it in the downloads section under Ninja Trader/Misc. Hope it helps.

Attached Files
Register to download File Type: txt ES_03_10_5min.txt (78.1 KB, 54 views)
Started this thread Reply With Quote
 
(login for full post details)
  #13 (permalink)
ptd26
Portland, Oregon
 
 
Posts: 27 since Jan 2010
Thanks: 3 given, 2 received


Big Mike View Post
Right, it uses OnMarketData and etc, it does not convert the existing database.

I used to run it on a separate box, a Quad Core Q6600, it would come close to maxing out NT 6.5 single-core/thread, it would be much better to do it on NT7. The SQL db server should be local if possible as well to reduce latency. I was recording for a half dozen instruments.

Mike

Big Mike, what's the granularity of the ticks you are storing? Are you storing OHLC or the individual transactions, like this:

7:24:47.664433 [TRADE] ESH0.CME 1 @ 1096.75
7:24:47.664433 [VOLUME] ESH0.CME 115924 @ 0.0
7:24:47.664623 [BID] ESH0.CME 176 @ 1096.75
7:24:48.065112 [TRADE] ESH0.CME 1 @ 1096.75
7:24:48.065112 [VOLUME] ESH0.CME 115925 @ 0.0
7:24:48.06566 [BID] ESH0.CME 175 @ 1096.75

I spent about 5 minutes looking for a way to do this in NinjaTrader, but was not able to figure it out. I am currently logging to text files, but I plan to store binary files by day. Maybe eventually I will move everything to something like Berkeley DB. Whatever supports the most data compression and reliability.

Reply With Quote
 
(login for full post details)
  #14 (permalink)
 phyzfer 
Chicago
 
Experience: Intermediate
Platform: NinjaTrader
Broker: Mirus/Zen-FIre
Trading: GC, ES
 
Posts: 73 since Mar 2010
Thanks: 40 given, 32 received

If I may add my $.02. Saving tickdata is an immensely expensive (data storage, latency and resource HOG) endeavor. Saving to the database, even with a buffer, would make any relational database grow exponentially fast. Further, querying that data with SQL would take forever and is time-prohibitive. I worked on a project where we listened to tick data and saved them to binary-files (similar to how NT saves tick data in NT7) using a buffer (like BM suggested). We then created binary read applications that would be able to translate and chart the data as needed (timebound bars, tick bars, etc).

I would recommend simply opening up a bunch of charts and have NT do all of the heavy lifting with regards to listening and saving the data to binary (have numerous NT charts open but minimized maybe?) and have it store the data in its binary format. Then create some sort of binary-reader that will be able to translate the saved binary files for later use (either in matlab, r or anything else).

Does that make sense? I guess my question is, if you simply open up a 1tick chart on NT, will the data be saved? I don't have a broker-connected NT at the moment, so I'm unable to test this.

Reply With Quote
 
(login for full post details)
  #15 (permalink)
 MXASJ 
Asia
 
Experience: Beginner
Platform: NinjaTrader, TOS
 
Posts: 798 since Jun 2009
Thanks: 109 given, 799 received

There is a 4 Gig limit on the size of the SQL CE database engine that NT7 uses, so that would need to be taken into consideration ie you couldn't record tick data of the whole CME Group product list for very long on a single instance of NT ..

A side project I haven't looked into yet is sync between SQL CE and SQL, but I'm looking at that more for getting an overview of multiple NT users positions than for storing tick data.

I'm leaning towards the "buy" camp in the "buy vs. build" for historical tick data if 1 second timestamps are OK.

phyzfer if you need live futures data for development (not trading) try this:

Zen-Fire. The Ultimate Trading Solution.

That will get you login credentials for 30 days or so which you can use with your existing NT installation.

Started this thread Reply With Quote
 
(login for full post details)
  #16 (permalink)
 phyzfer 
Chicago
 
Experience: Intermediate
Platform: NinjaTrader
Broker: Mirus/Zen-FIre
Trading: GC, ES
 
Posts: 73 since Mar 2010
Thanks: 40 given, 32 received

Doh! 4GB? I don't understand. I was under the impression if you opened up a 1-tick chart in NT, that it will save the data into the \Documents\NinjaTrader 7\db\tick folder. I guess that isn't correct then. Right?

If not, then going off of your strategy code, I'd create a binary-writer to register to the contracts and write to disk (via a buffer + some compression algorithms). Then the fun part. We'd create an interface (separate C# app) that would find, decompress and return the data in whatever format your calling program (matlab, r, NT even) would need.

I'd love to help with something like this. I think keeping tick data in a relational database makes no sense whatsoever. Querying the data will take forever and the db size will grow significantly.

Reply With Quote
 
(login for full post details)
  #17 (permalink)
 MXASJ 
Asia
 
Experience: Beginner
Platform: NinjaTrader, TOS
 
Posts: 798 since Jun 2009
Thanks: 109 given, 799 received

Where I get lost is if SQL CE is writing binaries to the file system and the DB is merely referencing those binaries... do the binaries themselves count towards the 4 Gig limit or are only the references (which should only be a few bytes) counted.

I'll be honest I'm waaaaaaaaay out of my depth on this one.

Started this thread Reply With Quote
 
(login for full post details)
  #18 (permalink)
 phyzfer 
Chicago
 
Experience: Intermediate
Platform: NinjaTrader
Broker: Mirus/Zen-FIre
Trading: GC, ES
 
Posts: 73 since Mar 2010
Thanks: 40 given, 32 received

Personally I don't THINK that NT uses the (SQL CE) database for any market data storage. I think they use it mostly for internal data (account performance, ATM strategies, etc).

I was under the impression that NT saved all Market Data in the Documents\NinjaTrader7\db folder. If you take a look, you'll see a Tick, Minute, Day subfolders that contain binary files (.ntd file extension). How'd these folders get populated?

Personally I'd want NT to save all tick-data for a product to disk whenever any chart is open.

Reply With Quote
 
(login for full post details)
  #19 (permalink)
 phyzfer 
Chicago
 
Experience: Intermediate
Platform: NinjaTrader
Broker: Mirus/Zen-FIre
Trading: GC, ES
 
Posts: 73 since Mar 2010
Thanks: 40 given, 32 received

I just got this from the NT's help document:

NinjaTrader Converts Real-Time Data into Historical Data
NinjaTrader stores real-time incoming tick data to your local PC if you have a Chart or Market Analyzer (must have an indicator column added) window open. This data can then be used as historical data. For example, if you open a chart and let it run all day long, the data collected today, will be available as historical data when you open the same chart tomorrow.


I take this to mean that NT is saving tick data on the machine (in the location I specified above) and makes that data available to NT. We'd simply need a way to 'convert' it back and traverse the data. I'd also like to place some sort of backup strategy in place on the machine that is running this data collection.





A colleague of mine suggested I look into this for storing market data:
HDF Group - HDF5

Has anyone used HDF5?

thanks!

Reply With Quote
 
(login for full post details)
  #20 (permalink)
 Trader.Jon 
Near the BEuTiFULL Horse Shoe
 
Experience: Beginner
Platform: NinjaTrader
Broker: MBTrading Dukascopy ZenFire
Trading: $EURUSD when it is trending
 
Trader.Jon's Avatar
 
Posts: 500 since Jul 2009
Thanks: 401 given, 184 received

At one time I was talking with a programmer to do this type of project for me: he stopped answering my emails ... cant think why he would do that LOL@me

"" The TREE Data Server captures real-time financial data from one or several datafeed services, archives data in a historical database, and makes both live and archived data available to client applications. The system can be used in real-time charting, an ATS, tick feed simulator, etc. or any situation in which multiple clients need real-time access to the tick stream or archived tick data. In addition, the archived tick stream and tick data are available for offline data analysis and backtesting. ""

Support for the code is one person ... but he does eventually answer email
tree.sourceforge.net | TREE Data Server

TJ

Reply With Quote
 
(login for full post details)
  #21 (permalink)
 phyzfer 
Chicago
 
Experience: Intermediate
Platform: NinjaTrader
Broker: Mirus/Zen-FIre
Trading: GC, ES
 
Posts: 73 since Mar 2010
Thanks: 40 given, 32 received

Trader.Jon,

Have you used this? I'm very interested and may be able to help him.
Looks like the last update was in 2008.

Reply With Quote
 
(login for full post details)
  #22 (permalink)
 Trader.Jon 
Near the BEuTiFULL Horse Shoe
 
Experience: Beginner
Platform: NinjaTrader
Broker: MBTrading Dukascopy ZenFire
Trading: $EURUSD when it is trending
 
Trader.Jon's Avatar
 
Posts: 500 since Jul 2009
Thanks: 401 given, 184 received

No: I have not made any steps other than connecting with the source code maintainer, and also trying to contract a local programmer to build a linux box with Barchart API connectivity to TREE. YES: I was trying to build exactly what MXASJ described in his 'Tick Database' initial post.

The programmer I actually met in real time (ie a local with many years experience varying from micocode controllers to neural nets) stopped responding to emails and phone calls after I passed on (to him) the TREE info and the Barchart.com (development tools logins) I had aquired for the project. I got discouraged and moved on. Too much to do: not enough time

The TREE source code provider has a contact info on the webpage (or I can pass it on to you [along with the only email I exchanged with him]): I contacted him to see if there was an API already built for TREE>>Barchart as I did not want to use the available IB API (as a data source IB is lacking as not all data is sent). Only the IB API is available.

Long term my plan was:
1. build the TREE server with a cluster or parallel configuration
2. use data for NT live (less stress on NT [lol] and historical for building neural net/CUDA
3. data exchange 24 hour delay basis for 'fills' with others that might have had disconnects etc

Reply With Quote
 
(login for full post details)
  #23 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received

I want to bring this thread alive again, and I'm going to cheat and move it out of the NinjaTrader section because I think the thread itself could be platform independent.

I'm not a DBA so could use some advice. As I previously wrote in post #2, I had previously used NinjaTrader to store tick data into a MySQL database, but it was expensive. And I haven't even booted up Ninja in well over a year, in fact I don't even have it installed anymore, so I need a new way to do this.

Right now I have a huge amount of data in a proprietary database (MultiCharts), but I'd like to decentralize it and just have my own private database so I can import and export to it as I please.

At this stage, I am not trying to interface with an API to record incoming ticks live as they come, although that naturally is the ultimate goal. For phase 1, I'd just like to import raw data which I've already got (exported from existing platform) in tick form into a SQL db.

I think the database also needs to be smart enough to handle bid/ask sequencing, which means we need more than just instrument, date, time, last trade, and volume --- we need bid and ask plus a tickid sequencing (like dtn iqfeed).

An example:


So, who wants to help build this? I'm looking for advice on the db structure, storage engine, layout etc. I'm happy to share the data I've accumulated in exchange (Elite only).

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #24 (permalink)
 Adamus 
London, UK
 
Experience: Beginner
Platform: NinjaTrader, home-grown Java
Broker: IB/IQFeed
Trading: EUR/USD
 
Adamus's Avatar
 
Posts: 1,085 since Dec 2010
Thanks: 471 given, 786 received

Yes Mike, I'm interested in doing this kind of thing.

I can definitely help on the table structure.

I'm not so confident I know enough to say that using an SQL database is definitively a good idea, given the comments above, but I don't see why not if the database tables are correctly indexed. However I think it would be easy to test performance to see whether it meets expectations.

What documentation method do you use for database schemas? I used to use entity relationship diagrams that could output SQL, like Visio, but more recently I just write the schema create sql.

You can discover what your enemy fears most by observing the means he uses to frighten you.
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #25 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received

Nothing so fancy, just the schema in sql directly. I guess I will need to see the "downside" with my own eyes for reasons to not use mysql, I really can't imagine that performance wouldn't be more than adequate. Main question for step 1 is just the table structure and if one storage engine makes sense over another (myisam, innodb, etc).

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #26 (permalink)
 Adamus 
London, UK
 
Experience: Beginner
Platform: NinjaTrader, home-grown Java
Broker: IB/IQFeed
Trading: EUR/USD
 
Adamus's Avatar
 
Posts: 1,085 since Dec 2010
Thanks: 471 given, 786 received

As far as I remember, innoDB gives you the ability to use relational integrity so you can lock down foreign keys to primary keys that actually exist. I think there are other engines that are better for faster access.

I always only used innoDB in team projects because it gives more control to keep the data in the database in order, by preventing anyone (inc. myself) from stuffing it full of orphan records, e.g. lots of records in the Contract table without a valid Market foreign key.

IIRC with innoDB, there's an extra indexing statement you have to add to apply the indices - no big deal though.

So how many tables are you talking? First off, do bid/ask ticks go in the same table as last ticks?

 
Code
create table TICK (
    ID integer,
    CONTRACT_ID integer,
    TIME datetime not null,
    LAST decimal(15,6),
    LAST_SIZE integer,
    BID decimal(15,6),
    ASK decimal(15,6),
    constraint TICK_PK primary key (ID),
    constraint TICK_CONTRACT_FK foreign key (CONTRACT_ID) references CONTRACT (ID)
);
That's a completely FIRST atttempt! I guess you also want running volume total in there too if it doesn't get calculated by the app as you go along - but I'd be tempted to leave it out since a lot of charts won't be intereested in the time frame used to calculate the current total and would calculate it anyway. That's just a guess, I've never worked with volume.

It looks like the tick id is going to be your primary key value - as long as you guarantee the tick id is always unique.

Extra question - date/time - I would put the timestamp with date and time in one column since that's what the datetime data type is designed for. It might be better to stick with seperate date and time fields though if most software wants them that way. I don't know if it would be possible to merge 2 seperate fields at import - it might require a seperate SQL operations afterwards.

You can discover what your enemy fears most by observing the means he uses to frighten you.
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
The following user says Thank You to Adamus for this post:
 
(login for full post details)
  #27 (permalink)
 NetTecture 
Szczecin
 
Experience: Intermediate
Platform: Ninja, writing own now
 
Posts: 212 since Mar 2010

I tried doing that for some months now on the side (besides a freaking 10 hour per day project) and I am giving up - makes zero sense. Well, at least for me. Storing ticks book 2 with all bid / ask changes for the complete cme group (around 600 million rows per day) is not somethign I think can be effectively done. Jsut to give you an idea - my current hardware is a dedicated SQ LServer ,16gb ram, 4 cores assigned (virtual machine, but I control the underlying hardware) with the storage consisting of 8 Velociraptors and one SSD.

The overhead is just too much.

I am now moving towards storing binary blobs for one base symbol + instrument type (example: YM-Futures) for a specific interval (one hour) and trying to use highly efficient extraction methods. This is a LOT more efficient in terms of storage

Reply With Quote
 
(login for full post details)
  #28 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received


NetTecture View Post
I tried doing that for some months now on the side (besides a freaking 10 hour per day project) and I am giving up - makes zero sense. Well, at least for me. Storing ticks book 2 with all bid / ask changes for the complete cme group (around 600 million rows per day) is not somethign I think can be effectively done. Jsut to give you an idea - my current hardware is a dedicated SQ LServer ,16gb ram, 4 cores assigned (virtual machine, but I control the underlying hardware) with the storage consisting of 8 Velociraptors and one SSD.

The overhead is just too much.

I am now moving towards storing binary blobs for one base symbol + instrument type (example: YM-Futures) for a specific interval (one hour) and trying to use highly efficient extraction methods. This is a LOT more efficient in terms of storage

Are you saying the load was too high?

Were you storing in real time, I assume? That is not required for me in phase 1, I am just importing.

If the issue were table scans, perhaps a unique DB per instrument instead of one gigantic db? Again, not a dba, which is why I'm looking for input.

The server is a Dual Xeon 5430, 24GB. I'm looking for input as to what the issue was on your side, and how many instruments you were archiving. I plan to only archive about a dozen instruments, perhaps that is the major difference between us.

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #29 (permalink)
 samurai 
CO
 
Experience: Intermediate
Platform: NinjaTrader
 
Posts: 29 since Sep 2009
Thanks: 228 given, 45 received

Another consideration is that MySQL timestamps do not support millisecond resolution. If millisecond resolution is needed, you cannot use a Timestamp field.

Instead, use a Decimal(17,3) field.

Here's a link describing a work around: Once upon a timestamp(milliseconds)…. MySQL Preacher

Reply With Quote
The following user says Thank You to samurai for this post:
 
(login for full post details)
  #30 (permalink)
 gomi 
Paris
 
Experience: None
Platform: NinjaTrader
 
Posts: 1,248 since Oct 2009
Thanks: 275 given, 4,421 received

I remember that on SQLServer you can use table partitioning and automatically split your tables on your disks.

Also remember data rows are stored on disk in the order of the clustered index, so it must be chosen wisely, depending on your queries.

To add on performance issues, I'm not sure SQL is the way to go, it just eats up too much space if you use the basic data types. So you end up implementing your own binary database inside an SQL database, like NetTecture, and then you're using the SQL server more or less like a file system.

You could try the easy solution (brute insert in SQL, dunno what SQL server you will use but I'm sure they all have some sort of bulk copy for massive inserts), if you have performance issues you can always think of a more optimized solution.

Reply With Quote
The following user says Thank You to gomi for this post:
 
(login for full post details)
  #31 (permalink)
 gomi 
Paris
 
Experience: None
Platform: NinjaTrader
 
Posts: 1,248 since Oct 2009
Thanks: 275 given, 4,421 received

Another idea is to use a NoSQL database. Cloud providers, Google, Amazon, Microsoft Azure, don't use SQL servers for their data, it's just too much overhead.

I might one day try to setup a tick database on my Azure account, could be fun

If you take a look how the Table service is organized (there's also a Blob and queue service) , it's
PartitionKey - RowKey - Object (max 64 kB, for more, use a link to a Blob object)

This strucuture is optimized to allow storage of billions of rows , actually your account can grow up to 100 TB of data, and if you never do a full scan and only query on PartitionKey -RowKey, performance is good.

So if you set up PartitionKey= instrument name, and RowKey = tick timestamp, you can easily build a database that wil store billions of ticks, and that on which you can run very speedy queries like "give me all the ticks on instrument xyz and between startdate and enddate".

I'm pretty sure there are also desktop implementations of NoSQL data servers, you can check NOSQL Databases

Reply With Quote
The following user says Thank You to gomi for this post:
 
(login for full post details)
  #32 (permalink)
sayfuji
Kiev
 
 
Posts: 6 since Mar 2011
Thanks: 0 given, 0 received

Please, trust my skills. For using tick history the better way is to use SQL. One time me and my guys created system, which needs to analyse historical tick data. SQL for our deals was the best (even better then Oracle).

Reply With Quote
 
(login for full post details)
  #33 (permalink)
MichaelE
Germany
 
 
Posts: 2 since Aug 2011
Thanks: 0 given, 0 received

I'm a software developer and found this forum entry via google and would like to share my knowledge and hope getting also information about other tick database systems or best practices. I'm currently developing a system which should be commercial. So the plan is to buy the streams as well as history data from stock data vendors and pay license and service fees. My knowledge base is currently very low in context of handling stock data in IT systems. But my knowledge is very high in software architectures (also with normal relational database use cases).

Current plan:
Using a relational database for all. The table is splitted into two separate tables. One key table and one table for the tick data. The key table contains references to other tables and provides a unique id which is referenced from the tick table. The key table contains ids like stock id, stock exchange id, currency id. The tick table contains the price, bid, ask and the time. The PK is the reference to the key table and the time field. The key table is only to reduce the amout of fields and so also the needed storage per tick (5NF).

All the analyses should be made on time frames like 1m, 5m, 1h, 1 day, and so on - not on a tick base. For performance improvements end of day jobs aggregate the data and stores this aggregated data into separate tables. Views are used to combine the day data and the data from the history time frame tables. The tick table is cleaned during the end of day job (runs when the referenced stock closes) and the data are stored as blobs. I think this would improve the performance a lot and keep the table space as less as possible without implementing blob handling for analyzes.

For the stream handling I would buffer the incoming ticks. The buffer is inserted if 500 - or 5000 ticks are reached or 500ms are bygone. This will reduce the transaction effort.

@Mike
Don't use a ID per tick. I think the performence is very very low with this strategy because the key generation effort is to high and after years (or weeks) you must override the keys because the limit is reached. By the way it is also unnecessary.

I would be deeply grateful for feedback from experienced users or developers.

Reply With Quote
 
(login for full post details)
  #34 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received


MichaelE View Post
All the analyses should be made on time frames like 1m, 5m, 1h, 1 day, and so on - not on a tick base.

.
.
.

Don't use a ID per tick. I think the performence is very very low with this strategy because the key generation effort is to high and after years (or weeks) you must override the keys because the limit is reached. By the way it is also unnecessary.

Tick based analysis is a requirement. The system needs to be good enough to capture the live data from the market, which is tick increment + bid/ask size.

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #35 (permalink)
MichaelE
Germany
 
 
Posts: 2 since Aug 2011
Thanks: 0 given, 0 received


Big Mike View Post
Tick based analysis is a requirement. The system needs to be good enough to capture the live data from the market, which is tick increment + bid/ask size.

I guess there is a misunderstanding. Why you need an increment? It is unnecessary because the time, as well as the stock, stock exchange and currency should be the PK. Also the total volume is unneccessary because it could be easily calculated using the quote size. (All my comments refer to your last screenshot).

Ok, if tick base analysis is your requirement (tick analyses are't mine), you must handle this with a big system. But to reduce the needed storage and improve the performance I would increase the level of normalization and would also work with blobs for historic data.

Reply With Quote
 
(login for full post details)
  #36 (permalink)
Kostagr33k
Manhattan NY usa
 
 
Posts: 3 since Nov 2011
Thanks: 0 given, 0 received

BigMike,

Out of curiousity did you get this working? If not I may have time Next week to talk about options.

Thanks

Kosta

Reply With Quote
 
(login for full post details)
  #37 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received


Kostagr33k View Post
BigMike,

Out of curiousity did you get this working? If not I may have time Next week to talk about options.

Thanks

Kosta

I had it working over 2 years ago with NinjaTrader. I posted the code somewhere on the NT forums, but it was long ago and probably easier to start over.

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #38 (permalink)
Kostagr33k
Manhattan NY usa
 
 
Posts: 3 since Nov 2011
Thanks: 0 given, 0 received

Mike,

What I was trying to ask is if you still were trying to figure out a more performance related solution that would scale, or if you decided not to move forward with the project full scale?


kosta

Reply With Quote
 
(login for full post details)
  #39 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received


Kostagr33k View Post
Mike,

What I was trying to ask is if you still were trying to figure out a more performance related solution that would scale, or if you decided not to move forward with the project full scale?


kosta

It is in the back of my mind, but not something I am working on.

Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
 
(login for full post details)
  #40 (permalink)
 Big Mike 
Site Administrator
Swing Trader
Data Scientist & DevOps
Manta, Ecuador
 
Experience: Advanced
Platform: Custom solution
Trading: Futures & Crypto
 
Big Mike's Avatar
 
Posts: 50,068 since Jun 2009
Thanks: 32,534 given, 98,493 received

Have documented entire process here, ongoing:



Mike

We're here to help -- just ask

For the best trading education, watch our webinars
Searching for trading reviews? Review this list

Follow us on Twitter, YouTube, and Facebook

Support our community as an Elite Member:
https://futures.io/elite/

Visit other sites? Please spread the word about your experience with our community!
Follow me on Twitter Visit my futures io Trade Journal Reply With Quote
The following user says Thank You to Big Mike for this post:
 
(login for full post details)
  #41 (permalink)
philrose
Panama Panama
 
 
Posts: 1 since Jul 2012
Thanks: 0 given, 0 received


Big Mike View Post
Have documented entire process here, ongoing:

Mike

After completing your very strict registration process, I am still not able to see the post referenced above unless I become and "Elite" member

Reply With Quote


futures io Trading Community Platforms and Indicators > Tick Database Storage


Last Updated on July 20, 2012


Upcoming Webinars and Events

NinjaTrader Indicator Challenge!

Ongoing

Journal Challenge w/$1,800 in prizes!

May 7

The Cold Hard Truth: Maybe I Am Not Good Enough w/Chris Gray @ Earn2Trade

Elite only
     



Copyright © 2021 by futures io, s.a., Av Ricardo J. Alfaro, Century Tower, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada), info@futures.io
All information is for educational use only and is not investment advice.
There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
no new posts