[Other] Time Series database for tick and trade data. - Platforms and Indicators | futures.io
futures io futures trading


Time Series database for tick and trade data.
Started: by gregid Views / Replies:4,016 / 36
Last Reply: Attachments:2

Welcome to futures io.

Welcome, Guest!

This forum was established to help traders (especially futures traders) by openly sharing indicators, strategies, methods, trading journals and discussing the psychology of trading.

We are fundamentally different than most other trading forums:
  • We work extremely hard to keep things positive on our forums.
  • We do not tolerate rude behavior, trolling, or vendor advertising in posts.
  • We firmly believe in openness and encourage sharing. The holy grail is within you, it is not something tangible you can download.
  • We expect our members to participate and become a part of the community. Help yourself by helping others.


You'll need to register in order to view the content of the threads and start contributing to our community. It's free and simple, and we will never resell your private information.

-- Big Mike

Reply
 2  
 
Thread Tools Search this Thread
 

Time Series database for tick and trade data.

  #21 (permalink)
Elite Member
Wrocław, Poland
 
Futures Experience: Intermediate
Platform: NinjaTrader, Racket
Favorite Futures: Ockham's razor
 
gregid's Avatar
 
Posts: 651 since Aug 2009
Thanks: 321 given, 601 received


Jasonnator View Post
Set up PostgreSQL according to the attachment and you're set.

Interesting this definitely brings a battle tested option to the table.

Have you personally had any experience with PostgreSQL as time series storage or are you considering this as an option?

Reply With Quote
 
  #22 (permalink)
Elite Member
Jacksonville, Florida United States
 
Futures Experience: Intermediate
Platform: Fully custom
Broker/Data: Optimus Futures, Interactive Brokers
Favorite Futures: Profitable ones
 
Jasonnator's Avatar
 
Posts: 67 since Dec 2014
Thanks: 18 given, 41 received

Hey Greg, been a while.

Yes, I have used PostgreSQL and it's what I've decided on for my custom platform. It crushed everything else I tested. It is truly enterprise grade. For a tick store, I couldn't find anything faster.

Reply With Quote
The following 2 users say Thank You to Jasonnator for this post:
 
  #23 (permalink)
Elite Member
Wrocław, Poland
 
Futures Experience: Intermediate
Platform: NinjaTrader, Racket
Favorite Futures: Ockham's razor
 
gregid's Avatar
 
Posts: 651 since Aug 2009
Thanks: 321 given, 601 received



Jasonnator View Post
Hey Greg, been a while.

Yes, I have used PostgreSQL and it's what I've decided on for my custom platform. It crushed everything else I tested. It is truly enterprise grade. For a tick store, I couldn't find anything faster.



Excellent to hear and thanks for stopping by!

Reply With Quote
The following user says Thank You to gregid for this post:
 
  #24 (permalink)
StockSharp dev
Moscow, RU
 
Futures Experience: Advanced
Platform: StockSharp
Favorite Futures: ES
 
stocksharp's Avatar
 
Posts: 28 since Mar 2014
Thanks: 3 given, 12 received

Will agree with @Jasonnator - PostgreSQL has the same perf like a top most db system from IBM or MS.

But. The key advantages of "Big" databases - to provide flexible analytics and data mining features. Especially with big data. It is not enough just to make a fast system - all db has almost the same performance for primitive queries. The system should provide something more.

Good news - for trading it almost useless. No really. Most of all operations - select batch (for example, to do some backtesting) or updating toxic data. Or put the newest rowset. That is all!

In out app Hydra we started from DB (SQL Express was as underlying storage). But it shown us extremely poor performance in compare with file system. Maybe it will be faster with using some features like Clustering DB, but what the price for that?

Also one of the major questing - who will be use a trading app. If company - it is no any problem to maintain the db by special team. In case of one-man band" any human readable format like CSV can give odds to any DBMS - easy to read, supported by Excel, easy to load to any programming language, easy to modify (just open in Notepad and move forward).

Hope it will help.

Reply With Quote
The following 3 users say Thank You to stocksharp for this post:
 
  #25 (permalink)
Trading Apprentice
Cookeville, TN
 
Futures Experience: Advanced
Platform: R|Trader, Thinkorswim
Favorite Futures: CL, RB, 6E, ZB, DX
 
Posts: 36 since Oct 2011
Thanks: 9 given, 12 received

Futures Edge on FIO

What value do you place on the webinars on FIO?

 

stocksharp View Post
Will agree with @Jasonnator - PostgreSQL has the same perf like a top most db system from IBM or MS.

But. The key advantages of "Big" databases - to provide flexible analytics and data mining features. Especially with big data. It is not enough just to make a fast system - all db has almost the same performance for primitive queries. The system should provide something more.

Good news - for trading it almost useless. No really. Most of all operations - select batch (for example, to do some backtesting) or updating toxic data. Or put the newest rowset. That is all!

Hello stocksharp - so just to confirm, you're stating that PostgreSQL is useless for financial market data, because of read operation limitations?

Reply With Quote
 
  #26 (permalink)
Trading Apprentice
Cookeville, TN
 
Futures Experience: Advanced
Platform: R|Trader, Thinkorswim
Favorite Futures: CL, RB, 6E, ZB, DX
 
Posts: 36 since Oct 2011
Thanks: 9 given, 12 received


Jasonnator View Post
Hey Greg, been a while.

Yes, I have used PostgreSQL and it's what I've decided on for my custom platform. It crushed everything else I tested. It is truly enterprise grade. For a tick store, I couldn't find anything faster.

Hi Jason, can you elaborate a bit on your data request schema as well as current storage capacity req'd and where it's stored (home PC and/or cloud e.g. AWS)?

Also, if you're using a home server, what's your hardware specs (HDD capacity total, RAM, CPU)?

My developer for a backtesting & live trading bot has elected to use PostgreSQL as well, per my indication that I'd like to store tick-level data for 3 months' worth across 30 U.S. futures instruments (plus 8-10 years of 1-hour OHLC bars). I figured <10TB of total storage capacity needed, but the real cost (>$300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS. So, now we're looking toward a home server.


Last edited by ClutchAce; November 5th, 2016 at 03:47 PM. Reason: added detail
Reply With Quote
 
  #27 (permalink)
Elite Member
San Francisco, CA
 
Futures Experience: Advanced
Platform: SC, eSignal
Broker/Data: IB
Favorite Futures: Spreads
 
Posts: 44 since Jan 2015
Thanks: 43 given, 36 received


stocksharp View Post
Will agree with @Jasonnator - PostgreSQL has the same perf like a top most db system from IBM or MS.

But. The key advantages of "Big" databases - to provide flexible analytics and data mining features. Especially with big data. It is not enough just to make a fast system - all db has almost the same performance for primitive queries. The system should provide something more.

This is not completely accurate. Databases like Berkeley DB (BDB) are specifically about starting with the primitives, designing the underlying record format, and implementing everything around that (including the level of abstraction needed and even if relational type support is needed). Typical RDBMS like Postgres, Oracle, or MySQL have that design baked in based on the underlying storage formats provided by the software.

This is important for things like tick data whereby storage per tick is of pretty big importance and can have a direct bearing on performance (the more data that can fit in memory and/or the less backing store needing to be seeked, the better).

Reply With Quote
 
  #28 (permalink)
Elite Member
Jacksonville, Florida United States
 
Futures Experience: Intermediate
Platform: Fully custom
Broker/Data: Optimus Futures, Interactive Brokers
Favorite Futures: Profitable ones
 
Jasonnator's Avatar
 
Posts: 67 since Dec 2014
Thanks: 18 given, 41 received


ClutchAce View Post
Hi Jason, can you elaborate a bit on your data request schema as well as current storage capacity req'd and where it's stored (home PC and/or cloud e.g. AWS)?

Also, if you're using a home server, what's your hardware specs (HDD capacity total, RAM, CPU)?

My developer for a backtesting & live trading bot has elected to use PostgreSQL as well, per my indication that I'd like to store tick-level data for 3 months' worth across 30 U.S. futures instruments (plus 8-10 years of 1-hour OHLC bars). I figured <10TB of total storage capacity needed, but the real cost (>$300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS. So, now we're looking toward a home server.

I'll apologize in advance because my answer will be somewhat vague but that's due to a proprietary schema which I cannot disclose. I will say that I did testing on everything from an ancient Q6600 to i7 2600K to some fairly esoteric Xeon systems. Surprisingly, we learned the most from the much older and slower systems. The new hardware is just so blazing fast that some optimizing is not as easily discernable. Now obviously certain metrics must be interpreted in the context of system limitations and bottlenecks.

A solid understanding of instructions, clock cycles, fetching, pre fetching, cpu registers, etc is what is really required to truly optimize for something like a tick data store. Forget managed language access, you'll leave 30%+ on the table at least before hitting your database or utilizing what you get from it. Unmanaged calls, pointers, pinning, and advanced performant-oriented data structures are where the true speed comes from. Otherwise, you'll give back most, if not all, of anything you gain from database optimization. So essentially, the database can be tuned to the hilt but the way you load and use that data must be extremely efficient and fairly low level when your dealing with huge amounts of data like a tick store.

I hope that helps and is clearer that, say, mud. I wish you and your devs the best of luck. This is no one man show when done right but the capabilities it'll yield break all restraints and allow quant level strategy development and testing. I have no doubt there are other ways (perhaps even better) of accomplishing a tick storage system. I am however offering insight on how I did it in an actual enterprise grade implementation,nit just based on a white paper or opinion.

Oh, one point I didnt address is my usage. I actually run my tick storage locally instead of in the cloud for maximum performance. I don't want to be limited by (although 1Gbps) my Internet connection. My raid setup is optimized for read operations so I never wait more than a matter of seconds for as much tick data as I need for testing.

Reply With Quote
The following user says Thank You to Jasonnator for this post:
 
  #29 (permalink)
Elite Member
Jacksonville, Florida United States
 
Futures Experience: Intermediate
Platform: Fully custom
Broker/Data: Optimus Futures, Interactive Brokers
Favorite Futures: Profitable ones
 
Jasonnator's Avatar
 
Posts: 67 since Dec 2014
Thanks: 18 given, 41 received


ClutchAce View Post
Hello stocksharp - so just to confirm, you're stating that PostgreSQL is useless for financial market data, because of read operation limitations?

@stocksharp brings up a great point of you must consider the end user. Do you need multiple concurrent connections? Is it just you? If it's just you, I'd agree to go with a more human readable format like CSV or json.

As far as operations performance, that is extremely (I'd argue entirely) dependent on design. For example, one of my benchmarks was loading AND replaying the entire NASDAQ 100 plus the top 6 futures contracts, every single tick for 4 hours during peak volume (before Chicago lunch). I am well under 10 seconds on my run of the mill i7 2600K, 32GB RAM, and 2 SSD in raid 0. May not sound all that great but my system is not overclocked and this is just under 5GB of data. This also includes doing numerous logic checks to ensure data integrity and proper sequence. The best part was this pipeline scaled 1:1 based on system performance so a system which was 50% slower, performance was half, systems that were 5-10x faster than my workstation were 5-10x faster. Design, design, design. It'll "SHOW YOU THE MONEY!!!"

Reply With Quote
The following user says Thank You to Jasonnator for this post:
 
  #30 (permalink)
Elite Member
Manchester, NH
 
Futures Experience: Beginner
Platform: thinkorswim
Broker/Data: TD Ameritrade
Favorite Futures: Stocks
 
Posts: 882 since Jul 2012
Thanks: 595 given, 1,761 received


I'm going to address a few random things brought up in this thread.

1. JSON
Definitely don't. JSON is the only serialization format that I would absolutely recommend against.

JSON doesn't have explicit typing (hence no bit alignment), which inflates your storage requirements and makes compression ineffective. Parsing JSON is clunky and slow in most languages (C++, Python, bash). There's no clearly defined standard so you have to specify properties in an application-specific header format, which is troublesome to maintain. And because of the way it accepts nested braces, writing a streaming application that consumes JSON downstream is a pain though not impossible. JSON is an OK serialization format for configs, but not for market data.

2. TeaFiles vs CSV vs Feather vs Mongo (Arctic) vs Influx vs Postgres vs MS SQL...
This is the most useless debate ever. You can't wrong with anything (except JSON) when you're talking about OHLC data for <500 symbols even going back 80 years. For example, the CRSP database stores 80-90 years of OHLC data on 10,000+ symbols and does its job entirely on a relational DBMS and commodity hardware. It's not fast, but it gets its job done. Focus on ease of use and familiarity rather than performance or storage requirements.

It costs 1k~ to get a PCIe storage device that bumps even the most naively designed SQL database into the 10^5 TPS region. By contrast, it takes weeks of effort to design a scalable schema and do some CI to collect data points on what indices you want. It takes weeks of effort to install an exotic DBMS that you've never used before, familiarize yourself with its driver/DDL/QL and configure it for production use. If you even have the extra cash to be trading, surely you have a reasonable flow of income that a few weeks of your time is more valuable than $1k to you.

The time I'd start thinking about performance or storage optimizations is when (1) you are clearly I/O bound and (2) you have multiple hosts consuming the data at the same time. That means even if you have a single 40+ core workstation, it's still too early to start thinking about optimizations.


Last edited by artemiso; November 6th, 2016 at 01:09 AM.
Reply With Quote
The following 2 users say Thank You to artemiso for this post:

Reply



futures io > > > [Other]       Time Series database for tick and trade data.

Thread Tools Search this Thread
Search this Thread:

Advanced Search



Upcoming Webinars and Events (4:30PM ET unless noted)

PTMC Protrader Platform First Look w/Sergey Sokolov

Elite only

Spring Grains Outlook w/Sean Lusk @ Walsh Trading

Elite only

FIO Video Journal Challenge featuring NinjaTrader ($2,000+ of prizes)

April

Process above all else w/Anthony Crudele @ Futures Radio Show

Elite only

Machine Learning - Quantitative Trading w/Martin Froehler @ Quantiacs

Elite only

Market Dynamics w/Peter Davies @ Jigsaw Trading

Elite only

Ask Me Anything w/Patrick Rooney @ Trading Technologies

Apr 18

Ask Me Anything w/FuturesTrader71

Apr 19

Machine Learning w/Kris Longmore

Elite only

Market Analysis w/Dave Forss

Apr 25

Introducing iSystems with Stage 5 Trading

Apr 27
     

Similar Threads
Thread Thread Starter Forum Replies Last Post
CTA - Series 3, Series 7, Series 9, Series 10, Series 56 NASD exams certifications Big Mike Traders Hideout 41 January 15th, 2015 09:27 AM
Tick data for ES a few years in time elitetradernyc NinjaTrader Programming 2 May 6th, 2013 04:23 PM
DataBase of TICK Data bomberone1 The Elite Circle 4 June 17th, 2012 09:31 PM
Is the 6E 12.50 a tick? Best time to trade it Texas time? skyfly Currency Futures 7 August 7th, 2011 09:49 AM
PLT_NTDataCollect store bid ask tick data to its local database? rcabri Vendors and Product Reviews 3 December 27th, 2010 10:00 AM


All times are GMT -4. The time now is 01:25 PM.

no new posts
Page generated 2017-03-26 in 0.15 seconds with 20 queries on phoenix via your IP 54.205.126.164