NexusFi: Find Your Edge


Home Menu

 





Time Series database for tick and trade data.


Discussion in Platforms and Indicators

Updated
      Top Posters
    1. looks_one gregid with 9 posts (10 thanks)
    2. looks_two Jasonnator with 7 posts (8 thanks)
    3. looks_3 stocksharp with 4 posts (4 thanks)
    4. looks_4 ClutchAce with 3 posts (0 thanks)
      Best Posters
    1. looks_one artemiso with 1.7 thanks per post
    2. looks_two gregid with 1.1 thanks per post
    3. looks_3 Jasonnator with 1.1 thanks per post
    4. looks_4 stocksharp with 1 thanks per post
    1. trending_up 20,924 views
    2. thumb_up 35 thanks given
    3. group 25 followers
    1. forum 40 posts
    2. attach_file 2 attachments




 
Search this Thread

Time Series database for tick and trade data.

  #21 (permalink)
 
gregid's Avatar
 gregid 
Wrocław, Poland
 
Experience: Intermediate
Platform: NinjaTrader, Racket
Trading: Ockham's razor
Posts: 650 since Aug 2009
Thanks Given: 320
Thanks Received: 623


Jasonnator View Post
Set up PostgreSQL according to the attachment and you're set.

Interesting this definitely brings a battle tested option to the table.

Have you personally had any experience with PostgreSQL as time series storage or are you considering this as an option?

Started this thread Reply With Quote

Can you help answer these questions
from other members on NexusFi?
New Micros: Ultra 10-Year & Ultra T-Bond -- Live Now
Treasury Notes and Bonds
Better Renko Gaps
The Elite Circle
NexusFi Journal Challenge - April 2024
Feedback and Announcements
Futures True Range Report
The Elite Circle
Deepmoney LLM
Elite Quantitative GenAI/LLM
 
Best Threads (Most Thanked)
in the last 7 days on NexusFi
Get funded firms 2023/2024 - Any recommendations or word …
61 thanks
Funded Trader platforms
39 thanks
NexusFi site changelog and issues/problem reporting
26 thanks
The Program
18 thanks
GFIs1 1 DAX trade per day journal
18 thanks
  #22 (permalink)
 
Jasonnator's Avatar
 Jasonnator 
Denver, Colorado United States
 
Experience: Intermediate
Platform: NT8 + Custom
Broker: NT Brokerage, Kinetick, IQFeed, Interactive Brokers
Trading: ES
Posts: 159 since Dec 2014
Thanks Given: 40
Thanks Received: 166

Hey Greg, been a while.

Yes, I have used PostgreSQL and it's what I've decided on for my custom platform. It crushed everything else I tested. It is truly enterprise grade. For a tick store, I couldn't find anything faster.

Reply With Quote
Thanked by:
  #23 (permalink)
 
gregid's Avatar
 gregid 
Wrocław, Poland
 
Experience: Intermediate
Platform: NinjaTrader, Racket
Trading: Ockham's razor
Posts: 650 since Aug 2009
Thanks Given: 320
Thanks Received: 623



Jasonnator View Post
Hey Greg, been a while.

Yes, I have used PostgreSQL and it's what I've decided on for my custom platform. It crushed everything else I tested. It is truly enterprise grade. For a tick store, I couldn't find anything faster.



Excellent to hear and thanks for stopping by!

Started this thread Reply With Quote
Thanked by:
  #24 (permalink)
 
stocksharp's Avatar
 stocksharp 
Moscow, RU
 
Experience: Advanced
Platform: StockSharp
Trading: ES
Posts: 38 since Mar 2014
Thanks Given: 3
Thanks Received: 13

Will agree with @Jasonnator - PostgreSQL has the same perf like a top most db system from IBM or MS.

But. The key advantages of "Big" databases - to provide flexible analytics and data mining features. Especially with big data. It is not enough just to make a fast system - all db has almost the same performance for primitive queries. The system should provide something more.

Good news - for trading it almost useless. No really. Most of all operations - select batch (for example, to do some backtesting) or updating toxic data. Or put the newest rowset. That is all!

In out app Hydra we started from DB (SQL Express was as underlying storage). But it shown us extremely poor performance in compare with file system. Maybe it will be faster with using some features like Clustering DB, but what the price for that?

Also one of the major questing - who will be use a trading app. If company - it is no any problem to maintain the db by special team. In case of one-man band" any human readable format like CSV can give odds to any DBMS - easy to read, supported by Excel, easy to load to any programming language, easy to modify (just open in Notepad and move forward).

Hope it will help.

Follow me on Twitter Reply With Quote
Thanked by:
  #25 (permalink)
 ClutchAce 
Cookeville, TN
 
Experience: Advanced
Platform: Sierra Chart, IB, Python
Trading: NQ, DAX, TOPIX
Frequency: Daily
Duration: Hours
Posts: 71 since Oct 2011
Thanks Given: 18
Thanks Received: 21


stocksharp View Post
Will agree with @Jasonnator - PostgreSQL has the same perf like a top most db system from IBM or MS.

But. The key advantages of "Big" databases - to provide flexible analytics and data mining features. Especially with big data. It is not enough just to make a fast system - all db has almost the same performance for primitive queries. The system should provide something more.

Good news - for trading it almost useless. No really. Most of all operations - select batch (for example, to do some backtesting) or updating toxic data. Or put the newest rowset. That is all!

Hello stocksharp - so just to confirm, you're stating that PostgreSQL is useless for financial market data, because of read operation limitations?

Reply With Quote
  #26 (permalink)
 ClutchAce 
Cookeville, TN
 
Experience: Advanced
Platform: Sierra Chart, IB, Python
Trading: NQ, DAX, TOPIX
Frequency: Daily
Duration: Hours
Posts: 71 since Oct 2011
Thanks Given: 18
Thanks Received: 21


Jasonnator View Post
Hey Greg, been a while.

Yes, I have used PostgreSQL and it's what I've decided on for my custom platform. It crushed everything else I tested. It is truly enterprise grade. For a tick store, I couldn't find anything faster.

Hi Jason, can you elaborate a bit on your data request schema as well as current storage capacity req'd and where it's stored (home PC and/or cloud e.g. AWS)?

Also, if you're using a home server, what's your hardware specs (HDD capacity total, RAM, CPU)?

My developer for a backtesting & live trading bot has elected to use PostgreSQL as well, per my indication that I'd like to store tick-level data for 3 months' worth across 30 U.S. futures instruments (plus 8-10 years of 1-hour OHLC bars). I figured <10TB of total storage capacity needed, but the real cost (>$300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS. So, now we're looking toward a home server.

Reply With Quote
  #27 (permalink)
 i960 
San Francisco, CA
 
Experience: Advanced
Platform: SC, eSignal
Broker: IB
Trading: Spreads
Posts: 46 since Jan 2015
Thanks Given: 44
Thanks Received: 43


stocksharp View Post
Will agree with @Jasonnator - PostgreSQL has the same perf like a top most db system from IBM or MS.

But. The key advantages of "Big" databases - to provide flexible analytics and data mining features. Especially with big data. It is not enough just to make a fast system - all db has almost the same performance for primitive queries. The system should provide something more.

This is not completely accurate. Databases like Berkeley DB (BDB) are specifically about starting with the primitives, designing the underlying record format, and implementing everything around that (including the level of abstraction needed and even if relational type support is needed). Typical RDBMS like Postgres, Oracle, or MySQL have that design baked in based on the underlying storage formats provided by the software.

This is important for things like tick data whereby storage per tick is of pretty big importance and can have a direct bearing on performance (the more data that can fit in memory and/or the less backing store needing to be seeked, the better).

Reply With Quote
  #28 (permalink)
 
Jasonnator's Avatar
 Jasonnator 
Denver, Colorado United States
 
Experience: Intermediate
Platform: NT8 + Custom
Broker: NT Brokerage, Kinetick, IQFeed, Interactive Brokers
Trading: ES
Posts: 159 since Dec 2014
Thanks Given: 40
Thanks Received: 166


ClutchAce View Post
Hi Jason, can you elaborate a bit on your data request schema as well as current storage capacity req'd and where it's stored (home PC and/or cloud e.g. AWS)?

Also, if you're using a home server, what's your hardware specs (HDD capacity total, RAM, CPU)?

My developer for a backtesting & live trading bot has elected to use PostgreSQL as well, per my indication that I'd like to store tick-level data for 3 months' worth across 30 U.S. futures instruments (plus 8-10 years of 1-hour OHLC bars). I figured <10TB of total storage capacity needed, but the real cost (>$300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS. So, now we're looking toward a home server.

I'll apologize in advance because my answer will be somewhat vague but that's due to a proprietary schema which I cannot disclose. I will say that I did testing on everything from an ancient Q6600 to i7 2600K to some fairly esoteric Xeon systems. Surprisingly, we learned the most from the much older and slower systems. The new hardware is just so blazing fast that some optimizing is not as easily discernable. Now obviously certain metrics must be interpreted in the context of system limitations and bottlenecks.

A solid understanding of instructions, clock cycles, fetching, pre fetching, cpu registers, etc is what is really required to truly optimize for something like a tick data store. Forget managed language access, you'll leave 30%+ on the table at least before hitting your database or utilizing what you get from it. Unmanaged calls, pointers, pinning, and advanced performant-oriented data structures are where the true speed comes from. Otherwise, you'll give back most, if not all, of anything you gain from database optimization. So essentially, the database can be tuned to the hilt but the way you load and use that data must be extremely efficient and fairly low level when your dealing with huge amounts of data like a tick store.

I hope that helps and is clearer that, say, mud. I wish you and your devs the best of luck. This is no one man show when done right but the capabilities it'll yield break all restraints and allow quant level strategy development and testing. I have no doubt there are other ways (perhaps even better) of accomplishing a tick storage system. I am however offering insight on how I did it in an actual enterprise grade implementation,nit just based on a white paper or opinion.

Oh, one point I didnt address is my usage. I actually run my tick storage locally instead of in the cloud for maximum performance. I don't want to be limited by (although 1Gbps) my Internet connection. My raid setup is optimized for read operations so I never wait more than a matter of seconds for as much tick data as I need for testing.

Reply With Quote
Thanked by:
  #29 (permalink)
 
Jasonnator's Avatar
 Jasonnator 
Denver, Colorado United States
 
Experience: Intermediate
Platform: NT8 + Custom
Broker: NT Brokerage, Kinetick, IQFeed, Interactive Brokers
Trading: ES
Posts: 159 since Dec 2014
Thanks Given: 40
Thanks Received: 166


ClutchAce View Post
Hello stocksharp - so just to confirm, you're stating that PostgreSQL is useless for financial market data, because of read operation limitations?

@stocksharp brings up a great point of you must consider the end user. Do you need multiple concurrent connections? Is it just you? If it's just you, I'd agree to go with a more human readable format like CSV or json.

As far as operations performance, that is extremely (I'd argue entirely) dependent on design. For example, one of my benchmarks was loading AND replaying the entire NASDAQ 100 plus the top 6 futures contracts, every single tick for 4 hours during peak volume (before Chicago lunch). I am well under 10 seconds on my run of the mill i7 2600K, 32GB RAM, and 2 SSD in raid 0. May not sound all that great but my system is not overclocked and this is just under 5GB of data. This also includes doing numerous logic checks to ensure data integrity and proper sequence. The best part was this pipeline scaled 1:1 based on system performance so a system which was 50% slower, performance was half, systems that were 5-10x faster than my workstation were 5-10x faster. Design, design, design. It'll "SHOW YOU THE MONEY!!!"

Reply With Quote
Thanked by:
  #30 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685


I'm going to address a few random things brought up in this thread.

1. JSON
Definitely don't. JSON is the only serialization format that I would absolutely recommend against.

JSON doesn't have explicit typing (hence no bit alignment), which inflates your storage requirements and makes compression ineffective. Parsing JSON is clunky and slow in most languages (C++, Python, bash). There's no clearly defined standard so you have to specify properties in an application-specific header format, which is troublesome to maintain. And because of the way it accepts nested braces, writing a streaming application that consumes JSON downstream is a pain though not impossible. JSON is an OK serialization format for configs, but not for market data.

2. TeaFiles vs CSV vs Feather vs Mongo (Arctic) vs Influx vs Postgres vs MS SQL...
This is the most useless debate ever. You can't wrong with anything (except JSON) when you're talking about OHLC data for <500 symbols even going back 80 years. For example, the CRSP database stores 80-90 years of OHLC data on 10,000+ symbols and does its job entirely on a relational DBMS and commodity hardware. It's not fast, but it gets its job done. Focus on ease of use and familiarity rather than performance or storage requirements.

It costs 1k~ to get a PCIe storage device that bumps even the most naively designed SQL database into the 10^5 TPS region. By contrast, it takes weeks of effort to design a scalable schema and do some CI to collect data points on what indices you want. It takes weeks of effort to install an exotic DBMS that you've never used before, familiarize yourself with its driver/DDL/QL and configure it for production use. If you even have the extra cash to be trading, surely you have a reasonable flow of income that a few weeks of your time is more valuable than $1k to you.

The time I'd start thinking about performance or storage optimizations is when (1) you are clearly I/O bound and (2) you have multiple hosts consuming the data at the same time. That means even if you have a single 40+ core workstation, it's still too early to start thinking about optimizations.

Reply With Quote
Thanked by:




Last Updated on May 21, 2019


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts