NexusFi: Find Your Edge


Home Menu

 





Time Series database for tick and trade data.


Discussion in Platforms and Indicators

Updated
      Top Posters
    1. looks_one gregid with 9 posts (10 thanks)
    2. looks_two Jasonnator with 7 posts (8 thanks)
    3. looks_3 stocksharp with 4 posts (4 thanks)
    4. looks_4 artemiso with 3 posts (5 thanks)
      Best Posters
    1. looks_one artemiso with 1.7 thanks per post
    2. looks_two gregid with 1.1 thanks per post
    3. looks_3 Jasonnator with 1.1 thanks per post
    4. looks_4 stocksharp with 1 thanks per post
    1. trending_up 20,946 views
    2. thumb_up 35 thanks given
    3. group 25 followers
    1. forum 40 posts
    2. attach_file 2 attachments




 
Search this Thread

Time Series database for tick and trade data.

  #31 (permalink)
 
gregid's Avatar
 gregid 
Wrocław, Poland
 
Experience: Intermediate
Platform: NinjaTrader, Racket
Trading: Ockham's razor
Posts: 650 since Aug 2009
Thanks Given: 320
Thanks Received: 623


artemiso View Post
JSON is the only serialization format that I would absolutely recommend against.

Not trying to be facetious, but you can still do even worse than JSON... ehmm... YAML...

Started this thread Reply With Quote
Thanked by:

Can you help answer these questions
from other members on NexusFi?
Deepmoney LLM
Elite Quantitative GenAI/LLM
Futures True Range Report
The Elite Circle
NT7 Indicator Script Troubleshooting - Camarilla Pivots
NinjaTrader
NexusFi Journal Challenge - April 2024
Feedback and Announcements
Exit Strategy
NinjaTrader
 
Best Threads (Most Thanked)
in the last 7 days on NexusFi
Get funded firms 2023/2024 - Any recommendations or word …
60 thanks
Funded Trader platforms
43 thanks
NexusFi site changelog and issues/problem reporting
24 thanks
GFIs1 1 DAX trade per day journal
22 thanks
The Program
19 thanks
  #32 (permalink)
 
stocksharp's Avatar
 stocksharp 
Moscow, RU
 
Experience: Advanced
Platform: StockSharp
Trading: ES
Posts: 38 since Mar 2014
Thanks Given: 3
Thanks Received: 13


ClutchAce View Post
I figured <10TB of total storage capacity needed, but the real cost (>$300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS.

To keep all data on the same region (and use it in non peak hours) will decrease a total price. Transfer data in bound of any region should be free.

Follow me on Twitter Reply With Quote
  #33 (permalink)
 
stocksharp's Avatar
 stocksharp 
Moscow, RU
 
Experience: Advanced
Platform: StockSharp
Trading: ES
Posts: 38 since Mar 2014
Thanks Given: 3
Thanks Received: 13



ClutchAce View Post
Hello stocksharp - so just to confirm, you're stating that PostgreSQL is useless for financial market data, because of read operation limitations?

We started from MS SQL.

Just a simple comparison. To load data (ticks, candles - irrelevant) from local DB (using named pipes as a fastest inter-process communication) will be slower in 100-200 times than read it from regular CSV file.

Market data is not oriented for relation databases. First of all - there is no any relations. The pricing data like a streaming video - oriented on raw data format (text or binary). The fastest way - upload 1-2 weeks in tick data and use it memory.

Wanna to upload it into some cloud services? Pretty easy. We use AWS for iteration backtesting (optimization). Allocation at the same time ~20 servers with replicated market data. Cheapest data storage - S3. Replicating it to EC2. As a blobs for sure. No any SQL queries or something like that can kill the performance.

DB has a great potential but not with a market data and backtesting.

Follow me on Twitter Reply With Quote
Thanked by:
  #34 (permalink)
 
Jasonnator's Avatar
 Jasonnator 
Denver, Colorado United States
 
Experience: Intermediate
Platform: NT8 + Custom
Broker: NT Brokerage, Kinetick, IQFeed, Interactive Brokers
Trading: ES
Posts: 159 since Dec 2014
Thanks Given: 40
Thanks Received: 166

There is some fantastic real-world experience coming out in this thread! Regardless of the approach, there is information in here for a wide array of users. Great thread guys.

Reply With Quote
  #35 (permalink)
 ClutchAce 
Cookeville, TN
 
Experience: Advanced
Platform: Sierra Chart, IB, Python
Trading: NQ, DAX, TOPIX
Frequency: Daily
Duration: Hours
Posts: 71 since Oct 2011
Thanks Given: 18
Thanks Received: 21


stocksharp View Post
We started from MS SQL.

...DB has a great potential but not with a market data and backtesting.

Sorry for the extreme hypothetical, but I'd doubt Facebook can run their data collection/storage back-end off of CSV files

There are more than several builds that individual traders in this thread have alluded to as using DBs (Cassandra, Mongo, etc), and if in context of financial market data, going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

I'm going to be pulling unfiltered, streaming tick data for some of the most active futures instruments, such as Crude Oil, E-Mini S&P, etc...the number of rows will be in the tens of millions rather quickly. I get that reading from / writing to a CSV is a lot simpler & faster than a DB table, but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

Reply With Quote
  #36 (permalink)
 
stocksharp's Avatar
 stocksharp 
Moscow, RU
 
Experience: Advanced
Platform: StockSharp
Trading: ES
Posts: 38 since Mar 2014
Thanks Given: 3
Thanks Received: 13


ClutchAce View Post
going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

For backtesting? Actually I didn't seen any commercial trading app that keep market data in DB. NT, MC, SC, TS - they use they own format of a flat files.


ClutchAce View Post
but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

Sure. CSV like a binary format - just a files. They are not a silver bullets. I'm sure there is hundreds and thousand cases where DB works brilliant than just files. Even in trading and backtesting. For my cases (running data on local environment or use a cloud services) raw files more adapted.

My point it this thread - to make a seed of doubt. And terms "Data" and "Database" are not equals. Right choice depends from many-many details (storage type, count of users, distributed, parallel working, fast reading or writing, etc.).

Follow me on Twitter Reply With Quote
  #37 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685


ClutchAce View Post
Sorry for the extreme hypothetical, but I'd doubt Facebook can run their data collection/storage back-end off of CSV files

There are more than several builds that individual traders in this thread have alluded to as using DBs (Cassandra, Mongo, etc), and if in context of financial market data, going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

I'm going to be pulling unfiltered, streaming tick data for some of the most active futures instruments, such as Crude Oil, E-Mini S&P, etc...the number of rows will be in the tens of millions rather quickly. I get that reading from / writing to a CSV is a lot simpler & faster than a DB table, but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

A DBMS and a flat file are not all that different. Many DBMSes are just executables that sit on top of a flat file on your file system. You can always roll your own and achieve comparable functionality, but if you find yourself replicating most of the functionality of a DBMS, then chances are that years of painstaking B-tree optimization etc. are superior to your own solution.

The truth is, platform designers for NT, MC, SC, TS etc. are 20 years behind in their design choices because their users don't run into serious use cases. On the other hand, a company like FB has many flexible and unstructured use cases for their data that they have not yet discovered. The principle of 'big data' in a firm like FB is that they're collecting more data than they can churn analytics on for months to come, whereas most retail traders can backtest through their entire collection of data overnight. So you should let neither sway your decision on what to do.

I can name several use cases where a DBMS is superior:

1. Administrative privileges. This isn't just a matter of having multiple people in a company that you're working with. Even in a 1 man shop, it is poor taste for your backtesting application to have same write privileges as your storing application.

2. Constraints. For example, let's say you're storing EUR/USD data for FXCM and then you decide to trade EUR/USD on Hotspot. Now maybe you want to rename them to EUR/USD.FXCM and EUR/USD.Hotspot and modify all the associated data (tables). It's easy to make this modification with a `CASCADE` operation, but troublesome if you rolled your own solution with flat files. Or let's say you rolled a software change to the application that is storing your tick data last Thursday at 5 PM ET, and after a backtest today, you realized that all your data since last Thursday 5 PM ET was glitched. Maybe all of it has incorrect rollover offsets. With careful planning in your schema and a few statements in your DBMS's DSL, you can fix this.

3. Indexing and range selection. Let's say you're trying to select data from 3 AM on Aug 15, 2014 to 4.15 PM on Aug 17, 2014 for a backtest. Walking through the array that backs the file system directory structure is cheap because you probably don't have a large number of files, however at some point you're going to have to walk through the timestamps. It takes O(2 log n) to do both endpoints, but many CSV parsing modules/libraries are agnostic to sort order and will brainless walk O(2n) through this. Because your handrolled solution is almost surely in row major order, you're paying as much as 2 days of backtesting in I/O cost to qualify your data for an actual loop for 3~ days of backtesting. Similarly, hash indices make short work of certain useful queries ("fetch me all ticks at support/resistance level"). It's ugly to serialize a B-tree or hash index blob in CSV or your own binary format to compete, and that's code that doesn't bring you joy or money.

4. One-stop shop for redundancy and backup. It's entirely possible to do backup with RAID and a bunch of Bash scripts (rsync etc.) but for many users, it's cleaner to take care of backup entirely at the DBMS application level. Newer DBMSes ensure you're backed up on commit and can fall back on your backup in a hardware-agnostic way. You could have 2 separate RAID0 machines and it would still work.

Reply With Quote
Thanked by:
  #38 (permalink)
femto
Paris, France
 
Posts: 3 since Sep 2016
Thanks Given: 0
Thanks Received: 3

Maybe TimescaleDB should be considered (it's a PostgreSQL extension to deal more efficiently with time series (search "TimescaleDB vs. PostgreSQL for time-series")

Reply With Quote
Thanked by:
  #39 (permalink)
 ab456 
New Delhi, India
 
Experience: Intermediate
Platform: SierraChart NinjaTrader,
Trading: ES,Stocks, Futures
Posts: 170 since Sep 2011
Thanks Given: 734
Thanks Received: 111


femto View Post
Maybe TimescaleDB should be considered (it's a PostgreSQL extension to deal more efficiently with time series (search "TimescaleDB vs. PostgreSQL for time-series")

Thanks femto

I checked out the site and I am wondering what practical difference is there in the different versions mentioned in the link, from the perspective of a normal retail trader point of view. If I use the FREE COMMUNITY VERSION then would that be good enough or do I need to think about the Enterprise Version ?
- https://www.timescale.com/products


Any one else who has tried out TimeScale DB, what is your experience in this regards ? Is it good for storing the TICK Data which gets increased by around 1 Million Rows for each trading session ?

Does it have proper analytic functions etc. which will help in analyzing the data properly ?

A lot of guys keep on saying that Relational Database are not good for storing Tick Data, but the TimeScale DB guys contest that point of view and say that there database is one of the best one's out there for managing timeseries data in the current times. All this gets me confused...

I cannot keep the data into separate flat files etc. as some friends have mentioned here. I need to have all this tick data imported into the db on a daily basis. If anyone is practically doing something similar, then please share your views about which particular db are you using.

Thanks and regards

Reply With Quote
  #40 (permalink)
 Optiondreamer 
Spain
 
Experience: Intermediate
Platform: NinjaTrader
Trading: Options
Posts: 13 since Oct 2009
Thanks Given: 12
Thanks Received: 16


In this thread have been listed enough alternatives to make a decision. I think that depends on the tool you use to analyze the data.
If you use a third-party tool, the type of DB to be used is not relevant because it does not make sense to use a different one.
If you make your own tool, then you must choose the DB/Library that better fit your goals, and here a lot of variables come in.
Being simplistic, an example would be to use TeaFiles for single data series. To make decisions from multiple data sources a relational db could be used, but this could be achieved through code, too, using single series.
There is no one solution, analize your goals and test and measure the tools better fit.

Reply With Quote
Thanked by:




Last Updated on May 21, 2019


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts