[Other] Time Series database for tick and trade data. (Page 4) - Platforms and Indicators | futures.io
futures.io futures trading
 

Go Back   futures.io

> Futures Trading, News, Charts and Platforms > Platforms and Indicators


Time Series database for tick and trade data.
Started:April 22nd, 2015 (09:19 AM) by gregid Views / Replies:3,383 / 36
Last Reply:November 7th, 2016 (07:10 PM) Attachments:2

Welcome to futures.io.

Welcome, Guest!

This forum was established to help traders (especially futures traders) by openly sharing indicators, strategies, methods, trading journals and discussing the psychology of trading.

We are fundamentally different than most other trading forums:
  • We work extremely hard to keep things positive on our forums.
  • We do not tolerate rude behavior, trolling, or vendor advertising in posts.
  • We firmly believe in openness and encourage sharing. The holy grail is within you, it is not something tangible you can download.
  • We expect our members to participate and become a part of the community. Help yourself by helping others.


You'll need to register in order to view the content of the threads and start contributing to our community. It's free and simple, and we will never resell your private information.

-- Big Mike
     

Reply
 2  
 
Thread Tools Search this Thread

Time Series database for tick and trade data.

Old November 6th, 2016, 07:29 AM   #31 (permalink)
Elite Member
London, UK
 
Futures Experience: Intermediate
Platform: NinjaTrader, Julia
Favorite Futures: Ockham's razor
 
gregid's Avatar
 
Posts: 643 since Aug 2009
Thanks: 313 given, 592 received


artemiso View Post
JSON is the only serialization format that I would absolutely recommend against.

Not trying to be facetious, but you can still do even worse than JSON... ehmm... YAML...

Reply With Quote
     
The following user says Thank You to gregid for this post:
     

Old November 6th, 2016, 10:21 AM   #32 (permalink)
StockSharp dev
Moscow, RU
 
Futures Experience: Advanced
Platform: StockSharp
Favorite Futures: ES
 
stocksharp's Avatar
 
Posts: 24 since Mar 2014
Thanks: 3 given, 12 received


ClutchAce View Post
I figured <10TB of total storage capacity needed, but the real cost (>$300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS.

To keep all data on the same region (and use it in non peak hours) will decrease a total price. Transfer data in bound of any region should be free.

Reply With Quote
     

Old November 6th, 2016, 10:37 AM   #33 (permalink)
StockSharp dev
Moscow, RU
 
Futures Experience: Advanced
Platform: StockSharp
Favorite Futures: ES
 
stocksharp's Avatar
 
Posts: 24 since Mar 2014
Thanks: 3 given, 12 received



ClutchAce View Post
Hello stocksharp - so just to confirm, you're stating that PostgreSQL is useless for financial market data, because of read operation limitations?

We started from MS SQL.

Just a simple comparison. To load data (ticks, candles - irrelevant) from local DB (using named pipes as a fastest inter-process communication) will be slower in 100-200 times than read it from regular CSV file.

Market data is not oriented for relation databases. First of all - there is no any relations. The pricing data like a streaming video - oriented on raw data format (text or binary). The fastest way - upload 1-2 weeks in tick data and use it memory.

Wanna to upload it into some cloud services? Pretty easy. We use AWS for iteration backtesting (optimization). Allocation at the same time ~20 servers with replicated market data. Cheapest data storage - S3. Replicating it to EC2. As a blobs for sure. No any SQL queries or something like that can kill the performance.

DB has a great potential but not with a market data and backtesting.

Reply With Quote
     
The following user says Thank You to stocksharp for this post:
     

Old November 6th, 2016, 11:22 AM   #34 (permalink)
Elite Member
Jacksonville, Florida United States
 
Futures Experience: Intermediate
Platform: Fully custom
Broker/Data: Optimus Futures, Interactive Brokers
Favorite Futures: Profitable ones
 
Jasonnator's Avatar
 
Posts: 67 since Dec 2014
Thanks: 18 given, 41 received

There is some fantastic real-world experience coming out in this thread! Regardless of the approach, there is information in here for a wide array of users. Great thread guys.

Reply With Quote
     

Old November 6th, 2016, 01:05 PM   #35 (permalink)
Trading Apprentice
Cookeville, TN
 
Futures Experience: Advanced
Platform: R|Trader, Thinkorswim
Favorite Futures: CL, RB, 6E, ZB, DX
 
Posts: 29 since Oct 2011
Thanks: 8 given, 7 received

Futures Edge on FIO

Are you a NinjaTrader user?

 

stocksharp View Post
We started from MS SQL.

...DB has a great potential but not with a market data and backtesting.

Sorry for the extreme hypothetical, but I'd doubt Facebook can run their data collection/storage back-end off of CSV files

There are more than several builds that individual traders in this thread have alluded to as using DBs (Cassandra, Mongo, etc), and if in context of financial market data, going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

I'm going to be pulling unfiltered, streaming tick data for some of the most active futures instruments, such as Crude Oil, E-Mini S&P, etc...the number of rows will be in the tens of millions rather quickly. I get that reading from / writing to a CSV is a lot simpler & faster than a DB table, but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

Reply With Quote
     

Old November 7th, 2016, 04:15 PM   #36 (permalink)
StockSharp dev
Moscow, RU
 
Futures Experience: Advanced
Platform: StockSharp
Favorite Futures: ES
 
stocksharp's Avatar
 
Posts: 24 since Mar 2014
Thanks: 3 given, 12 received


ClutchAce View Post
going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

For backtesting? Actually I didn't seen any commercial trading app that keep market data in DB. NT, MC, SC, TS - they use they own format of a flat files.


ClutchAce View Post
but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

Sure. CSV like a binary format - just a files. They are not a silver bullets. I'm sure there is hundreds and thousand cases where DB works brilliant than just files. Even in trading and backtesting. For my cases (running data on local environment or use a cloud services) raw files more adapted.

My point it this thread - to make a seed of doubt. And terms "Data" and "Database" are not equals. Right choice depends from many-many details (storage type, count of users, distributed, parallel working, fast reading or writing, etc.).

Reply With Quote
     

Old November 7th, 2016, 07:10 PM   #37 (permalink)
Elite Member
Manchester, NH
 
Futures Experience: Beginner
Platform: thinkorswim
Broker/Data: TD Ameritrade
Favorite Futures: Stocks
 
Posts: 854 since Jul 2012
Thanks: 585 given, 1,697 received


ClutchAce View Post
Sorry for the extreme hypothetical, but I'd doubt Facebook can run their data collection/storage back-end off of CSV files

There are more than several builds that individual traders in this thread have alluded to as using DBs (Cassandra, Mongo, etc), and if in context of financial market data, going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

I'm going to be pulling unfiltered, streaming tick data for some of the most active futures instruments, such as Crude Oil, E-Mini S&P, etc...the number of rows will be in the tens of millions rather quickly. I get that reading from / writing to a CSV is a lot simpler & faster than a DB table, but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

A DBMS and a flat file are not all that different. Many DBMSes are just executables that sit on top of a flat file on your file system. You can always roll your own and achieve comparable functionality, but if you find yourself replicating most of the functionality of a DBMS, then chances are that years of painstaking B-tree optimization etc. are superior to your own solution.

The truth is, platform designers for NT, MC, SC, TS etc. are 20 years behind in their design choices because their users don't run into serious use cases. On the other hand, a company like FB has many flexible and unstructured use cases for their data that they have not yet discovered. The principle of 'big data' in a firm like FB is that they're collecting more data than they can churn analytics on for months to come, whereas most retail traders can backtest through their entire collection of data overnight. So you should let neither sway your decision on what to do.

I can name several use cases where a DBMS is superior:

1. Administrative privileges. This isn't just a matter of having multiple people in a company that you're working with. Even in a 1 man shop, it is poor taste for your backtesting application to have same write privileges as your storing application.

2. Constraints. For example, let's say you're storing EUR/USD data for FXCM and then you decide to trade EUR/USD on Hotspot. Now maybe you want to rename them to EUR/USD.FXCM and EUR/USD.Hotspot and modify all the associated data (tables). It's easy to make this modification with a `CASCADE` operation, but troublesome if you rolled your own solution with flat files. Or let's say you rolled a software change to the application that is storing your tick data last Thursday at 5 PM ET, and after a backtest today, you realized that all your data since last Thursday 5 PM ET was glitched. Maybe all of it has incorrect rollover offsets. With careful planning in your schema and a few statements in your DBMS's DSL, you can fix this.

3. Indexing and range selection. Let's say you're trying to select data from 3 AM on Aug 15, 2014 to 4.15 PM on Aug 17, 2014 for a backtest. Walking through the array that backs the file system directory structure is cheap because you probably don't have a large number of files, however at some point you're going to have to walk through the timestamps. It takes O(2 log n) to do both endpoints, but many CSV parsing modules/libraries are agnostic to sort order and will brainless walk O(2n) through this. Because your handrolled solution is almost surely in row major order, you're paying as much as 2 days of backtesting in I/O cost to qualify your data for an actual loop for 3~ days of backtesting. Similarly, hash indices make short work of certain useful queries ("fetch me all ticks at support/resistance level"). It's ugly to serialize a B-tree or hash index blob in CSV or your own binary format to compete, and that's code that doesn't bring you joy or money.

4. One-stop shop for redundancy and backup. It's entirely possible to do backup with RAID and a bunch of Bash scripts (rsync etc.) but for many users, it's cleaner to take care of backup entirely at the DBMS application level. Newer DBMSes ensure you're backed up on commit and can fall back on your backup in a hardware-agnostic way. You could have 2 separate RAID0 machines and it would still work.

Reply With Quote
     
The following 2 users say Thank You to artemiso for this post:
     

Reply



futures.io > Futures Trading, News, Charts and Platforms > Platforms and Indicators > [Other]       Time Series database for tick and trade data.

Thread Tools Search this Thread
Search this Thread:

Advanced Search



Upcoming Webinars and Events (4:30PM ET unless noted)
 

NinjaTrader 8: Features and Enhancements, Tips and Tricks

Dec 6
 

Al Brooks: Stop Losing when a Good Trade goes Bad, Correcting Mistakes

Elite only
 

Trading Technologies: Algo Design Lab hands-on

Dec 13
     

Similar Threads
Thread Thread Starter Forum Replies Last Post
CTA - Series 3, Series 7, Series 9, Series 10, Series 56 NASD exams certifications Big Mike Traders Hideout 41 January 15th, 2015 10:27 AM
Tick data for ES a few years in time elitetradernyc NinjaTrader Programming 2 May 6th, 2013 05:23 PM
DataBase of TICK Data bomberone1 The Elite Circle 4 June 17th, 2012 10:31 PM
Is the 6E 12.50 a tick? Best time to trade it Texas time? skyfly Currency Futures 7 August 7th, 2011 10:49 AM
PLT_NTDataCollect store bid ask tick data to its local database? rcabri Vendors and Product Reviews 3 December 27th, 2010 11:00 AM


All times are GMT -4. The time now is 06:10 AM.

Copyright © 2016 by futures.io. All information is for educational use only and is not investment advice.
There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
 
no new posts

Page generated 2016-12-06 in 0.13 seconds with 20 queries on phoenix via your IP 54.158.157.24