[Other] Time Series database for tick and trade data. - Platforms and Indicators | futures io social trading
futures io futures trading


Time Series database for tick and trade data.
Started: by gregid Views / Replies:4,154 / 36
Last Reply: Attachments:2

Welcome to futures io.

Welcome, Guest!

This forum was established to help traders (especially futures traders) by openly sharing indicators, strategies, methods, trading journals and discussing the psychology of trading.

We are fundamentally different than most other trading forums:
  • We work extremely hard to keep things positive on our forums.
  • We do not tolerate rude behavior, trolling, or vendor advertising in posts.
  • We firmly believe in openness and encourage sharing. The holy grail is within you, it is not something tangible you can download.
  • We expect our members to participate and become a part of the community. Help yourself by helping others.


You'll need to register in order to view the content of the threads and start contributing to our community. It's free and simple, and we will never resell your private information.

-- Big Mike

Reply
 2  
 
Thread Tools Search this Thread
 

Time Series database for tick and trade data.

  #31 (permalink)
Elite Member
Wrocław, Poland
 
Futures Experience: Intermediate
Platform: NinjaTrader, Racket
Favorite Futures: Ockham's razor
 
gregid's Avatar
 
Posts: 651 since Aug 2009
Thanks: 321 given, 601 received


artemiso View Post
JSON is the only serialization format that I would absolutely recommend against.

Not trying to be facetious, but you can still do even worse than JSON... ehmm... YAML...

Reply With Quote
The following user says Thank You to gregid for this post:
 
  #32 (permalink)
StockSharp dev
Moscow, RU
 
Futures Experience: Advanced
Platform: StockSharp
Favorite Futures: ES
 
stocksharp's Avatar
 
Posts: 28 since Mar 2014
Thanks: 3 given, 12 received


ClutchAce View Post
I figured <10TB of total storage capacity needed, but the real cost (>$300/mo) shows up due the total monthly up/down transfer quota between a colo'd server and a cloud solution like AWS.

To keep all data on the same region (and use it in non peak hours) will decrease a total price. Transfer data in bound of any region should be free.

Reply With Quote
 
  #33 (permalink)
StockSharp dev
Moscow, RU
 
Futures Experience: Advanced
Platform: StockSharp
Favorite Futures: ES
 
stocksharp's Avatar
 
Posts: 28 since Mar 2014
Thanks: 3 given, 12 received



ClutchAce View Post
Hello stocksharp - so just to confirm, you're stating that PostgreSQL is useless for financial market data, because of read operation limitations?

We started from MS SQL.

Just a simple comparison. To load data (ticks, candles - irrelevant) from local DB (using named pipes as a fastest inter-process communication) will be slower in 100-200 times than read it from regular CSV file.

Market data is not oriented for relation databases. First of all - there is no any relations. The pricing data like a streaming video - oriented on raw data format (text or binary). The fastest way - upload 1-2 weeks in tick data and use it memory.

Wanna to upload it into some cloud services? Pretty easy. We use AWS for iteration backtesting (optimization). Allocation at the same time ~20 servers with replicated market data. Cheapest data storage - S3. Replicating it to EC2. As a blobs for sure. No any SQL queries or something like that can kill the performance.

DB has a great potential but not with a market data and backtesting.

Reply With Quote
The following user says Thank You to stocksharp for this post:
 
  #34 (permalink)
Elite Member
Jacksonville, Florida United States
 
Futures Experience: Intermediate
Platform: Fully custom
Broker/Data: Optimus Futures, Interactive Brokers
Favorite Futures: Profitable ones
 
Jasonnator's Avatar
 
Posts: 67 since Dec 2014
Thanks: 18 given, 41 received

There is some fantastic real-world experience coming out in this thread! Regardless of the approach, there is information in here for a wide array of users. Great thread guys.

Reply With Quote
 
  #35 (permalink)
Trading Apprentice
Cookeville, TN
 
Futures Experience: Advanced
Platform: R|Trader, Thinkorswim
Favorite Futures: CL, RB, 6E, ZB, DX
 
Posts: 37 since Oct 2011
Thanks: 9 given, 12 received

futures io on Facebook

stocksharp View Post
We started from MS SQL.

...DB has a great potential but not with a market data and backtesting.

Sorry for the extreme hypothetical, but I'd doubt Facebook can run their data collection/storage back-end off of CSV files

There are more than several builds that individual traders in this thread have alluded to as using DBs (Cassandra, Mongo, etc), and if in context of financial market data, going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

I'm going to be pulling unfiltered, streaming tick data for some of the most active futures instruments, such as Crude Oil, E-Mini S&P, etc...the number of rows will be in the tens of millions rather quickly. I get that reading from / writing to a CSV is a lot simpler & faster than a DB table, but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

Reply With Quote
 
  #36 (permalink)
StockSharp dev
Moscow, RU
 
Futures Experience: Advanced
Platform: StockSharp
Favorite Futures: ES
 
stocksharp's Avatar
 
Posts: 28 since Mar 2014
Thanks: 3 given, 12 received


ClutchAce View Post
going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

For backtesting? Actually I didn't seen any commercial trading app that keep market data in DB. NT, MC, SC, TS - they use they own format of a flat files.


ClutchAce View Post
but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

Sure. CSV like a binary format - just a files. They are not a silver bullets. I'm sure there is hundreds and thousand cases where DB works brilliant than just files. Even in trading and backtesting. For my cases (running data on local environment or use a cloud services) raw files more adapted.

My point it this thread - to make a seed of doubt. And terms "Data" and "Database" are not equals. Right choice depends from many-many details (storage type, count of users, distributed, parallel working, fast reading or writing, etc.).

Reply With Quote
 
  #37 (permalink)
Elite Member
Manchester, NH
 
Futures Experience: Beginner
Platform: thinkorswim
Broker/Data: TD Ameritrade
Favorite Futures: Stocks
 
Posts: 885 since Jul 2012
Thanks: 595 given, 1,760 received


ClutchAce View Post
Sorry for the extreme hypothetical, but I'd doubt Facebook can run their data collection/storage back-end off of CSV files

There are more than several builds that individual traders in this thread have alluded to as using DBs (Cassandra, Mongo, etc), and if in context of financial market data, going flat-file CSV is faster, cheaper, and just as easy or easier to scale to some infinite horizon, why would anybody deploy a DB?

I'm going to be pulling unfiltered, streaming tick data for some of the most active futures instruments, such as Crude Oil, E-Mini S&P, etc...the number of rows will be in the tens of millions rather quickly. I get that reading from / writing to a CSV is a lot simpler & faster than a DB table, but when we're talking large-scale setups, seems this argument won't be able to keep up with the reality of the demand.

A DBMS and a flat file are not all that different. Many DBMSes are just executables that sit on top of a flat file on your file system. You can always roll your own and achieve comparable functionality, but if you find yourself replicating most of the functionality of a DBMS, then chances are that years of painstaking B-tree optimization etc. are superior to your own solution.

The truth is, platform designers for NT, MC, SC, TS etc. are 20 years behind in their design choices because their users don't run into serious use cases. On the other hand, a company like FB has many flexible and unstructured use cases for their data that they have not yet discovered. The principle of 'big data' in a firm like FB is that they're collecting more data than they can churn analytics on for months to come, whereas most retail traders can backtest through their entire collection of data overnight. So you should let neither sway your decision on what to do.

I can name several use cases where a DBMS is superior:

1. Administrative privileges. This isn't just a matter of having multiple people in a company that you're working with. Even in a 1 man shop, it is poor taste for your backtesting application to have same write privileges as your storing application.

2. Constraints. For example, let's say you're storing EUR/USD data for FXCM and then you decide to trade EUR/USD on Hotspot. Now maybe you want to rename them to EUR/USD.FXCM and EUR/USD.Hotspot and modify all the associated data (tables). It's easy to make this modification with a `CASCADE` operation, but troublesome if you rolled your own solution with flat files. Or let's say you rolled a software change to the application that is storing your tick data last Thursday at 5 PM ET, and after a backtest today, you realized that all your data since last Thursday 5 PM ET was glitched. Maybe all of it has incorrect rollover offsets. With careful planning in your schema and a few statements in your DBMS's DSL, you can fix this.

3. Indexing and range selection. Let's say you're trying to select data from 3 AM on Aug 15, 2014 to 4.15 PM on Aug 17, 2014 for a backtest. Walking through the array that backs the file system directory structure is cheap because you probably don't have a large number of files, however at some point you're going to have to walk through the timestamps. It takes O(2 log n) to do both endpoints, but many CSV parsing modules/libraries are agnostic to sort order and will brainless walk O(2n) through this. Because your handrolled solution is almost surely in row major order, you're paying as much as 2 days of backtesting in I/O cost to qualify your data for an actual loop for 3~ days of backtesting. Similarly, hash indices make short work of certain useful queries ("fetch me all ticks at support/resistance level"). It's ugly to serialize a B-tree or hash index blob in CSV or your own binary format to compete, and that's code that doesn't bring you joy or money.

4. One-stop shop for redundancy and backup. It's entirely possible to do backup with RAID and a bunch of Bash scripts (rsync etc.) but for many users, it's cleaner to take care of backup entirely at the DBMS application level. Newer DBMSes ensure you're backed up on commit and can fall back on your backup in a hardware-agnostic way. You could have 2 separate RAID0 machines and it would still work.

Reply With Quote
The following 2 users say Thank You to artemiso for this post:

Reply



futures io > > > [Other]       Time Series database for tick and trade data.

Thread Tools Search this Thread
Search this Thread:

Advanced Search



Upcoming Webinars and Events (4:30PM ET unless noted)
 

Market Analysis w/Dave Forss @ IQFeed

Apr 25
 

Introducing iSystems with Stage 5 Trading

Apr 27

*NEW* FIO Journal Challenge featuring NinjaTrader ($2,000+ of prizes)

May

Prototyping Python Strategies (part 2) w/Ran Aroussi

Elite only

Ask Me Anything w/Raymond Deux @ NinjaTrader

May 3

An Afternoon with FIO member Softsoap

Elite only
 

Machine Learning & Data Mining Bias w/Kris Longmore @ Robot Wealth

Elite only

Brannigan Barrett (TBA)

Elite only

FIO member Gomi (TBA)

Elite only

FuturesTrader71 (TBA)

Elite only

EasyLanguage Programming w/Chris @ ABC Trading Group

Elite only

Portfolio Diversification w/Brendon Delate

Elite only
     

Similar Threads
Thread Thread Starter Forum Replies Last Post
CTA - Series 3, Series 7, Series 9, Series 10, Series 56 NASD exams certifications Big Mike Traders Hideout 41 January 15th, 2015 09:27 AM
Tick data for ES a few years in time elitetradernyc NinjaTrader Programming 2 May 6th, 2013 04:23 PM
DataBase of TICK Data bomberone1 The Elite Circle 4 June 17th, 2012 09:31 PM
Is the 6E 12.50 a tick? Best time to trade it Texas time? skyfly Currency Futures 7 August 7th, 2011 09:49 AM
PLT_NTDataCollect store bid ask tick data to its local database? rcabri Vendors and Product Reviews 3 December 27th, 2010 10:00 AM


All times are GMT -4. The time now is 10:26 AM.

no new posts
Page generated 2017-04-25 in 0.15 seconds with 20 queries on phoenix via your IP 54.167.215.35