NexusFi: Find Your Edge


Home Menu

 





Database setup recommendation


Discussion in Tech Support

Updated
      Top Posters
    1. looks_one treydog999 with 6 posts (0 thanks)
    2. looks_two artemiso with 4 posts (8 thanks)
    3. looks_3 sam028 with 3 posts (4 thanks)
    4. looks_4 Big Mike with 2 posts (3 thanks)
      Best Posters
    1. looks_one artemiso with 2 thanks per post
    2. looks_two grausch with 2 thanks per post
    3. looks_3 Big Mike with 1.5 thanks per post
    4. looks_4 sam028 with 1.3 thanks per post
    1. trending_up 4,213 views
    2. thumb_up 19 thanks given
    3. group 7 followers
    1. forum 16 posts
    2. attach_file 0 attachments




 
Search this Thread

Database setup recommendation

  #11 (permalink)
 
sam028's Avatar
 sam028 
Site Moderator
 
Posts: 3,765 since Jun 2009
Thanks Given: 3,825
Thanks Received: 4,629


treydog999 View Post
Optimally I would collect each depth of market change, so that i could estimate order cancellation and modification rates. If it do it after each trade, then it would basically just be a random snapshot.
...

You should start by doing some basic statistics and count how many changes/second you have for each tracked instrument.
It's easy using the Ninjatrader OnMarketDepth() for example, then plot the results.
I'm afraid that a single server won't be able to collect 500 symbols level 2 updates, and this is before writing the data somewhere.
Nanex do this, NinjaTrader too for their Market Replay, but not on a single server I think .

Success requires no deodorant! (Sun Tzu)
Follow me on Twitter Reply With Quote
Thanked by:

Can you help answer these questions
from other members on NexusFi?
ZombieSqueeze
Platforms and Indicators
How to apply profiles
Traders Hideout
Exit Strategy
NinjaTrader
MC PL editor upgrade
MultiCharts
REcommedations for programming help
Sierra Chart
 
Best Threads (Most Thanked)
in the last 7 days on NexusFi
Just another trading journal: PA, Wyckoff & Trends
34 thanks
Tao te Trade: way of the WLD
24 thanks
GFIs1 1 DAX trade per day journal
15 thanks
Vinny E-Mini & Algobox Review TRADE ROOM
13 thanks
My NQ Trading Journal
11 thanks
  #12 (permalink)
 
Big Mike's Avatar
 Big Mike 
Manta, Ecuador
Site Administrator
Developer
Swing Trader
 
Experience: Advanced
Platform: Custom solution
Broker: IBKR
Trading: Stocks & Futures
Frequency: Every few days
Duration: Weeks
Posts: 50,440 since Jun 2009
Thanks Given: 33,212
Thanks Received: 101,599

I would never use HDD for this, only SSD

Sent from my phone

We're here to help: just ask the community or contact our Help Desk

Quick Links: Change your Username or Register as a Vendor
Searching for trading reviews? Review this list
Lifetime Elite Membership: Sign-up for only $149 USD
Exclusive money saving offers from our Site Sponsors: Browse Offers
Report problems with the site: Using the NexusFi changelog thread
Follow me on Twitter Visit my NexusFi Trade Journal Reply With Quote
Thanked by:
  #13 (permalink)
 
sam028's Avatar
 sam028 
Site Moderator
 
Posts: 3,765 since Jun 2009
Thanks Given: 3,825
Thanks Received: 4,629


I just checked very quickly on ES 03-16 15 minutes after the open today, about 2000 market depth updates/second maximum, the average seems to be around 300/second.
Okay, it's liquid instrument, but if @treydog999 wants to deal with 500 symbols this going to be tricky to manage this kind of frequency.

Success requires no deodorant! (Sun Tzu)
Follow me on Twitter Reply With Quote
  #14 (permalink)
 
treydog999's Avatar
 treydog999 
seoul, Korea
 
Experience: Intermediate
Platform: Multicharts
Broker: CQG, DTN IQfeed
Trading: YM 6E
Posts: 897 since Jul 2012
Thanks Given: 291
Thanks Received: 1,039


sam028 View Post
I just checked very quickly on ES 03-16 15 minutes after the open today, about 2000 market depth updates/second maximum, the average seems to be around 300/second.
Okay, it's liquid instrument, but if @treydog999 wants to deal with 500 symbols this going to be tricky to manage this kind of frequency.

That does seem like a lot. I appreciate you taking a look at that for me. What if I did not need to collect the data myself? For example buying it from a tick warehouse / professional data provider. Instead just used this database to store the data after the close and run queries against. That may be a better alternative for me as I definitely will not be able to handle 500 instruments that way, maybe not even 100.

Started this thread Reply With Quote
  #15 (permalink)
 
shodson's Avatar
 shodson 
OC, California, USA
Quantoholic
 
Experience: Advanced
Platform: IB/TWS, NinjaTrader, ToS
Broker: IB, ToS, Kinetick
Trading: stocks, options, futures, VIX
Posts: 1,976 since Jun 2009
Thanks Given: 533
Thanks Received: 3,709

I think you should build your software solution first, then scale up as needed. Don't worry about hardware needs for now.

So, get data for, say, 10 symbols. Write your apps or queries that are going to read and write this data. Use it for a while, work out the bugs. Keep adding symbols until your exiting hardware starts to struggle, then that will give you a good idea on how high-end your hardware solution needs to be.

In general, my philosophies towards hardware are
- SSDs always, at least for anything where response time matters. You can use HDD for archive and backups. You can get a 2TB SSD for $600+ now.
- The more RAM the better
- RAID 10 is optimal, at least do RAID 0 for speed and backup transaction logs to a separate drive. You only need to upgrade to RAID 1+0 if you can't miss a single transaction ever in your data stream. If you can re-load it later from an outside source then you don't need RAID 1, just restore from a transaction log backup and reload data from your outside source anything after your last backup.

Initially, I'm guessing you want to play with and research this set of data and come up with some algos based on the historical data, so the always-on, real-time aspect of the server is not crucial if you can reload it later from an outside source. Once you move to trade live money with it then the real-time uptime will be crucial and you'll need to implement the highest degree of speed and availability.

And I don't think mongoDB or any other noSQL solution is right for you. Those are more appropriate for large, unstructured data structures (video, documents, etc.) but if you're dealing with a deterministic structure of numbers then SQL will be superior for you both storage and application-wise (query-ability). However, noSQL has superior horizontal scaling capabilities but poor support for ACID transactions. For more see this SQL vs. noSQL article.

Follow me on Twitter Visit my NexusFi Trade Journal Reply With Quote
Thanked by:
  #16 (permalink)
 
treydog999's Avatar
 treydog999 
seoul, Korea
 
Experience: Intermediate
Platform: Multicharts
Broker: CQG, DTN IQfeed
Trading: YM 6E
Posts: 897 since Jul 2012
Thanks Given: 291
Thanks Received: 1,039


shodson View Post
I think you should build your software solution first, then scale up as needed. Don't worry about hardware needs for now.

So, get data for, say, 10 symbols. Write your apps or queries that are going to read and write this data. Use it for a while, work out the bugs. Keep adding symbols until your exiting hardware starts to struggle, then that will give you a good idea on how high-end your hardware solution needs to be.

In general, my philosophies towards hardware are
- SSDs always, at least for anything where response time matters. You can use HDD for archive and backups. You can get a 2TB SSD for $600+ now.
- The more RAM the better
- RAID 10 is optimal, at least do RAID 0 for speed and backup transaction logs to a separate drive. You only need to upgrade to RAID 1+0 if you can't miss a single transaction ever in your data stream. If you can re-load it later from an outside source then you don't need RAID 1, just restore from a transaction log backup and reload data from your outside source anything after your last backup.

Initially, I'm guessing you want to play with and research this set of data and come up with some algos based on the historical data, so the always-on, real-time aspect of the server is not crucial if you can reload it later from an outside source. Once you move to trade live money with it then the real-time uptime will be crucial and you'll need to implement the highest degree of speed and availability.

And I don't think mongoDB or any other noSQL solution is right for you. Those are more appropriate for large, unstructured data structures (video, documents, etc.) but if you're dealing with a deterministic structure of numbers then SQL will be superior for you both storage and application-wise (query-ability). However, noSQL has superior horizontal scaling capabilities but poor support for ACID transactions. For more see this SQL vs. noSQL article.


Thanks for the comments. Have you heard of KERF? just curious as it is new for time series data. Not well known or battle tested but since you seem knowledgeable i thought i could get your opinion.

In regards to using this for development and research you are bang on. Our production servers use a totally different system that is outsourced solution. I am doing some basic research to see what can be brought in house.

The negative to that is most of the tick data suppliers for large quantities of what i need usually do a 1 time data dump and then FTP uploads daily for updates. I would not be able to get that back by going back to my original data source. For example Bloomberg uses this methodology and that initial data dump is not cheap. So losing that at all would be disastrous, 100x multiple of the cost of my test server. So raid 1+0 is probably the way to go for me. Thoughts on raid 5?

Started this thread Reply With Quote
  #17 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685


treydog999 View Post
The negative to that is most of the tick data suppliers for large quantities of what i need usually do a 1 time data dump and then FTP uploads daily for updates. I would not be able to get that back by going back to my original data source. For example Bloomberg uses this methodology and that initial data dump is not cheap. So losing that at all would be disastrous, 100x multiple of the cost of my test server. So raid 1+0 is probably the way to go for me. Thoughts on raid 5?

Store both the raw files from the FTP dump and the database derived from the raw files. They can just be in two different directories in the same file system.

If you want a cheap and quick backup of your raw files from the FTP dump on top of the copy that resides on your database server, since these are static, you can probably do it for real cheap with Glacier, or worse, a plug-and-play NAS or an external HDD.

Reply With Quote
Thanked by:




Last Updated on February 23, 2016


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts