Database setup recommendation - Tech Support | futures io social day trading
futures io futures trading


Database setup recommendation
Updated: Views / Replies:1,524 / 16
Created: by treydog999 Attachments:0

Welcome to futures io.

(If you already have an account, login at the top of the page)

futures io is the largest futures trading community on the planet, with over 90,000 members. At futures io, our goal has always been and always will be to create a friendly, positive, forward-thinking community where members can openly share and discuss everything the world of trading has to offer. The community is one of the friendliest you will find on any subject, with members going out of their way to help others. Some of the primary differences between futures io and other trading sites revolve around the standards of our community. Those standards include a code of conduct for our members, as well as extremely high standards that govern which partners we do business with, and which products or services we recommend to our members.

At futures io, our focus is on quality education. No hype, gimmicks, or secret sauce. The truth is: trading is hard. To succeed, you need to surround yourself with the right support system, educational content, and trading mentors Ė all of which you can find on futures io, utilizing our social trading environment.

With futures io, you can find honest trading reviews on brokers, trading rooms, indicator packages, trading strategies, and much more. Our trading review process is highly moderated to ensure that only genuine users are allowed, so you donít need to worry about fake reviews.

We are fundamentally different than most other trading sites:
  • We are here to help. Just let us know what you need.
  • We work extremely hard to keep things positive in our community.
  • We do not tolerate rude behavior, trolling, or vendors advertising in posts.
  • We firmly believe in and encourage sharing. The holy grail is within you, we can help you find it.
  • We expect our members to participate and become a part of the community. Help yourself by helping others.

You'll need to register in order to view the content of the threads and start contributing to our community.  It's free and simple.

-- Big Mike, Site Administrator

Reply
 
Thread Tools Search this Thread
 

Database setup recommendation

  #1 (permalink)
Elite Member
seoul, Korea
 
Futures Experience: Intermediate
Platform: Multicharts
Broker/Data: CQG, DTN IQfeed
Favorite Futures: YM 6E
 
treydog999's Avatar
 
Posts: 894 since Jul 2012
Thanks: 291 given, 1,006 received

Database setup recommendation

I am considering replicating something like @Big Mike did using mariaDB and toku, or possibly going mongoDB. I have very little idea about what would fit my needs. As big mike's set up as shown in his thread handles like 6000 instruments. Which is definite overkill for me at this point.

So lets get my requirements down. I am going to be collecting tick data for about 500 instruments maybe 1000 after some development. We only have 5 levels of depth not 10 so that should save some space, some instruments are only level 1. I will also save minute and daily data as well.

I have a decently sized budget around 5k for this. Not including a 1.5 year old ThinkServer with a Xeon E3-1200v3 which I am going to purpose for this.

How many HDDs do I need? Raid 5 or 10?

Is going SSD vs HDD worth it for saving tick data for 500 instruments?

How would i set that up? 1 hard drive for the OS 1 for the logs and the raid array for storage?

I am going to guess going Linux (ubuntu) is going to be better than windows as well.

How much storage space should i look buy?

How important is RAM the ThinkServer only has 8GB at this point?

Sorry I really have no idea about anything here. I could really use some help.

Thanks in advance

Reply With Quote
 
  #2 (permalink)
Elite Member
Manchester, NH
 
Futures Experience: Beginner
Platform: thinkorswim
Broker/Data: TD Ameritrade
Favorite Futures: Stocks
 
Posts: 902 since Jul 2012
Thanks: 603 given, 1,785 received


Quoting 
I will also save minute and daily data as well.

Is going SSD vs HDD worth it for saving tick data for 500 instruments?

Depends on the type of queries that you are going to run. My answer is most probably not.


Quoting 
How important is RAM the ThinkServer only has 8GB at this point?

Depends on the memory-intensiveness (caching mechanism, index operations) of the database that you're using. My gut feeling is that your database application will probably need 32-64 GB without knowing more details.


treydog999 View Post
How many HDDs do I need? Raid 5 or 10?

How much storage space should i look buy?

You've answered this yourself:


Quoting 
So lets get my requirements down. I am going to be collecting tick data for about 500 instruments maybe 1000 after some development. We only have 5 levels of depth not 10 so that should save some space, some instruments are only level 1.

Probably give yourself around 20M x 500 per day = 10G per day, 5 TB to last 2 years. Your budget is plenty.

RAID 10 vs 5 is a business decision first: How much are you willing to pay for faster performance? 10 is generally 'faster' than 5, at the trade-off of space. You can easily get 5 TB with 3 disks in a RAID 5 or 4 disks in a RAID 10, so it's up to you.

Make sure you get a server with enough bays for all the disks you need and extras (you will care about this later).


Quoting 
How would i set that up? 1 hard drive for the OS 1 for the logs and the raid array for storage?

I am going to guess going Linux (ubuntu) is going to be better than windows as well.

What logs? Database WAL/xlog? Depends on the database you're using - but most probably yes, splitting the drives would help speed it up.

Linux vs Windows: It depends on the rest of your ecosystem... If you're using something like MS SQL Server, then Windows has first class citizen support. If the rest of your machines are in a Windows environment (Active Directory, RDC etc.), then it's probably convenient to use Windows. Otherwise, Linux.

Reply With Quote
The following 3 users say Thank You to artemiso for this post:
 
  #3 (permalink)
Administrator: Retired Backtester
 Vendor: speedytradingservers.com 
Rennes France
 
Futures Experience: Advanced
Platform: NinjaTrader
Broker/Data: IB/Kinetick
Favorite Futures: Futures
 
sam028's Avatar
 
Posts: 3,366 since Jun 2009
Thanks: 3,574 given, 3,982 received


There are two many unknown parameters, IMVHO, to give good answers, so the first think to do might be to choose the database type, SQL or noSQL.
You can try to think how you're going to get your data from a noSQL database, that may help to see if i's a good idea or not.

To collect 5 levels from 500 instruments will require a very good write throughput: forget about the couple Raid 5/HDDs! Or maybe with very good Raid card with at least 2 GB of cache.
I'll do SSD in Raid 0 (or Raid 10) for live collect, then archive collected data on cheaper/bigger HDDs in Raid 5, as write performance won't be a problem in this case.

About the RAM the more the better, but the Xeon E3 are limited to 32 GB.

You don't want to collect each depth of market change, only the depth of market status after a trade occurred, correct?

Success requires no deodorant! (Sun Tzu)
Reply With Quote
The following 3 users say Thank You to sam028 for this post:
 
  #4 (permalink)
Elite Member
seoul, Korea
 
Futures Experience: Intermediate
Platform: Multicharts
Broker/Data: CQG, DTN IQfeed
Favorite Futures: YM 6E
 
treydog999's Avatar
 
Posts: 894 since Jul 2012
Thanks: 291 given, 1,006 received


sam028 View Post
There are two many unknown parameters, IMVHO, to give good answers, so the first think to do might be to choose the database type, SQL or noSQL.
You can try to think how you're going to get your data from a noSQL database, that may help to see if i's a good idea or not.

To collect 5 levels from 500 instruments will require a very good write throughput: forget about the couple Raid 5/HDDs! Or maybe with very good Raid card with at least 2 GB of cache.
I'll do SSD in Raid 0 (or Raid 10) for live collect, then archive collected data on cheaper/bigger HDDs in Raid 5, as write performance won't be a problem in this case.

About the RAM the more the better, but the Xeon E3 are limited to 32 GB.

You don't want to collect each depth of market change, only the depth of market status after a trade occurred, correct?

Optimally I would collect each depth of market change, so that i could estimate order cancellation and modification rates. If it do it after each trade, then it would basically just be a random snapshot.


Just to help fix some of the parameters. Lets say I am going with the exact same set up as @Big Mike and using MariaDB and Toku. That will make it more familiar and probably just easier to use as SQL is more commonplace.

Reply With Quote
 
  #5 (permalink)
Elite Member
seoul, Korea
 
Futures Experience: Intermediate
Platform: Multicharts
Broker/Data: CQG, DTN IQfeed
Favorite Futures: YM 6E
 
treydog999's Avatar
 
Posts: 894 since Jul 2012
Thanks: 291 given, 1,006 received


artemiso View Post
Depends on the type of queries that you are going to run. My answer is most probably not.



Depends on the memory-intensiveness (caching mechanism, index operations) of the database that you're using. My gut feeling is that your database application will probably need 32-64 GB without knowing more details.



You've answered this yourself:



Probably give yourself around 20M x 500 per day = 10G per day, 5 TB to last 2 years. Your budget is plenty.

RAID 10 vs 5 is a business decision first: How much are you willing to pay for faster performance? 10 is generally 'faster' than 5, at the trade-off of space. You can easily get 5 TB with 3 disks in a RAID 5 or 4 disks in a RAID 10, so it's up to you.

Make sure you get a server with enough bays for all the disks you need and extras (you will care about this later).



What logs? Database WAL/xlog? Depends on the database you're using - but most probably yes, splitting the drives would help speed it up.

Linux vs Windows: It depends on the rest of your ecosystem... If you're using something like MS SQL Server, then Windows has first class citizen support. If the rest of your machines are in a Windows environment (Active Directory, RDC etc.), then it's probably convenient to use Windows. Otherwise, Linux.

I really appreciate this answer. Helped me get some basic estimates and what I should be looking for, as well as giving me a frame of reference. Thanks as always @artemiso

Reply With Quote
 
  #6 (permalink)
Site Administrator
Manta, Ecuador
 
Futures Experience: Advanced
Platform: My own custom solution
Favorite Futures: E-mini ES S&P 500
 
Big Mike's Avatar
 
Posts: 46,240 since Jun 2009
Thanks: 29,354 given, 83,237 received

Storing data is only a small piece. Using it is more complex (timely queries).

My dual Xeon E5 with 128gb memory and raid 10 ssds with 6gb sas hardware raid is barely up to the task, I would advise you do months of proof of concept testing on a small scale before deciding what you really need in a larger deployment.

Sent from my phone

Due to time constraints, please do not PM me if your question can be resolved or answered on the forum.

Need help?
1) Stop changing things. No new indicators, charts, or methods. Be consistent with what is in front of you first.
2) Start a journal and post to it daily with the trades you made to show your strengths and weaknesses.
3) Set goals for yourself to reach daily. Make them about how you trade, not how much money you make.
4) Accept responsibility for your actions. Stop looking elsewhere to explain away poor performance.
5) Where to start as a trader? Watch this webinar and read this thread for hundreds of questions and answers.
6)
Help using the forum? Watch this video to learn general tips on using the site.

If you want
to support our community, become an Elite Member.

Reply With Quote
The following 2 users say Thank You to Big Mike for this post:
 
  #7 (permalink)
Elite Member
seoul, Korea
 
Futures Experience: Intermediate
Platform: Multicharts
Broker/Data: CQG, DTN IQfeed
Favorite Futures: YM 6E
 
treydog999's Avatar
 
Posts: 894 since Jul 2012
Thanks: 291 given, 1,006 received


Big Mike View Post
Storing data is only a small piece. Using it is more complex (timely queries).

My dual Xeon E5 with 128gb memory and raid 10 ssds with 6gb sas hardware raid is barely up to the task, I would advise you do months of proof of concept testing on a small scale before deciding what you really need in a larger deployment.

Sent from my phone

You make a really good point on the small scale and usage is very important (timely queries).

Is the upgrade from say 10k HDDs to SSDs worth it? I know that you get 10x IOPS at about the 10x the price. Also the price of enterprise great hardware is significantly more expensive. Is that worth it?

Using maria/toku what are the tweaks or hardware that you think improves timely queries the most?

Thanks in advance

Reply With Quote
 
  #8 (permalink)
Elite Member
Manchester, NH
 
Futures Experience: Beginner
Platform: thinkorswim
Broker/Data: TD Ameritrade
Favorite Futures: Stocks
 
Posts: 902 since Jul 2012
Thanks: 603 given, 1,785 received


treydog999 View Post
Also the price of enterprise great hardware is significantly more expensive. Is that worth it?

Enterprise hardware is almost certainly desired if (i) you are going to plug it in a remote datacenter and leave it alone and inaccessible for a very long time and (ii) you have a lot of hardware working together in concert and (iii) you have a lot of data to write and re-write. (There's another, 4th reason why enterprise hardware is expensive that I will talk about at the end, that will probably be irrelevant for this discussion.)

It's a matter of probability. Hardware has a certain mean time before failure and standard deviation of time before failure; enterprise hardware generally has a larger mean and smaller standard deviation. Even the cheapest retail hardware will probably provide you something like 57 years mean time before failure. You can quantify whether it matters to you: If you have 4 disks and your server is in your closet vs if you have 400 disks at 8 hours' travel time away. For most people, enterprise hardware is not necessary.

For example, I'm in a large city where it is costly to put (a very large number of) servers within driving distance from the city. Instead, most of my firepower is in an obscure city that I've only been to once in my lifetime. 1 day of downtime and sending a sysadmin down to fix things is a lot costlier than the additional cost of enterprise hardware, so the price becomes justified immediately. In another case, I have servers about 10k miles away from my location, it's an absolute certainty that you'd rather spend a bit more on a better insurance policy (enterprise hardware).

(The fourth reason that enterprise hardware is expensive - and in this case I mean EMC etc. is that it gets very, very expensive to extract acceptable performance after your first 1 petabyte.)

Reply With Quote
The following 2 users say Thank You to artemiso for this post:
 
  #9 (permalink)
Elite Member
Luxembourg, Luxembourg
 
Futures Experience: Advanced
Platform: TWS
Broker/Data: Interactive Brokers
Favorite Futures: Stocks
 
Posts: 491 since May 2012
Thanks: 1,641 given, 1,126 received


treydog999 View Post
Raid 5 or 10?


artemiso View Post
RAID 10 vs 5 is a business decision first: How much are you willing to pay for faster performance? 10 is generally 'faster' than 5, at the trade-off of space. You can easily get 5 TB with 3 disks in a RAID 5 or 4 disks in a RAID 10, so it's up to you.

Regarding RAID5 vs RAID10, I was always under the impression that RAID 10 is also the "safer" option. Couldn't really find the article I was looking for, but here is something similar - RAID 5 Vs. RAID 10. With RAID5, you run the risk that you lose your entire dataset during a rebuild. With RAID 10, you still have a risk when 1 disk is being "cloned" after a failure, but that is a read-only operation (on the surviving drive) which places less stress on a disk vs the full read-write operation of RAID5 rebuild. Bear in mind though that the last time I investigated this was 2010/2011 and I have used RAID5 (actually RAIDZ1) since then without issue in my iTunes server, but it runs a much lighter workload than your proposed database.

You also need to budget for a backup solution. RAIDs can fail and your data can be lost. If you want to have snapshots (like Apples's Time Machine), then it would be good to have more storage space in your backup solution than in your database server. I would aim for about double the size in this scenario. If you do not need snapshots then same size is fine.

Also bear in mind that the less free space you have in your array, the more the workload increases stress across your drives. In my server there is a very definite drop in performance once the free space drops below 50% and once it drops to 20% things slow to a crawl. So, always ensure that you budget for sufficient size now. While you can increase size later with RAID5 (not sure about 10), every time you add drives the array rebuilds itself which means you run a higher risk of failure.

Reply With Quote
The following 2 users say Thank You to grausch for this post:
 
  #10 (permalink)
Elite Member
Manchester, NH
 
Futures Experience: Beginner
Platform: thinkorswim
Broker/Data: TD Ameritrade
Favorite Futures: Stocks
 
Posts: 902 since Jul 2012
Thanks: 603 given, 1,785 received



grausch View Post
Regarding RAID5 vs RAID10, I was always under the impression that RAID 10 is also the "safer" option. Couldn't really find the article I was looking for, but here is something similar - RAID 5 Vs. RAID 10. With RAID5, you run the risk that you lose your entire dataset during a rebuild. With RAID 10, you still have a risk when 1 disk is being "cloned" after a failure, but that is a read-only operation (on the surviving drive) which places less stress on a disk vs the full read-write operation of RAID5 rebuild. Bear in mind though that the last time I investigated this was 2010/2011 and I have used RAID5 (actually RAIDZ1) since then without issue in my iTunes server, but it runs a much lighter workload than your proposed database.

Seems mostly about right. Also have to add that 10 gives you the ability to lose half of your drives in the best case scenario as opposed to only 1 drive.

Reply With Quote
The following 2 users say Thank You to artemiso for this post:

Reply



futures io > > > > Database setup recommendation

Thread Tools Search this Thread
Search this Thread:

Advanced Search



Upcoming Webinars and Events (4:30PM ET unless noted)

Jigsaw Trading: TBA

Elite only

FuturesTrader71: TBA

Elite only

NinjaTrader: TBA

Jan 18

RandBots: TBA

Jan 23

GFF Brokers & CME Group: Futures & Bitcoin

Elite only

Adam Grimes: TBA

Elite only

Ran Aroussi: TBA

Elite only
     

Similar Threads
Thread Thread Starter Forum Replies Last Post
Broker/data recommendation Jmccoyiii Reviews of Brokers and Data Feeds 4 May 28th, 2015 03:25 AM
intraday trading home study course / setup recommendation contrails The Elite Circle 1 September 25th, 2013 10:10 PM
Trading School Recommendation Request plethora Trading Reviews and Vendors 37 September 4th, 2012 01:02 PM
JPMorgan CEO should not be chairman: recommendation kbit News and Current Events 0 April 30th, 2012 07:35 PM
EL Book Recommendation/s? TonyB EasyLanguage Programming 4 March 24th, 2012 05:19 AM


All times are GMT -4. The time now is 12:03 PM.

Copyright © 2017 by futures io, s.a., Av Ricardo J. Alfaro, Century Tower, Panama, +507 833-9432, info@futures.io
All information is for educational use only and is not investment advice.
There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
no new posts
Page generated 2017-12-17 in 0.16 seconds with 19 queries on phoenix via your IP 54.90.92.204