NexusFi: Find Your Edge


Home Menu

 





Database setup recommendation


Discussion in Tech Support

Updated
      Top Posters
    1. looks_one treydog999 with 6 posts (0 thanks)
    2. looks_two artemiso with 4 posts (8 thanks)
    3. looks_3 sam028 with 3 posts (4 thanks)
    4. looks_4 Big Mike with 2 posts (3 thanks)
      Best Posters
    1. looks_one artemiso with 2 thanks per post
    2. looks_two shodson with 2 thanks per post
    3. looks_3 Big Mike with 1.5 thanks per post
    4. looks_4 sam028 with 1.3 thanks per post
    1. trending_up 4,130 views
    2. thumb_up 19 thanks given
    3. group 7 followers
    1. forum 16 posts
    2. attach_file 0 attachments




 
Search this Thread

Database setup recommendation

  #1 (permalink)
 
treydog999's Avatar
 treydog999 
seoul, Korea
 
Experience: Intermediate
Platform: Multicharts
Broker: CQG, DTN IQfeed
Trading: YM 6E
Posts: 897 since Jul 2012
Thanks Given: 291
Thanks Received: 1,039

I am considering replicating something like @Big Mike did using mariaDB and toku, or possibly going mongoDB. I have very little idea about what would fit my needs. As big mike's set up as shown in his thread handles like 6000 instruments. Which is definite overkill for me at this point.

So lets get my requirements down. I am going to be collecting tick data for about 500 instruments maybe 1000 after some development. We only have 5 levels of depth not 10 so that should save some space, some instruments are only level 1. I will also save minute and daily data as well.

I have a decently sized budget around 5k for this. Not including a 1.5 year old ThinkServer with a Xeon E3-1200v3 which I am going to purpose for this.

How many HDDs do I need? Raid 5 or 10?

Is going SSD vs HDD worth it for saving tick data for 500 instruments?

How would i set that up? 1 hard drive for the OS 1 for the logs and the raid array for storage?

I am going to guess going Linux (ubuntu) is going to be better than windows as well.

How much storage space should i look buy?

How important is RAM the ThinkServer only has 8GB at this point?

Sorry I really have no idea about anything here. I could really use some help.

Thanks in advance

Started this thread Reply With Quote

Can you help answer these questions
from other members on NexusFi?
Futures True Range Report
The Elite Circle
Better Renko Gaps
The Elite Circle
My NT8 Volume Profile Split by Asian/Euro/Open
NinjaTrader
Build trailing stop for micro index(s)
Psychology and Money Management
Deepmoney LLM
Elite Quantitative GenAI/LLM
 
Best Threads (Most Thanked)
in the last 7 days on NexusFi
Get funded firms 2023/2024 - Any recommendations or word …
59 thanks
Funded Trader platforms
36 thanks
NexusFi site changelog and issues/problem reporting
22 thanks
The Program
20 thanks
GFIs1 1 DAX trade per day journal
19 thanks
  #2 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685


Quoting 
I will also save minute and daily data as well.

Is going SSD vs HDD worth it for saving tick data for 500 instruments?

Depends on the type of queries that you are going to run. My answer is most probably not.


Quoting 
How important is RAM the ThinkServer only has 8GB at this point?

Depends on the memory-intensiveness (caching mechanism, index operations) of the database that you're using. My gut feeling is that your database application will probably need 32-64 GB without knowing more details.


treydog999 View Post
How many HDDs do I need? Raid 5 or 10?

How much storage space should i look buy?

You've answered this yourself:


Quoting 
So lets get my requirements down. I am going to be collecting tick data for about 500 instruments maybe 1000 after some development. We only have 5 levels of depth not 10 so that should save some space, some instruments are only level 1.

Probably give yourself around 20M x 500 per day = 10G per day, 5 TB to last 2 years. Your budget is plenty.

RAID 10 vs 5 is a business decision first: How much are you willing to pay for faster performance? 10 is generally 'faster' than 5, at the trade-off of space. You can easily get 5 TB with 3 disks in a RAID 5 or 4 disks in a RAID 10, so it's up to you.

Make sure you get a server with enough bays for all the disks you need and extras (you will care about this later).


Quoting 
How would i set that up? 1 hard drive for the OS 1 for the logs and the raid array for storage?

I am going to guess going Linux (ubuntu) is going to be better than windows as well.

What logs? Database WAL/xlog? Depends on the database you're using - but most probably yes, splitting the drives would help speed it up.

Linux vs Windows: It depends on the rest of your ecosystem... If you're using something like MS SQL Server, then Windows has first class citizen support. If the rest of your machines are in a Windows environment (Active Directory, RDC etc.), then it's probably convenient to use Windows. Otherwise, Linux.

Reply With Quote
Thanked by:
  #3 (permalink)
 
sam028's Avatar
 sam028 
Site Moderator
 
Posts: 3,765 since Jun 2009
Thanks Given: 3,825
Thanks Received: 4,629


There are two many unknown parameters, IMVHO, to give good answers, so the first think to do might be to choose the database type, SQL or noSQL.
You can try to think how you're going to get your data from a noSQL database, that may help to see if i's a good idea or not.

To collect 5 levels from 500 instruments will require a very good write throughput: forget about the couple Raid 5/HDDs! Or maybe with very good Raid card with at least 2 GB of cache.
I'll do SSD in Raid 0 (or Raid 10) for live collect, then archive collected data on cheaper/bigger HDDs in Raid 5, as write performance won't be a problem in this case.

About the RAM the more the better, but the Xeon E3 are limited to 32 GB.

You don't want to collect each depth of market change, only the depth of market status after a trade occurred, correct?

Success requires no deodorant! (Sun Tzu)
Follow me on Twitter Reply With Quote
Thanked by:
  #4 (permalink)
 
treydog999's Avatar
 treydog999 
seoul, Korea
 
Experience: Intermediate
Platform: Multicharts
Broker: CQG, DTN IQfeed
Trading: YM 6E
Posts: 897 since Jul 2012
Thanks Given: 291
Thanks Received: 1,039


sam028 View Post
There are two many unknown parameters, IMVHO, to give good answers, so the first think to do might be to choose the database type, SQL or noSQL.
You can try to think how you're going to get your data from a noSQL database, that may help to see if i's a good idea or not.

To collect 5 levels from 500 instruments will require a very good write throughput: forget about the couple Raid 5/HDDs! Or maybe with very good Raid card with at least 2 GB of cache.
I'll do SSD in Raid 0 (or Raid 10) for live collect, then archive collected data on cheaper/bigger HDDs in Raid 5, as write performance won't be a problem in this case.

About the RAM the more the better, but the Xeon E3 are limited to 32 GB.

You don't want to collect each depth of market change, only the depth of market status after a trade occurred, correct?

Optimally I would collect each depth of market change, so that i could estimate order cancellation and modification rates. If it do it after each trade, then it would basically just be a random snapshot.


Just to help fix some of the parameters. Lets say I am going with the exact same set up as @Big Mike and using MariaDB and Toku. That will make it more familiar and probably just easier to use as SQL is more commonplace.

Started this thread Reply With Quote
  #5 (permalink)
 
treydog999's Avatar
 treydog999 
seoul, Korea
 
Experience: Intermediate
Platform: Multicharts
Broker: CQG, DTN IQfeed
Trading: YM 6E
Posts: 897 since Jul 2012
Thanks Given: 291
Thanks Received: 1,039


artemiso View Post
Depends on the type of queries that you are going to run. My answer is most probably not.



Depends on the memory-intensiveness (caching mechanism, index operations) of the database that you're using. My gut feeling is that your database application will probably need 32-64 GB without knowing more details.



You've answered this yourself:



Probably give yourself around 20M x 500 per day = 10G per day, 5 TB to last 2 years. Your budget is plenty.

RAID 10 vs 5 is a business decision first: How much are you willing to pay for faster performance? 10 is generally 'faster' than 5, at the trade-off of space. You can easily get 5 TB with 3 disks in a RAID 5 or 4 disks in a RAID 10, so it's up to you.

Make sure you get a server with enough bays for all the disks you need and extras (you will care about this later).



What logs? Database WAL/xlog? Depends on the database you're using - but most probably yes, splitting the drives would help speed it up.

Linux vs Windows: It depends on the rest of your ecosystem... If you're using something like MS SQL Server, then Windows has first class citizen support. If the rest of your machines are in a Windows environment (Active Directory, RDC etc.), then it's probably convenient to use Windows. Otherwise, Linux.

I really appreciate this answer. Helped me get some basic estimates and what I should be looking for, as well as giving me a frame of reference. Thanks as always @artemiso

Started this thread Reply With Quote
  #6 (permalink)
 
Big Mike's Avatar
 Big Mike 
Manta, Ecuador
Site Administrator
Developer
Swing Trader
 
Experience: Advanced
Platform: Custom solution
Broker: IBKR
Trading: Stocks & Futures
Frequency: Every few days
Duration: Weeks
Posts: 50,399 since Jun 2009
Thanks Given: 33,173
Thanks Received: 101,538

Storing data is only a small piece. Using it is more complex (timely queries).

My dual Xeon E5 with 128gb memory and raid 10 ssds with 6gb sas hardware raid is barely up to the task, I would advise you do months of proof of concept testing on a small scale before deciding what you really need in a larger deployment.

Sent from my phone

We're here to help: just ask the community or contact our Help Desk

Quick Links: Change your Username or Register as a Vendor
Searching for trading reviews? Review this list
Lifetime Elite Membership: Sign-up for only $149 USD
Exclusive money saving offers from our Site Sponsors: Browse Offers
Report problems with the site: Using the NexusFi changelog thread
Follow me on Twitter Visit my NexusFi Trade Journal Reply With Quote
Thanked by:
  #7 (permalink)
 
treydog999's Avatar
 treydog999 
seoul, Korea
 
Experience: Intermediate
Platform: Multicharts
Broker: CQG, DTN IQfeed
Trading: YM 6E
Posts: 897 since Jul 2012
Thanks Given: 291
Thanks Received: 1,039


Big Mike View Post
Storing data is only a small piece. Using it is more complex (timely queries).

My dual Xeon E5 with 128gb memory and raid 10 ssds with 6gb sas hardware raid is barely up to the task, I would advise you do months of proof of concept testing on a small scale before deciding what you really need in a larger deployment.

Sent from my phone

You make a really good point on the small scale and usage is very important (timely queries).

Is the upgrade from say 10k HDDs to SSDs worth it? I know that you get 10x IOPS at about the 10x the price. Also the price of enterprise great hardware is significantly more expensive. Is that worth it?

Using maria/toku what are the tweaks or hardware that you think improves timely queries the most?

Thanks in advance

Started this thread Reply With Quote
  #8 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685


treydog999 View Post
Also the price of enterprise great hardware is significantly more expensive. Is that worth it?

Enterprise hardware is almost certainly desired if (i) you are going to plug it in a remote datacenter and leave it alone and inaccessible for a very long time and (ii) you have a lot of hardware working together in concert and (iii) you have a lot of data to write and re-write. (There's another, 4th reason why enterprise hardware is expensive that I will talk about at the end, that will probably be irrelevant for this discussion.)

It's a matter of probability. Hardware has a certain mean time before failure and standard deviation of time before failure; enterprise hardware generally has a larger mean and smaller standard deviation. Even the cheapest retail hardware will probably provide you something like 57 years mean time before failure. You can quantify whether it matters to you: If you have 4 disks and your server is in your closet vs if you have 400 disks at 8 hours' travel time away. For most people, enterprise hardware is not necessary.

For example, I'm in a large city where it is costly to put (a very large number of) servers within driving distance from the city. Instead, most of my firepower is in an obscure city that I've only been to once in my lifetime. 1 day of downtime and sending a sysadmin down to fix things is a lot costlier than the additional cost of enterprise hardware, so the price becomes justified immediately. In another case, I have servers about 10k miles away from my location, it's an absolute certainty that you'd rather spend a bit more on a better insurance policy (enterprise hardware).

(The fourth reason that enterprise hardware is expensive - and in this case I mean EMC etc. is that it gets very, very expensive to extract acceptable performance after your first 1 petabyte.)

Reply With Quote
Thanked by:
  #9 (permalink)
 grausch 
Luxembourg, Luxembourg
 
Experience: Advanced
Platform: TWS
Broker: Interactive Brokers
Trading: Stocks
Posts: 494 since May 2012
Thanks Given: 1,731
Thanks Received: 1,159


treydog999 View Post
Raid 5 or 10?


artemiso View Post
RAID 10 vs 5 is a business decision first: How much are you willing to pay for faster performance? 10 is generally 'faster' than 5, at the trade-off of space. You can easily get 5 TB with 3 disks in a RAID 5 or 4 disks in a RAID 10, so it's up to you.

Regarding RAID5 vs RAID10, I was always under the impression that RAID 10 is also the "safer" option. Couldn't really find the article I was looking for, but here is something similar - RAID 5 Vs. RAID 10. With RAID5, you run the risk that you lose your entire dataset during a rebuild. With RAID 10, you still have a risk when 1 disk is being "cloned" after a failure, but that is a read-only operation (on the surviving drive) which places less stress on a disk vs the full read-write operation of RAID5 rebuild. Bear in mind though that the last time I investigated this was 2010/2011 and I have used RAID5 (actually RAIDZ1) since then without issue in my iTunes server, but it runs a much lighter workload than your proposed database.

You also need to budget for a backup solution. RAIDs can fail and your data can be lost. If you want to have snapshots (like Apples's Time Machine), then it would be good to have more storage space in your backup solution than in your database server. I would aim for about double the size in this scenario. If you do not need snapshots then same size is fine.

Also bear in mind that the less free space you have in your array, the more the workload increases stress across your drives. In my server there is a very definite drop in performance once the free space drops below 50% and once it drops to 20% things slow to a crawl. So, always ensure that you budget for sufficient size now. While you can increase size later with RAID5 (not sure about 10), every time you add drives the array rebuilds itself which means you run a higher risk of failure.

Reply With Quote
Thanked by:
  #10 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685



grausch View Post
Regarding RAID5 vs RAID10, I was always under the impression that RAID 10 is also the "safer" option. Couldn't really find the article I was looking for, but here is something similar - RAID 5 Vs. RAID 10. With RAID5, you run the risk that you lose your entire dataset during a rebuild. With RAID 10, you still have a risk when 1 disk is being "cloned" after a failure, but that is a read-only operation (on the surviving drive) which places less stress on a disk vs the full read-write operation of RAID5 rebuild. Bear in mind though that the last time I investigated this was 2010/2011 and I have used RAID5 (actually RAIDZ1) since then without issue in my iTunes server, but it runs a much lighter workload than your proposed database.

Seems mostly about right. Also have to add that 10 gives you the ability to lose half of your drives in the best case scenario as opposed to only 1 drive.

Reply With Quote
Thanked by:




Last Updated on February 23, 2016


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts