NexusFi: Find Your Edge


Home Menu

 





Petabyte-scale storage server


Discussion in Tech Support

Updated
    1. trending_up 1,595 views
    2. thumb_up 4 thanks given
    3. group 2 followers
    1. forum 6 posts
    2. attach_file 0 attachments




 
Search this Thread

Petabyte-scale storage server

  #1 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685

I noticed several members here have plenty of IT experience. Is there any particular hardware that you all might recommend for petabyte-scale storage? Or any existing build lists that we can get ideas out of? We're thinking of building everything from scratch instead of using a turn-key expansion enclosure.

Started this thread Reply With Quote

Can you help answer these questions
from other members on NexusFi?
ZombieSqueeze
Platforms and Indicators
Trade idea based off three indicators.
Traders Hideout
NexusFi Journal Challenge - May 2024
Feedback and Announcements
NT7 Indicator Script Troubleshooting - Camarilla Pivots
NinjaTrader
Exit Strategy
NinjaTrader
 
Best Threads (Most Thanked)
in the last 7 days on NexusFi
Spoo-nalysis ES e-mini futures S&P 500
48 thanks
Just another trading journal: PA, Wyckoff & Trends
31 thanks
Bigger Wins or Fewer Losses?
24 thanks
Tao te Trade: way of the WLD
24 thanks
GFIs1 1 DAX trade per day journal
22 thanks
  #2 (permalink)
 
Big Mike's Avatar
 Big Mike 
Manta, Ecuador
Site Administrator
Developer
Swing Trader
 
Experience: Advanced
Platform: Custom solution
Broker: IBKR
Trading: Stocks & Futures
Frequency: Every few days
Duration: Weeks
Posts: 50,446 since Jun 2009
Thanks Given: 33,217
Thanks Received: 101,610


artemiso View Post
I noticed several members here have plenty of IT experience. Is there any particular hardware that you all might recommend for petabyte-scale storage? Or any existing build lists that we can get ideas out of? We're thinking of building everything from scratch instead of using a turn-key expansion enclosure.

My last job before trading was VP of a high-end storage company.

It is really impossible to answer your question without knowing a whole lot more about the purpose.

What level of redundancy is needed? Active/active, active/passive, none?
Is it primarily random or sequential IO?
What level of throughput/iops in both scenarios above is required?
How are you going to access the data (FC SAN, iSCSI, etc)?
How many hosts are going to connect to it at once?
What type of file system requirements do you have? One huge file system? 100 small file systems?
How do you want to handle file sharing and LUN provisioning?

In the end, you can just build your own bare 4U chassis loaded with as many 3TB spindles as can fit, and get dozens of SAS cards and attach them to hosts running a Linux server. That is the poor man solution that will also require the most experience to handle.

If you are looking at Tier 1 storage and a seven figure budget, then the above would be laughed out of the park.

Mike

We're here to help: just ask the community or contact our Help Desk

Quick Links: Change your Username or Register as a Vendor
Searching for trading reviews? Review this list
Lifetime Elite Membership: Sign-up for only $149 USD
Exclusive money saving offers from our Site Sponsors: Browse Offers
Report problems with the site: Using the NexusFi changelog thread
Follow me on Twitter Visit my NexusFi Trade Journal Reply With Quote
Thanked by:
  #3 (permalink)
 
Big Mike's Avatar
 Big Mike 
Manta, Ecuador
Site Administrator
Developer
Swing Trader
 
Experience: Advanced
Platform: Custom solution
Broker: IBKR
Trading: Stocks & Futures
Frequency: Every few days
Duration: Weeks
Posts: 50,446 since Jun 2009
Thanks Given: 33,217
Thanks Received: 101,610


BTW, I haven't even talked about disaster recovery, snapshots, etc. Depending on your requirements, you may be able to do it at the software level, or may need a hardware solution to do it.

Mike

We're here to help: just ask the community or contact our Help Desk

Quick Links: Change your Username or Register as a Vendor
Searching for trading reviews? Review this list
Lifetime Elite Membership: Sign-up for only $149 USD
Exclusive money saving offers from our Site Sponsors: Browse Offers
Report problems with the site: Using the NexusFi changelog thread
Follow me on Twitter Visit my NexusFi Trade Journal Reply With Quote
  #4 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685

Thanks @Big Mike, excellent hardware advice as always. I'll answer to my best ability, but I have to confess this is not an area of my expertise.

In the end, you can just build your own bare 4U chassis loaded with as many 3TB spindles as can fit, and get dozens of SAS cards and attach them to hosts running a Linux server. That is the poor man solution that will also require the most experience to handle.

If you are looking at Tier 1 storage and a seven figure budget, then the above would be laughed out of the park.


Right. Currently our data storage is completely outsourced. There's a submarine cable that goes directly across the river to a data center in Boston. As one might guess, this is beginning to be expensive and we're looking to mitigate costs. The current arrangement will still be kept as it works well for some kind of operational redundancy (disaster recovery as you put it), but the plan is to set up a storage server inhouse. We've thought of hiring a dedicated database engineer to solve the problem - I'm surprised how far we've gone without one.

Our current, first-impression plan was indeed as you've described: poor man's solution, bare 4U chassis with plenty of 2 TB spindles, several stacks of these, probably nets the highest capacity/cost ratio and is very achievable with a 6-digit budget. What are the drawbacks?

A reason I'm asking for general ideas is not that I have to be doing most of the assembly, but I'm ultimately responsible for allocating the budget and it would be good to have an informed opinion before I green-light a build. Is there any reading material that you'd recommend in this area?

What level of redundancy is needed? Active/active, active/passive, none? How do you want to handle file sharing and LUN provisioning?
Active/active. Never thought too far. Opinions?

Is it primarily random or sequential IO?
Random.

What level of throughput/iops in both scenarios above is required? How are you going to access the data (FC SAN, iSCSI, etc)?
Haven't decided on the former, revamping a lot of our software layer lately. 16 Gb FC SAN. Possibly IB-based SAN.

How many hosts are going to connect to it at once?
<=15.

What type of file system requirements do you have? One huge file system? 100 small file systems?
One huge.

Started this thread Reply With Quote
  #5 (permalink)
 
Big Mike's Avatar
 Big Mike 
Manta, Ecuador
Site Administrator
Developer
Swing Trader
 
Experience: Advanced
Platform: Custom solution
Broker: IBKR
Trading: Stocks & Futures
Frequency: Every few days
Duration: Weeks
Posts: 50,446 since Jun 2009
Thanks Given: 33,217
Thanks Received: 101,610


artemiso View Post
Thanks @Big Mike, excellent hardware advice as always. I'll answer to my best ability, but I have to confess this is not an area of my expertise.

In the end, you can just build your own bare 4U chassis loaded with as many 3TB spindles as can fit, and get dozens of SAS cards and attach them to hosts running a Linux server. That is the poor man solution that will also require the most experience to handle.

If you are looking at Tier 1 storage and a seven figure budget, then the above would be laughed out of the park.


Right. Currently our data storage is completely outsourced. There's a submarine cable that goes directly across the river to a data center in Boston. As one might guess, this is beginning to be expensive and we're looking to mitigate costs. The current arrangement will still be kept as it works well for some kind of operational redundancy (disaster recovery as you put it), but the plan is to set up a storage server inhouse. We've thought of hiring a dedicated database engineer to solve the problem - I'm surprised how far we've gone without one.

Our current, first-impression plan was indeed as you've described: poor man's solution, bare 4U chassis with plenty of 2 TB spindles, several stacks of these, probably nets the highest capacity/cost ratio and is very achievable with a 6-digit budget. What are the drawbacks?

A reason I'm asking for general ideas is not that I have to be doing most of the assembly, but I'm ultimately responsible for allocating the budget and it would be good to have an informed opinion before I green-light a build. Is there any reading material that you'd recommend in this area?

What level of redundancy is needed? Active/active, active/passive, none? How do you want to handle file sharing and LUN provisioning?
Active/active. Never thought too far. Opinions?

Is it primarily random or sequential IO?
Random.

What level of throughput/iops in both scenarios above is required? How are you going to access the data (FC SAN, iSCSI, etc)?
Haven't decided on the former, revamping a lot of our software layer lately. 16 Gb FC SAN. Possibly IB-based SAN.

How many hosts are going to connect to it at once?
<=15.

What type of file system requirements do you have? One huge file system? 100 small file systems?
One huge.

Well keep in mind your on a trading site, and I left the industry 6 years ago.

That said, the first trouble area is wanting a single enormous file system. Technically you can do it but there are risks.

Random IO is also a bit of an issue, but you didn't give an IOPS number so its unclear how demanding the application is. If it's not that demanding, just enormous, then you can save a lot of money.

Basically, demanding random IO (high IOPS) means you need lots of SSD's. Otherwise you could get away with 3TB spindles for the most part, and then just a few (say 10%) SSD's if you use the right kind of file system, like ZFS, where it can cache to SSD for near line and push the other stuff to the slower spindles.

If you want full redundancy, it really means you should invest in at least a hardware raid controller head unit. For this level of capacity, it means you would need several of those. Each head unit can manage 100-200TB of capacity, depending on the level of performance you need. So each head unit would have say (100) 3TB spindles, and (10) 512GB SSD spindles hanging off it.

At this point you have the option of whether you want to present the spindles raw to the host, which might be preferred in some situations with ZFS, or if you want it to manage the RAID directly. Maybe (5) RAID 10 arrays, each made up of 20 spindles, for example on the HDD side, and (1) RAID 10 array on the SSD side. RAID 10 is best for high random IO but at high cost (more spindles) vs RAID 5 or RAID 6. Each box would give you roughly 50*2.6TB[usable] so 130TB usable capacity, plus the SSD cache for ZFS of roughly 2.2TB usable.

The hardware raid head units have two controllers per head unit, and each drive bay should be dual-port so there is two paths to the drive. This provides redundancy. Each of these head units would connect to your FC fabric.

You might check names like Infortrend and Promise for some example 1U active/active hardware raid head units, and they'll just use SFP to connect from the head units to the SAS JBOD chassis.

The next level up from there would be to buy a tier 1 named solution, at many times the price, but also guaranteed service levels and such. Probably 5x the cost of just buying the head units and JBOD chassis from Infortrend or etc, and buying your own drives, and managing all of it yourself.

As for where to turn for help, sorry but I am out of the industry and the company I was the VP of went out of business shortly after I quit. I would start with Infortrend or Promise website, see if any of their solutions make sense, and find a reseller. You can then issue some RFQ's.

You should also check hardforum.com and ask for advice there.

Mike

We're here to help: just ask the community or contact our Help Desk

Quick Links: Change your Username or Register as a Vendor
Searching for trading reviews? Review this list
Lifetime Elite Membership: Sign-up for only $149 USD
Exclusive money saving offers from our Site Sponsors: Browse Offers
Report problems with the site: Using the NexusFi changelog thread
Follow me on Twitter Visit my NexusFi Trade Journal Reply With Quote
  #6 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685

@Big Mike

Great tips. We're starting work on this project. More complicated than initially expected and we might have to expand our office space because of the servers humming away. Will let you know how it goes.

Started this thread Reply With Quote
  #7 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685

Confirmed! Breaking the contract on our current office lease to move. New office needs to be rewired for the considerable power delivery, three-phase etc. I wonder how other people are coping with their servers. Hope you're enjoying your holiday.

Started this thread Reply With Quote




Last Updated on August 5, 2013


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts