Welcome to NexusFi: the best trading community on the planet, with over 150,000 members Sign Up Now for Free
Genuine reviews from real traders, not fake reviews from stealth vendors
Quality education from leading professional traders
We are a friendly, helpful, and positive community
We do not tolerate rude behavior, trolling, or vendors advertising in posts
We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community. It's free for basic access, or support us by becoming an Elite Member -- see if you qualify for a discount below.
-- Big Mike, Site Administrator
(If you already have an account, login at the top of the page)
I don't think I have ever seen a DOM data sample - Do you have a sample you want to store that you can share?
It strikes me as something for which MySQL would be better suited - but it depends on the data resolution, your reasons to store this data as well as your longer term plans about it.
Given the data resolution, it really depends on how you intend to store the data. If you are going to store it in single-day files, you can use Excel.
But if you intend to use Excel to store multiple days in one single file, chances are after a few weeks/months of data gathering you will exceed Excel's limit of 1,048,576 rows, which means mySQL would be more suitable.
Again, it really depends on your intentions, as well as your proficiency in both Excel and SQL.
In SQL you can build views (Which are predefined reports that aggregate, calculate, and filter data in anyway you can possibly imagine). So the approach you want is to send rows over to your SQL Server on whatever frequency you like, and then when you need it to query the views back to your application. If you application is running on the same machine as your SQL Server and you have decent memory allocation, this process will be the fastest and cheapest on your memory.
I might do a post about this eventually and show how I do it....
Excel by contrast only holds 1.2 million rows, and starts to hemorrhage after around 500K rows. It can't calculation, aggregate, or filter quick at all, and there is no chance you could do any analysis and send it back to your application in any reasonable time.
Hope this helps.
Ian
In the analytical world there is no such thing as art, there is only the science you know and the science you don't know. Characterizing the science you don't know as "art" is a fools game.
1. To do the initial analysis (Not in real time to optimize for latency or execution speed) I just do a print to the output window of ninjatrader and copy batches of 100K records at a time into excel. The raw data is about 3x the size of the final analyzed dataset. So for one day in the ES you will be around 150k to 200k raw rows. I typically keep a spreadhseet for each day, where I compute my various bets and test different things. In case it's not obvious, I play in the HFT space.... So I don't need months and years of history at a macro level to test my bets. I need weeks up to at most a few months of very very detailed information to validate my current betting models. This might be a different approach from the needs of some traders.
2. Now in a production environment, optimizing for real time data feeds and execution speed you would need a different approach to work with this much data. If the goal is to synthesize 10k to 50k rows of data into your decision engine in real time to get alpha signals to trade off of, then you will certainly need to take the SQL approach. Here the idea would be to do an insert statement for each row into your database table, and then as needed fetch various views in SQL to determine your analytics.
So if you are just getting started and want to do the analysis, then just printing the output to an output window and then copying it straight into excel will work fine during the initial analysis phase.... But for real time trading, there is no way to move between excel and your application fast enough to have any value, you will have to go with SQL for this step.
There has been interest in some of these topics before from others, and I have been saying I would do it for a while now.... So maybe I will start a microstructure thread and share how to do some of this data modeling, analysis, betting logic, etc.
Ian
In the analytical world there is no such thing as art, there is only the science you know and the science you don't know. Characterizing the science you don't know as "art" is a fools game.
If you have any programming experience I would suggest storing raw data in flat binary files with a fixed structure (one file per instrument per day). ES alone has 2-3 million level1 (best bid/ask/trades) updates per day and > 4 million level2 (10 levels of order book on each side). It's way too much data even for SQL. But as other posters said, it depends on whether you want to store the whole thing.