DOM & Math

Interesting Post, maybe I can help....

For sure you quantify if information from the DOM has a correlation to how the market moves. To do this, you just need to build an extraction program to get the raw data. You don't need anything fanciful like AI, or ML to solve this for you, all you need is the raw data in a spreadsheet and a few basic formulas to determine if there are any legit bets that definitively pan out.

Getting the data extracted and compiled into the right format for your analysis is way more complex than running the analysis itself. I can tell you the process for both extracting the data, building the data model as well as how to perform the analysis. I don't play in this particular (DOM / Scalping) space myself, so even though I have this data I don't use it for this purpose. But generally here is the approach.

1. To get the raw data: You will need to make a call to your level 2 feed on some sort of frequency to get access to every level and extract these. Some approaches may be to sample this every 1 minute, every 5 seconds, or whatever. To me, the best approach is always the most granular which is to sample this on every price level change. So that's what I do. You will need to build logic in your extraction code to run your ask and bid levels relative to the current best bid, best ask so you have a relationship built in to sequence everything in order. So to do this you need to tap your level 1 feed as well. This won't automatically sync with your level 2 feed with most software 100% perfectly, so you will have to solve for this. Without a good structure you can get out of sequence very easily and the whole thing falls apart. You will also need your level 1 data to capture volume as well. I strongly recommend capturing the starting volume and ending volume from the bid and ask on every price level. It's also very helpful if you capture the added volume, subtracted volume canceled volume and transacted volume... Bifurcating these into different buckets will tell you a lot about the microstrucutre and the level 2 data resting 2,3,4 levels out will have more context once you see these types of KPIs.

2. To analyze the data: (Easy Part) All you need to do is define a series of bets that you want to test and then you do a few simple (look forward) formulas to check a few rows ahead to see if your hypothesis panned out or not. Here is an example.

A: Hypothesis Example: If the bid volume is 2x the ask volume on the first resting level out, then the market will move up. From here you just quantify each row where this condition is true, and then you set a formula to look ahead 3,5,10 rows, etc to the next price level change and see if the price level moves up or down. If it moves up. (Your bet was right, give yourself a + 1) If it moves down (Your bet was wrong, give yourself a -1) At the end of analyzing X amount of price level changes, over X amount of days, weeks, months, etc. Count your +1 and -1 and see the score. You can build the results into any fancy output you want.... Once you have the data you can measure this by time of day, day of week, etc... There are tons of ways to quantify how often and how decisively your bet panned out. Just make sure you dig deep enough. One or two days, or even a few weird market cycles can produce some outlier statistics. But you can easily solve this part.

B: Stress testing your results: This part is important. Just because you cross the 51% line doesn't mean you have a legit bet, you need to actually KILL it, to have a shot at covering your commission cost, and accounting for all the things that could go wrong. And finally, this is the most important part.... You need to make sure that you could have had any chance in hell of getting filled in the first place if you use limit orders. There are tons and I mean tons of edges that look like the holly grail that mathematically speaking have a 70%, 80%, 90% edge, until you realize that these are only available to the top 5% of the queue that gets filled on the side that wins the price level.

So that's the long answer, but short answer, yes you can easily do all of this. I may eventually be willing to share some of the microstructure raw data and research I have, but getting the raw data is the hard part. Testing a bet is very easy.

The kind of questions you are asking and the pursuit you are on is the right pursuit in my opinion.

Best of luck!


In the analytical world there is no such thing as art, there is only the science you know and the science you don't know. Characterizing the science you don't know as "art" is a fools game.

