I decided to start a thread to provide some samples of historical market depth and MBO data to consolidate a solution for different questions on execution quality, data integrity and latency. My main goal is to encourage more latency transparency and data integrity in various products out there by providing an educational resource for traders to benchmark these. This thread was also inspired by: and
FAQ
1. What are the key differences between this and [other data source]?
The main difference between this and most sources of historical data is that it comes with market-by-order (MBO), full book market depth (all levels), nanosecond precision in the timestamps and one-to-one correspondence with exchange message sequence numbers. Here's a few comparisons with typical data feeds:
IQFeed, Nanex, QCollector, BaBAR: Does not provide millisecond resolution. Does not provide full historical market depth or MBO. Historical downloader only samples by trade tick, not on every market depth update.
CQG, Continuum, TT etc. through a retail platform: Limited by the platform's timestamping issues, no transparency over native timestamping and event ordering caused by platform (e.g. Interactive Brokers snapshots at fixed time intervals), poor granularity. Limited levels in market depth. No retail platform supports market-by-order that I know of at the moment. Most retail platforms don't provide you with the actual message sequence numbers, which make it difficult to ask CME or your broker for details about your trade.
Market replay or any realtime capture through NinjaTrader like GomCollector: Same as above, does not provide all levels and only shows market depth changes, not limit order events. You still need to write your own indicator to do this, and this is prone to error due to connection interruption etc.
Reuters tick history: Inaccurate level 2 timestamping because it uses 2 separate hosts to capture L1 and L2.
QuantHouse: Limited levels in market depth, no nanosecond resolution timestamping in historical data.
Rithmic: Lossy because of UDP protocol.
Because of licensing reasons (see point 3), I will only provide limited samples on random dates and products.
2. What's useful about this data? How should I use it?
Because the samples are not on contiguous days, this is not intended for backtesting or developing a trading strategy. I recommend you use this primarily for understanding your execution and data quality, identifying vendor software issues, or performing post-trade forensics on your own orders. Here's a few things you can do:
- Find your own orders in the data and figure out your execution latency.
- Understand the sequence of events around fast bursts and volatile spikes.
- Benchmark the integrity of your data feeds or historical data.
- Use the sequence numbers to reference specific events to your broker or CME for investigation.
This thread is dedicated solely to the discussion of historical market depth, MBO and their implications for these types of use cases. Please use the official tick data sharing thread ( instead if you are looking for backtesting or backfill data.
3. Are there any terms of use to these data samples?
The main issue has to do licensing. The CME has always applied no fees for historical redistribution of data, but Sections 1 and 11 of their agreement do not technically allow any end users of their data to be receiving a compilation of data from an extended period unless they have an explicit subscriber agreement. Large websites like Barchart get around this by having you sign up and click an "Agree" checkbox before you use their market data APIs, though these are practically available to the public since there's no policing of fake identities used for sign-up. Unfortunately, I'm not an authorized representative of this site so I can't dictate a user policy. I also don't think it's in the spirit of an online forum to have people submit themselves to restrictive user or subscriber agreements. This limits me to the next best …