Info

Full content requires Elite Membership, the below is only a preview of the first post in the discussion.

Data Quality Check (ES)

Howdy--

So I've cobbled together ES data using this thread and the QCollector thread, but upon backtesting it I was noticing some large losses when there shouldn't have been (I was using a fixed stop). Upon investigation, the large loss occurred in the backtest because I had a data gap whereby the Open[0] was MUCH less than the Close[1]. So I built a little indicator to determine if other data gaps were present in my ES data--and lo and behold my ES data is riddled with price gaps.

I have a working knowledge of the Settlement Policies whereby Closing prices are set, so I would expect there to be SOME differences between Open[0] and Close[1], so in my indicator I added in the ability to filter out gaps that were less than a user defined tick setting (i.e., 4 ticks, 8 ticks, whatever). So after applying the "acceptable tick error", the remaining gaps are ones to take a look at.

The indicator plots the tick price difference when the difference is greater than the acceptable tick range on the chart above the bar where the gap occurred. I also printed some of the relevant info to the output window so that I could see how many errors were present on the chart in question.

The indicator also checks if Feb follows Jan (etc) and 2007 follows 2006 (etc). I had originally done this because I was cobbling together data, and I couldn't remember what I had downloaded and uploaded, so I used the indicator to tell me where I had time gaps (then I went out and found the missing data).

Use the indicator as you see fit--it's attached.

I made sure that I was using a Session Template that included RTH and ETH, and I also made sure that my Merge Policy was set to "MergeBackAdjusted".

Here is an example of my ES daily Output Window from 2005 until now:

So as you can see, I have 198 data gaps of larger than 8 ticks (which is what I set it at).

Doing a similar analysis on my ES 60 min bars from 2005 forward, I have 181 data gaps or greater than 8 ticks.

Here are my goals:

1. We are nothing without quality data. If we don't have quality data, we'll have shitty backtest results and we will fail. Failure is not an option.

2. There is a ton of tick data on futures.io (formerly BMT), but I can't seem to find any threads out there that have set out to explore the quality of this data. My hope is that my little indicator will be the jumping off point for a movement within futures.io (formerly BMT) whereby we scrub available data for cleanliness before looking each other in the eyes and saying that the data is good and that we should trade capital upon the backtest results.

3. Given the presence of pricing gaps, then what. How can we fix this (either the data itself, import settings, instrument settings, rollover offsets, time zones, etc)?

4. Would someone that has a "pure" ES data set (perhaps one that was purchased by one of the tick data vendors (tick data dot com for instance) be willing to run this little indicator to see how many data gaps are present (and thereby considered "normal") for ES historical tick data on some of the main bar periods (i.e., annual, monthly, weekly, daily, 60 min, 30 min, 15 min, 5 min, 1 min)? The runs only take a few seconds, and perhaps the Output Window results for each of the bar periods could be posted up.

5. Once we have a "pure" picture of what the data should look like, we can then begin working on cleaning up the posted data--and perhaps even creating a new "clean" data thread that is moderated to only accept data that has been scrubbed by a community consensus for what constitutes "clean" data. The existing QCollector thread should still keep going (and much of the data might very well be deemed "clean" by the community).

Thanks for everyone's help, and my hope is that a number of us will come out the backend of this better off.

All best,

Aventeren