Interlaced Out of sample testing

September 13th, 2012, 06:19 AM

Hi,
though I am new to this forum considering this is my first post, I am not new to trading and also though I this is is going to be more general question, I did not know what part of forum to choose but since I am using NinjaTrader, I will post it here.

Introduction

With my friend, we have linked NinjaTrader to MySQL database to record all trades from our script and we do all tests and analyses above the database, so we do not need Ninja Trader for it. We just need it to run the script and record all trades on market replay data to database. Then we apply filters, just above the database with out of sample testing (oost). So the analyzer provides us results of oost with filters of minimal drawdown, weighted R2 (coefficient of determinance, but weighted to put more weight on recent trades), profit index (compares profit before and after oost dateline) and other by us defined indicators. Then we compare the baseline (all trades in database) and the best results of oost with different filter settings and the selected we put to trade auto with NT7 on our Chicago based server.

The issue is

we have come that far, that we can play with out of sample testing as we like and right now we started testing what we call "interlaced out of sample testing". That means we take every second week of trades, we put them together in a "learning" set, we do optimization above it and then we apply it to the remaining ("every first") weeks and display it in a chart. Then we can see this 50 % out of sample test, which gives us (or should give us) a profit curve with statistical characteristics of standard oost like standard 70/30 method. We can choose starting day and ending day while the very last week of oost is always in the learning set.

The question is

What do you think about this approach, does it even make senseto you? We are playing with futures live trading for about a year with this automated script and we think here is huge potential in it. The iOOST (interlaced oost) of 50 % (the sets can be created in any way we want as we programm it) is for example for CL contract about 10 % better in profitability, the profit curve is a smoother line than of the standard 70/30 out of sample test with like 20-30 % smaller drawdowns. We can put the lines in charts and do whatever we want. We did check that we achieve in reality the same results as on market replay data (or with filters) with average slippage of 0,75 ticks on TF or ES contracts. That is ok, as average trade is above 5 ticks, on some contracts even greater than 10.
Later, when we have say 2 years of market replay data we would like to proceed with automatically rolling out of sample testing so the system could based on many indicators set itself up, so the system would automatically pick the best settings itself. Now we have NT7's MRD since say March 2011 and I think this is still too short timeframe to start playing with that idea.

This idea of iOOST is still in testing and we we do not run it live yet, we ned to program a lot of test to find out if it is programmed well. I was just curious what more experienced traders in this forum may think about this idea. Any ideas, experience, discussion on this topic....is highly appreciated.

Thank you for your time
K.

September 13th, 2012, 07:11 AM

I'm not really sure if i understand your oost, but i think you include future, unknown price date into the "out of sample" first week test. You have to make sure, that you only apply "learning" data from prior the "out of sample" test data. Anything else is curve fitting. A sliding 70/30 approach complies to that. Perhaps you can draw some graphical representation of your oost model.

2 related excellent books:
Trading Systems - Chapter 6 - Periodic re-optimisation and walk forward analysis
The Evaluation and Optimization of Trading Startegies - Chapter 10/11 - Optimization/Walk-Forward Analysis

Koepisch

September 13th, 2012, 07:46 AM

It is still in development and it does not chart the in and out of sample data in different colours. I will try to explain it better:
image we have 5 weeks of say 50 trades, no filters, in database, then we apply filters and about 30 % of trades is selected. The filters apply only in compliance of oost settings. so we can say that in the final set of different settings where each could be put on live market have some results and we are selecting just one, as it is "one" baseline trading set (say 30 minute data timeframe).
To make oost from these 5 weeks, I can say 3 weeks of learning, two weeks of out of sample set. with filters. then we apply this second level of filters that only shows us the good results. It is the opposite way of just making random settings test on SMA Crossover. Example:

You have SMA Crosover sestem which has fast and slow period fixed, this makes baseline trades written in database. Then you apply filter, that cuts out only trades that were executed only when certain period EMA allowed that trade to happen on say 60 minute timeframe. So we would have one baseline set of trades and then a lot of filtered of various trades sets based on different period of that EMA. If we have 1,2,3,..10 period EMAs, then we would have 10 different sets of trades that would happen with that SMA crossover settings on 30 mintue data only if that certain period of EMA allowed it to happen.

This we do not consider to be curve fitting as we do not know the results in advance, we jsut make a lot of versions of one system and we have lot of results to select from based on our criteria.

So in fact we generate a lot of trading systems and their results.

On these sets we run oost, standard 70/30, some are good, some are not as the EMA period for example did not help at all and with some it helped a lot.

This interlaced OOST does make the "70" learning set from trades that took place every second week of that baseline. In other words we select the best period EMA based on every second week's trades, then we run this settings on every first week to see, how it worked.

We think that for bad what I call here in this example "EMA period filter" we would receive bad results like 70 is growing, the remaining 30 is falling result, so the oost proves the system settings is wrong. But with other filter we see that it helped to go sideways during market plunges and grows with market, so it makes mroe money than baseline and has lower drawdown.

We then select the best result and run it live. This is reverse approach immediately giving us hundreds of results than to try optimize variables of SMAcrossover with this 60 minute filter and then running one oost on this one single settings. We do it in large groups thus saving a lot of time finding good settings for a system.

With iOOST we want to see the profit curve where just every second week was in learning set growing and be smooth as if the filter settings did not work, we should see like teeth in that profit curve asthe oost on every first second weeks that were not in learning set simply did not work.

It is not easy to explain it without an image (here it would really be worth a thousand words), but it is faster to write about it two pages than to program it to chart it right away as it is hard work and takes some time with expected delivery (to know we have programmed it well and can put it on real money) about the end of next week.

So, what do you think, did this help little bit?

September 13th, 2012, 08:06 AM

...and thank you for the books, we willcheck them out as there is always a lot ot learn....

with this approach we are little bit worried about what we call "second level" overfitting, as we aredoing a lot of optimizations on learning data, then we have a lot of out of sample tests and we select the best one.

This second level is something most traders do not get to experience, as they optimize script, then run ONE out of sample test and if that works, they put it live. With our analysis software we are able to run many oost quickly. so then we select the settingsthat gives us the best not only the learning part, but also the oost part, either the 30 % (orwhatever % we set) or the interlaced week on week test.

Hope now it is even clearer what we are doing here. Any opinions?

September 13th, 2012, 08:07 AM

I'm not sure. You wrote: "In other words we select the best period EMA based on every second week's trades, then we run this settings on every first week to see, how it worked". In live trading you haven't the data from next week. Keep in mind that you don't searching for ONE best setting for the WHOLE backtest intervall. You have to search for a setting which applied to single intervalls give the highest rewards with minimum deviation. It's all about probabilities.

Results in Rs per (for instance) month:
Bad : 2, -5, -5, -20, 50, -10, -10, 5 Overall: 7
Good: -1, 2, 2, 1, -2, 2, 2, 1 Overall: 7

September 13th, 2012, 08:20 AM

we had the trades for the next week, but thosetrades were not contained in the learning set, we have them for all weeks for baseline, like the SMAcrossover with fixed both periods, this provides us one result, but then we know with which EMA period each trade would happen and we select such settings, that provides good oost 70/30 (or week on week off interlacedOOST). Since we have all the trading sets for each EMA period, we have a lot of out of sample tests and we just select the best one.

September 13th, 2012, 08:25 AM

Koepisch

I'm not sure. You wrote: "In other words we select the best period EMA based on every second week's trades, then we run this settings on every first week to see, how it worked". In live trading you haven't the data from next week. Keep in mind that you don't searching for ONE best setting for the WHOLE backtest intervall. You have to search for a setting which applied to single intervalls give the highest rewards with minimum deviation. It's all about probabilities.

Results in Rs per (for instance) month:
Bad : 2, -5, -5, -20, 50, -10, -10, 5 Overall: 7
Good: -1, 2, 2, 1, -2, 2, 2, 1 Overall: 7

I do not understand this. If you meant, that we take the whole interval, of trades and optimize it, NO we do not do that, we take only 70% of the trades for oost, then we do the oost.

It is like we do learning on trades between january and june and then run oost on trades that happened july and august.

we never use future information for the 70/30 oost but we do use

every second future week and test the weeks in between, so we have oost even in the early stages of traade dataset and also have learning the very last week. this settins would be run live, then after a week we would do another optimization, again with learning last week, but every second week of the set would be oost week, not learning.

I know it is strange approach, but why it should not make sense. if the system should not be working on oost period, then those weeks that were notcontained in the learning set of weeks should generate loss, right?

September 13th, 2012, 08:51 AM

I meant that you don't should use oost to GET your settings, you use oost to PROVE your settings. The results are the strategy result metrics for the OOS periods. They should be positive and occur with small deviation. I don't had any success with optimizing my settings on a week per week (or month per month) approach.

September 13th, 2012, 09:03 AM

yes, we are proving settings with oost, the thing is just we do many oost in a short time and then seelect the system with best settings, best learning+best oost

it is interlaced, not rolling optimization where the system settings would change month from month, here we insert the 30 or 50 % in between the learning set, like in the beginnning, middle and end to see if it worked in all time periods...

Interlaced Out of sample testing

Discussion in NinjaTrader

Interlaced Out of sample testing