I've created an extremely large-scale testing environment, split into equal-sized data sets representing 'in sample' and 'out of sample' data. I've input millions of rows of data from hundreds of thousands of completely random backtests, using a collection of 23,000 random indicators/signals/variables, and broken down into instrument/HourOfDay combinations. Each trade enters the market at a random period within a specific hour-of-day, and stays in the market for a random period of time, ranging between approximately 30minutes and 3 hours.
The goal of this is to test, on a very 'macro' level, just how 'telling' any statistical output from a backtest is, such as 'profit factor', or 'sharpe ratio', or 'win%', 'ratio W/L', etc. . and more importantly, to test various combinations of these, such as "Profit Factor * Sharpe Ratio", to determine the predictivity of a 'formula'. The goal is to have the database 'ask' and 'answer' the question:
"When an optimization produces a (statistical variable here) that is higher than the average for this instrument/hourofday combination, how much higher or lower than average is my out-of-sample profit/loss likely to be?"
So, for example:
"When a backtest produces a Sharpe Ratio that is X amount higher than the average Sharpe Ratio for this specific instrument/hour combination, how much higher or lower than average is my out-of-sample profit/loss likely to be?"
Once again the entry/exit logic is generated *completely* at random, so this is solely a test of just how big of a boogeyman curve-fitting is on the broadest level, in the realm of backtesting and optimization, specifically for intraday trades ranging from 30mins to 3 hours.
As many of you experienced traders might expect, the predictive values of any of these single variables in a standalone sense is nearly zero, it is very small indeed. . since these are are completely random sets of entry/exit conditions, and every great backtest result in the 'in-sample' is likely to fall to pieces in the out-of-sample. . the goal here is more to prove their value relative to one another, and more importantly, determine their value when used in concert/combination with one another, and still more importantly, used in concert/combination for specific instruments over specific periods of the day.
On this note, if any of you are interested, feel free to post little calculations you'd like to test, and very soon I'll post a variety of results, of not only each standalone variable for each instrument globally, but also of your own little calculations and statistical blends. I may even post results broken down by each Instrument/hour-of-day combination as well, if there is much interest.
Here are the statistical variables we have to work with:
Net Profit
Gross Profit
Gross Loss
Profit Factor
Max Drawdown (as percent)
Sharpe Ratio
Number Of Trades
Time In Market
% Profitable
Ratio Avg Win/Avg Loss
Avg MAE
Avg MFE
Avg ETD
We also have the following values normalized, relative to the instrument/hour-of-day combination:
ProfitFactor
SharpeRatio
% Profitable
Ratio Avg Win/Avg Loss
If you'd like to use these in your calc, refer to them like 'NormalizedPF' or 'NormalizedSharpeRatio', etc.
Some examples of calculations might be (keep in mind I use ninjatraders 'labels' when I refer to these statistical data points):
((Profit Factor - 1) * Sharpe Ratio) * Number Of Trades
or
(NormalizedProfitFactor * SharpeRatio) * NumberOfTrades
or
((% Profitable * Ratio W/L) / MaxDrawdown) * Number Of Trades
or
NetProfit * (MFE / MAE)
etc. . anything you can dream up, using those variables
The significance in this testing lies in the fact that *massive* data pools are being used, thus the resulting 'predictivity scores' are likely to be have some real validity. . especially those that score well over multiple instruments/markets, in both directions, …