My thought is essentially we're dealing with the algorithms. Ratios and analysis that usually work for equities might not be relevant here. Say, you day trade futures, set you daily target, and turn off your platform after 3 p.m. Would you calculate a standard deviation for that? Diversification also does not work quite well with these systems as your results for Aug show. Few things we know for sure are: 1) this service is in business for many years and they have real customers, so it is possible they add some value to trading; 2) some systems do work well as you showed us. The rest is at our own risk, as they say. I'm not very technical and may be too theoretical in this post and if so I apologize. Please keep posting your observations, the topic seems very popular..

The following user says Thank You to tradeday for this post:

I'm curious to learn more about portfolio analysis. I haven't used that to any extent. However, I will provide a few questions/suggestions:

1. It looks like the portfolio analysis is not working on these systems. Do you agree ?

2. You might try capturing the entries/exits of your systems and increasing the holding time by either adding time to the exit. This will force the system to be in the market more. You might run your correlation analysis on these modified systems. I'm thinking there might be more chance for overlap and thus yield higher correlation levels. Try doubling the holding time.

3. You might force a higher level of anti-correlation in picking systems that only trade uncorrelated markets.

4. The problem with max dd is the DD almost always increases with a longer track record: solution decrease leverage and accept potential for lower return.

Obviously in hindsight, it looks like it might have been a better idea to run either the entire portfolio through an incubation period before going live, or as an alternative, run each system through its own incubation period before taking it live. Regardless of how good the historical analysis results tend to look, murphy's law always seems to rear its ugly head when real money is on the line.

I have a big update coming with full September PnL and some (potentially alarming) results from some additional analysis I have done now that I can scrape the isystems website. Just need to get the analysis finished in a presentable format.

The mid-month October update, is that I'm up a couple of hundred dollars, which includes over $1000 in license fees and commissions! I mention this since as of close Friday fees represent exactly half of my approximately $6.5k in losses!

The following 5 users say Thank You to SMCJB for this post:

As I have already mentioned I have created a function in R that when given a system number it will scrape the isystems website and return the monthly PnL table. I then created a second function (that called the first) that when given a system number, a start date and an end date, it would return the average monthly PnL of that system between those dates. This was then replaced by a third function that once passed the PnL data table from the first function and the two dates, instead of performing the average calculation it returned an array with all the monthly PnLs between those dates. There were two main advantages of this. First to perform multiple analysis of the same system I'm only scraping the website once. Second I can now easily calculate more statics than the mean. For example I can calculate the sharpe ratio of any date range by just dividing the mean of the array by it's standard deviation. It was now relatively simple to calculate the Tracked/Backtest Ratio for any system given just its system number, start date and since tracked date.

All I needed then was a complete list of all the system numbers. To get this I went to the isystems website, pulled up a table with every system in it, inspected the element of the table to bring up the underlying code. Copy that code into word, used search & replace to find system name and number HTML tags, and insert line breaks. Then copy that into excel and with very little manipulation create a lookup table of name and number. Last thing was to create a CSV file with system name, system number, contract, type, and some dates which was saved and then loaded into R.

... and a Potentially flawed Hypothesis

The underlying theory of my analysis is that by using the since tracked data we can perform an out of sample incubation. Then by selecting the best systems that performed similarly in incubation to their backtest we should be picking robust systems that are not back tested. We can then trade these systems with an expectation that they are not overfitted.

An obvious implication of this is that we are assuming that if systems perform similarly in the incubation to the backtest that they will continue to perform similarly. After my terrible second month I decided to do more testing, specifically I wanted to test the 'obvious implication' above. I did this by filtering the systems to only include those with at least 2 years of since tracked data. I then calculated how those systems did in their first year of since tracked data and then their second year. If my theory was correct I would expect the data to look something like a barbell but at a 45 degree angle, with the center of the barbell just below a ratio of 1.

In reality the data looks like this. (with varying levels of magnification/subsetting)

All 344 Systems with more than 2 years of tracked data

Eliminate outliers - 320 Systems

The expected sweet spot - 155 Systems

While the slope of the best fit line, does get slightly steeper as we focus on the expected sweet spot, the R2 is so low as to imply any relationship is meaningless. Unfortunately I think that this would imply that there is no obvious relationship between how well a system performs in its 2nd year given how it performed in its 1st year and as such, my idea of using since tracked to incubate and pick robust systems seems to be fataly flawed.

Why would this be the case? The framework I used to develop the few successful systems I have is one taught by @kevinkdog. It's a framework that assumes you develop systems in the same way, using a walk forward optimization, with the incubation as the last reality check. Following this methodology the incubation is normally a reasonable indicator of results. When systems fail the incubation, there's often a reason why - you performed one to many optimizations, with just a little but to much knowledge of the data, tainting or biasing your results. In this case we have systems developed by about 100 developers. We have no knowledge of how they developed the systems, and whether they are over fitted. Hence it could be completely random that they did well (or badly) in their first 12 months, and not an indication of robustness in any way. If we generated 350 random systems, would the results look any differently?

Next Steps ... are there any?

At this point I believe my initial hypothesis is flawed and that my analysis does not help identify robust systems. As such it seems silly to leave money invested in systems whose results appear to be random and I plan to deactivate them in the next few days unless I miraculously find something that does give me confidence. (As an aside, I have already plotted the monthly sharpe ratio of the first 12 months of since tracked data, versus the second 12 months. The results are as disappointing as the T/B ratio charts above.)

I do still believe that this data set could be very valuable. It contains almost a 1000 systems that in some cases have several years of real, non backtested, performance history. That is quite unique and has to be valuable - all I need to do is find out how! I have an idea on how I could use this data to create an even larger dataset, which combined with some calculated metrics, could be a potential machine learning project. I'm just not sure I have the time to follow that path at this point in time. Maybe I need to find a grad student with nothing to do!

The following 10 users say Thank You to SMCJB for this post:

I was going to post another message to reply to some of the comments and questions posted here but I'm not sure that its really worth my time or theirs in reading my responses. Instead I would like to thank @Mabi @mattz @Sazon @suit @tpredictor @tradeday and anybody else I have missed for your comments, ideas, suggestions and support. A couple of you in particular definitely gave me some things to think about.

In the coming days I hope to post some more results and thoughts (especially about costs), but after that I suspect any further posts will be a lot more sporadic in nature. Hopefully this thread isn't dead yet though. :-)

SMCJB

The following 6 users say Thank You to SMCJB for this post:

1. Have you look for correlation in "sign" of the returns versus the magnitude? The regression might be too steep a hurdle.
2. Have you tried to break down the systems into factor components or based on proxy systems? If you could view the systems in terms of factor components then this might explain the performance better. I'd suggest the factor components of: mean reversion, trend, volatility, perhaps more exist. A PCA might enable this. A simple method is to check for correlations against your proxy systems and then group them based on the correlations. You should be able to view year over year performance in relations to the factors.

I was hoping to see better correlation between these groups (it would have been nice to see that ones that did good in first 12 months tended to do better in second 12 months).

Based on your work, I agree that the process used to develop the strategies might be the key. With a consistent development process, it might be easier to find some predictive "markers" for future performance (as you have seen with your work outside of this study).

I'm going to keep thinking about this, because you've done a great job collecting and analyzing this data, and maybe there needs to be another way to think about how to frame this problem.

The following 3 users say Thank You to kevinkdog for this post:

I suspect the changes you are seeing might be due to changing factors in the market. You need to group the strategies by the factors that are predictive with their performance and then see if those factors have changed. If the factors haven't changed significantly but strategy performance has then it indicates the strategy is failing to adapt. If the factors have changed then it would suggest possibility for strategy to come back to life when/if the factors return. First thought, create proxy strategies/factors for testing this such as intraday range, 1 day range, 5 day trend, 5 day reversion, etc.

A day trading strategies profits will probably correlate well with the magnitude of 1 day range range. Once you see the performance of the strategies in relation to its factors, you should understand the cause of the under performance. If you build this into a classifier then you can turn the strategy on or off based on the factors.

I would add, once you can identify the underlying factors, then you can start to turn on and off the strategies with more intelligence. For example, mean reversion is going to be more prominent during uncertain times. Momentum works best when their is a positive catalyst. Markets tend to trend when no one is watching. You can also use the factor analysis for your diversification. You can force the strategies to be always "in market" to generate your correlations.