Back to the OP's question, one thing to think about is how to define if a strategy is working or not. It needs to be a binary answer (yes or no) that has a bit more behind it than whether you are making money or losing money.
If you have fully back- and forward-tested an idea, you should have a few datapoints that you can use to evaluate the strategy when it is live. This is an area I'm still working on, but most of the criterea I'm using are based on deviations in P&L from model vs. actual, and deviations in the volatility of returns.
Those deviations need to be monitored both on the positive side and the negative side. If the system is doing significantly better (or worse) than the model, it would indicate I've missed something and need to be very careful. While it is a lot easier to pull the plug on a system that is losing money than one that is making money, if you are deviating more than n%/period from your model, your model is wrong and you need to find out why.
The challenge (for me at least) is to work out parameters that indicate All Clear, Caution, and Stop and apply them objectively. Personally I've found it helpful to accept that all systems will fail at some point, and part of the design process therefore needs to include ways to recognize and mitigate that failure.
As I said earlier, a backtest can provide useful information if paired with forward/sim test data.
If you're able to backtest and get a promising result, with respect to any parameter that interests you, say win rate. (assume a win rate of 67%) over a backtest period of 18 months.
If you then forward test/sim for 3 months and you observe a simulated win rate of 63%, you then conduct a backtest over the same period you just sim'ed and you get a return rate of 65%, you can then start to draw some linear interpretations of the relationship between your backtest and forward test.
It's indicating that your forward test has a variance of about -4% with respect to long term backtest and -2% with respect to in kind back/forward tests.
You can do the same thing for drawdown, profit ratio, etc, etc.
However, as I stated earlier, there's always a "fudge" factor that must be employed do to the inherent limitations for simulation and backtesting.
This "fudge" or safety factor will depend on your own tolerance taste and your observed data for live trading.
If say for instance, you were really thorough (like some people) and you paid to have 2 different data streams and 2 different accounts, that way you could live trade and sim trade at the same time, then you could get all three, back, forward and live, and compare all of them to develop your own factor to normalize or compare the various methods for results.
I think I am missing the terminology here, or perhaps you're just not using a well-defined terminology, because I think there are 2 seperate issues here:
1) the performance of your system in live or sim-account real-time vs the performance in backtest against data collected over that same period
2) the performance of your back test over one period against which you build and possibly optimise your system vs the performance forward tested over another period vs the performance traded real-time over yet another period.
Number (1) is something you can control by throwing resources at it to get the best data that is closest to the real-time data that streamed to the system during live trading
Number (2) is something that is in the lap of the gods, and the only control you have is to make sure you build a robust system that is not curve fit, e.g. a long term trend following system such as the Turtles used.
What terms would you use to refer to (1) and (2) ?
You can discover what your enemy fears most by observing the means he uses to frighten you.