Let's say I have software that will generate random rules to trade stocks. Together the rules make a strategy and I want to avoid curve fitting the strategy to the data. I have 10 years of stock data.
To try to avoid curve fitting I divide the data into in sample and out of sample. The software generates the strategies then tests them on the in sample data. If the in sample data test results in 60% wins, it then tests the out of sample data.
If the out of sample test is also 60% wins or more, the strategy is saved for further robustness testing.
Alternatively, the software can generate the strategies on the whole 10 years of data and simply reject any strategies that don't have 60% wins on the entire data.
The question is, do both the above approaches carry the same risk of ending up with curve fitted data?