NexusFi: Find Your Edge


Home Menu

 





Why using the term "curve-fitting" is wrong


Discussion in Traders Hideout

Updated
    1. trending_up 2,840 views
    2. thumb_up 10 thanks given
    3. group 2 followers
    1. forum 2 posts
    2. attach_file 0 attachments




 
Search this Thread

Why using the term "curve-fitting" is wrong

  #1 (permalink)
 Outlier 
Germany
 
Experience: Advanced
Platform: TradeStation
Trading: Futures
Posts: 88 since May 2012
Thanks Given: 53
Thanks Received: 93

After having seen the term being misused and abused hundreds of times, and having been guilty of it myself just until a few years ago, I had to write this.

Curve-fitting is a term from non-linear regression analysis and means constructing a curve, such as a high order polynomial, that best fits a series of data points. It is commonly used as an aid for visualization. Think of a curve that cuts right through your data. Curve-fitting alone is neither good nor bad in the sense that it makes no claims at all about any extrapolation or generalization performance.

Over-fitting, the correct term from statistics and machine learning, means that a model generalizes poorly. On the training set (in-sample), the model has good performance. But on the validation set and test set (out-of-sample), the model has bad performance. Over-fitting, also called high variance, occurs when a model has too many degrees of freedom, or capacity, that during training is fitted to random noise (or sampling error), rather than the underlying structure.

The opposite is under-fitting, also called high bias. An under-fitted model has very similar performance on the training and validation set. But each time, the performance is poor because the capacity of the model is too small to capture enough of the underlying structure.

Desirable is a good fit, a solution to the bias-variance tradeoff.

Over-fitting is countered by the following:
- increasing the number of trades, by raising their frequency and/or using more data, which naturally reduces the sampling error (best approach)
- reduction of the model capacity
- advanced techniques like regularization, early stopping, pruning

Under-fitting is countered by increasing the model capacity.

More at https://en.wikipedia.org/wiki/Curve_fitting
https://en.wikipedia.org/wiki/Overfitting

Successful systematic traders know about this.

"I can talk a little more about over-fitting, if not my personal proprietary techniques. First of all I like the [term] over-fitting rather than curve-fitting because curve-fitting is a term from non-linear regression analysis. It is where you have a lot of data and you are fitting the data points to some curve. Well, you are not doing that with futures. Technically there is no curve-fitting here; the term does not apply. But what you can do is you can over-fit. The reason I like the term over-fit rather than curve-fit is that over-fit shows that you also can under-fit. The people who do not optimize are under-fitting." -- William Eckhardt

William Eckhardt: The man who launched 1,000 systems

Related to these issues is data-mining bias. When a large number of systems is evaluated during training or even validation, the best systems may meet your criteria just by chance. The more systems are tested, the higher the random variation in results.

It can be countered by:
- evaluating a selected group of systems on a 2nd validation set, taking into account the whole distribution of performance
- evaluating the final choice of system(s) on the test set that is only used once, to get an unbiased estimate of performance

None of this, however, can protect against regime changes. The markets could change enough to invalidate any statistical and structural edge. The whole training-validation-test approach of data splitting works under the assumption that all segments are drawn from the same distribution. Unfortunately, that distribution may change substantially in the future. For example, the advent of HFT was a serious regime change for discretionary stock scalpers. Fortunately, regime changes happen gradually over time, giving the trader time to adapt. It may also help to be slightly on the high-bias side.

Started this thread Reply With Quote

Can you help answer these questions
from other members on NexusFi?
Better Renko Gaps
The Elite Circle
ZombieSqueeze
Platforms and Indicators
NT7 Indicator Script Troubleshooting - Camarilla Pivots
NinjaTrader
Trade idea based off three indicators.
Traders Hideout
About a successful futures trader who didnĀ“t know anyth …
Psychology and Money Management
 
  #3 (permalink)
 
Fluid Fox's Avatar
 Fluid Fox 
Bangor, Maine
Legendary Retail Failure
 
Experience: Intermediate
Platform: NinjaTrader 8
Trading: MNQ
Posts: 677 since Sep 2018
Thanks Given: 2,968
Thanks Received: 2,711



Outlier View Post
After having seen the term being misused and abused hundreds of times, and having been guilty of it myself just until a few years ago, I had to write this.

Curve-fitting is a term from non-linear regression analysis and means constructing a curve, such as a high order polynomial, that best fits a series of data points. It is commonly used as an aid for visualization. Think of a curve that cuts right through your data. Curve-fitting alone is neither good nor bad in the sense that it makes no claims at all about any extrapolation or generalization performance.

Over-fitting, the correct term from statistics and machine learning, means that a model generalizes poorly. On the training set (in-sample), the model has good performance. But on the validation set and test set (out-of-sample), the model has bad performance. Over-fitting, also called high variance, occurs when a model has too many degrees of freedom, or capacity, that during training is fitted to random noise (or sampling error), rather than the underlying structure.

The opposite is under-fitting, also called high bias. An under-fitted model has very similar performance on the training and validation set. But each time, the performance is poor because the capacity of the model is too small to capture enough of the underlying structure.

Desirable is a good fit, a solution to the bias-variance tradeoff.

Over-fitting is countered by the following:
- increasing the number of trades, by raising their frequency and/or using more data, which naturally reduces the sampling error (best approach)
- reduction of the model capacity
- advanced techniques like regularization, early stopping, pruning

Under-fitting is countered by increasing the model capacity.

More at https://en.wikipedia.org/wiki/Curve_fitting
https://en.wikipedia.org/wiki/Overfitting

Successful systematic traders know about this.

"I can talk a little more about over-fitting, if not my personal proprietary techniques. First of all I like the [term] over-fitting rather than curve-fitting because curve-fitting is a term from non-linear regression analysis. It is where you have a lot of data and you are fitting the data points to some curve. Well, you are not doing that with futures. Technically there is no curve-fitting here; the term does not apply. But what you can do is you can over-fit. The reason I like the term over-fit rather than curve-fit is that over-fit shows that you also can under-fit. The people who do not optimize are under-fitting." -- William Eckhardt

William Eckhardt: The man who launched 1,000 systems

Related to these issues is data-mining bias. When a large number of systems is evaluated during training or even validation, the best systems may meet your criteria just by chance. The more systems are tested, the higher the random variation in results.

It can be countered by:
- evaluating a selected group of systems on a 2nd validation set, taking into account the whole distribution of performance
- evaluating the final choice of system(s) on the test set that is only used once, to get an unbiased estimate of performance

None of this, however, can protect against regime changes. The markets could change enough to invalidate any statistical and structural edge. The whole training-validation-test approach of data splitting works under the assumption that all segments are drawn from the same distribution. Unfortunately, that distribution may change substantially in the future. For example, the advent of HFT was a serious regime change for discretionary stock scalpers. Fortunately, regime changes happen gradually over time, giving the trader time to adapt. It may also help to be slightly on the high-bias side.

I'm glad someone realized and said this (years ago), because I was very confused.

Just giving it a bump / attention.

Visit my NexusFi Trade Journal Reply With Quote
Thanked by:




Last Updated on January 21, 2020


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts