Statistical significance

August 28th, 2014, 08:15 AM

Suppose you have a database of trades A.

Suppose you want to use a simple ‘filter ‘ to improve statistics. Lets say this is a simple filter which in your opinion is valid, eg don’t take trades after time y. Reason is that after time y price has not enough time/opportunity to travel to your desired target( you don’t go overnight, and there is no exceptional volatility). Another example would be when you want to analyze if your stoploss and profit target can be made dependent on current volatility. Offcourse, the reason could be anything, as long as the trader can suspect there lies some causality or logical thinking behind the rationale to use the filter.

We don’t want to use too many filters, since more filters means more chance of curve fitting.
What number of trades B in dataset A is sufficiënt to be statistically significant? And if this depends on the number in database A, how would you think/know this relation yields?

At this moment I use a rule of thumb that is around 30 ( law of diminishing returns from statistics) observations. So there need to be a minimum of around 30 (=B) observations to let the filter be of any significant value. I also look at scatterplots/histograms of the data and use common sense ( asuming I have some) to ascertain if the difference is statistically significant. But I wonder if others use a more methodical approach to this?

August 28th, 2014, 08:21 AM

A rule of thumb of 30 observations helps with the Central Limit Theorem issues, but it doesn't prove statistical significance.

I would suggest start here -> https://www.csulb.edu/~msaintg/ppa696/696stsig.htm

From that page, you need to answer these two questions:

Quoting

1) what is the probability that the relationship exists;
2) if it does, how strong is the relationship

To answer those questions, you'll need to determine how best to study the relationship between the number of trades "B" and the number of trades in your database "A".

August 28th, 2014, 08:30 AM

ericbrown

I would suggest start here -> https://www.csulb.edu/~msaintg/ppa696/696stsig.htm

.

Interesting link, thanks! I will certainly read this.

August 28th, 2014, 08:31 AM

Zwaen

Interesting link, thanks! I will certainly read this.

Welcome.

There's a lot on that page to take in. I use quite a bit of those analysis techniques in my own work. I'm not that great at stats but I know some of the basics so feel free to ask questions.

September 3rd, 2014, 03:20 PM

ericbrown

Welcome.

There's a lot on that page to take in. I use quite a bit of those analysis techniques in my own work. I'm not that great at stats but I know some of the basics so feel free to ask questions.

Hi Eric,

Thanks! I did the following calculation, and wondered if my asumptions and calculations correct, or do I make some mistakes?

I will use simple numbers. Reality is offcourse not so simple, but to illustrate the idea, I either win or lose, given a fixed amount ( target is always t, loss is always l, but are not relevant for calculations)

Suppose you have a set of 200 trades ( set A) which have positive EV. You want to evaluate if filter B is relevant. Filter B contains 25 trades, and has negative EV. We want to know if the 25 trades are significant, so we compare the 2 distributions.

Set A:
200 trades
80 trades are closed at target.
120 trades are stopped out.
Pclose target = 80/200= 0.40
Pstopped out/close trade=120/200=0.60

Set B( filter):
25 trades
5 trades are closed at target
20 trades are stopped out.
Pclose target = 5/25= 0.20
Pstopped out/close trade=20/25=0.80

For set B having the same statistics as set A, distribution B would be:
Number of trades closed at target = 0.40*25=10
Number of trades stopped out/close trade = 0.60*25=15

Then
(5-10)^2/5 = 5.0
(20-15)^2/20 =1.25
Sum= 6.25

Degrees of freedom =(2-1)*(2-1)=1

Then I see in table https://sites.stat.psu.edu/~mga/401/tables/Chi-square-table.pdf
that 6.25 lies between 2.5 and 1%, so I can say with a chance of being wright of 97,5-99% that these distributions are significantly different?!

September 3rd, 2014, 03:38 PM

Zwaen

Hi Eric,

Thanks! I did the following calculation, and wondered if my asumptions and calculations correct, or do I make some mistakes?

I will use simple numbers. Reality is offcourse not so simple, but to illustrate the idea, I either win or lose, given a fixed amount ( target is always t, loss is always l, but are not relevant for calculations)

Suppose you have a set of 200 trades ( set A) which have positive EV. You want to evaluate if filter B is relevant. Filter B contains 25 trades, and has negative EV. We want to know if the 25 trades are significant, so we compare the 2 distributions.

Set A:
200 trades
80 trades are closed at target.
120 trades are stopped out.
Pclose target = 80/200= 0.40
Pstopped out/close trade=120/200=0.60

Set B( filter):
25 trades
5 trades are closed at target
20 trades are stopped out.
Pclose target = 5/25= 0.20
Pstopped out/close trade=20/25=0.80

For set B having the same statistics as set A, distribution B would be:
Number of trades closed at target = 0.40*25=10
Number of trades stopped out/close trade = 0.60*25=15

Then
(5-10)^2/5 = 5.0
(20-15)^2/20 =1.25
Sum= 6.25

Degrees of freedom =(2-1)*(2-1)=1

Then I see in table https://sites.stat.psu.edu/~mga/401/tables/Chi-square-table.pdf
that 6.25 lies between 2.5 and 1%, so I can say with a chance of being wright of 97,5-99% that these distributions are significantly different?!

Not having done the calculations myself with your data, I can't say for certainty that this is correct...but a quick glance I can't see anything wrong.

Regarding interpretation, you are testing that the distributions are different. Your null hypothesis is that they are the same or similar.

With your data, the Chi-Square of 6.25 is greater than the p=.025 for df=1, therefore you can reject the null hypothesis (with a 2.5% probability of error) that the distributions are the same. You can't really say they are significantly different...you can only say that the null hypothesis is rejected, which in your case is what you want to see.

Statistical significance

Discussion in Traders Hideout

Statistical significance