Help wanted on on statistics / testing approach for prediction - Index Futures Trading | futures.io

futures.io

Help wanted on on statistics / testing approach for prediction
 Started: June 29th, 2014 (06:02 PM) by aquarian1 Views / Replies: 1,087 / 16 Last Reply: July 5th, 2014 (02:01 PM) Attachments: 4

 Welcome to futures.io.

4

# Help wanted on on statistics / testing approach for prediction

 June 29th, 2014, 06:02 PM #2 (permalink) Quick Summary Quick Summary Post Quick Summary is created and edited by users like you... Add FAQ's, Links and other Relevant Information by clicking the edit button in the lower right hand corner of this message.

July 2nd, 2014, 12:02 AM   #3 (permalink)
Market Wizard
Houston TX

Favorite Futures: Energy

Posts: 1,515 since Dec 2013
Forum Reputation: Legendary

 This post has been selected as an answer to the original posters question

Interesting question. You may want to message @NJAMC as I know he has applied some machine learning to markets problems that probably/may look a lot like this.

Just curious why did you exclude "B" as a 4th high prediction condition?

Also in your table "all three D^J^G" is 112 but "All 3 pairs" [which I assume is (D^J)^(D^G)^(J^G)] is only 83. When you have D^J^G don't you always have all 3 pairs as well hence the number of occurrences should be the same?

 The following user says Thank You to SMCJB for this post:

 July 2nd, 2014, 01:49 AM #4 (permalink) Elite Member Manchester, NH   Futures Experience: Beginner Platform: thinkorswim Broker/Data: TD Ameritrade Favorite Futures: Stocks   Posts: 835 since Jul 2012 Thanks: 579 given, 1,627 received No point finding the highest probability without knowing what is the nature of M. I'd rather not have 99% chance of winning a \$500M lottery if it also comes with 1% remaining chance of being kidnapped, tortured and incrementally dismembered for 30 days. But if you want to find the combination with the highest joint probability, this is trivial. Let Xi denote each observable condition for any nonnegative integer i. We have: P(M & Xj & Xk) = P(M & Xj) * P(M & Xj | Xk) The improvement of M & Xj & Xk over M & Xj is really simply: P(M & Xj & Xk) - P (M & Xj) = P(M & Xj) * P(M & Xj | Xk) - P(M & Xj) = P(M & Xj) * ( P(M & Xj | Xk) - 1) Axiomatically 0 <= P(S) <= 1 for any set S, so we can conclude P(M & Xj & Xk) - P (M & Xj) <= 0. In other words, introducing any additional condition will never yield a higher joint probability. Hence, you will find that the combination of conditions that yields the highest joint probability is D alone. It's theoretically impossible to perform better than using D alone.
 The following user says Thank You to artemiso for this post:

July 3rd, 2014, 11:45 AM   #5 (permalink)
Elite Member
Atkinson, NH USA

Futures Experience: Intermediate
Favorite Futures: Futures, CL

Posts: 1,864 since Dec 2010

 This post has been selected as an answer to the original posters question

 Futures Edge on FIO Tweets by futuresio

artemiso
 No point finding the highest probability without knowing what is the nature of M. I'd rather not have 99% chance of winning a \$500M lottery if it also comes with 1% remaining chance of being kidnapped, tortured and incrementally dismembered for 30 days. But if you want to find the combination with the highest joint probability, this is trivial. Let Xi denote each observable condition for any nonnegative integer i. We have: P(M & Xj & Xk) = P(M & Xj) * P(M & Xj | Xk) The improvement of M & Xj & Xk over M & Xj is really simply: P(M & Xj & Xk) - P (M & Xj) = P(M & Xj) * P(M & Xj | Xk) - P(M & Xj) = P(M & Xj) * ( P(M & Xj | Xk) - 1) Axiomatically 0 <= P(S) <= 1 for any set S, so we can conclude P(M & Xj & Xk) - P (M & Xj) <= 0. In other words, introducing any additional condition will never yield a higher joint probability. Hence, you will find that the combination of conditions that yields the highest joint probability is D alone. It's theoretically impossible to perform better than using D alone.

@artemiso,

Great summary and this looks correct for assuming everything is independent. Given the data used to derive the probabilities is based upon a non-linear non-stationary data set, it might not be fair to say all observations are independent (depends upon how you derive your features), as they may be dependent upon the prior days trend. Your proposal is a great way to start as I would hope most of the data is mostly independent, this is a valid hypothesis and worth trying. I just want to put the caution out there that the results might not run forward in time so validate the solution forward and backward if you can.

There is a great package, RapidMiner which is open-source. Loading the data into that tool may help you look at it different ways to help understand if your hypothesis is correct.

 Nil per os -NJAMC [Generic Programmer] LOM WIKI: NT-Local-Order-Manager-LOM-Guide Artificial Bee Colony Optimization
 The following 2 users say Thank You to NJAMC for this post:

July 3rd, 2014, 06:30 PM   #6 (permalink)
The fun is in the numbers
Point Roberts, WA, USA

Platform: IB and free NT
Broker/Data: IB
Favorite Futures: ES

Posts: 2,132 since Dec 2010

NJAMC
 @artemiso, Great summary and this looks correct for assuming everything is independent. Given the data used to derive the probabilities is based upon a non-linear non-stationary data set, it might not be fair to say all observations are independent (depends upon how you derive your features), as they may be dependent upon the prior days trend. Your proposal is a great way to start as I would hope most of the data is mostly independent, this is a valid hypothesis and worth trying. I just want to put the caution out there that the results might not run forward in time so validate the solution forward and backward if you can. There is a great package, RapidMiner which is open-source. Loading the data into that tool may help you look at it different ways to help understand if your hypothesis is correct.

Thank-you.

The conditions are not assumed to be independent but each letter is exclusive.

July 3rd, 2014, 06:45 PM   #7 (permalink)
The fun is in the numbers
Point Roberts, WA, USA

Platform: IB and free NT
Broker/Data: IB
Favorite Futures: ES

Posts: 2,132 since Dec 2010

SMCJB
 Interesting question. You may want to message @NJAMC as I know he has applied some machine learning to markets problems that probably/may look a lot like this. Just curious why did you exclude "B" as a 4th high prediction condition? Also in your table "all three D^J^G" is 112 but "All 3 pairs" [which I assume is (D^J)^(D^G)^(J^G)] is only 83. When you have D^J^G don't you always have all 3 pairs as well hence the number of occurrences should be the same?

Thank-you

RE: Just curious why did you exclude "B" as a 4th high prediction condition?

- You're correct I would test for it. I was only giving the above as an example of how far I had progressed. I thought it best to get input from others, first, before proceeding further. As my probability and stats are limited others might see a flaw in HOW I am approaching the problem.

------------
RE: Also in your table "all three D^J^G" is 112 but "All 3 pairs" [which I assume is (D^J)^(D^G)^(J^G)] is only 83. When you have D^J^G don't you always have all 3 pairs as well hence the number of occurrences should be the same?

- I think you may well be correct. I will have to go back and check over things. Thank you for highlighting this.
I'm in a bit of a brain fog at the moment (more than usual LOL!)

July 3rd, 2014, 07:07 PM   #8 (permalink)
The fun is in the numbers
Point Roberts, WA, USA

Platform: IB and free NT
Broker/Data: IB
Favorite Futures: ES

Posts: 2,132 since Dec 2010

artemiso
 No point finding the highest probability without knowing what is the nature of M. I'd rather not have 99% chance of winning a \$500M lottery if it also comes with 1% remaining chance of being kidnapped, tortured and incrementally dismembered for 30 days. But if you want to find the combination with the highest joint probability, this is trivial. Let Xi denote each observable condition for any nonnegative integer i. We have: P(M & Xj & Xk) = P(M & Xj) * P(M & Xj | Xk) The improvement of M & Xj & Xk over M & Xj is really simply: P(M & Xj & Xk) - P (M & Xj) = P(M & Xj) * P(M & Xj | Xk) - P(M & Xj) = P(M & Xj) * ( P(M & Xj | Xk) - 1) Axiomatically 0 <= P(S) <= 1 for any set S, so we can conclude P(M & Xj & Xk) - P (M & Xj) <= 0. In other words, introducing any additional condition will never yield a higher joint probability. Hence, you will find that the combination of conditions that yields the highest joint probability is D alone. It's theoretically impossible to perform better than using D alone.

Thank-you.

No I am not just looking for the set with the highest probability
"Goal
To establish a systematic way of investigating the possible condition sets to the highest probability combinations for a rule set to predict my outcome, which I can then test my searching my database."

There will be some days when D does not occur on those days I could have J^G and knowing what their odd of that giving M are would be useful.

Additionally if D^G is lower than D alone and I have a day with a D^G I would like to know that M has become less likely.

I thought that if one has multiple rules forming a set that one could derive a better trading system.

Last edited by aquarian1; July 3rd, 2014 at 07:13 PM.

July 3rd, 2014, 07:14 PM   #9 (permalink)
The fun is in the numbers
Point Roberts, WA, USA

Platform: IB and free NT
Broker/Data: IB
Favorite Futures: ES

Posts: 2,132 since Dec 2010

Perhaps I need to go back to the drawing board.

July 3rd, 2014, 07:22 PM   #10 (permalink)
Elite Member
Manchester, NH

Futures Experience: Beginner
Platform: thinkorswim
Favorite Futures: Stocks

Posts: 835 since Jul 2012

NJAMC
 @artemiso, Great summary...

Thanks.

NJAMC
 ...and this looks correct for assuming everything is independent.

Actually, my solution is in the general form and true even for dependent variables. In the special case that {Xi, i is a positive integer} is a collection of pairwise independent variables, you can further decompose the conditional probability P(Xj | Xk) = P(Xj) and P(Xi & Xj) = P(Xi)*P(Xj).

I think what you're meaning to say is that it doesn't solve the problem if each of the random variables {Xi, i is a positive integer} is itself a member of some non-stationary stochastic process. I thank you for pointing out. Well, that's an issue with @aquarian1's methodology...

 The following user says Thank You to artemiso for this post:

 futures.io > Help wanted on on statistics / testing approach for prediction

Upcoming Webinars and Events (4:30PM ET unless noted)

Elite only

Elite only

Elite only

Elite only

Elite only

Dec 1

## NinjaTrader 8: Features and Enhancements

Dec 6

 Similar Threads Thread Thread Starter Forum Replies Last Post rodbuilder Traders Hideout 5 January 31st, 2014 05:39 PM aquarian1 Index Futures Trading 18 May 10th, 2013 08:50 PM Steele NinjaTrader 2 February 18th, 2013 10:40 AM Surly Psychology and Money Management 7 May 17th, 2011 05:36 PM Big Mike Traders Hideout 21 December 8th, 2009 05:55 AM

All times are GMT -4. The time now is 07:40 PM.

 Copyright © 2016 by futures.io. All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
no new posts

Page generated 2016-10-24 in 0.19 seconds with 38 queries on phoenix via your IP 54.166.11.173