NexusFi: Find Your Edge


Home Menu

 





Help wanted on on statistics / testing approach for prediction


Discussion in Emini and Emicro Index

Updated
      Top Posters
    1. looks_one aquarian1 with 9 posts (0 thanks)
    2. looks_two NJAMC with 4 posts (3 thanks)
    3. looks_3 artemiso with 3 posts (5 thanks)
    4. looks_4 SMCJB with 2 posts (1 thanks)
    1. trending_up 4,974 views
    2. thumb_up 9 thanks given
    3. group 8 followers
    1. forum 16 posts
    2. attach_file 3 attachments




 
Search this Thread

Help wanted on on statistics / testing approach for prediction

  #1 (permalink)
 
aquarian1's Avatar
 aquarian1 
Point Roberts, WA, USA
 
Experience: Advanced
Platform: IB and free NT
Broker: IB
Trading: ES
Posts: 4,034 since Dec 2010
Thanks Given: 1,509
Thanks Received: 2,593

I am looking for help on statistics / testing approach to search for rule sets for increased probabilities of certain outcomes.

Situation
I have a database of conditions for EOD results from 1 Feb 2012 forward to 27 June 2014. This equals 605 records.
Each condition has a letter associated with it and these go from A to U.
I want to establish a rule set which will give the highest predictive strength of a condition of the next day's, my desired predicted outcome. (e.g M)

Goal
To establish a systematic way of investigating the possible condition sets to the highest probability combinations for a rule set to predict my outcome, which I can then test my searching my database.

Here is where I am at:






My goal would be something like a set of rules such as:
1. If D^G and G^J and d^~J then M will happen 68% of the time.
2. If pair 1 or 2 and not pair N1b M will happen 50% of the time.

Perhaps Venn diagrams would be helpful in determining the best rule sets?

I am looking for ideas on an approach to find a solution just as much as a solution.

Thanks in advance.

-----------
Clarifying notes:
1. "^" symbol = the AND condition so D^G is "both D ^ G occur"
2. In the table of occurrences of individual conditions D occurred 223 of 390 records or 57.2% of the time M happened the next day. The percentage on the right 12.4% = 223 of 1796 and is just a relative strength %.
3. "~" symbol = NOT

Visit my NexusFi Trade Journal Started this thread Reply With Quote

Can you help answer these questions
from other members on NexusFi?
NexusFi Journal Challenge - April 2024
Feedback and Announcements
Exit Strategy
NinjaTrader
My NT8 Volume Profile Split by Asian/Euro/Open
NinjaTrader
The space time continuum and the dynamics of a financial …
Emini and Emicro Index
NT7 Indicator Script Troubleshooting - Camarilla Pivots
NinjaTrader
 
Best Threads (Most Thanked)
in the last 7 days on NexusFi
Get funded firms 2023/2024 - Any recommendations or word …
61 thanks
Funded Trader platforms
39 thanks
NexusFi site changelog and issues/problem reporting
26 thanks
Battlestations: Show us your trading desks!
26 thanks
The Program
18 thanks
  #3 (permalink)
 
SMCJB's Avatar
 SMCJB 
Houston TX
Legendary Market Wizard
 
Experience: Advanced
Platform: TT and Stellar
Broker: Advantage Futures
Trading: Primarily Energy but also a little Equities, Fixed Income, Metals and Crypto.
Frequency: Many times daily
Duration: Never
Posts: 5,041 since Dec 2013
Thanks Given: 4,375
Thanks Received: 10,192


Interesting question. You may want to message @NJAMC as I know he has applied some machine learning to markets problems that probably/may look a lot like this.

Just curious why did you exclude "B" as a 4th high prediction condition?

Also in your table "all three D^J^G" is 112 but "All 3 pairs" [which I assume is (D^J)^(D^G)^(J^G)] is only 83. When you have D^J^G don't you always have all 3 pairs as well hence the number of occurrences should be the same?

Reply With Quote
Thanked by:
  #4 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685

No point finding the highest probability without knowing what is the nature of M. I'd rather not have 99% chance of winning a $500M lottery if it also comes with 1% remaining chance of being kidnapped, tortured and incrementally dismembered for 30 days.

But if you want to find the combination with the highest joint probability, this is trivial. Let Xi denote each observable condition for any nonnegative integer i. We have:

P(M & Xj & Xk) = P(M & Xj) * P(M & Xj | Xk)
The improvement of M & Xj & Xk over M & Xj is really simply:

P(M & Xj & Xk) - P (M & Xj) = P(M & Xj) * P(M & Xj | Xk) - P(M & Xj) = P(M & Xj) * ( P(M & Xj | Xk) - 1)

Axiomatically 0 <= P(S) <= 1 for any set S, so we can conclude P(M & Xj & Xk) - P (M & Xj) <= 0.

In other words, introducing any additional condition will never yield a higher joint probability. Hence, you will find that the combination of conditions that yields the highest joint probability is D alone. It's theoretically impossible to perform better than using D alone.

Reply With Quote
Thanked by:
  #5 (permalink)
 
NJAMC's Avatar
 NJAMC 
Atkinson, NH USA
Market Wizard
 
Experience: Intermediate
Platform: NinjaTrader 8/TensorFlow
Broker: NinjaTrader Brokerage
Trading: Futures, CL, ES, ZB
Posts: 1,970 since Dec 2010
Thanks Given: 3,037
Thanks Received: 2,394


artemiso View Post
No point finding the highest probability without knowing what is the nature of M. I'd rather not have 99% chance of winning a $500M lottery if it also comes with 1% remaining chance of being kidnapped, tortured and incrementally dismembered for 30 days.

But if you want to find the combination with the highest joint probability, this is trivial. Let Xi denote each observable condition for any nonnegative integer i. We have:

P(M & Xj & Xk) = P(M & Xj) * P(M & Xj | Xk)
The improvement of M & Xj & Xk over M & Xj is really simply:

P(M & Xj & Xk) - P (M & Xj) = P(M & Xj) * P(M & Xj | Xk) - P(M & Xj) = P(M & Xj) * ( P(M & Xj | Xk) - 1)

Axiomatically 0 <= P(S) <= 1 for any set S, so we can conclude P(M & Xj & Xk) - P (M & Xj) <= 0.

In other words, introducing any additional condition will never yield a higher joint probability. Hence, you will find that the combination of conditions that yields the highest joint probability is D alone. It's theoretically impossible to perform better than using D alone.

@artemiso,

Great summary and this looks correct for assuming everything is independent. Given the data used to derive the probabilities is based upon a non-linear non-stationary data set, it might not be fair to say all observations are independent (depends upon how you derive your features), as they may be dependent upon the prior days trend. Your proposal is a great way to start as I would hope most of the data is mostly independent, this is a valid hypothesis and worth trying. I just want to put the caution out there that the results might not run forward in time so validate the solution forward and backward if you can.

There is a great package, RapidMiner which is open-source. Loading the data into that tool may help you look at it different ways to help understand if your hypothesis is correct.

Nil per os
-NJAMC [Generic Programmer]

LOM WIKI: NT-Local-Order-Manager-LOM-Guide
Artificial Bee Colony Optimization
Visit my NexusFi Trade Journal Reply With Quote
Thanked by:
  #6 (permalink)
 
aquarian1's Avatar
 aquarian1 
Point Roberts, WA, USA
 
Experience: Advanced
Platform: IB and free NT
Broker: IB
Trading: ES
Posts: 4,034 since Dec 2010
Thanks Given: 1,509
Thanks Received: 2,593


NJAMC View Post
@artemiso,

Great summary and this looks correct for assuming everything is independent. Given the data used to derive the probabilities is based upon a non-linear non-stationary data set, it might not be fair to say all observations are independent (depends upon how you derive your features), as they may be dependent upon the prior days trend. Your proposal is a great way to start as I would hope most of the data is mostly independent, this is a valid hypothesis and worth trying. I just want to put the caution out there that the results might not run forward in time so validate the solution forward and backward if you can.

There is a great package, RapidMiner which is open-source. Loading the data into that tool may help you look at it different ways to help understand if your hypothesis is correct.

Thank-you.

The conditions are not assumed to be independent but each letter is exclusive.

..........
peace, love and joy to you
.........
Visit my NexusFi Trade Journal Started this thread Reply With Quote
  #7 (permalink)
 
aquarian1's Avatar
 aquarian1 
Point Roberts, WA, USA
 
Experience: Advanced
Platform: IB and free NT
Broker: IB
Trading: ES
Posts: 4,034 since Dec 2010
Thanks Given: 1,509
Thanks Received: 2,593


SMCJB View Post
Interesting question. You may want to message @NJAMC as I know he has applied some machine learning to markets problems that probably/may look a lot like this.

Just curious why did you exclude "B" as a 4th high prediction condition?

Also in your table "all three D^J^G" is 112 but "All 3 pairs" [which I assume is (D^J)^(D^G)^(J^G)] is only 83. When you have D^J^G don't you always have all 3 pairs as well hence the number of occurrences should be the same?

Thank-you

RE: Just curious why did you exclude "B" as a 4th high prediction condition?

- You're correct I would test for it. I was only giving the above as an example of how far I had progressed. I thought it best to get input from others, first, before proceeding further. As my probability and stats are limited others might see a flaw in HOW I am approaching the problem.


------------
RE: Also in your table "all three D^J^G" is 112 but "All 3 pairs" [which I assume is (D^J)^(D^G)^(J^G)] is only 83. When you have D^J^G don't you always have all 3 pairs as well hence the number of occurrences should be the same?

- I think you may well be correct. I will have to go back and check over things. Thank you for highlighting this.
I'm in a bit of a brain fog at the moment (more than usual LOL!)

..........
peace, love and joy to you
.........
Visit my NexusFi Trade Journal Started this thread Reply With Quote
  #8 (permalink)
 
aquarian1's Avatar
 aquarian1 
Point Roberts, WA, USA
 
Experience: Advanced
Platform: IB and free NT
Broker: IB
Trading: ES
Posts: 4,034 since Dec 2010
Thanks Given: 1,509
Thanks Received: 2,593


artemiso View Post
No point finding the highest probability without knowing what is the nature of M. I'd rather not have 99% chance of winning a $500M lottery if it also comes with 1% remaining chance of being kidnapped, tortured and incrementally dismembered for 30 days.

But if you want to find the combination with the highest joint probability, this is trivial. Let Xi denote each observable condition for any nonnegative integer i. We have:

P(M & Xj & Xk) = P(M & Xj) * P(M & Xj | Xk)
The improvement of M & Xj & Xk over M & Xj is really simply:

P(M & Xj & Xk) - P (M & Xj) = P(M & Xj) * P(M & Xj | Xk) - P(M & Xj) = P(M & Xj) * ( P(M & Xj | Xk) - 1)

Axiomatically 0 <= P(S) <= 1 for any set S, so we can conclude P(M & Xj & Xk) - P (M & Xj) <= 0.

In other words, introducing any additional condition will never yield a higher joint probability. Hence, you will find that the combination of conditions that yields the highest joint probability is D alone. It's theoretically impossible to perform better than using D alone.

Thank-you.

No I am not just looking for the set with the highest probability
"Goal
To establish a systematic way of investigating the possible condition sets to the highest probability combinations for a rule set to predict my outcome, which I can then test my searching my database."

If I follow your reply yes D is highest alone. I think in your reply of "joint probabilities" you are referring to "ands".

There will be some days when D does not occur on those days I could have J^G and knowing what their odd of that giving M are would be useful.

Additionally if D^G is lower than D alone and I have a day with a D^G I would like to know that M has become less likely.

I thought that if one has multiple rules forming a set that one could derive a better trading system.

..........
peace, love and joy to you
.........
Visit my NexusFi Trade Journal Started this thread Reply With Quote
  #9 (permalink)
 
aquarian1's Avatar
 aquarian1 
Point Roberts, WA, USA
 
Experience: Advanced
Platform: IB and free NT
Broker: IB
Trading: ES
Posts: 4,034 since Dec 2010
Thanks Given: 1,509
Thanks Received: 2,593

Thank-you all for your replies.

Perhaps I need to go back to the drawing board.

..........
peace, love and joy to you
.........
Visit my NexusFi Trade Journal Started this thread Reply With Quote
  #10 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685



NJAMC View Post
@artemiso,

Great summary...

Thanks.


NJAMC View Post
...and this looks correct for assuming everything is independent.

Actually, my solution is in the general form and true even for dependent variables. In the special case that {Xi, i is a positive integer} is a collection of pairwise independent variables, you can further decompose the conditional probability P(Xj | Xk) = P(Xj) and P(Xi & Xj) = P(Xi)*P(Xj).

I think what you're meaning to say is that it doesn't solve the problem if each of the random variables {Xi, i is a positive integer} is itself a member of some non-stationary stochastic process. I thank you for pointing out. Well, that's an issue with @aquarian1's methodology...

Reply With Quote
Thanked by:




Last Updated on July 5, 2014


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts