NexusFi: Find Your Edge


Home Menu

 





Help wanted on on statistics / testing approach for prediction


Discussion in Emini and Emicro Index

Updated
      Top Posters
    1. looks_one aquarian1 with 9 posts (0 thanks)
    2. looks_two NJAMC with 4 posts (3 thanks)
    3. looks_3 artemiso with 3 posts (5 thanks)
    4. looks_4 SMCJB with 2 posts (1 thanks)
    1. trending_up 4,979 views
    2. thumb_up 9 thanks given
    3. group 8 followers
    1. forum 16 posts
    2. attach_file 3 attachments




 
Search this Thread

Help wanted on on statistics / testing approach for prediction

  #11 (permalink)
 artemiso 
New York, NY
 
Experience: Beginner
Platform: Vanguard 401k
Broker: Yahoo Finance
Trading: Mutual funds
Posts: 1,152 since Jul 2012
Thanks Given: 784
Thanks Received: 2,685


aquarian1 View Post
Thank-you.

No I am not just looking for the set with the highest probability
"Goal
To establish a systematic way of investigating the possible condition sets to the highest probability combinations for a rule set to predict my outcome, which I can then test my searching my database."

If I follow your reply yes D is highest alone. I think in your reply of "joint probabilities" you are referring to "ands".

There will be some days when D does not occur on those days I could have J^G and knowing what their odd of that giving M are would be useful.

Additionally if D^G is lower than D alone and I have a day with a D^G I would like to know that M has become less likely.

I thought that if one has multiple rules forming a set that one could derive a better trading system.

You're most welcome.

I will hint that important rules to bear in mind are that (1) correlation does not imply causation, and (2) description and prescription. Historically, most daily deaths occur on the days I wear black underwear. That's nice to know and very descriptive, but it is probably not a prescriptive relationship.

If you think of it in this way, you don't need formal mathematics to gain an intuition for many statistical problems that you are encountering.

Reply With Quote
Thanked by:

Can you help answer these questions
from other members on NexusFi?
Build trailing stop for micro index(s)
Psychology and Money Management
NT7 Indicator Script Troubleshooting - Camarilla Pivots
NinjaTrader
ZombieSqueeze
Platforms and Indicators
My NT8 Volume Profile Split by Asian/Euro/Open
NinjaTrader
Deepmoney LLM
Elite Quantitative GenAI/LLM
 
  #12 (permalink)
 
aquarian1's Avatar
 aquarian1 
Point Roberts, WA, USA
 
Experience: Advanced
Platform: IB and free NT
Broker: IB
Trading: ES
Posts: 4,034 since Dec 2010
Thanks Given: 1,509
Thanks Received: 2,593


artemiso View Post
Thanks.

Actually, my solution is in the general form and true even for dependent variables. In the special case that {Xi, i is a positive integer} is a collection of pairwise independent variables, you can further decompose the conditional probability P(Xj | Xk) = P(Xj) and P(Xi & Xj) = P(Xi)*P(Xj).

I think what you're meaning to say is that it doesn't solve the problem if each of the random variables {Xi, i is a positive integer} is itself a member of some non-stationary stochastic process. I thank you for pointing out. Well, that's an issue with @aquarian1's methodology...

@artemiso


I was hoping there might be a better methodology/approach and there would be others who understand the problem better. This is why I asked for help and started the thread.

Your reply is too scholarly for me to understand:
"each of the random variables is itself a member of some non-stationary stochastic process."

1. I do not believe the variable are independent. As stated they are all based on the same data series -specifically EOD data for the ES. I would expect that they are non-independent.

2. I do understand that correlation is not causation, but I believe I am a long way from there and it is like sinking the boat before I can even find it! One has to start somewhere. I'm still in the water.

Your reply does not seem to offer an alternative approach.

..........
peace, love and joy to you
.........
Visit my NexusFi Trade Journal Started this thread Reply With Quote
  #13 (permalink)
 
NJAMC's Avatar
 NJAMC 
Atkinson, NH USA
Market Wizard
 
Experience: Intermediate
Platform: NinjaTrader 8/TensorFlow
Broker: NinjaTrader Brokerage
Trading: Futures, CL, ES, ZB
Posts: 1,970 since Dec 2010
Thanks Given: 3,037
Thanks Received: 2,394



aquarian1 View Post
@artemiso

Thank-you.
I was hoping there might be a better methodology/approach and there would be others who understand the problem better. This is why I asked for help and started the thread.

Your reply is too scholarly for me to understand:
"each of the random variables is itself a member of some non-stationary stochastic process."

1. I do not believe the variable are independent. As stated they are all based on the same data series -specifically EOD data for the ES. I would expect that they are non-independent.

2. I do understand that correlation is not causation, but I believe I am a long way from there and it is like sinking the boat before I can even find it! One has to start somewhere. I'm still in the water.

Your reply does not seem to offer an alternative approach.

@aquarian1,

I am stuck a bit with the "classifications" of the EOD closes. Is each day assigned one such class or are multiple classes assigned to each day?

So is the following something like the language?
- A = Close Up from previous day
- B = Close Up from previous day by large amount
- C = Close Down from Previous day
- D = Close Down from Previous day by Large amount

Or do they classes reach back further?
- A = Closed up 1 day in a row
- B = Closed up 2 days in a row
- C = Closed up 3 days in a row

Nil per os
-NJAMC [Generic Programmer]

LOM WIKI: NT-Local-Order-Manager-LOM-Guide
Artificial Bee Colony Optimization
Visit my NexusFi Trade Journal Reply With Quote
  #14 (permalink)
 
NJAMC's Avatar
 NJAMC 
Atkinson, NH USA
Market Wizard
 
Experience: Intermediate
Platform: NinjaTrader 8/TensorFlow
Broker: NinjaTrader Brokerage
Trading: Futures, CL, ES, ZB
Posts: 1,970 since Dec 2010
Thanks Given: 3,037
Thanks Received: 2,394


aquarian1 View Post
I am looking for help on statistics / testing approach to search for rule sets for increased probabilities of certain outcomes.

Situation
I have a database of conditions for EOD results from 1 Feb 2012 forward to 27 June 2014. This equals 605 records.
Each condition has a letter associated with it and these go from A to U.
I want to establish a rule set which will give the highest predictive strength of a condition of the next day's, my desired predicted outcome. (e.g M)

Goal
To establish a systematic way of investigating the possible condition sets to the highest probability combinations for a rule set to predict my outcome, which I can then test my searching my database.

Here is where I am at:






My goal would be something like a set of rules such as:
1. If D^G and G^J and d^~J then M will happen 68% of the time.
2. If pair 1 or 2 and not pair N1b M will happen 50% of the time.

Perhaps Venn diagrams would be helpful in determining the best rule sets?

I am looking for ideas on an approach to find a solution just as much as a solution.

Thanks in advance.

-----------
Clarifying notes:
1. "^" symbol = the AND condition so D^G is "both D ^ G occur"
2. In the table of occurrences of individual conditions D occurred 223 of 390 records or 57.2% of the time M happened the next day. The percentage on the right 12.4% = 223 of 1796 and is just a relative strength %.
3. "~" symbol = NOT

Hummmm....

I am starting to get your approach. I think Rapid Miner may help as I think you have created a class of "things":
A, B, C, D, E,... U

What I think might help here is to develop a Fitness function. So create a function that does something like what you have stated, but I think of it this way:
Fitness=k1*A+k2*B+...+Kx*U

You can then use a generic algorithm to "search" this function to maximize the fitness function. k1, k2 ... kx are likely one of 3 values -1, 0, +1 (NOT, absent, Present).

This is the approach I would likely take to solve this as you have stated you have ~20 possible input combinations which leads to a very large search space.

Nil per os
-NJAMC [Generic Programmer]

LOM WIKI: NT-Local-Order-Manager-LOM-Guide
Artificial Bee Colony Optimization
Visit my NexusFi Trade Journal Reply With Quote
Thanked by:
  #15 (permalink)
 
aquarian1's Avatar
 aquarian1 
Point Roberts, WA, USA
 
Experience: Advanced
Platform: IB and free NT
Broker: IB
Trading: ES
Posts: 4,034 since Dec 2010
Thanks Given: 1,509
Thanks Received: 2,593


NJAMC View Post
@aquarian1,

I am stuck a bit with the "classifications" of the EOD closes. Is each day assigned one such class or are multiple classes assigned to each day?

So is the following something like the language?
- A = Close Up from previous day
- B = Close Up from previous day by large amount
- C = Close Down from Previous day
- D = Close Down from Previous day by Large amount

Or do they classes reach back further?
- A = Closed up 1 day in a row
- B = Closed up 2 days in a row
- C = Closed up 3 days in a row

It is
"specifically EOD data for the ES"
not EOD closes.

So not what you posted - which would certainly be a some very good things to consider. later.
I have not got that far yet.

..........
peace, love and joy to you
.........
Visit my NexusFi Trade Journal Started this thread Reply With Quote
  #16 (permalink)
 
NJAMC's Avatar
 NJAMC 
Atkinson, NH USA
Market Wizard
 
Experience: Intermediate
Platform: NinjaTrader 8/TensorFlow
Broker: NinjaTrader Brokerage
Trading: Futures, CL, ES, ZB
Posts: 1,970 since Dec 2010
Thanks Given: 3,037
Thanks Received: 2,394


aquarian1 View Post
It is
"specifically EOD data for the ES"
not EOD closes.

So not what you posted - which would certainly be a some very good things to consider. later.
I have not got that far yet.

Hummmm.... Okay,

So the letters represent "features" that occurred that day?

Nil per os
-NJAMC [Generic Programmer]

LOM WIKI: NT-Local-Order-Manager-LOM-Guide
Artificial Bee Colony Optimization
Visit my NexusFi Trade Journal Reply With Quote
  #17 (permalink)
 
SMCJB's Avatar
 SMCJB 
Houston TX
Legendary Market Wizard
 
Experience: Advanced
Platform: TT and Stellar
Broker: Advantage Futures
Trading: Primarily Energy but also a little Equities, Fixed Income, Metals and Crypto.
Frequency: Many times daily
Duration: Never
Posts: 5,041 since Dec 2013
Thanks Given: 4,375
Thanks Received: 10,192


NJAMC View Post
Hummmm....

I am starting to get your approach. I think Rapid Miner may help as I think you have created a class of "things":
A, B, C, D, E,... U

What I think might help here is to develop a Fitness function. So create a function that does something like what you have stated, but I think of it this way:
Fitness=k1*A+k2*B+...+Kx*U

You can then use a generic algorithm to "search" this function to maximize the fitness function. k1, k2 ... kx are likely one of 3 values -1, 0, +1 (NOT, absent, Present).

This is the approach I would likely take to solve this as you have stated you have ~20 possible input combinations which leads to a very large search space.

Thought you might like this thread...

Reply With Quote




Last Updated on July 5, 2014


© 2024 NexusFi™, s.a., All Rights Reserved.
Av Ricardo J. Alfaro, Century Tower, Panama City, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada)
All information is for educational use only and is not investment advice. There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
About Us - Contact Us - Site Rules, Acceptable Use, and Terms and Conditions - Privacy Policy - Downloads - Top
no new posts