I'm having a play with MATLAB for the first time with the intention of pattern hunting.
Wondered if anyone has experience with Markov approaches using MATLAB. Initially keen just to make contact with anyone with relevant practical experience. Very much a side project that I am using to get familiar with the software /code.
Great to hear back from anyone active in this dark corner of futures.io (formerly BMT).
Thought I would document the side project I'm undertaking with Matlab / Markov. I've asked around this and other forums to see what work has been done by us little guys and have drawn a blank so far. Seems to be the scholarly, hedge fundy types that get their teeth into it.
A disclaimer: I am a total novice with this stuff, just driven by an inquiring mind and a little spare time.
So what's a nice guy like me doing in a (mental) place like this? Well I've figured out that I will not trade unless I have worked up a 'system' that I can empirically test. I will then trade it discretionary and more confidently than I would without putting in the hard miles of coding and testing. Without putting in the back testing groundwork my mind just gets too paranoid and the whole thing falls in on itself.
I got here because I am working up a system based on market structure/volatility. I have watched 15 min charts on forex pairs pretty much full time for 2 years and I can sense patterns in there. One of my main indicators is ADX and I can intuitively get close to predicting trend days and consolidation days in the context of the last few days PA. I also have been spotting repetitions in price extremes at key times of the day, London and US open for example (these are well known).
So my starting point is to work out whether a trade idea such as, go long off a double bottom in the Asian late and London open session can be safer knowing than the probability that today is more likely to be an up day.
So I was looking for a way to underpin the intuition with a decent bit of back testing and who knows maybe I'll go so far as putting a some trade on as result.
There are then two elements to the approach, i) profiling 'typical' price structure in the context of 24hrs of forex price action from UK midnight to midnight and ii) developing a hidden Markov model than can make a prediction on the direction of price for the day.
The first part is relatively straight forward. I split the day into the obvious (to me) trading session. Remember I am in the UK. Asian Late, London Open, Morning, US Open, Afternoon, Asian early. The sessions aim to have a similar amount of activity, so the London and US opens are short (2 hours or so), the morning and afternoon a little longer and the Asian sessions taking up around half the day from 19:00-006:30.
I then identify in an NT strategy in which sessions the low of the day (LOD), high of the day (HOD) occur and in which sessions any double tops / bottoms occur with LOD/HOD. That output can then easily be manipulate to provide a probability for price extremes as the day progresses. I figure that the same market participants get up, brush their teeth and play out the whole game day after day after day. So there will inevitably be routines in their trading decisions.
So that deals with the PA intra day day, what about the PA across days?
I've opted for using an Markov approach. I first became aware of this in a great book called 'Trading Regime Analysis - The Probability of Volatility' by Murray Gunn. Mr Gunn's idea is all systems work some of the time, the key is knowing when they will, which is down to volatility - you are either trending or range bound.
So my take on this is that there is some structure in the markets all rooted in human emotion and whilst the observations look random, they mask an underlying structure. Markov modeling deals exactly with this. And it does so based only on the last observation. In this way the approach is not like a (lagging) indicator. You only need to know the current state to predict the next state and you train you prediction on the window of the last n days to come up with a maximum likelihood estimate.
So for this part I have extracted two sets of data from my price series (I am using EURUSD). The first is the 'state' which I have determined as whether yesterday was an up day or a down down - as I am trying to work out if today is going to be an up or down day. The other set are the 'emissions', i.e. those things that we can see. For this I have chosen price as related to yesterday's mid price, 3 for the price being higher at advancing degrees and the same inversely for lower prices. Matlab has some built in function/algorithms that get to work on the relationship between the states and emissions and output the most probable next state and their confidence weights. In my case: is today predicted to be an up or down day and how confident is that prediction.
For back testing purposes the goal is to train the model on test data and then apply predictions of up/down days to out of sample data as a filter to, say, 'go long off a double bottom in the Asian late and London open session'.
And that is where I am at now. I have previously been bitten by the ogre of curve fitted strategies so I am skeptical of anything written in computer code. But this Markov stuff less vulnerable to those issues. And it's a whole lot of fun too.
So I have a draft of the model done and am just starting the grind of testing. I'll post up the preliminary output for anyone that is weird enough to care.
The following 6 users say Thank You to mokodo for this post:
Other (non-trading) work has dominated my time over the last month so I haven't made as much progress as I had planned, but I have some developments to post up.
The intra day PA side of the strategy is done and I have started backtesting that. And yes a simple long of a double bottom registers as mildly profitable for my in sample data on EURUSD (April '07 - Jan '11).
The Markov model I drafted out in Matlab is great fun. Here's where I am with that.
I've settled for a three state system, determined by the daily range. State 1 is a down day with a close 45 pips or more below the open, state 2 is a range day with the close within 45 pips of the the open (either up or down), state 3 is an up day with the close more than 45 above the open. Why these settings? From the sample data this gave me a fairly good spread of days of each type. I have not run tests with any other settings, primarily to avoid curve fitting.
The emissions side of the model is sampled 6 times across each day. The emissions are the the distance from yesterday's mid price. There are 6 of these, 3 for increasing degrees above and in reverse for below. Those are numbered 1-6.
So a sample output may look like:
1 1 1 1 1 1 (this is a down day, so each of the 6 states are 1)
5 4 4 3 2 1 (pa during the day went lower through the day, and the emissions report that)
An output like the above is generated by the NT strategy and written to a text file which is made available to Matlab. In my sample I have a 860 x 6 matrix for the states, each row is a day and will be the same (either all 1s,2s or 3s); and an 860 x 6 row for the emissions where on each row there can be different numbers representing the 'emission' at each of the 6 intra day time slots.
So the variables in the Markov code in Matlab just determine at which point in the matrix to start training the model and at which point to stop. I tested on a fixed and moving window and the moving window performs better - and makes more sense to me. Too many or too few days and the model fails to capture the underlying structure. Training the model on 480 data points (80 days) provides a really stable result.
So the Matlab code I've hacked together trains on the first 80 days and then applies that model to the input data and outputs the most probable next state, it then shifts along by one data point and continues the process outputting the next most probable state until it reached the end of the sample. The output is written in to a separate file which can then be compared to what actually happen at each next point.
So the states which were predicted in the first time slot (Asian Late) which runs to 07.30GMT are shown below. The practical side of this is so I can consult the model at the beginning of my trading day and see what's it's predicting - up day, down day or range day.
The model predicted from the 860 day sample:
132 down days from 284 actual down days, 46.48% accuracy
197 range days from 324 actual range days, 60.18% accuracy
118 up days from 252 actual up days, 46.83% accuracy
127 days the model predicted a trend day but got the direction wrong, 23.69% all trend days
I also ran outputs of random 60 days sets and the std dev is very tight so I have some faith in the consistency.
My bench mark to compare these figures to would be random accuracy for a three output probability, i.e. 33%. So it's around 50% or more accurate than random on this first run for all three states.
Next step is to keep chugging out the tests to try and break it. I'll put it to work on other currency pairs as I have decent tick data going back to 07 for several of those and do the out of sample comparisons for EURUSD and then for these others. Days predicted to be up/down/range can be integrated in to NT backtesting code very easily.
I will reiterate a disclaimer here:
I am a novice and possibly do not know what I am doing. I am self taught with NT and Matlab coding and only have a year's experience with the former and about 6 weeks with the later. All these results may well just be the product of chance. I have no training with probability at all (although I do have a few books on probability and statistics). I would welcome anyone to ask questions who does know a thing or two about this stuff to blow my ideas and methodology up.
The following user says Thank You to mokodo for this post:
Looks like a great thread and you are off and running! I don't have any direct experience with Matlab and HMM but some self learning about Machine Learning. I will help where I can and hope to learn as you have a good connection to the market which I feel I am lacking in my efforts. For others interested in the detail of the Hidden Markov Model, Wikipedia has a good page:
Find attached the Markov's toolbox 4.0 latest version for Matlab (entirely compatible with GNU Octave freeware ), Documentation can be viewed insed the zip file via Matlab. I do use Matlab as a passive user, I am still in the learning process and can't help any further, you can compare it with your model
The following 3 users say Thank You to redratsal for this post:
Thanks for the feedback, I was not following the thread, but now am
Yes, I have that one and there are a few more than I collated as I trawled the web. Once I have time to review them I'll drop in some links for others to get as confused as I did!
Very many thanks for this. I was not aware of this toolbox and have had to hack some code together from samples at the Matlab file exchange. Not yet taken a look at it, but will soon. Again very many thanks for sharing.
The following 2 users say Thank You to mokodo for this post: