Intro:
I have decided to document the development and deployment of my first FOREX ATS. Upon completion of the system, I hope to be able to review this journal to see where I went wrong/right in the development process.

I welcome the comments along the way...

AIM: I am developing an ATS to trade multiple currency pairs. I hope to use this as a basis to continue developing a separate ATS focused on ES contracts. I am looking to complete the FOREX ATS in time to be tested in this years MQL competition. I hope that the structured competition will provide a good means of testing the performance of the strategy.

I will look to update this journal a few time a week at least...

Journal Entry #1:
A bit of background to development as it currently stands: So far I have downloaded and installed MT5 which I have no experience using. I also have no experience trading forex so this will be a challenge.

The 1st thing I did after dl'ing MT5 was play around with the application. The strategy optimization/analysis tools look awesome.
I then read a few articles on how to develop different strategies. I have to say I am quite impressed by how the MQL community is setup. Everything seems to promote collaboration which is a big plus for newbies like me.
Following this I reviewed the last years competition details to see how it faired. The rules are interesting and allowed me to formulate a development strategy. I am a little concerned as it appears that there are 1000s of entries to this competition and a lot of the strategies appear to be VERY basic. These strategies do appear to succeed in this competition and I ask myself if this is more a lottery and not a test of skill? I am looking to develop a statistically robust strategy which may not win against some of these 'lucky' strategies. I guess the measure of success will not be winning the competition but to test if the development framework I employ can result in a consistently profibitable strategy. Time will tell if I am successful or not.

My next step was to download a whole bunch of EURUSD 1m data for importing into SAS (my preferred analytics package).

Once I have my historical 1m time series I need to look at creating some outcome flags. To do this I have to define a crude business case based on the questions: What is a profitable trend? What is my entry point?
By conducting some univariate analysis I get an understanding of how the price moves over different time frames. I dont want to trade too quickly as this is a trend model and I dont want to exit too quickly based on the price moving against me either. To capture these characteristics I apply a price filter and some time constraints.

Once I have these answered I can then extend the single 1m observations to a window of observations where I can enter the market and be profitable. Now that I have these trading horizons defined for both long and short trends I can move onto some indicators.

AJEspy

Can you help answer these questions from other members on futures io?

The modeling session continues so whilst my code runs its a good idea to jot down my thoughts...
On further consideration I have decided to continue with investigating the outcomes.
I think my original filter was too light at 0.0035. As a result of this my flagged outcomes was too small at 0.0095. The horizon cut-off was set at 0.0055.
I was capturing too much noise in the data. I have just increased the filter to 0.0055 and flag cut-off to 0.017 with a horizon limit of 0.01.

On running a random model, the results are promising. The mean GP for both long and short flagged trades have doubled. Trading has also halfed. I am happy enough with this for now to move onto the indicators.

As I have no idea about what works in FOREX I have opted to start with a top down analysis and will conduct a thorough data mining/dredging plan. I will then focus my attention on any promising indicators to see if a couple of customized indicators can be developed (bottom up) to be used in the ATS.

To complete the data mining I have used a MQL5 script which I picked up on trial that allows the downloading of the standard MQL historical indicator values. I have 217 indicator and time-frame combinations to check against my outcomes. To complete this analysis I am using basic univariate IV.

Another midnight modelling session!
I have begun high level analysis of 31 different indicators across 7 different time windows for a total of 317. Some indicators have multiple variables associated with the indicator (for example ADX_Wilder which has ADX_Wilder, +PI and -PI). Due to the fact that this is very broad brush stroke analysis I will be looking at these supplementary vars also.

The analysis consists of constructing a histogram for the variable then using these buckets as initial buckets for WOE calcs against my oucomes. I have written a quick piece of code which should allow me to complete the +317 vars in a few days.

Once I have compiled the rough WOE calcs I will be able to narrow down the var list by disgarding the indicators with low value IVs. I will then look for commonalities between the different vars and see if I can get some intial thoughts on constructing some potent custom vars.

So far I have knocked off 14 primary vars... only leaves +303!

The code works great and I have managed to initially test 105 different vars so far... so about a qtr of the way through. At this rate I maybe able to finish the intial testing this evening.

On a different note...
The primary model concept is that at any time the market will immediately begin to operate in two states:
1- trending up or 2-trending down.
Based on technical analysis a probability is assigned to both these states.
As I was perusing the web today I came across grid trading systems which from my brief introduction is used for trading sideways trending markets... I have not accounted for this 3rd state. As I continue my analysis I will toy with some ideas about how I can incorporate a sideways trend strategy into my model framework.

Developing this beast is going to be interesting...

The following user says Thank You to ajespy for this post:

After a monster effort I have completed first pass analysis of 287 different vars. I will now begin culling those vars with the lowest IVs.
If a var has an outright IV of less than 0.01 for either long or short outcomes I will move it to a discard list.
74 vars have been removed by this rule.
I will then focus on what is left and begin to selectively cull the weaker of the remainder. As we are initially focusing on polarised states I will consider the correlation of the WOE in this cull. Weak IV vars which are nicely correlated will be retained where as the vars with stronger IV values with WOE which is a little more noisy will be disgarded. I will however try to retain the more promising vars in each group as to asertain any commonalities.

The first pass cull has resulted in a subset of 185 different indicators.The subset is a mixed bag. Some vars have weak Information Value but are nicely distributed with respect to the underlying histogram; others have relatively strong IV but the distribution of outcomes is noisy. These vars will require a lot of work to develop anything useful, which will result in significant loss of IV.

The next step in the indicator analysis is to gain some insight into how the variables relate to each other. Each prospective var will capture some level of unique information. To gauge this I will construct a summary matrix of the elements used in the construction of the various indicators. In addition to this hgh level 'map' I will construct various correlation matrices.

This analysis will assist me in maximising the amount of information from the various indicators.

The following user says Thank You to ajespy for this post:

So I coded up a SQL join of 185 different variables from 180 different data tables. This resulted in a lack of computing power

I split the join down to just the first 10 vars and ran the correlation on that. Doing it this way would take too long and I wouldnt get my high level image of how the vars fit with each other very easily.

I wasnt getting anywhere very fast so I decided to go back and be more aggressive in my cull. I decided that I would only retain vars if they either where the best of similarly distributed indicators or they where distributed completely differently within the indicator type. I also would not keep more than 3 of the same indicator (still too soft I know).

Second cull has resulted in 93 variables being retained. This is a little more manageable. Back to running correlations.

I have just finalised 4 different correlation matrices: Pearson, Spearman, Kendall and Hoeffding. I will use these matrices to continue the cull with the aim of cutting at least 3/4 of the current variables. I will then be able to focus on the remaining variables.

To do this I will group the vars with a correlation of over 0.5 in the order of Hoeffding, Kendall, Spearman then lastly Pearson. I will then refine the WOE buckets for each var and then remove those vars with weakest resultant IV with similar information content.

The Hoeffding analysis alone resulted in removal of 22 more vars.

OK so I have finalised filtering variables based on both kendall and hoeffding measures of correlation as well as reviewing different variable components and Information Value. I managed to chop through half my variables with the current retained count sitting at 47!

So I began at ~300 and have cut that down to 47 which at this stage is not too bad. I still need to focus on variables as gleaning as much info from the data is a sure way to increase the likelihood of good trades which in turn leads to increased expected returns.

My next task is to combine the 47 with the outcome flags into a single datamart. I will then calculate the WOE for the 47 based on the histograms and run that through a step-wise log regression. I just want to get a feel for what I am dealing with. I still need to cull because I know that some vars need to be dealt with as a quotient of another var (like MAs where one crosses above or below another). If I analysed the whole 47 that would result in 47! vars (Im thinking on the fly so this may be incorrect) or some other crazy number which is just impossible to crunch with my computing power. So what I will do is cull the list down a little more with the step-wise then use a boxcox to transform the remainder, get the quotients and then check all of them for some more info.

In addition to these vars I think I may model the time series itself, and construct some vars out of the resultant models.

On another note I further formulated my trading model framework today. I am hoping to batch optimise 4 log-reg models to capture the correlation. I have no idea how to achieve this in SAS or any other stat package. I think I may estimate the singles then optimise the combination with a NN. I will then overlay my risk management and money management and optimise that with an NN also.

It will be interesting to see the results.

The following user says Thank You to ajespy for this post: