TensorFlow + NT8 Strategy - Battle of the Bots style

May 21st, 2021, 10:37 PM

Jasonnator

I definitely tried using a "stepped" learning rate with the LearningRateScheduler callback. I tested this extensively actually because I know about peak/valley minima. Again, worked fine on the sonar dataset but absolutely did not with financial dataset.

Any pointers to what that "problem" may be would be very helpful.

I could be completely wrong but the fact that the different model architectures work with a very similarly structured dataset (sonar) but do not with the financial data leads me to believe the indicators just don't have any predictive information contained in them.

From my experience, you should be able to Train the model well with almost complete crap data. Your Validation will drop fairly rapidly in these overfit situations. The model should still learn almost regardless of the input (given enough entropy, you can't feed it 0's which contain no data for example). There is something simply not working right in this model. The model might be underfitting as well... Just not enough weights to get a fit.

May 22nd, 2021, 03:27 PM

Small update and learning point:

I thought a lot about @NJAMC's comment about when I z-score normalize my dataset, I am effectively zooming out. Since the EMAs and ATR have such contrasting values, a blanket normalization across all of them does not treat them "fairly" with respect to the model. Z-scoring the EMAs results in a significant difference in the value the model sees. At the same time, when the ATR values are centered, they are changed by way more of an overall amount relative to their regular values. This results in most of the ATR information being "lost" because they end up getting squeezed into such a tight window, the resolution in their values is effectively gone.

Greg is absolutely correct that a properly built network should be able to learn from pretty much any dataset, including a crap one. There is a small amount that these models learn from the dataset when I get the normalization and learning rates correct, but not enough to put this into production by any means.

I did a quick test to challenge this theory and @NJMAC is right! I completely remove the ATR from the dataset by commenting out adding it to the queueWorker //this.queueWorker.AddIndicator(this.atr14) as well as commenting out adding its value //this.queueWorker.AddValue(this.atr14). With ATR removed from the dataset, I now had a vector with 10 values instead of 15. I used the same z-score normalizer and noticed the following:

The accuracy remained mostly unchanged
The ROC AUC improved by 5% on all models

The 5% ROC AUC improvement without changing any model architecture is validation that the EMA and ATR values need to be normalized independently. The increase is promising but with a peak value of only 57%-58%, it is still well below what a skillful classifier should be (70%-80% minimum IMO). I think this is because the 2 EMAs are not a sufficient predictor of whether the trade was a winner or a loser (no tangible predictive ability in the input information).

I've added this to the "stuff I've learned" since starting this post and will definitely incorporate it in the future. Manipulating data in python is so tedious and this will add another level of complexity, but as this little experiment has shown, it is necessary. My basic thinking is I will need to break apart the input (X_train) into it's individual features and normalize them independently then recombine them, create a tensorflow dataset, and fire off a training job.

May 22nd, 2021, 06:35 PM

@Jasonnator
thank you very much for sharing your efforts on the TF/NT8 integration.
please bare with my limited knowledge about machine learning and statistics.

If I understand your notebook example and the code of TensorFlowFib50Strat.cs right, you're passing (the somewhere normalized?) data for e.g. an EMA directly into an input neuron.

I'll stick with your ipynb example to explain my question/thinking.

first - if you normalize an value which has no real maximum (outside of the training set) - how will you handle values which will be reported above the max value of the normalized set? I had this question lately at a coding practice where I had to normalize the "age" of a data sample - but asked myself how to apply the model to the population if I don't know if it contains higher aged people than the sample. probably a simple question - as I said - still (eagerly) learning ;-)

Is it really important to know that ema14_0 had at timestamp 1 the value of 2087.169189? Will this input reoccur in the same way, that any predictability can be derived from this? I think my apporach would be to created binary descriptions for ema14_0 and all others to describe the context of a signal like ema14_0_rising true/false, above_close... open.. has_crossed... and so on
Would this create to much inputs for a NN?

thank you very much for your input.

askerix

May 22nd, 2021, 10:34 PM

askerix

@Jasonnator
thank you very much for sharing your efforts on the TF/NT8 integration.
please bare with my limited knowledge about machine learning and statistics.

If I understand your notebook example and the code of TensorFlowFib50Strat.cs right, you're passing (the somewhere normalized?) data for e.g. an EMA directly into an input neuron.

I'll stick with your ipynb example to explain my question/thinking.

first - if you normalize an value which has no real maximum (outside of the training set) - how will you handle values which will be reported above the max value of the normalized set? I had this question lately at a coding practice where I had to normalize the "age" of a data sample - but asked myself how to apply the model to the population if I don't know if it contains higher aged people than the sample. probably a simple question - as I said - still (eagerly) learning ;-)

Is it really important to know that ema14_0 had at timestamp 1 the value of 2087.169189? Will this input reoccur in the same way, that any predictability can be derived from this? I think my apporach would be to created binary descriptions for ema14_0 and all others to describe the context of a signal like ema14_0_rising true/false, above_close... open.. has_crossed... and so on
Would this create to much inputs for a NN?

thank you very much for your input.

askerix

I take a stab at trying to clear this up. I think the most common form of normalization is squeezing values to either 0 to 1 or -1 to 1. This is commonly known as min/max normalization. It can be very useful and sometimes better but has the drawback you mentioned of what if new data is outside your computer min or max? Then you have a problem, possibly. There is another way of normalizing which is called z-score. It has other names as well (sklearn called it the StandardScaler) like center, standard score, etc. Standardizing a dataset means that you do the appropriate math so that the distribution has a mean of 0 and a standard deviation of 1. You could end up with a value of 2.5 which just means that particular value is 2.5 standard deviations from the mean of 0 and the standard deviation of 1. For financial data which may have significantly higher/lower values as compared to a min max, machine learning tends to more accepting of z-score. There are definitely exceptions so this is not a hard/fast rule, more of a general one.

So your example of 2087.xyz may have a normalized value of -0.7 or maybe -1.23 when compared to the overall distribution. Standard statistics (bell curve) still applies meaning that a value of 1 to -1 will still contain approximately 68% of all data.

Your reference to timestamp is not quite accurate in this context. The ema14_0 is the first position in the input vector which has 15 elements. Following up on your next question about the input reoccurring, I'm not quite sure what you mean. The first input or the 15th input, the model will learn any relationships it can between all/any of the input.

As far as representing values as binary, I am serializing every value as binary out of the indicator then on the python side, pandas is taking care of deserializing them. This is why I created the type array before reading the file in with pandas and pandas takes that type array so it know how many bytes to read per value.

The basics of how this works is:

Signal is generated by the strategy
At the time of the signal, the previous 5 values are serialized out for each indicator (all 3) for a total of 15 + the label
When the trade closes, the label is added to the training sample
Once all historical data has been process, the binary file is created by the strategy
The binary file created is read in by pandas
Different models are trained on the data
Models are evaluated on a test dataset which it has not previously seen
Results are interpreted
Learning hopefully occurs

May 26th, 2021, 08:04 PM

Jasonnator

Small update and learning point:

I thought a lot about @NJAMC's comment about when I z-score normalize my dataset, I am effectively zooming out. Since the EMAs and ATR have such contrasting values, a blanket normalization across all of them does not treat them "fairly" with respect to the model. Z-scoring the EMAs results in a significant difference in the value the model sees. At the same time, when the ATR values are centered, they are changed by way more of an overall amount relative to their regular values. This results in most of the ATR information being "lost" because they end up getting squeezed into such a tight window, the resolution in their values is effectively gone.

Greg is absolutely correct that a properly built network should be able to learn from pretty much any dataset, including a crap one. There is a small amount that these models learn from the dataset when I get the normalization and learning rates correct, but not enough to put this into production by any means.

I did a quick test to challenge this theory and @NJMAC is right! I completely remove the ATR from the dataset by commenting out adding it to the queueWorker //this.queueWorker.AddIndicator(this.atr14) as well as commenting out adding its value //this.queueWorker.AddValue(this.atr14). With ATR removed from the dataset, I now had a vector with 10 values instead of 15. I used the same z-score normalizer and noticed the following:

The accuracy remained mostly unchanged
The ROC AUC improved by 5% on all models

The 5% ROC AUC improvement without changing any model architecture is validation that the EMA and ATR values need to be normalized independently. The increase is promising but with a peak value of only 57%-58%, it is still well below what a skillful classifier should be (70%-80% minimum IMO). I think this is because the 2 EMAs are not a sufficient predictor of whether the trade was a winner or a loser (no tangible predictive ability in the input information).

I've added this to the "stuff I've learned" since starting this post and will definitely incorporate it in the future. Manipulating data in python is so tedious and this will add another level of complexity, but as this little experiment has shown, it is necessary. My basic thinking is I will need to break apart the input (X_train) into it's individual features and normalize them independently then recombine them, create a tensorflow dataset, and fire off a training job.

Yet another learning point: sklearn's StandardScaler does in fact do per feature (column) scaling. This means that simply removing ATR was the reason for the small bump in ROC AUC improvement.

Bit by bit, I am learning more and more. Hopefully I help someone along the way better analyze their dataset when trying to apply machine learning to their trading. Ultimately, what I have taken away is to test, test, test absolutely everything. There is no comprehensive documentation (which I can find) out there on this which means every step must be approached with skepticism and the assumption that you could be wrong. Although frustrating at times, I am definitely enjoying the journey and fully expect that I'll crack this at some point and start bringing a lot more machine learning based algos into my trading.

TensorFlow + NT8 Strategy - Battle of the Bots style

Discussion in NinjaTrader

TensorFlow + NT8 Strategy - Battle of the Bots style