TensorFlow + NT8 Strategy - Battle of the Bots style - futures io
futures io



TensorFlow + NT8 Strategy - Battle of the Bots style


Discussion in NinjaTrader

Updated
    1. trending_up 1,242 views
    2. thumb_up 8 thanks given
    3. group 9 followers
    1. forum 14 posts
    2. attach_file 2 attachments




Welcome to futures io: the largest futures trading community on the planet, with well over 125,000 members
  • Genuine reviews from real traders, not fake reviews from stealth vendors
  • Quality education from leading professional traders
  • We are a friendly, helpful, and positive community
  • We do not tolerate rude behavior, trolling, or vendors advertising in posts
  • We are here to help, just let us know what you need
You'll need to register in order to view the content of the threads and start contributing to our community.  It's free and simple.

-- Big Mike, Site Administrator

(If you already have an account, login at the top of the page)

 
Search this Thread
 

TensorFlow + NT8 Strategy - Battle of the Bots style

(login for full post details)
  #11 (permalink)
 NJAMC 
Atkinson, NH USA
 
Experience: Intermediate
Platform: NinjaTrader 8/TensorFlow
Broker: NinjaTrader Brokerage
Trading: Futures, CL, ES, ZB
 
NJAMC's Avatar
 
Posts: 1,970 since Dec 2010
Thanks: 3,036 given, 2,379 received


Jasonnator View Post
I definitely tried using a "stepped" learning rate with the LearningRateScheduler callback. I tested this extensively actually because I know about peak/valley minima. Again, worked fine on the sonar dataset but absolutely did not with financial dataset.

Any pointers to what that "problem" may be would be very helpful.

I could be completely wrong but the fact that the different model architectures work with a very similarly structured dataset (sonar) but do not with the financial data leads me to believe the indicators just don't have any predictive information contained in them.

From my experience, you should be able to Train the model well with almost complete crap data. Your Validation will drop fairly rapidly in these overfit situations. The model should still learn almost regardless of the input (given enough entropy, you can't feed it 0's which contain no data for example). There is something simply not working right in this model. The model might be underfitting as well... Just not enough weights to get a fit.

Nil per os
-NJAMC [Generic Programmer]

LOM WIKI: NT-Local-Order-Manager-LOM-Guide
Artificial Bee Colony Optimization
Visit my futures io Trade Journal Reply With Quote

Can you help answer these questions
from other members on futures io?
Is there inherently more opportunity in smaller markets?
Traders Hideout
For the experts of Metastock
Platforms and Indicators
MacdBB V402 NT804 + gradient + pullback not working on N …
NinjaTrader
TD Sequential for NT or MW?
Platforms and Indicators
Experience with AGN (Introducing Broker)
Brokers
 
 
(login for full post details)
  #12 (permalink)
 Jasonnator 
Denver, Colorado United States
 
Experience: Intermediate
Platform: NT8 + Custom
Broker: NT Brokerage, Kinetick, IQFeed, Interactive Brokers
Trading: ES
 
Jasonnator's Avatar
 
Posts: 151 since Dec 2014
Thanks: 39 given, 140 received

Small update and learning point:

I thought a lot about @NJAMC's comment about when I z-score normalize my dataset, I am effectively zooming out. Since the EMAs and ATR have such contrasting values, a blanket normalization across all of them does not treat them "fairly" with respect to the model. Z-scoring the EMAs results in a significant difference in the value the model sees. At the same time, when the ATR values are centered, they are changed by way more of an overall amount relative to their regular values. This results in most of the ATR information being "lost" because they end up getting squeezed into such a tight window, the resolution in their values is effectively gone.

Greg is absolutely correct that a properly built network should be able to learn from pretty much any dataset, including a crap one. There is a small amount that these models learn from the dataset when I get the normalization and learning rates correct, but not enough to put this into production by any means.

I did a quick test to challenge this theory and @NJMAC is right! I completely remove the ATR from the dataset by commenting out adding it to the queueWorker //this.queueWorker.AddIndicator(this.atr14) as well as commenting out adding its value //this.queueWorker.AddValue(this.atr14). With ATR removed from the dataset, I now had a vector with 10 values instead of 15. I used the same z-score normalizer and noticed the following:
  1. The accuracy remained mostly unchanged
  2. The ROC AUC improved by 5% on all models

The 5% ROC AUC improvement without changing any model architecture is validation that the EMA and ATR values need to be normalized independently. The increase is promising but with a peak value of only 57%-58%, it is still well below what a skillful classifier should be (70%-80% minimum IMO). I think this is because the 2 EMAs are not a sufficient predictor of whether the trade was a winner or a loser (no tangible predictive ability in the input information).

I've added this to the "stuff I've learned" since starting this post and will definitely incorporate it in the future. Manipulating data in python is so tedious and this will add another level of complexity, but as this little experiment has shown, it is necessary. My basic thinking is I will need to break apart the input (X_train) into it's individual features and normalize them independently then recombine them, create a tensorflow dataset, and fire off a training job.

Started this thread Reply With Quote
 
(login for full post details)
  #13 (permalink)
 askerix 
Switzerland
 
Experience: None
Platform: NT
Trading: Ukulele
 
Posts: 50 since Mar 2011
Thanks: 492 given, 50 received


@Jasonnator
thank you very much for sharing your efforts on the TF/NT8 integration.
please bare with my limited knowledge about machine learning and statistics.

If I understand your notebook example and the code of TensorFlowFib50Strat.cs right, you're passing (the somewhere normalized?) data for e.g. an EMA directly into an input neuron.

I'll stick with your ipynb example to explain my question/thinking.

first - if you normalize an value which has no real maximum (outside of the training set) - how will you handle values which will be reported above the max value of the normalized set? I had this question lately at a coding practice where I had to normalize the "age" of a data sample - but asked myself how to apply the model to the population if I don't know if it contains higher aged people than the sample. probably a simple question - as I said - still (eagerly) learning ;-)

Is it really important to know that ema14_0 had at timestamp 1 the value of 2087.169189? Will this input reoccur in the same way, that any predictability can be derived from this? I think my apporach would be to created binary descriptions for ema14_0 and all others to describe the context of a signal like ema14_0_rising true/false, above_close... open.. has_crossed... and so on
Would this create to much inputs for a NN?

thank you very much for your input.

askerix

Reply With Quote
 
(login for full post details)
  #14 (permalink)
 Jasonnator 
Denver, Colorado United States
 
Experience: Intermediate
Platform: NT8 + Custom
Broker: NT Brokerage, Kinetick, IQFeed, Interactive Brokers
Trading: ES
 
Jasonnator's Avatar
 
Posts: 151 since Dec 2014
Thanks: 39 given, 140 received


askerix View Post
@Jasonnator
thank you very much for sharing your efforts on the TF/NT8 integration.
please bare with my limited knowledge about machine learning and statistics.

If I understand your notebook example and the code of TensorFlowFib50Strat.cs right, you're passing (the somewhere normalized?) data for e.g. an EMA directly into an input neuron.

I'll stick with your ipynb example to explain my question/thinking.

first - if you normalize an value which has no real maximum (outside of the training set) - how will you handle values which will be reported above the max value of the normalized set? I had this question lately at a coding practice where I had to normalize the "age" of a data sample - but asked myself how to apply the model to the population if I don't know if it contains higher aged people than the sample. probably a simple question - as I said - still (eagerly) learning ;-)

Is it really important to know that ema14_0 had at timestamp 1 the value of 2087.169189? Will this input reoccur in the same way, that any predictability can be derived from this? I think my apporach would be to created binary descriptions for ema14_0 and all others to describe the context of a signal like ema14_0_rising true/false, above_close... open.. has_crossed... and so on
Would this create to much inputs for a NN?

thank you very much for your input.

askerix

I take a stab at trying to clear this up. I think the most common form of normalization is squeezing values to either 0 to 1 or -1 to 1. This is commonly known as min/max normalization. It can be very useful and sometimes better but has the drawback you mentioned of what if new data is outside your computer min or max? Then you have a problem, possibly. There is another way of normalizing which is called z-score. It has other names as well (sklearn called it the StandardScaler) like center, standard score, etc. Standardizing a dataset means that you do the appropriate math so that the distribution has a mean of 0 and a standard deviation of 1. You could end up with a value of 2.5 which just means that particular value is 2.5 standard deviations from the mean of 0 and the standard deviation of 1. For financial data which may have significantly higher/lower values as compared to a min max, machine learning tends to more accepting of z-score. There are definitely exceptions so this is not a hard/fast rule, more of a general one.

So your example of 2087.xyz may have a normalized value of -0.7 or maybe -1.23 when compared to the overall distribution. Standard statistics (bell curve) still applies meaning that a value of 1 to -1 will still contain approximately 68% of all data.

Your reference to timestamp is not quite accurate in this context. The ema14_0 is the first position in the input vector which has 15 elements. Following up on your next question about the input reoccurring, I'm not quite sure what you mean. The first input or the 15th input, the model will learn any relationships it can between all/any of the input.

As far as representing values as binary, I am serializing every value as binary out of the indicator then on the python side, pandas is taking care of deserializing them. This is why I created the type array before reading the file in with pandas and pandas takes that type array so it know how many bytes to read per value.

The basics of how this works is:
  1. Signal is generated by the strategy
  2. At the time of the signal, the previous 5 values are serialized out for each indicator (all 3) for a total of 15 + the label
  3. When the trade closes, the label is added to the training sample
  4. Once all historical data has been process, the binary file is created by the strategy
  5. The binary file created is read in by pandas
  6. Different models are trained on the data
  7. Models are evaluated on a test dataset which it has not previously seen
  8. Results are interpreted
  9. Learning hopefully occurs

Started this thread Reply With Quote
The following user says Thank You to Jasonnator for this post:
 
(login for full post details)
  #15 (permalink)
 Jasonnator 
Denver, Colorado United States
 
Experience: Intermediate
Platform: NT8 + Custom
Broker: NT Brokerage, Kinetick, IQFeed, Interactive Brokers
Trading: ES
 
Jasonnator's Avatar
 
Posts: 151 since Dec 2014
Thanks: 39 given, 140 received


Jasonnator View Post
Small update and learning point:

I thought a lot about @NJAMC's comment about when I z-score normalize my dataset, I am effectively zooming out. Since the EMAs and ATR have such contrasting values, a blanket normalization across all of them does not treat them "fairly" with respect to the model. Z-scoring the EMAs results in a significant difference in the value the model sees. At the same time, when the ATR values are centered, they are changed by way more of an overall amount relative to their regular values. This results in most of the ATR information being "lost" because they end up getting squeezed into such a tight window, the resolution in their values is effectively gone.

Greg is absolutely correct that a properly built network should be able to learn from pretty much any dataset, including a crap one. There is a small amount that these models learn from the dataset when I get the normalization and learning rates correct, but not enough to put this into production by any means.

I did a quick test to challenge this theory and @NJMAC is right! I completely remove the ATR from the dataset by commenting out adding it to the queueWorker //this.queueWorker.AddIndicator(this.atr14) as well as commenting out adding its value //this.queueWorker.AddValue(this.atr14). With ATR removed from the dataset, I now had a vector with 10 values instead of 15. I used the same z-score normalizer and noticed the following:
  1. The accuracy remained mostly unchanged
  2. The ROC AUC improved by 5% on all models

The 5% ROC AUC improvement without changing any model architecture is validation that the EMA and ATR values need to be normalized independently. The increase is promising but with a peak value of only 57%-58%, it is still well below what a skillful classifier should be (70%-80% minimum IMO). I think this is because the 2 EMAs are not a sufficient predictor of whether the trade was a winner or a loser (no tangible predictive ability in the input information).

I've added this to the "stuff I've learned" since starting this post and will definitely incorporate it in the future. Manipulating data in python is so tedious and this will add another level of complexity, but as this little experiment has shown, it is necessary. My basic thinking is I will need to break apart the input (X_train) into it's individual features and normalize them independently then recombine them, create a tensorflow dataset, and fire off a training job.

Yet another learning point: sklearn's StandardScaler does in fact do per feature (column) scaling. This means that simply removing ATR was the reason for the small bump in ROC AUC improvement.

Bit by bit, I am learning more and more. Hopefully I help someone along the way better analyze their dataset when trying to apply machine learning to their trading. Ultimately, what I have taken away is to test, test, test absolutely everything. There is no comprehensive documentation (which I can find) out there on this which means every step must be approached with skepticism and the assumption that you could be wrong. Although frustrating at times, I am definitely enjoying the journey and fully expect that I'll crack this at some point and start bringing a lot more machine learning based algos into my trading.

Started this thread Reply With Quote
The following user says Thank You to Jasonnator for this post:


futures io Trading Community Platforms and Indicators NinjaTrader > TensorFlow + NT8 Strategy - Battle of the Bots style


Last Updated on May 26, 2021


Upcoming Webinars and Events
 

NinjaTrader Indicator Challenge!

Ongoing
 

Our 12-year anniversary w/ $$,$$$ prizes (check soon)

August
     



Copyright © 2021 by futures io, s.a., Av Ricardo J. Alfaro, Century Tower, Panama, Ph: +507 833-9432 (Panama and Intl), +1 888-312-3001 (USA and Canada), info@futures.io
All information is for educational use only and is not investment advice.
There is a substantial risk of loss in trading commodity futures, stocks, options and foreign exchange products. Past performance is not indicative of future results.
no new posts