Combining discretionary trading, risk management and ML as an art

Welcome to futures io.

(If you already have an account, login at the top of the page)

futures io is the largest futures trading community on the planet, with over 90,000 members. At futures io, our goal has always been and always will be to create a friendly, positive, forward-thinking community where members can openly share and discuss everything the world of trading has to offer. The community is one of the friendliest you will find on any subject, with members going out of their way to help others. Some of the primary differences between futures io and other trading sites revolve around the standards of our community. Those standards include a code of conduct for our members, as well as extremely high standards that govern which partners we do business with, and which products or services we recommend to our members.

At futures io, our focus is on quality education. No hype, gimmicks, or secret sauce. The truth is: trading is hard. To succeed, you need to surround yourself with the right support system, educational content, and trading mentors – all of which you can find on futures io, utilizing our social trading environment.

With futures io, you can find honest trading reviews on brokers, trading rooms, indicator packages, trading strategies, and much more. Our trading review process is highly moderated to ensure that only genuine users are allowed, so you don’t need to worry about fake reviews.

We are fundamentally different than most other trading sites:

We are here to help. Just let us know what you need.

We work extremely hard to keep things positive in our community.

We do not tolerate rude behavior, trolling, or vendors advertising in posts.

We firmly believe in and encourage sharing. The holy grail is within you, we can help you find it.

We expect our members to participate and become a part of the community. Help yourself by helping others.

You'll need to register in order to view the content of the threads and start contributing to our community. It's free and simple.

I am hoping that this thread be a combination of ideas

risk management

trading psychology

discretionary trading

and how ML can possibly assist - (but not replace) decisions in these areas.

Therefore even if you are not planning to automate your trading (which I am not at this time) this thread could still be of use to you. It is not a pure ML thread at all but mostly a trading thread with ML as a possible adjunct in certain areas.

You may not know anything about ML but have questions about it and how it might help you in your trading. Though I myself will probably not know the answer, hopefully others will. Don't worry if you are a beginning trader and don't worry if your question might be silly. Asking a question can start a flow with others asking theirs and an exchange of ideas.

There are limitations to approaching trading with pure "think of something - back test - repeat"
Intelligently we wish to understand if we are seeing shadows in our setups.

As all variables and indicators (correction most variables) are price,time,volume based there is an interplay - interrelationship between them. We do expect that they will not be independent. However has an indicator added new useful information to our set-up or is the information added contained within another we are using?

Rather than replacing the human the Ml algorithms could possible guide them towards making better setups.

Naturally there are many questions a non auto-trader might like to pose (perhaps not specific at all) and this could be a place.

So even if you know nothing about ML and still have a question don't let that lack of ML stop you from posting your question.

(Clearly I am new in exploring this tool but others may to able to answer how ML could help with part of your trading strategy of trade management strategy.

@rleplae I would like to thank Ron for guiding us to this resource (Weka)

------------- update 1 Nov 2107 ----

Machine learning algorithms can be divided into 3 broad categories:

supervised learning,

unsupervised learning, and

reinforcement learning.

Supervised learning is useful in cases where a property (label) is available for a certain dataset (training set), but is missing and needs to be predicted for other instances.

Unsupervised learning is useful in cases where the challenge is to discover implicit relationships in a given unlabeled dataset (items are not pre-assigned).

Reinforcement learning falls between these 2 extremes — there is some form of feedback available for each predictive step or action, but no precise label or error message.

Supervised Learning
1. Decision Trees: A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance-event outcomes, resource costs, and utility.

--------------------------- from wiki ----------------------------------------

From a business decision point of view, a decision tree is the minimum number of yes/no questions that one has to ask, to assess the probability of making a correct decision, most of the time. As a method, it allows you to approach the problem in a structured and systematic way to arrive at a logical conclusion.

2. Naïve Bayes Classification: Naïve Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naïve) independence assumptions between the features. The featured image is the equation — with P(A|B) is posterior probability, P(B|A) is likelihood, P(A) is class prior probability, and P(B) is predictor prior probability.
Uses:
To mark an email as spam or not spam
Classify a news article about technology, politics, or sports
Check a piece of text expressing positive emotions, or negative emotions?
Used for face recognition software.

3. Ordinary Least Squares Regression: If you know statistics, you probably have heard of linear regression before. Least squares is a method for performing linear regression. You can think of linear regression as the task of fitting a straight line through a set of points. There are multiple possible strategies to do this, and “ordinary least squares” strategy go like this — You can draw a line, and then for each of the data points, measure the vertical distance between the point and the line, and add these up; the fitted line would be the one where this sum of distances is as small as possible.

4. Logistic Regression: Logistic regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution.
Uses:
Credit Scoring
Measuring the success rates of marketing campaigns
Predicting the revenues of a certain product
Is there going to be an earthquake on a particular day?

5. Support Vector Machines: SVM is binary classification algorithm. Given a set of points of 2 types in N dimensional place, SVM generates a (N — 1) dimensional hyperlane to separate those points into 2 groups. Say you have some points of 2 types in a paper which are linearly separable. SVM will find a straight line which separates those points into 2 types and situated as far as possible from all those points. With more dimensions the dividing boundry is a hyperplane. Sometimes a straight line cannot be used to separate the classes and then a non-linear line/ plane is used. Kernalling allows this (by twisting the dimesions?)

Uses:
In terms of scale, some of the biggest problems that have been solved using SVMs (with suitably modified implementations) are display advertising, human splice site recognition, image-based gender detection, large-scale image classification...

----
Logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of a binary dependent variable—that is, where the output can take only two values, "0" and "1", which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Cases where the dependent variable has more than two outcome categories may be analysed in multinomial logistic regression, or, if the multiple categories are ordered, in ordinal logistic regression.[2] In the terminology of economics, logistic regression is an example of a qualitative response/discrete choice model.

Logistic regression was developed by statistician David Cox in 1958.[2][3] The binary logistic model is used to estimate the probability of a binary response based on one or more predictor (or independent) variables (features). It allows one to say that the presence of a risk factor increases the probability of a given outcome by a specific percentage. https://en.wikipedia.org/wiki/Logistic_regression
-------
In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.[1] In computer science and some branches of mathematics, categorical variables are referred to as enumerations or enumerated types.
The probability distribution associated with a random categorical variable is called a categorical distribution.

Categorical data is the statistical data type consisting of categorical variables or of data that has been converted into that form, for example as grouped data. More specifically, categorical data may derive from observations made of qualitative data that are summarized as counts or cross tabulations, or from observations of quantitative data grouped within given intervals. Often, purely categorical data are summarized in the form of a contingency table. However, particularly when considering data analysis, it is common to use the term "categorical data" to apply to data sets that, while containing some categorical variables, may also contain non-categorical variables.
-----

The logit (/ˈloʊdʒɪt/ LOH-jit) function is the inverse of the sigmoidal "logistic" function or logistic transform used in mathematics, especially in statistics. When the function's variable represents a probability p, the logit function gives the log-odds, or the logarithm of the odds p/(1 − p).[1]

Please register on futures.io to view futures trading content such as post attachment(s), image(s), and screenshot(s).

----------
Tasks can be categorized into deep learning (the application of artificial neural networks to learning tasks that contain more than one hidden layer) and shallow learning (tasks with a single hidden layer).

Topic modeling is a related problem, where a program is given a list of human language documents and is tasked to find out which documents cover similar topics. https://en.wikipedia.org/wiki/Topic_modeling

Approaches
Main article: List of machine learning algorithms

1. Decision tree learning uses a decision tree as a predictive model, which maps observations about an item to conclusions about the item's target value.
Association rule learning
Main article: Association rule learning

2. Association rule learning is a method for discovering interesting relations between variables in large databases.

3. Artificial neural networks
An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is a learning algorithm that is inspired by the structure and functional aspects of biological neural networks. Computations are structured in terms of an interconnected group of artificial neurons, processing information using a connectionist approach to computation. Modern neural networks are non-linear statistical data modeling tools. They are usually used to model complex relationships between inputs and outputs, to find patterns in data, or to capture the statistical structure in an unknown joint probability distribution between observed variables.

4. Deep learning
Falling hardware prices and the development of GPUs for personal use in the last few years have contributed to the development of the concept of deep learning which consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.[29]

5. Inductive logic programming

Inductive logic programming (ILP) is an approach to rule learning using logic programming as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples. Inductive programming is a related field that considers any kind of programming languages for representing hypotheses (and not only logic programming), such as functional programs.

6. Support vector machines

Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.

7. Clustering
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to some predesignated criterion or criteria, while observations drawn from different clusters are dissimilar. Different clustering techniques make different assumptions on the structure of the data, often defined by some similarity metric and evaluated for example by internal compactness (similarity between members of the same cluster) and separation between different clusters. Other methods are based on estimated density and graph connectivity. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis.

8. Bayesian networks

A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning.

9. Reinforcement learning

Reinforcement learning is concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.

10. Representation learning

Several learning algorithms, mostly unsupervised learning algorithms, aim at discovering better representations of the inputs provided during training. Classical examples include principal components analysis and cluster analysis. Representation learning algorithms often attempt to preserve the information in their input but transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions, allowing reconstruction of the inputs coming from the unknown data generating distribution, while not being necessarily faithful for configurations that are implausible under that distribution.

Manifold learning algorithms attempt to do so under the constraint that the learned representation is low-dimensional. Sparse coding algorithms attempt to do so under the constraint that the learned representation is sparse (has many zeros). Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into (high-dimensional) vectors.[30] Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.[31]

10. Similarity and metric learning

In this problem, the learning machine is given pairs of examples that are considered similar and pairs of less similar objects. It then needs to learn a similarity function (or a distance metric function) that can predict if new objects are similar. It is sometimes used in Recommendation systems.

11. Sparse dictionary learning

In this method, a datum is represented as a linear combination of basis functions, and the coefficients are assumed to be sparse. Let x be a d-dimensional datum, D be a d by n matrix, where each column of D represents a basis function. r is the coefficient to represent x using D. Mathematically, sparse dictionary learning means solving x ≈ D r {\displaystyle x\approx Dr} where r is sparse. Generally speaking, n is assumed to be larger than d to allow the freedom for a sparse representation.

Learning a dictionary along with sparse representations is strongly NP-hard and also difficult to solve approximately.[32] A popular heuristic method for sparse dictionary learning is K-SVD.

Sparse dictionary learning has been applied in several contexts. In classification, the problem is to determine which classes a previously unseen datum belongs to. Suppose a dictionary for each class has already been built. Then a new datum is associated with the class such that it's best sparsely represented by the corresponding dictionary. Sparse dictionary learning has also been applied in image de-noising. The key idea is that a clean image patch can be sparsely represented by an image dictionary, but the noise cannot.[33]

12. Genetic algorithms

A genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and uses methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms found some uses in the 1980s and 1990s.[34][35] Vice versa, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.[36]

13. Rule-based machine learning

Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves `rules’ to store, manipulate or apply, knowledge. The defining characteristic of a rule-based machine learner is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learners that commonly identify a singular model that can be universally applied to any instance in order to make a prediction.[37] Rule-based machine learning approaches include learning classifier systems, association rule learning, and artificial immune systems.

14. Learning classifier systems

Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement learning, or unsupervised learning). They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions.[38]
Applications

Applications for machine learning include:

Automated theorem proving[39][40]
Adaptive websites[citation needed]
Affective computing
Bioinformatics
Brain–machine interfaces
Cheminformatics
Classifying DNA sequences
Computational anatomy
Computer vision, including object recognition
Detecting credit-card fraud
General game playing[41]
Information retrieval
Internet fraud detection[28]
Linguistics
Marketing
Machine learning control
Machine perception
Medical diagnosis
Economics
Insurance
Natural language processing
Natural language understanding[42]
Optimization and metaheuristic
Online advertising
Recommender systems
Robot locomotion
Search engines
Sentiment analysis (or opinion mining)
Sequence mining
Software engineering
Speech and handwriting recognition
Financial market analysis
Structural health monitoring
Syntactic pattern recognition
Time series forecasting
User behavior analytics
Translation[43]

There is an example of a credit file with 1000 instances (this would represent 1000 people applying for loans).

The cost of a rejecting a good credit risk (a mis-classifcation) is considered to be 1/5 of the cost of accepting a bad credit risk. (so a missed opportunity is 1/5 of making a bad decision)

Using one ML algorithm (J48) without consideration to cost you could correctly select 70%
and missing a good application cost you $1000
and accepting a bad application costs you $5000
you would have total costs of $1,027,000 on 1000 applications.

Please register on futures.io to view futures trading content such as post attachment(s), image(s), and screenshot(s).

(I multiplied the $1 and $5 by $1000)
--------------
If you apply a cost matrix to the prb then the ML minimizes total costs by sacrificing total correct for less bad applications accepted.

Please register on futures.io to view futures trading content such as post attachment(s), image(s), and screenshot(s).

Keep your mind in the future, in the now.

The following user says Thank You to aquarian1 for this post:

the cost of taking a setup that turns to a costly loss can be higher than

the cost of not taking a setup that would have given a small profit

More experienced traders realize you cannot just set a stop whereever you like eg 1:5 W:L size ratio. Where a stop can be placed is a function of the instrument the place in the price movement and many other factors.

If stops are too tight for these factors you keep getting stopped out quickly. - dying a death of a thousand cuts.

However different setups in your arsenal can have different:

expectations of working out

amount that they produce

So I see a potential analogy to the above post. The cost of losing is much higher than not winning =missing out on an opportunity.

Many losses of $500 in a row (or a high percentage of them to wins) can quickly knock out a small capital trader.
It would be like a small new startup bank with limited capital -they just can take too many losses at the start until they build up their capital. They are better to pass on doubtful setups.

Also it is psychologically expensive eroding your confidence.

Keep your mind in the future, in the now.

The following user says Thank You to aquarian1 for this post:

Why not add to one of the existing big discussion threads in the Elite Automated Trading section?

Mike

Due to time constraints, please do not PM me if your question can be resolved or answered on the forum.

Need help? 1) Stop changing things. No new indicators, charts, or methods. Be consistent with what is in front of you first. 2) Start a journal and post to it daily with the trades you made to show your strengths and weaknesses. 3) Set goals for yourself to reach daily. Make them about how you trade, not how much money you make. 4) Accept responsibility for your actions. Stop looking elsewhere to explain away poor performance. 5) Where to start as a trader? Watch this webinar and read this thread for hundreds of questions and answers. 6) Help using the forum? Watch this video to learn general tips on using the site.

If you want to support our community, become an Elite Member.

So this is why I made this thread. I don't see using ML for trading as a simple throw numbers in the top and turn the crank. One needs to understand the complexity of trading to intelligently apply ML to it.

That is the reason I am working slowly through the course. I am trying to pause and think of examples of its application and mis-application. I see a danger in just rushing into applying ML without knowing the dangers and limitations.

Selecting a subset of attributes rather than the entire db

not using the training set for testing

how NaiveBayes can give very good results even when the assumption of Independence is clearly violated

etc

I can am starting to see the complexity of the many ML techniques.
when that complex matrix is applied to the complexity of the many, many factors that go into trading
we get a complexity to the second power.

Though that gives challenges it gives potential.

Finding the right ML techniques to complement your trading strategies will probably prove to be an art.

Keep your mind in the future, in the now.

The following user says Thank You to aquarian1 for this post:

The reason is in the post after yours. (you are too quick for me!). :-)

I want to stay apart from the existing threads on automation that I see being much more linear -with more depth to the discussion and not only about automated trading.

I think there is room for a thread on the intricacies of intelligently applying the factors of ML, discretionary trading, trading psychology and risk management.

However, I am perfectly OK if you move this to a non-elite trading journal if you feel this is the wrong place.

Keep your mind in the future, in the now.

The following 2 users say Thank You to aquarian1 for this post:

When I looked at the credit model example I could see a parallel to trading.

In the credit model 1000 loan applications are the training set (the training set would be be like your data base (=db)of past trades for your strategy).
So this is 1000 historical setups, 1000 instances.
In the credit db 700 are good loans and 300 are bad so that is like your db with 700 setups were "good" and 300 were "bad".

For each loan you had characteristics (credit rating, own or rent, reason for the loan etc.) -these are called attributes.
For your setup you had attributes (above/below 20Ma, momentum positive or negative, etc)

Please register on futures.io to view futures trading content such as post attachment(s), image(s), and screenshot(s).

Now each loan had a result (called the class attribute) repaid=good, not repaid=bad
and each of your trades has a result win/loss.

Each trade has a potential cost (as well as a potential payoff). So if you lose 5 pts on a "Bigswings" that don't work out and lose 1pt on "scaplers" that don't work out then this is the costs.

So in the lesson they are teaching that if you know your costs for each outcome and they are different (5 vs 1) then rather go for the highest accuracy in your ML algorithm you would seek to minimize the total cost. On best accuracy alone you might 70% predictive accuracy but with lowest total cost you might have 60.9%. accuracy.

Keep your mind in the future, in the now.

The following user says Thank You to aquarian1 for this post:

Last night I did the lesson on neutral networks. The instructor is not very impressed. He did his Master Thesis on them (at U of Calgary - which seems to have a strong ML comp sci dept), and made an improved algo. What he is impressed with is the name - thinks its brilliant! neural nets? not so much so.

In any case they were called perceptron in 1957 when first started, then fell out of favour with a paper showing their limitations and then came back into vogue when a method around the limitation (of linear boundary) was solved with the "kernal" trick. {They are akin to support vector machines which also use boundaries for classification.} They can be multilayer (hidden layers in addition to the input and output layer). Additional layers greatly increase the number of permutations and therefore computations - though may not add much to predictive accuracy in datamining. They are akin to linear regression analysis and go through repetitive learning cycles to adjust weights vectors based on the error rate (using gradient decent and later error increase to terminate the searching =epoch cycles).

Please register on futures.io to view futures trading content such as post attachment(s), image(s), and screenshot(s).

I take notes as I am doing the lessons (these lessons are youtube videos).

2 neutral net options
1. The Voted Perceptron (under function Classifiers) and it got 86% on the ionosphere
2. SMO is another choice and it got 88.6%

BY COMPARISON two other algos that are not NN:
1. Logistics = 88.9%
2. SimpleLogistic = 88.3%

So on that db the 2 nn algos didn't perform better than the Logistics algo.

------------------- my takeaway ------

I have noticed in trading many people are attracted to complicated sounded things. So if their indicator was called "heuristic adaptive quantum stochastic" indicator -- boy it must be good !

but a brilliant name that conjures how sophisticated one is seem to pander more to one's ego that to results. Certainly Kiss can't be over-applied to ST day trading for there is complexity in the markets. However, watching others via journals etc - one of the biggest dangers to a trader is too much ego. (BTW I'm not calling confidence, ego)

Especially for new traders jumping to complexities before getting a sold handle on basics (S/R, double tops etc) can be a recipe for account blow-up.

Keep your mind in the future, in the now.

Last edited by aquarian1; October 11th, 2017 at 12:41 PM.

The following user says Thank You to aquarian1 for this post: