IBKR Quant Blog


1 2 3 4 5 2 43


Quant

Decision Tree For Trading Using Python - Part IV


By Mario Pisa PeñaQuantInsti Blog

In the final piece of this series, the author will focus on Visualization. In Part I, Part II, and Part III he discussed creating the predictor variables and obtaining data for decision trees.

Visualize Decision Trees for Classification

We have at our disposal a very powerful tool that will help us to analyze graphically the tree that the ML algorithm has created automatically. The graph showed the most significant nodes that maximize the output and will help us determine, if applicable, some useful trading rules.

The graphviz library allows us to graph the tree that the DecisionTreeClassifier function has automatically created with the training data.

from sklearn import tree
import graphviz
dot_data = tree.export_graphviz(clf, out_file=None,filled=True,feature_names=predictors_list)
graphviz.Source(dot_data)

 

Quant

 

Note that the image only shows the most significant nodes. In this graph we can see all the relevant information in each node:

  • The predictor variable used to split the data set.
  • The value of Gini impurity.
  • The number of data points available at that node.
  • The number of target variable data points belonging to each class, 1 and 0.

We can observe a pair of pure nodes that allows us to deduce possible trading rules. For example:

  • On the third leaf starting from the left, when the closing price is lower than the EMA10 and the ATR is higher than 51.814 and the RSI is lower than or equal to 62.547, the marking decreases.
  • On the fifth leaf starting from the left, we can deduce the following rule: When the ADX is less than or equal to 19.243 and the RSI is less than or equal to 62.952 and the RSI is greater than 62.547, the market goes up.


Make forecast

Now let’s make predictions with data sets reserved for testing, this is the part that will let us know if the algorithm is reliable with unknown data in training.

y_cls_pred = clf.predict(X_cls_test)

Performance analysis

Finally, we can only evaluate the performance of the algorithm on unknown data by comparing it with the result obtained in the training process. To do this we will use the classification_report function of the sklearn.metrics library.

The report shows some parameters that will help us to evaluate the goodness of the algorithm:

  • Precision: Indicate the quality of our predictions.
  • Recall: Indicate the quality of our predictions.
  • F1-score: Shows the harmonic mean of precision and recall.
  • Support: Used as weights to compute the average values of precision, recall and F-1.

 

Anything above 0.5 is usually considered a good number.

from sklearn.metrics import classification_report
report = classification_report(y_cls_test, y_cls_pred)
print(report)

Quant

 

Decision Trees for Regression

Now let’s create the regression decision tree using the DecisionTreeRegressor function from the sklearn.tree library.

Although the DecisionTreeRegressor function has many parameters that I invite you to know and experiment with (help(DecisionTreeRegressor)), here we will see the basics to create the regression decision tree.

Quant

Refer to the parameters with which the algorithm must build the tree, because it follows a recursive approach to build the tree, we must set some limits to create it.

  • criterion: For the classification decision trees, we can choose Mean Absolute Error (MAE) or Mean Square Error (MSE), these criteria are related with the loss function to evaluate the performance of a learning machine algorithm and are the most used for the regression algorithms, although it is beyond the scope of this post, basically serves us to adjust the accuracy of the model, also the algorithm to build the tree, stops evaluating the branches in which no improvement is obtained according to the loss function. Here we left the default parameter to Mean Square Error (MSE).
  • max_depth: Maximum number of levels the tree will have, here we left the default parameter to none.
  • min_samples_leaf: This parameter is optimizable and indicates the minimum number of leaves that we want the tree to have.
# Regression tree model
from sklearn.tree import DecisionTreeRegressor
dtr = DecisionTreeRegressor(min_samples_leaf = 200)

Now we are going to train the model with the training datasets, we adjust the model and the algorithm would already be fully trained.

dtr.fit(X_rgs_train, y_rgs_train)

Now we need to make forecasts with the model on unknown data, for this we will use 30% of the data that we had left reserved for testing and, finally, evaluate the performance of the model. But first, let’s take a graphical look at the regression decision tree that the ML algorithm has automatically created for us.

Visualize the model

To visualize the tree, we use again the graphviz library that gives us an overview of the regression decision tree for analysis.

from sklearn import tree
import graphviz
dot_data = tree.export_graphviz(dtr,
                  out_file=None,
                  filled=True,
                  feature_names=predictors_list)
graphviz.Source(dot_data)
 

 

Quant

 

In this graph we can see all the relevant information in each node:

  • The predictor variable used to split the data set.
  • The value of MSE.
  • The number of data points available at that node.


Conclusion

It might look like we’ve found a crystal ball, but nothing could be further from the truth. Mastering these techniques requires a lot of study and an integral understanding of the mathematical techniques behind them.

Apparently, trees are easy to create and extract some rules that promise to be useful, but the truth is that to create decision trees, they need to be parametrized and these parameters can and must be optimized.

In order to continue to deepen our knowledge of decision trees and really help us to extract reliable trading rules, we will advance in the next post with the ensemble mechanisms to create a robust model combining the models created by one algorithm.

  • Parallel ensemble methods or averaging methods: several models are created by one algorithm and the forecast is the average of the overall models:
    • Bagging
    • Random Subspace
    • Random Forest
  • Sequential ensemble methods or boosting methods: the algorithm creates sequential models refining on each new model to reduce the bias of the previous one:
    • AdaBoosting
    • Gradient Boosting

 

To download the code in this article, visit QuantInsti website.  See what they offer for Python educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


21874




Quant

qplum - A Machine Learning Approach to Systematic Global Macro Strategies


In case you missed it! The webinar recording is available on our YouTube channel:

https://youtu.be/KfMWzPcamTs

 

Quant

 

From the time of Benjamin Graham to Jeremy Grantham, using macroeconomic data in asset allocation has been the most universally accepted way of tactical asset allocation. However, without a machine learning foundation, global macro has often been considered an art and not a science. In this webinar, we will give a preview to use of macro-economic data for asset allocation in a deep learning framework. Think of this as a "RoboWarren" which is able to look at over 100 macro-economic indicators to predict if a stock market correction is imminent.

Speaker: Ankit Awasthi, Quantitative Portfolio Manager at qplum

Sponsor: qplum

 

Information posted on IBKR Quant that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Quant are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.


21982




Quant

Measuring Factor Exposures: Uses and Abuses


The post Measuring Factor Exposures: Uses and Abuses appeared first on Alpha Architect Blog https://alphaarchitect.com/blog/

 

By Tommi Johnsen, PhD | December 3rd, 2018
 

Measuring Factor Exposures: Uses and Abuses

  • Ronen Israel and Adrienne Ross
  • The Journal of Alternative Investments
  • A version of this paper can be found here
  • Want to read our summaries of academic finance papers? Check out our Academic Research Insight category

What are the research questions?

  1. USES: Can investors really separate “alpha” from “beta”? What are the ins-and-outs of understanding the exposures in a portfolio and their contribution to “alpha”?
  2. ABUSES: Are there differences in the way strategies are constructed in academic articles vs. the way practitioners actually implement those strategies that are consequential for investors?

What are the Academic Insights?

  1. YES.  But it’s a bit complicated.  Two things are needed to separate alpha from beta: first, a risk model and second, use of regression to match the returns from a portfolio to the risk exposures present in that portfolio. What are the appropriate risk factors (CAPM, Fama-French, etc.) that are associated with or explain returns produced by a portfolio?  Once identified, then regression analysis is conducted to decompose portfolio returns into their component sources of risk as they exist in the portfolio. Portfolio returns are regressed again the returns of the risk variables which accomplishes the separation need to isolate alpha.  Any return that is not separated and associated with a risk exposure is real alpha.  The results of several such regressions are presented in Exhibit 3.  Four risk models are estimated beginning with the basic CAPM (model 1) and three Fama-French factors are added sequentially (models 2,3,4).  Note that as risk variables are added, alpha decreases from 6.1% to 1.8% as it is essentially being reassigned to the risk exposures actually embedded in the portfolio.
  2. YES.  While not intuitively obvious, the way that academics conduct research is designed to illustrate or test a theory and hopefully get the results published.  The studies are definitely not designed to illustrate how investors should implement their significant results.  Be aware of the following potential sources of discrepancies in the performance of actual portfolios and the academic results that are published. First, academics may not account for the costs of implementation including management fees, transactions costs, and taxes. Second, practitioners are generally interested in large and mid-cap universes for reasons of investibility. Academics may construct their studies to encompass all capitalization classes. Therefore, their results may be a function of the proportion of smaller capitalization ranges included in the study.  Academic studies are often unconcerned about industry exposures and, for the most part, they utilize equal weighting schemes. And while the studies are very concerned with making risk adjustments to their results (i.e., determining the presence of “alpha”) they are not concerned with the practical requirements of maintaining a particular active risk stance in the portfolio.  Factor studies generally construct portfolios using dollar and market neutral long-short positions to test their theories. And although practitioners may also construct long-short portfolios, they are often constrained by practical considerations such as the need to control active risk exposures and the need to maintain asset allocations. Finally, differences in how factors are measured (single component factors vs. multiple components) may produce more or less robust results.

Why does it matter?

When investors are choosing among managers and the various products offered, the comparison of alpha and beta can be tricky. Are the factors constructed similarly across managers and consistent with respect to implementation costs? Is the manager delivering a portfolio of factor tilts and is that consistent with the investment objective? Can any of the differences in performance relative to published research on factors be explained by the factor and portfolio construction process? Ultimately the investor will want to distinguish between managers offering a portfolio of factor tilts vs. managers delivering real alpha.

The most important chart from the paper

Quant-Alpha-Architect

 

The results are hypothetical results and are NOT an indicator of future results and do NOT represent returns that any investor actually attained. Indexes are unmanaged and do not reflect management or trading fees, and one cannot invest directly in an index.

Visit Alpha Architect Blog to read an abstract from the paper Measuring Factor Exposures: Uses and Abuses.

 

  • The views and opinions expressed herein are those of the author and do not necessarily reflect the views of Alpha Architect, its affiliates or its employees. Our full disclosures are available here. Definitions of common statistics used in our analysis are available here (towards the bottom).
  • Join thousands of other readers and subscribe to our blog.
  • This site provides NO information on our value ETFs or our momentum ETFs. Please refer to this site.

  

Alpha Architect empowers investors through education. The company designs affordable active management strategies for Exchange-Traded Funds and Separately Managed Accounts. Visit their website to learn more: https://alphaarchitect.com

This article is from Alpha Architect Blog and is being posted with Alpha Architect’s permission. The views expressed in this article are solely those of the author and/or Alpha Architect and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


21826




Quant

2018 Recap - Popular Quant Article Tags


Python takes the crown for the most popular article tag in the IBKR Quant Blog 2018 Tag Cloud.

R programming language ranks second, followed closely by Algo Trading, Machine Learning, and Artificial Intelligence.
 

Quant

Graph source: IBKR Tag Cloud Image created with Free online Wordcloud generator https://www.wordclouds.com/. Data from IBKR Quant Blog.

 

Find links below to read articles on these popular topics.


Python


R

 

Algo Trading, Machine Learning, and Artificial Intelligence

 

 

What to look for in 2019?

As data science tools constantly evolve, stay tuned for more articles on Alternative Data and Cryptocurrency. Learn more about the current trends in this field with these articles:

 

 

Information posted on IBKR Quant that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Quant are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.


21875




Quant

Decision Tree For Trading Using Python - Part III


By Mario Pisa PeñaQuantInsti Blog

In this installment, the author will discuss how to Obtain the data set for decision trees. In Part I and Part II, he discussed creating the predictor variables.

Obtaining the data set for decision trees

We have all the data ready! We have downloaded the market data, applied some technical indicators as predictor variables and defined the target variable for each type of problem, a categorical variable for the classification decision tree and a continuous variable for the regression decision tree.

We are going to do a small operation to sanitize the data and prepare the data set that each algorithm will use. We must to clean the data dropping the NA data, this step is crucial to compute cleanly the trees.

Next, we are going to create the data set of the predictor variables, that is to say, the indicators that we have calculated, this data set is common to the two decision trees that we are going to create, a classification decision tree and a regression decision tree.

X = df[predictors_list]
X.tail()

Quant

We then select the target dataset for the classification decision tree:

y_cls = df.target_cls
y_cls.tail()

Quant

Finally, we select the target dataset for the regression decision tree:

y_rgs = df.target_rgs
y_rgs.tail()

Quant

Splitting the data into training and testing data sets

The last step to finish with the preparation of the data sets is to split them into train and test data sets. This is necessary to fit the model with a set of data, usually 70% or 80% and the remainder, to test the goodness of the model. If we do not do so, we would run the risk of over-fitting the model. We want to test the model with unknown data, once the model has been fitted in order to evaluate the model accuracy.

We’re going to create the train data set with the 70% of the data from predictor and target variables data sets and the remainder 30% to test the model.

For classification decision trees, we’re going to use the train_test_split function from sklearn model_selection library to split the dataset. Since the output is categorical, it is important that the training and test datasets are proportional train_test_split function has as input the predictor and target datasets and some input parameters:

  • test_size: The size of the test data set, in this case, 30% of the data for the tests and, therefore, 70% for the training.
  • random_state: Since the sampling is random, this parameter allows us to reproduce the same randomness in each execution.
  • stratify: To ensure that the training and test sample data are proportional, we set the parameter to yes. This means that, for example, if there are more days with positive than negative return, the training and test samples will keep the same proportion.
from sklearn.model_selection import train_test_split
y=y_cls
X_cls_train, X_cls_test, y_cls_train, y_cls_test = train_test_split(X, y, test_size=0.3, random_state=432, stratify=y)
print (X_cls_train.shape, y_cls_train.shape)
print (X_cls_test.shape, y_cls_test.shape)

Here we have:

  • Train predictor variables dataset: X_cls_train
  • Train target variables dataset: y_cls_train
  • Test predictor variables dataset: X_cls_test
  • Test target variables dataset: y_cls_test

For regression decision trees we simply split the data at the specified rate, since the output is continuous, we don’t worry about the proportionality of the output in training and test datasets.

Again, here we have:

  • Train target variables dataset: y_rgs_train
  • Test predictor variables dataset: X_rgs_test
  • Test target variables dataset: y_rgs_test

So far we’ve done:

  • Download the market data.
  • Calculate the indicators that we will use as predictor variables.
  • Define the target variables.
  • Split the data into training set and test set.

With slight variations in obtaining the target variables and the procedure of splitting the data sets, the steps taken have been the same so far.

Decision Trees for Classification

Now let’s create the classification decision tree using the DecisionTreeClassifier function from the sklearn.tree library.

Although the DecisionTreeClassifier function has many parameters that I invite you to know and experiment with (help(DecisionTreeClassifier)), here we will see the basics to create the classification decision tree.

Quant

Basically refer to the parameters with which the algorithm must build the tree, because it follows a recursive approach to build the tree, we must set some limits to create it.

  • criterion: For the classification decision trees we can choose Gini or Entropy and Information Gain, these criteria refer to the loss function to evaluate the performance of a learning machine algorithm and are the most used for the classification algorithms, although it is beyond the scope of this post, basically serves us to adjust the accuracy of the model, also the algorithm to build the tree, stops evaluating the branches in which no improvement is obtained according to the loss function.
  • max_depth: Maximum number of levels the tree will have.
  • min_samples_leaf: This parameter is optimizable and indicates the minimum number of samples that we want to have in leaves.
     
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion='gini', max_depth=3, min_samples_leaf=6)
clf

Now we are going to train the model with the training datasets, we fit the model and the algorithm would already be fully trained.

clf = clf.fit(X_cls_train, y_cls_train)
clf

Now we need to make forecasts with the model on unknown data, for this we will use 30% of the data that we had left reserved for testing and, finally, evaluate the performance of the model. But first, let’s take a graphical look at the classification decision tree that the ML algorithm has automatically created for us.

 

To download the code in this article, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


21872




1 2 3 4 5 2 43

Disclosures

We appreciate your feedback. If you have any questions or comments about IBKR Quant Blog please contact ibkrquant@ibkr.com.

The material (including articles and commentary) provided on IBKR Quant Blog is offered for informational purposes only. The posted material is NOT a recommendation by Interactive Brokers (IB) that you or your clients should contract for the services of or invest with any of the independent advisors or hedge funds or others who may post on IBKR Quant Blog or invest with any advisors or hedge funds. The advisors, hedge funds and other analysts who may post on IBKR Quant Blog are independent of IB and IB does not make any representations or warranties concerning the past or future performance of these advisors, hedge funds and others or the accuracy of the information they provide. Interactive Brokers does not conduct a "suitability review" to make sure the trading of any advisor or hedge fund or other party is suitable for you.

Securities or other financial instruments mentioned in the material posted are not suitable for all investors. The material posted does not take into account your particular investment objectives, financial situations or needs and is not intended as a recommendation to you of any particular securities, financial instruments or strategies. Before making any investment or trade, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. Past performance is no guarantee of future results.

Any information provided by third parties has been obtained from sources believed to be reliable and accurate; however, IB does not warrant its accuracy and assumes no responsibility for any errors or omissions.

Any information posted by employees of IB or an affiliated company is based upon information that is believed to be reliable. However, neither IB nor its affiliates warrant its completeness, accuracy or adequacy. IB does not make any representations or warranties concerning the past or future performance of any financial instrument. By posting material on IB Quant Blog, IB is not representing that any particular financial instrument or trading strategy is appropriate for you.