Close Navigation
Learn more about IBKR accounts
Machine Learning for Algorithmic Trading in Python: A Complete Guide – Part III

Machine Learning for Algorithmic Trading in Python: A Complete Guide – Part III

Posted September 25, 2023
Chainika Thakar and Varun Divakar
QuantInsti

See Part I for an overview and Part II for creating hyperparameters.

Splitting the data into test and train sets

First, let us split the data into the input values and the prediction values. Here we pass on the OHLC data with one day lag as the data frame X and the Close values of the current day as y. Note the column names below in lower-case.

X = df[['open','high','low','close']]
y = df['Close']

Test_traind_data.py hosted with ❤ by GitHub

In this example, to keep the machine learning for algorithmic trading with Python tutorial short and relevant, I have chosen not to create any polynomial features but to use only the raw data.

If you are interested in various combinations of the input parameters and with higher degree polynomial features, you are free to transform the data using the PolynomialFeature() function from the preprocessing package of scikit learn.

You can find detailed information in Quantra course on Python for Machine Learning in Finance.

Now, let us also create a dictionary that holds the size of the train data set and its corresponding average prediction error.

avg_err={}

Average_error.py hosted with ❤ by GitHub

Getting the best-fit parameters to create a new function

I want to measure the performance of the regression function as compared to the size of the input dataset. In other words, I want to see if, by increasing the input data, we will be able to reduce the error. For this, I used for loop to iterate over the same data set but with different lengths.

At this point, I would like to add that for those of you who are interested, explore the ‘reset’ function and how it will help us make a more reliable prediction.

(Hint: It is a part of the Python magic commands)

import numpy as np
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV
from sklearn.impute import SimpleImputer

imp = SimpleImputer()

for t in np.arange(50, 97, 3):
    split = int(t * len(X) / 100)
    reg = GridSearchCV(estimator=Lasso(), param_grid={'alpha': [0.1, 0.5, 1.0], 'max_iter': [1000, 2000, 5000]})
    reg.fit(X[:split], y[:split])
    best_alpha = reg.best_params_['alpha']
    best_iter = reg.best_params_['max_iter']
    reg1 = Lasso(alpha=best_alpha, max_iter=best_iter)
    X = imp.fit_transform(X, y)
    reg1.fit(X[:split], y[:split])

Best_fit_parameters.py hosted with ❤ by GitHub

Let me explain what I did in a few steps.

First, I created a set of periodic numbers ‘t’ starting from 50 to 97, in steps of 3. The purpose of these numbers is to choose the percentage size of the dataset that will be used as the train data set.

Second, for a given value of ‘t’, I split the length of the data set to the nearest integer corresponding to this percentage. Then I divided the total data into train data, which includes the data from the beginning till the split, and test data, which includes the data from the split till the end. The reason for adopting this approach and not using the random split is to maintain the continuity of the time series.

After this, we pull the best parameters that generated the lowest cross-validation error and then use these parameters to create a new reg1 function, a simple Lasso regression fit with the best parameters.

Stay tuned for the next installment in this series for more details on how test and train the algo.

Originally posted on QuantInsti Blog.

Join The Conversation

If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.

Leave a Reply

Your email address will not be published. Required fields are marked *

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.