Close Navigation
Learn more about IBKR accounts
Machine Learning for Algorithmic Trading in Python: A Complete Guide – Part II

Machine Learning for Algorithmic Trading in Python: A Complete Guide – Part II

Posted September 15, 2023
Chainika Thakar and Varun Divakar
QuantInsti

See Part I for an overview.

Prerequisites for creating machine learning algorithms for trading using Python

Extensive Python libraries and frameworks make it a popular choice for machine learning tasks, enabling developers to implement and experiment with various algorithms, process and analyse data efficiently, and build predictive models.

In order to create the machine learning algorithms for trading using Python, you will need the following prerequisites:

  • Installation of Python packages and libraries meant for machine learning
  • Full-fledged knowledge of steps of machine learning
  • Knowing the application models

Install a few packages and libraries

Python machine learning specifically focuses on using Python for the development and application of machine learning models.

You may add one line to install the packages “pip install numpy” You can install the necessary packages in the Anaconda Prompt using the codes as mentioned below.

  • Scikit-learn for machine learning
  • TensorFlow for deep learning
  • Keras for deep learning
  • PyTorch for neural networks
  • NLTK for natural language processing

Full-fledged knowledge of steps of machine learning

In addition to general Python knowledge, proficiency in Python machine learning necessitates a deeper understanding of machine learning concepts, algorithms, model evaluation, feature engineering, and data preprocessing.

Knowing the application models

The primary focus of Python machine learning is the development and application of models and algorithms for tasks like classification, regression, clustering, recommendation systems, natural language processing, image recognition, and other machine learning applications.


How to use algorithmic trading with machine learning in Python?

Let us see the steps to doing algorithmic trading with machine learning in Python. These steps are:

  • Problem statement
  • Getting the data and making it usable for machine learning algorithm
  • Creating hyperparameter
  • Splitting the data into test and train sets
  • Getting the best-fit parameters to create a new function
  • Making the predictions and checking the performance

Problem Statement

Let’s start by understanding what we are aiming to do. By the end of this machine learning for algorithmic trading with Python tutorial, I will show you how to create an algorithm that can predict the closing price of a day from the previous OHLC (Open, High, Low, Close) data.

I also want to monitor the prediction error along with the size of the input data.

Let us import all the libraries and packages needed to build this machine-learning algorithm.

import numpy as np
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import RandomizedSearchCV as rcv
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt
from IPython import get_ipython

Import_libraries_ML.py hosted with ❤ by GitHub

Getting the data and making it usable for machine learning algorithm

To create any algorithm, we need data to train the algorithm and then to make predictions on new unseen data. In this machine learning for algorithmic trading with Python tutorial, we will fetch the data from Yahoo.

To accomplish this, we will use the data reader function from the pandas library. This function is extensively used, enabling you to get data from many online sources.

avg_err={}
avg_train_err={}

# To fetch financial data
import yfinance as yf

# Fetch data
AAPL_data= yf.download('AAPL', start='2005-1-1', end='2023-1-1', auto_adjust = True)
df = df[['Open', 'High', 'Low', 'Close']]

Fetch_data_AAPL.py hosted with ❤ by GitHub

We are fetching the data of AAPL(ticker) or APPLE. This stock can be used as a proxy for the performance of the S&P 500 index. We specify the year starting from which we will be pulling the data.

Once the data is in, we will discard any data other than the OHLC, such as volume and adjusted Close, to create our data frame ‘df ’.

Now we need to make our predictions from past data, and these past features will aid the machine learning model trade. So, let’s create new columns in the data frame that contain data with one day lag.

df = AAPL_data[['Open', 'High', 'Low', 'Close']].copy()
df['open']=AAPL_data['Open'].shift(1)
df['high']=AAPL_data['High'].shift(1)
df['low']=AAPL_data['Low'].shift(1)
df['close']=AAPL_data['Close'].shift(1)
df=df.dropna()

Data_one_day_lag.py hosted with ❤ by GitHub

Note: The capital letters are dropped for lower-case letters in the names of new columns.

Creating Hyperparameters

Although the concept of hyperparameters  is worthy of a blog in itself, for now I will just say a few words about them. These are the parameters that the machine learning algorithm can’t learn over but needs to be iterated over. We use them to see which predefined functions or parameters yield the best-fit  function.

imp = SimpleImputer(missing_values=np.nan, strategy='mean')
steps = [('imputation', imp),
('scaler',StandardScaler()),
('lasso',Lasso())]
pipeline =Pipeline(steps)
parameters = {'lasso__alpha':np.arange(0.0001,10,.0001),
'lasso__max_iter':np.random.uniform(100,100000,4)}
reg = rcv(pipeline, parameters,cv=5)

Creating_hyperparameters.py hosted with ❤ by GitHub

In this example, I have used Lasso regression which uses the L1 type of regularisation. This is a type of machine learning model based on regression analysis which is used to predict continuous data.

This type of regularisation is very useful when you are using feature selection. It is capable of reducing the coefficient values to zero. The SimpleImputer function replaces any NaN values that can affect our predictions with mean values, as specified in the code.

The ‘steps’ are a bunch of functions that are incorporated as a part of the Pipeline function. The pipeline is a very efficient tool to carry out multiple operations on the data set. Here we have also passed the Lasso function parameters along with a list of values that can be iterated over.

Although I am not going into details of what exactly these parameters do, they are something worthy of digging deeper into. Finally, I called the randomised search function for performing the cross-validation.

In this example, we used 5-fold cross-validation. In k-fold cross-validation, the original sample is randomly partitioned into k equal-sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k-1 subsamples are used as training data.

The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. Cross-validation combines (averages) measures of fit (prediction error) to derive a more accurate estimate of model prediction performance.

Based on the fit parameter, we decide on the best features.

In the next section of the machine learning for algorithmic trading with Python tutorial, we will look at test and train sets.

Stay tuned for Part III to learn how to split the data into test and train sets.

Originally posted on QuantInsti Blog.

Join The Conversation

If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.

Leave a Reply

Your email address will not be published. Required fields are marked *

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.