Machine Learning for Algorithmic Trading in Python: A Complete Guide – Part II

See Part I for an overview.

Prerequisites for creating machine learning algorithms for trading using Python

Extensive Python libraries and frameworks make it a popular choice for machine learning tasks, enabling developers to implement and experiment with various algorithms, process and analyse data efficiently, and build predictive models.

In order to create the machine learning algorithms for trading using Python, you will need the following prerequisites:

Installation of Python packages and libraries meant for machine learning
Full-fledged knowledge of steps of machine learning
Knowing the application models

Install a few packages and libraries

Python machine learning specifically focuses on using Python for the development and application of machine learning models.

You may add one line to install the packages “pip install numpy” You can install the necessary packages in the Anaconda Prompt using the codes as mentioned below.

Scikit-learn for machine learning
TensorFlow for deep learning
Keras for deep learning
PyTorch for neural networks
NLTK for natural language processing

Full-fledged knowledge of steps of machine learning

In addition to general Python knowledge, proficiency in Python machine learning necessitates a deeper understanding of machine learning concepts, algorithms, model evaluation, feature engineering, and data preprocessing.

Knowing the application models

The primary focus of Python machine learning is the development and application of models and algorithms for tasks like classification, regression, clustering, recommendation systems, natural language processing, image recognition, and other machine learning applications.

How to use algorithmic trading with machine learning in Python?

Let us see the steps to doing algorithmic trading with machine learning in Python. These steps are:

Problem statement
Getting the data and making it usable for machine learning algorithm
Creating hyperparameter
Splitting the data into test and train sets
Getting the best-fit parameters to create a new function
Making the predictions and checking the performance

Problem Statement

Let’s start by understanding what we are aiming to do. By the end of this machine learning for algorithmic trading with Python tutorial, I will show you how to create an algorithm that can predict the closing price of a day from the previous OHLC (Open, High, Low, Close) data.

I also want to monitor the prediction error along with the size of the input data.

Let us import all the libraries and packages needed to build this machine-learning algorithm.

import numpy as np
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import RandomizedSearchCV as rcv
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt
from IPython import get_ipython

Import_libraries_ML.py hosted with ❤ by GitHub

Getting the data and making it usable for machine learning algorithm

To create any algorithm, we need data to train the algorithm and then to make predictions on new unseen data. In this machine learning for algorithmic trading with Python tutorial, we will fetch the data from Yahoo.

To accomplish this, we will use the data reader function from the pandas library. This function is extensively used, enabling you to get data from many online sources.

avg_err={}
avg_train_err={}

# To fetch financial data
import yfinance as yf

# Fetch data
AAPL_data= yf.download('AAPL', start='2005-1-1', end='2023-1-1', auto_adjust = True)
df = df[['Open', 'High', 'Low', 'Close']]

Fetch_data_AAPL.py hosted with ❤ by GitHub

We are fetching the data of AAPL(ticker) or APPLE. This stock can be used as a proxy for the performance of the S&P 500 index. We specify the year starting from which we will be pulling the data.

Once the data is in, we will discard any data other than the OHLC, such as volume and adjusted Close, to create our data frame ‘df ’.

Now we need to make our predictions from past data, and these past features will aid the machine learning model trade. So, let’s create new columns in the data frame that contain data with one day lag.

df = AAPL_data[['Open', 'High', 'Low', 'Close']].copy()
df['open']=AAPL_data['Open'].shift(1)
df['high']=AAPL_data['High'].shift(1)
df['low']=AAPL_data['Low'].shift(1)
df['close']=AAPL_data['Close'].shift(1)
df=df.dropna()

Data_one_day_lag.py hosted with ❤ by GitHub

Note: The capital letters are dropped for lower-case letters in the names of new columns.

Creating Hyperparameters

Although the concept of hyperparameters is worthy of a blog in itself, for now I will just say a few words about them. These are the parameters that the machine learning algorithm can’t learn over but needs to be iterated over. We use them to see which predefined functions or parameters yield the best-fit function.

imp = SimpleImputer(missing_values=np.nan, strategy='mean')
steps = [('imputation', imp),
('scaler',StandardScaler()),
('lasso',Lasso())]
pipeline =Pipeline(steps)
parameters = {'lasso__alpha':np.arange(0.0001,10,.0001),
'lasso__max_iter':np.random.uniform(100,100000,4)}
reg = rcv(pipeline, parameters,cv=5)

Creating_hyperparameters.py hosted with ❤ by GitHub

In this example, I have used Lasso regression which uses the L1 type of regularisation. This is a type of machine learning model based on regression analysis which is used to predict continuous data.

This type of regularisation is very useful when you are using feature selection. It is capable of reducing the coefficient values to zero. The SimpleImputer function replaces any NaN values that can affect our predictions with mean values, as specified in the code.

The ‘steps’ are a bunch of functions that are incorporated as a part of the Pipeline function. The pipeline is a very efficient tool to carry out multiple operations on the data set. Here we have also passed the Lasso function parameters along with a list of values that can be iterated over.

Although I am not going into details of what exactly these parameters do, they are something worthy of digging deeper into. Finally, I called the randomised search function for performing the cross-validation.

In this example, we used 5-fold cross-validation. In k-fold cross-validation, the original sample is randomly partitioned into k equal-sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k-1 subsamples are used as training data.

The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. Cross-validation combines (averages) measures of fit (prediction error) to derive a more accurate estimate of model prediction performance.

Based on the fit parameter, we decide on the best features.

In the next section of the machine learning for algorithmic trading with Python tutorial, we will look at test and train sets.

Stay tuned for Part III to learn how to split the data into test and train sets.

Originally posted on QuantInsti Blog.

Join The Conversation

If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.

Visit IBKR.com Open an IBKR Account

Machine Learning for Algorithmic Trading in Python: A Complete Guide – Part II

Posted September 15, 2023

Prerequisites for creating machine learning algorithms for trading using Python

Install a few packages and libraries

Full-fledged knowledge of steps of machine learning

Knowing the application models

How to use algorithmic trading with machine learning in Python?

Problem Statement

Getting the data and making it usable for machine learning algorithm

Creating Hyperparameters

Join The Conversation

Leave a Reply Cancel reply

Disclosure: Interactive Brokers

IBKR Campus Newsletters

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

Prerequisites for creating machine learning algorithms for trading using Python

Install a few packages and libraries

Full-fledged knowledge of steps of machine learning

Knowing the application models

How to use algorithmic trading with machine learning in Python?

Problem Statement

Getting the data and making it usable for machine learning algorithm

Creating Hyperparameters

Related Tags

Join The Conversation

Leave a Reply Cancel reply

Disclosure: Interactive Brokers

IBKR Campus Newsletters