Close Navigation
Learn more about IBKR accounts
Trading using GPU-based RAPIDS Libraries from Nvidia – Part I

Trading using GPU-based RAPIDS Libraries from Nvidia – Part I

Posted April 4, 2024
José Carlos Gonzáles Tanaka via QuantInsti Blog
QuantInsti

The article “Trading using GPU-based RAPIDS Libraries from Nvidia” first appeared on QuantInsti blog.

Don’t be deceived by the past. In the rapidly evolving domains of data science and financial machine learning, quicker calculations and more effective processing techniques are becoming more and more important. These days, a new set of open-source software libraries called RAPIDS is gaining popularity.

RAPIDS leverages GPU capabilities to expedite data science tasks. This post will look at every aspect of RAPIDS, including its libraries, hardware specifications, setup guidelines, useful applications, and drawbacks. Last but not least, as usual, I’m going to offer a trading strategy based on the RAPIDS suite!

We cover:

  • Understanding RAPIDS Libraries
  • RAPIDS Libraries Installation Guide
  • Practical Examples of the RAPIDS Libraries
  • A trading strategy using machine learning and the GPU
  • Limitations of the Up-to-Date Libraries

Understanding RAPIDS Libraries

A new approach to speeding up data science and machine learning procedures is provided by the open-source software libraries collectively known as RAPIDS. It is necessary to use all RAPIDS libraries to fully take advantage of the computational and data analysis capabilities of GPUs.

Let’s look at the main RAPIDS Librarieshere:

  • cuDF: A GPU-accelerated data frame manipulation and operation tool similar to Pandas but optimised for GPUs. It has a Pandas-like user interface and accelerates processing through GPU parallelism.
  • cuML: This library is used for machine learning tasks. It provides GPU-accelerated algorithms for various tasks, such as clustering, regression, and classification. These algorithms are made to improve performance without compromising accuracy, which makes them suitable for use with large-scale datasets.
  • cuPy: Identical in appearance to NumPy, cuPy is intended to be a GPU-accelerated array library that enables fast GPU array operations. It mimics NumPy’s functionality to seamlessly transfer array-based code to GPU architectures, increasing computational speed.

These libraries are combined to create a single system that helps with data manipulation, analysis, and machine learning tasks by utilizing the parallel processing power of GPUs. This acceleration makes it possible to develop models and analyze data more quickly, which is helpful for tasks involving big datasets. It shortens processing times as well.

To make the most of GPU-accelerated computing, researchers, machine learning experts, and data scientists must grasp the nuances of the RAPIDS libraries. These libraries provide high-performance computing capabilities along with the ability to speed up and simplify a multitude of data processing tasks.


RAPIDS Libraries Installation Guide

The RAPIDS libraries can be installed using the following steps:

Step 1: System requirements

Please confirm that your system satisfies the requirements before proceeding with the installation. It is imperative to have a compatible GPU because RAPIDS libraries are optimized for NVIDIA GPUs. It only works in Linux-based operating systems. In case you have Windows, you can use WSL2 to have Ubuntu as a virtual machine. Verify that the Linux version on your machine is supported (such as Ubuntu or CentOS). Installing NVIDIA drivers that are compatible with your GPU is also required.

Step 2: Installing Conda

The installation and management of RAPIDS libraries require the use of Conda, a package manager and environment manager. Installing Miniconda or Anaconda, two Python distribution platforms that support Conda, should be your first step.

Follow the installation guidelines on the official website to download and install Miniconda or Anaconda.

For RAPIDS, create a new Conda environment to keep the setup tidy and isolated. The following command can be used to create an environment with the name “rapids” or any other desired name:

conda create -n rapids python=3.10

create_environment.sh hosted with ❤ by GitHub

Step 3: Install the RAPIDS Libraries

Use the following command to activate the Conda environment after it has been created:

conda activate rapids

activate_environment.sh hosted with ❤ by GitHub

Next, use the following command to install RAPIDS libraries:

conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.21 python=3.10

install_rapids.sh hosted with ❤ by GitHub

This command will install the RAPIDS suite in the specified Conda environment. The rapids=0.21 refers to the version of RAPIDS being installed.

Step 4: Verifying the Installation

Once the installation process is complete, you can verify that RAPIDS libraries have been successfully installed in your Conda environment. Open a Python interpreter within the Conda environment and import the desired libraries (e.g., cuDF, cuML, cuPy) to ensure they are accessible and functioning properly.

import cudf
import cuml
import cupy

checking_libraries_installation.py hosted with ❤ by GitHub

If the import statements execute without errors, it indicates the successful installation of RAPIDS libraries.


Practical Examples of the RAPIDS Libraries

Let’s understand how to use the 3 libraries from above. The examples will give a glimpse of what you can do with these libraries. As you’ll discover, they act very similar to numpy, pandas and scikit-learn. So you will not get confused at all while using them. They’re easy to handle and you’ll start coding quickly.

Ready to have some fun?
Let’s explore!

cuPy Examples

We now create two random arrays with 10,000 observations. Then we multiply them.

Example 1: In this example, we create 10,000 random numbers and dot-multiply them to get a unique value as the result.

# Import the cupy library
import cupy as cp

# Create 10,000 random numbers
x = cp.random.rand(10000)
y = cp.random.rand(10000)

# Perform the multiplication of both arrays
result = cp.dot(x, y)

# Print the result
print(result)

cupy_example_01.py hosted with ❤ by GitHub

Example 2: Here we create two 2×2 matrices and compute the multiplication of both. We then print the resulting matrix.

# Import the corresponding library
import cupy as cp

# Define matrices using CuPy arrays
matrix_a = cp.array([[1, 2], [3, 4]])
matrix_b = cp.array([[5, 6], [7, 8]])

# Perform a matrix multiplication using CuPy
result = cp.matmul(matrix_a, matrix_b)
print("Result of Matrix Multiplication:")
print(result)

cupy_example_02.py hosted with ❤ by GitHub

cuDF Examples

Example 1: Next, we create a GPU-based dataframe with 2 columns A and B and 3 observations each and sum both columns and the result we save it in column C. So simple, right?

# Import the corresponding library
import cudf

# Create a GPU DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = cudf.DataFrame(data)

# Perform data manipulation
df['C'] = df['A'] + df['B']
print(df)

cudf_example_02.py hosted with ❤ by GitHub

Example 2: Here we create a pandas dataframe obtained with a dictionary. Then we upload the pandas-based dataframe to the GPU memory using the cudf library. Then we print the dataframe.

# Import the libraries
import pandas as pd
import cudf

# Creating a Pandas DataFrame
pandas_data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['a', 'b', 'c', 'd', 'e']
}

pandas_df = pd.DataFrame(pandas_data)

# Display the Pandas DataFrame
print("Pandas DataFrame:")
print(pandas_df)

# Convert Pandas DataFrame to cuDF DataFrame
cudf_df = cudf.DataFrame.from_pandas(pandas_df)

# Display the cuDF DataFrame
print("cuDF DataFrame:")
print(cudf_df)

cudf_example_02.py hosted with ❤ by GitHub

cuML Examples

Example 1: We provide in this example two cupy arrays with 1000 random numbers each and use them to fit a k-means clustering algorithm with the cuml library. We then predict the labels of the features as per the model.

# Import the libraries
from cuml.cluster import KMeans
import cudf

# Generate sample data
data = cudf.DataFrame()
data['feature1'] = cp.random.rand(1000)
data['feature2'] = cp.random.rand(1000)

# Perform KMeans clustering
kmeans = KMeans(n_clusters=3)
kmeans.fit(data)
labels = kmeans.predict(data)
print(labels)

cuml_example_01.py hosted with ❤ by GitHub

Example 2: Finally, in this example, we create random input and prediction features using the cuml library. Then, we split the data into train and test data and next perform a random forest classifier to the data. Finally we predict the X test data and show only 10 predictions.

# Import the libraries
import pandas as pd
import cudf
import cuml
from cuml.datasets import make_classification
from cuml.model_selection import train_test_split

# Generating a sample Pandas DataFrame
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, random_state=42)

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and fitting a Random Forest Classifier using cuML
rf_classifier = cuml.ensemble.RandomForestClassifier(n_estimators=100)
rf_classifier.fit(X_train, y_train)

# Predicting on the test set
y_pred = rf_classifier.predict(X_test)

# Displaying sample predictions
print("Sample predictions:")
print(y_pred[:10])

cuml_example_02.py hosted with ❤ by GitHub

Did you notice?
It’s like using CPU-based libraries! So smooth the coding, right?

Stay tuned for the second part to learn about trading strategy using machine learning and the GPU.

Join The Conversation

If you have a general question, it may already be covered in our FAQs. If you have an account-specific question or concern, please reach out to Client Services.

Leave a Reply

Your email address will not be published. Required fields are marked *

Disclosure: Interactive Brokers

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

IBKR Campus Newsletters

This website uses cookies to collect usage information in order to offer a better browsing experience. By browsing this site or by clicking on the "ACCEPT COOKIES" button you accept our Cookie Policy.