In the previous installment, the author discussed Importing the dataset.
Preparing the dataset
dataset[‘H-L’] = dataset[‘High’] – dataset[‘Low’]
dataset[‘O-C’] = dataset[‘Close’] – dataset[‘Open’]
dataset[‘3day MA’] = dataset[‘Close’].shift(1).rolling(window = 3).mean()
dataset[’10day MA’] = dataset[‘Close’].shift(1).rolling(window = 10).mean()
dataset[’30day MA’] = dataset[‘Close’].shift(1).rolling(window = 30).mean()
dataset[‘RSI’] = talib.RSI(dataset[‘Close’].values, timeperiod = 9)
dataset[‘Williams %R’] = talib.WILLR(dataset[‘High’].values, dataset[‘Low’].values, dataset[‘Close’].values, 7)
We then prepare the various input features which will be used by the artificial neural network learning for making the predictions. We define the following input features:
- High minus Low price
- Close minus Open price
- Three day moving average
- Ten day moving average
- 30 day moving average
- Standard deviation for a period of 5 days
- Relative Strength Index
- Williams %R
dataset[‘Price_Rise’] = np.where(dataset[‘Close’].shift(-1) > dataset[‘Close’], 1, 0)
We then define the output value as price rise, which is a binary variable storing 1 when the closing price of tomorrow is greater than the closing price of today.
dataset = dataset.dropna()
Next, we drop all the rows storing NaN values by using the dropna() function.
X = dataset.iloc[:, 4:-1]
y = dataset.iloc[:, -1]
We then create two data frames storing the input and the output variables. The dataframe ‘X’ stores the input features, the columns starting from the fifth column (or index 4) of the dataset till the second last column. The last column will be stored in the dataframe y, which is the value we want to predict, i.e. the price rise.
Splitting the dataset
split = int(len(dataset)*0.8)
X_train, X_test, y_train, y_test = X[:split], X[split:], y[:split], y[split:]
In this part of the code, we will split our input and output variables to create the test and train datasets. This is done by creating a variable called split, which is defined to be the integer value of 0.8 times the length of the dataset.
We then slice the X and y variables into four separate data frames: Xtrain, Xtest, ytrain and ytest. This is an essential part of any machine learning algorithm, the training data is used by the model to arrive at the weights of the model. The test dataset is used to see how the model will perform on new data which would be fed into the model. The test dataset also has the actual value for the output, which helps us in understanding how efficient the model is. We will look at the confusion matrix later in the code, which essentially is a measure of how accurate the predictions made by the model are.
In the next installment, the author will demonstrate how Feature Scaling
Visit https://www.quantinsti.com/ for ready-to-use functions as applied in trading and data analysis.
Disclosure: Interactive Brokers
Information posted on IBKR Campus that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with permission from QuantInsti. The views expressed in this material are solely those of the author and/or QuantInsti and IBKR is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
In accordance with EU regulation: The statements in this document shall not be considered as an objective or independent explanation of the matters. Please note that this document (a) has not been prepared in accordance with legal requirements designed to promote the independence of investment research, and (b) is not subject to any prohibition on dealing ahead of the dissemination or publication of investment research.
Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.