More Campus Resources

Useful Tools and Information

Language

Multilingual content from IBKR

# Reinforcement Learning in Trading – Part V

###### Posted December 22, 2020 at 10:09 am
Ishan Shah
QuantInsti

See Part IPart II,  Part III and Part IV to get started.

## Bellman Equation

In this equation, s is the state, is a set of actions at time and ai is a specific action from the set. R is the reward table. is the state action table but it is constantly updated as we learn more about our system by experience. γ  is the learning rate

We will first start with the q-value for the Hold action on July 30.

1. The first part is the reward for taking that action. As seen in the R-table it is 0
2. Let us assume that γ = 0.98. The maximum Q-value for sell and hold actions on the next day, i.e. 31 July, is 1.09
3. Thus q-value for Hold action on 30 July is 0 + 0.98 (1.09) = 1.06

In this way, we will fill the values for the other rows of the Hold column to complete the Q table.

The RL model will now select the hold action to maximise the Q value. This was the intuition behind the Q table. This process of updating the Q table is called Q learning. Of course, we had taken a scenario with limited actions and states. In reality, we have a large state space and thus, building a q-table will be time-consuming as well as a resource constraint.

To overcome this problem, you can use deep neural networks. They are also called Deep Q networks or DQN. The deep Q networks learn the Q table from past experiences and when given state as input, they can provide the Q-value for each of the actions. We can select the action to take with the maximum Q value.

Stay tuned for the next installment to learn how to train artificial neural networks.