Asset Classes

Free investment financial education

More Campus Resources

Useful Tools and Information

Language

Multilingual content from IBKR

# Financial Data Manipulation in dplyr for Quant Traders – Part VII

###### Posted July 9, 2021 at 11:30 am
Robot James
Robot Wealth

Join Robot James for a tutorial on how to use `group_by()` dplyr verb for financial data manipulation. See Part IPart IIPart IIIPart IV,  Part V and Part VI for instructions on other dplyr verbs.

### The `group_by()` dplyr verb

Summarising the entire data set isn’t always very useful.

Usually, we want to group by a variable, and then summarise that grouped data.

The `group_by()` function tells the `dplyr` verbs to operate on each group one at a time.

#### Use `summarise()` with `group_by()` to calculate mean traded volume for each stock

If we group by ticker, then call summarise, then `dplyr` will preform the summary calculations separately for each ticker. We will get a row for each ticker.

prices %>%
group_by(ticker) %>%
summarise(meanvolume = mean(volume))

#### Use `summarise()` with multiple `group_by` variables, to calculate the mean traded volume for each stock for each year

In this example we:

• calculate a new variable `year`using `mutate()`
• group by ticker and year
• summarise.

library(lubridate)
prices %>%
mutate(year = year(date)) %>%
group_by(ticker, year) %>%
summarise(meanvolume = mean(volume),
obscount = n())

### Using `group_by()` with `mutate()` to do grouped row-level transformations

We can also use `group_by` with `mutate()` to calculate new variables which are calculated separately for a given variable (or set of variables)

You’ll use this nearly every time you do any quant analysis to calculate periodic returns.

#### Using `group_by` with `mutate()` and `lag()` to calculate daily close-to-close returns

prices %>%
group_by(ticker) %>%
arrange(date) %>%
mutate(c2creturns = close / lag(close) – 1)

## Summary

• Every column is variable
• Every row is an observation

You can then easily use `dplyr` to manipulate that data very efficiently.

There are 6 main functions to master in `dplyr`.

• `filter()`picks outs observations (rows) by some filter criteria
• `arrange()` reorders the observations (rows)
• `select()` picks out the variables (columns)
• `mutate()` creates new variables (columns) by applying transformations to existing variables
• `summarise()` allows you to group and summarise data – reducing the data into a grouped summary with fewer rows.

The `group_by()` causes the verbs above to act on a group at a time, rather than the whole dataset.