R Code: Creating Lagged Xs and Y for Supervised Learning

This post shows a simple R code to create various lagged time series and concatenate them with the original time series. This can be used frequently when preprocessing time series data for machine/deep learning models.

Creating lagged Xs and y

Time series and its various lagged one are used as input variables for supervised machine or deep learning models. The following R code generates this concatenation of a set of lagged and time-t variables of the time series.

As an output format, Xs (lagged variables) are followed by y since it is relatively easy to access X by using “1:nk” rather than “2:(nk+1)”.

# Create lagged Xs and y for supervised learning
graphics.off(); rm(list = ls())
# function for creating lagged Xs and y
func_lagged_Xs_y <- function(y, k, nmx = "x"){
    nk <- length(k); ny <- length(y)
    df <- as.data.frame(matrix(nrow=ny, ncol=nk))
    colnames(df) <- paste0(nmx,k); df$y = y
    for(i in 1:nk) 
        df[(1+k[i]):ny,i] <- y[1:(ny-k[i])]
# sample data
y <- c(31.12, 27.95, 30.67, 27.18, 21.89, 19.90, 21.58, 
       18.69, 20.31, 21.89, 19.29, 20.57, 21.57, 22.87, 
       21.01, 18.63, 17.96, 17.68, 17.54, 16.95, 17.33)
# test cases
Xy1 = func_lagged_Xs_y(y, k=1:10)
Xy2 = func_lagged_Xs_y(y, k=c(1,3,5), nmx = "XX")

As expected, the results of two examples in the above R code are as follows.


