In this article, we will explain how to use the arguments names_sep
and names_pattern
of the function pivot_longer()
from the tidyr
package.
First, let’s create some data:
df <- data.frame(jan_2023 = c(1,1), feb_2023 = c(2,2), mar_2023 = c(3,3)) df # jan_2023 feb_2023 mar_2023 #1 1 2 3 #2 1 2 3
The general syntax of pivot_longer()
is:
1. Using the names_sep argument in pivot_longer()
When calling pivot_longer()
, we can create 2 new columns (instead of 1) by splitting the column names of df using the following arguments:
- First we specify that we want 2 columns by setting
names_to = c('month', 'year')
- Next, we set
names_sep = '_'
to split jan_2023, feb_2023, and mar_2023 into month and year - And
names_transform = list(year = as.numeric)
to transform the year column to numeric instead of character
df2 <- df |> pivot_longer(cols = everything(), names_to = c('month','year'), names_sep = '_', values_to = 'sales', names_transform = list(year = as.numeric)) df2 df2 ## A tibble: 6 x 3 # month year sales # <chr> <dbl> <dbl> #1 jan 2023 1 #2 feb 2023 2 #3 mar 2023 3 #4 jan 2023 1 #5 feb 2023 2 #6 mar 2023 3
2. Using the names_pattern argument in pivot_longer()
We can split the columns in df by providing a regular expression to the names_pattern
argument of pivot_longer()
:
Since we want to split the column names of df (jan_2023, feb_2023, and mar_2023) into 2 columns (month and year), the regular expression should contain 2 groups:
df2 <- df |> pivot_longer(cols = everything(), names_to = c('month','year'), names_pattern = '([A-Za-z]+)_([0-9]+)', values_to = 'sales', names_transform = list(year = as.numeric)) df2 ## A tibble: 6 x 3 # month year sales # <chr> <dbl> <dbl> #1 jan 2023 1 #2 feb 2023 2 #3 mar 2023 3 #4 jan 2023 1 #5 feb 2023 2 #6 mar 2023 3