retaining names of levels of variables while melting

Is there any way to retain names of the original levels of variables that are being melted? For example, in the example below, is there any way to get "alpha", "beta", and "gamma" instead of "1", "2", "3".

I can, of course, rename them but the dataset with which I am working with has tons of levels and so renaming them will be time-consuming and error-prone.

Thanks.

library(data.table)
#> Warning: package 'data.table' was built under R version 3.4.2
set.seed(2334)

# define the dataframe
df <-
  as.data.frame(
    cbind(
      a_alpha = rnorm(10),
      a_beta = rnorm(10),
      a_gamma = rnorm(10),
      b_alpha = rnorm(10),
      b_beta = rnorm(10),
      b_gamma = rnorm(10),
      id = c(1:10)
    )
  )

# check the structure of the wide format data
str(df)
#> 'data.frame':    10 obs. of  7 variables:
#>  $ a_alpha: num  -0.118 1.237 0.809 -0.766 -0.592 ...
#>  $ a_beta : num  0.0019 1.0639 2.336 0.9056 0.6449 ...
#>  $ a_gamma: num  0.5485 0.8345 -0.5977 0.0827 0.2754 ...
#>  $ b_alpha: num  0.209 -0.305 0.434 -0.362 0.412 ...
#>  $ b_beta : num  -1.6404 2.8382 0.0661 0.7249 -0.4421 ...
#>  $ b_gamma: num  -0.144 0.964 -0.763 -1.356 0.995 ...
#>  $ id     : num  1 2 3 4 5 6 7 8 9 10

# convert to long format
df_long <- data.table::melt(
  data.table::setDT(df),
  measure = patterns("^a_", "^b_"),
  value.name = c("a", "b"),
  variable.name = "item"
)

# check the structure of the long format data
str(df_long)
#> Classes 'data.table' and 'data.frame':   30 obs. of  4 variables:
#>  $ id  : num  1 2 3 4 5 6 7 8 9 10 ...
#>  $ item: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ a   : num  -0.118 1.237 0.809 -0.766 -0.592 ...
#>  $ b   : num  0.209 -0.305 0.434 -0.362 0.412 ...
#>  - attr(*, ".internal.selfref")=<externalptr>

# structure of item
levels(df_long$item)
#> [1] "1" "2" "3"

# Question: instead of "1" "2" "3", how to get the "item" factor levels to be: "alpha" "beta" "gamma"

Created on 2018-01-12 by the reprex package (v0.1.1.9000).

1 answer

  • answered 2018-01-13 02:36 A5C1D2H2I1M1N2O1R2T1

    The way I've dealt with this in the past is to use factor after melting the data. However, you'll have to probably do some checks to make sure the data and levels are in the correct order.

    Here's an example:

    set.seed(2334)
    df <- data.table(a_alpha = rnorm(10), a_beta = rnorm(10), a_gamma = rnorm(10),
                     b_alpha = rnorm(10), b_beta = rnorm(10), b_gamma = rnorm(10), 
                     id = c(1:10))
    df_mess <- copy(df)
    setcolorder(df_mess, c(1, 7, 6, 4, 2, 5, 3))
    names(df_mess)
    # [1] "b_alpha" "id"      "a_alpha" "a_beta"  "b_beta"  "b_gamma" "a_gamma"
    
    stubs <- c("^a_", "^b_")
    ## assumes all stubs have same number of cols. Easy to modify
    labs <- grep(stubs[1], names(df_mess), value = TRUE) 
    labs <- gsub(paste(stubs, collapse = "|"), "", labs[order(labs)])
    
    out1 <- melt(df, measure.vars = patterns(stubs), value.name = c("a", "b"), 
                 variable.name = "item")[
                   , item := factor(item, labels = labs)][]
    
    out2a <- melt(df_mess, measure.vars = patterns(stubs), value.name = c("a", "b"), 
                  variable.name = "item")[
                    , item := factor(item, labels = labs)][]
    
    out2b <- melt(setcolorder(df_mess, names(df_mess)[order(names(df_mess))]),
         measure.vars = patterns(stubs), value.name = c("a", "b"), 
         variable.name = "item")[
           , item := factor(item, labels = labs)][]
    
    library(compare)
    compare(out1, out2a)
    # FALSE [TRUE, TRUE, TRUE, FALSE]
    compare(out1, out2b)
    # TRUE
    

    I haven't done enough test cases to say with confidence whether using order on the names and the levs is sufficient for all cases, but so far, I haven't found any exceptions.