Doing j on multiple .SDcols groups in data.table

I want to apply a function to different groups of data.table columns, and do this row-wise. For example, in the following data.table, if V2 and V3 were one group, and V3 and V4 were the other one, I want to normalise row-wise group values, such that in every row, values in those columns that are the same group add up to 1.

library(data.table) 
set.seed(11)
DT <- data.table(V1=LETTERS[1:5],
             V2=sample(5),
             V3=sample(5),
             V4=sample(5),
             V5=sample(5))

I can do this for one group of columns, using SDcols, though I have to convert from integer to numeric first, because data.table doesn't do this automatically, performance reasons:

cols <- c("V2","V3")
DT[,paste0("V",2:5)] <- lapply(DT[,paste0("V",2:5)], as.numeric)
DT[, (cols):=(.SD)/sum(.SD), .SDcols=cols, by=1:nrow(DT)]

I also tried playing around with for (j in cols) set, but although this does the job, it looks very clumsy.

for (j in cols) {
 set(DT, j = j, value = DT[[j]] / DT[, base::sum(.SD), .SDcols=cols, by=1:nrow(DT)][,V1]) 
}

Also, I have too many groups of columns in my original data.table to do this repeatedly for each and every one. I've kind of hit the wall with this now, so any suggestion is very welcome.

1 answer

  • answered 2018-01-13 15:52 akrun

    If there are multiple groups, one option would be to melt the dataset into 'long' format as melt can take multiple measure patterns

    Convert the dataset columns to numeric class

    nm1 <- names(DT)[-1]
    DT[, (nm1) := lapply(.SD, as.numeric), .SDcols = nm1]
    

    Then melt and dcast those columns

    dM <- dcast(melt(DT, measure = patterns("V[2-3]", "V[4-5]"))[, 
            lapply(.SD, function(x) x/sum(x)) , V1, .SDcols = value1:value2], 
                   V1~rowid(V1), value.var = c('value1', 'value2'))[, -1, with = FALSE]
    

    and assign the output back to the columns

    DT[, (nm1) := dM][]    
    #   V1        V2        V3        V4        V5
    #1:  A 0.2857143 0.7142857 0.2500000 0.7500000
    #2:  B 0.5000000 0.5000000 0.5000000 0.5000000
    #3:  C 0.5555556 0.4444444 0.7500000 0.2500000
    #4:  D 0.6666667 0.3333333 0.4444444 0.5555556
    #5:  E 0.5000000 0.5000000 0.5555556 0.4444444