Doing j on multiple .SDcols groups in data.table
I want to apply a function to different groups of data.table columns, and do this rowwise. For example, in the following data.table, if V2 and V3 were one group, and V3 and V4 were the other one, I want to normalise rowwise group values, such that in every row, values in those columns that are the same group add up to 1.
library(data.table)
set.seed(11)
DT < data.table(V1=LETTERS[1:5],
V2=sample(5),
V3=sample(5),
V4=sample(5),
V5=sample(5))
I can do this for one group of columns, using SDcols, though I have to convert from integer to numeric first, because data.table doesn't do this automatically, performance reasons:
cols < c("V2","V3")
DT[,paste0("V",2:5)] < lapply(DT[,paste0("V",2:5)], as.numeric)
DT[, (cols):=(.SD)/sum(.SD), .SDcols=cols, by=1:nrow(DT)]
I also tried playing around with for (j in cols) set
, but although this does the job, it looks very clumsy.
for (j in cols) {
set(DT, j = j, value = DT[[j]] / DT[, base::sum(.SD), .SDcols=cols, by=1:nrow(DT)][,V1])
}
Also, I have too many groups of columns in my original data.table to do this repeatedly for each and every one. I've kind of hit the wall with this now, so any suggestion is very welcome.
1 answer

If there are multiple groups, one option would be to
melt
the dataset into 'long' format asmelt
can take multiplemeasure
patternsConvert the dataset columns to
numeric
classnm1 < names(DT)[1] DT[, (nm1) := lapply(.SD, as.numeric), .SDcols = nm1]
Then
melt
anddcast
those columnsdM < dcast(melt(DT, measure = patterns("V[23]", "V[45]"))[, lapply(.SD, function(x) x/sum(x)) , V1, .SDcols = value1:value2], V1~rowid(V1), value.var = c('value1', 'value2'))[, 1, with = FALSE]
and assign the output back to the columns
DT[, (nm1) := dM][] # V1 V2 V3 V4 V5 #1: A 0.2857143 0.7142857 0.2500000 0.7500000 #2: B 0.5000000 0.5000000 0.5000000 0.5000000 #3: C 0.5555556 0.4444444 0.7500000 0.2500000 #4: D 0.6666667 0.3333333 0.4444444 0.5555556 #5: E 0.5000000 0.5000000 0.5555556 0.4444444