How to recode many data frame columns with same function
I have a data frame like this:
CriterionVar Var1 Var2 Var3
3 0 0 0
1 0 0 0
2 0 0 0
5 0 0 0
I want to recode the values of Var1
, Var2
, and Var3
based on the value of CriterionVar
. In pseudocode, it would be something like this:
for each row
if (CriterionVar.value >= Var1.index) Var1 = 1
if (CriterionVar.value >= Var2.index) Var2 = 1
if (CriterionVar.value >= Var3.index) Var3 = 1
The recoded data frame would look like this:
CriterionVar Var1 Var2 Var3
3 1 1 1
1 1 0 0
2 1 1 0
5 1 1 1
Obviously, that is not the way to get it done because (1) the number of VarN
columns is determined by a data value, and (2) it's just ugly.
Any help is appreciated.
3 answers

For more general values of CriterionVar, you can use
outer
to construct a logical matrix which you can use for indexing like this:dat[2:4][outer(dat$CriterionVar, seq_along(names(dat)[1]), ">=")] < 1
In this example, this returns
dat CriterionVar Var1 Var2 Var3 1 3 1 1 1 2 1 1 0 0 3 2 1 1 0 4 5 1 1 1
A second method using
col
, which returns a matrix of the column index, is a tad bit more direct:dat[2:4][dat$CriterionVar >= col(dat[1])] < 1
and returns the desired result.
data
dat < structure(list(CriterionVar = c(3L, 1L, 2L, 5L), Var1 = c(0L, 0L, 0L, 0L), Var2 = c(0L, 0L, 0L, 0L), Var3 = c(0L, 0L, 0L, 0L )), .Names = c("CriterionVar", "Var1", "Var2", "Var3"), class = "data.frame", row.names = c(NA, 4L))

df[,1] = lapply(2:NCOL(df), function(i) as.numeric(df[,1] >= (i1))) df # CriterionVar Var1 Var2 Var3 #1 3 1 1 1 #2 1 1 0 0 #3 2 1 1 0 #4 5 1 1 1
DATA
df = structure(list(CriterionVar = c(3L, 1L, 2L, 5L), Var1 = c(1, 1, 1, 1), Var2 = c(1, 0, 1, 1), Var3 = c(1, 0, 0, 1)), .Names = c("CriterionVar", "Var1", "Var2", "Var3"), row.names = c(NA, 4L), class = "data.frame")

I'm a big proponent of
vapply
: it's fast, and you know the shape of what it'll return. The only problem is the resulting matrix is usually the "sideways" version of what you want. Butt()
fixes that easily enough.n_var_cols < 3 truncated_criterion < pmin(dat[["CriterionVar"]], n_var_cols) row_template < rep_len(0, n_var_cols) replace_up_to_index < function(index) { replace(row_template, seq_len(index), 1) } over_matrix < vapply( X = truncated_criterion, FUN = replace_up_to_index, FUN.VALUE = row_template ) over_matrix < t(over_matrix) dat[, 1] < over_matrix dat # CriterionVar Var1 Var2 Var3 # 1 3 1 1 1 # 2 1 1 0 0 # 3 2 1 1 0 # 4 5 1 1 1
There was some bookkeeping in the first three lines, but nothing too bad. I used
pmin()
to restrict the criteria values to be no greater than the number ofVarN
columns.