How to recode many data frame columns with same function

I have a data frame like this:

CriterionVar Var1 Var2 Var3
3            0    0    0
1            0    0    0
2            0    0    0
5            0    0    0 

I want to recode the values of Var1, Var2, and Var3 based on the value of CriterionVar. In pseudocode, it would be something like this:

for each row
   if (CriterionVar.value >= Var1.index) Var1 = 1
   if (CriterionVar.value >= Var2.index) Var2 = 1
   if (CriterionVar.value >= Var3.index) Var3 = 1

The recoded data frame would look like this:

CriterionVar Var1 Var2 Var3
3            1    1    1
1            1    0    0
2            1    1    0
5            1    1    1

Obviously, that is not the way to get it done because (1) the number of VarN columns is determined by a data value, and (2) it's just ugly.

Any help is appreciated.

3 answers

  • answered 2017-08-16 19:25 lmo

    For more general values of CriterionVar, you can use outer to construct a logical matrix which you can use for indexing like this:

    dat[2:4][outer(dat$CriterionVar, seq_along(names(dat)[-1]), ">=")] <- 1
    

    In this example, this returns

    dat
      CriterionVar Var1 Var2 Var3
    1            3    1    1    1
    2            1    1    0    0
    3            2    1    1    0
    4            5    1    1    1
    

    A second method using col, which returns a matrix of the column index, is a tad bit more direct:

    dat[2:4][dat$CriterionVar >= col(dat[-1])] <- 1
    

    and returns the desired result.


    data

    dat <-
    structure(list(CriterionVar = c(3L, 1L, 2L, 5L), Var1 = c(0L, 
    0L, 0L, 0L), Var2 = c(0L, 0L, 0L, 0L), Var3 = c(0L, 0L, 0L, 0L
    )), .Names = c("CriterionVar", "Var1", "Var2", "Var3"), class = "data.frame",
    row.names = c(NA, -4L))
    

  • answered 2017-08-16 19:25 d.b

    df[,-1] = lapply(2:NCOL(df), function(i) as.numeric(df[,1] >= (i-1)))
    df
    #  CriterionVar Var1 Var2 Var3
    #1            3    1    1    1
    #2            1    1    0    0
    #3            2    1    1    0
    #4            5    1    1    1
    

    DATA

    df = structure(list(CriterionVar = c(3L, 1L, 2L, 5L), Var1 = c(1, 
    1, 1, 1), Var2 = c(1, 0, 1, 1), Var3 = c(1, 0, 0, 1)), .Names = c("CriterionVar", 
    "Var1", "Var2", "Var3"), row.names = c(NA, -4L), class = "data.frame")
    

  • answered 2017-08-16 19:25 Nathan Werth

    I'm a big proponent of vapply: it's fast, and you know the shape of what it'll return. The only problem is the resulting matrix is usually the "sideways" version of what you want. But t() fixes that easily enough.

    n_var_cols <- 3
    truncated_criterion <- pmin(dat[["CriterionVar"]], n_var_cols)
    row_template <- rep_len(0, n_var_cols)
    
    replace_up_to_index <- function(index) {
      replace(row_template, seq_len(index), 1)
    }
    
    over_matrix <- vapply(
      X         = truncated_criterion,
      FUN       = replace_up_to_index,
      FUN.VALUE = row_template
    )
    over_matrix <- t(over_matrix)
    
    dat[, -1] <- over_matrix
    dat
    #   CriterionVar Var1 Var2 Var3
    # 1            3    1    1    1
    # 2            1    1    0    0
    # 3            2    1    1    0
    # 4            5    1    1    1
    

    There was some bookkeeping in the first three lines, but nothing too bad. I used pmin() to restrict the criteria values to be no greater than the number of VarN columns.