Apply data frame with list-variable of multivariable functions to a data frame with function arguments

This dataframe contains what I'll call the "data":

library(tidyverse)
df_d <- data_frame(key = c("cat", "cat", "dog", "dog"), 
               value_1 = c(1,2,3,4), 
               value_2 = c(2,4,6,8))

Here is a dataframe that I intend to use as something like a function look-up table. f is a single variable function and f2 is a multivariable function:

df_f <- data_frame(key = c("cat", "dog"),
               f = c(function(x) x^2, function(x) sqrt(x)),
               f2 = c(function(x) (x[1]+x[2])^2, function(x) sqrt(x[1]+x[2])))

I can easily make a dataframe so that any cat row gets the cat functions and any dog row gets the dog functions:

df_both <- left_join(df_d, df_f)

I was able to figure out how to apply each of the f functions to, say, the value_1 column to get:

df_both %>% mutate(result = invoke_map_dbl(f, value_1))        
#> # A tibble: 4 x 6
#>   key   value_1 value_2 f      f2     result
#>   <chr>   <dbl>   <dbl> <list> <list>  <dbl>
#> 1 cat      1.00    2.00 <fn>   <fn>     1.00
#> 2 cat      2.00    4.00 <fn>   <fn>     4.00
#> 3 dog      3.00    6.00 <fn>   <fn>     1.73
#> 4 dog      4.00    8.00 <fn>   <fn>     2.00

My question is: how can I create a columns result2 that takes each function in f2 and uses as its input c(value_1, value_2). If re-defining the functions in f2 to be explicitly functions of two variables makes things much easier, that's fine too.

Desired output:

#> # A tibble: 4 x 7
#>   key   value_1 value_2 f      f2     result result2
#>   <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
#> 1 cat      1.00    2.00 <fn>   <fn>     1.00    9.00
#> 2 cat      2.00    4.00 <fn>   <fn>     4.00   36.0 
#> 3 dog      3.00    6.00 <fn>   <fn>     1.73    3.00
#> 4 dog      4.00    8.00 <fn>   <fn>     2.00    3.46

(Question motivated by an unfortunately self-deleted question from earlier today.)

2 answers

  • answered 2018-03-13 21:23 akrun

    We could use pmap

    df_both %>% 
       mutate(result = invoke_map_dbl(f, value_1), 
              result2 = pmap_dbl(.[c('value_1', 'value_2', 'f2')],  ~(..3)(c(..1, ..2))))
    # A tibble: 4 x 7
    #   key   value_1 value_2 f      f2     result result2
    #   <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
    #1 cat      1.00    2.00 <fun>  <fun>    1.00    9.00
    #2 cat      2.00    4.00 <fun>  <fun>    4.00   36.0 
    #3 dog      3.00    6.00 <fun>  <fun>    1.73    3.00
    #4 dog      4.00    8.00 <fun>  <fun>    2.00    3.46
    

    Here, we don't change the OP's functions. It is the same as in the OP's post.

  • answered 2018-03-13 21:23 Axeman

    "If re-defining the functions in f2 to be explicitly functions of two variables makes things much easier, that's fine too."

    Yes, that would be a more natural situation here, I think. Otherwise data is stored rowwise, and should possibly be reshaped.

    Redefining your functions:

    df_f <- data_frame(key = c("cat", "dog"),
                       f = c(function(x) x^2, function(x) sqrt(x)),
                       f2 = c(function(x, y) (x + y)^2, function(x, y) sqrt(x + y)))
    df_both <- left_join(df_d, df_f)
    

    Now you again use map_invoke, passing .x as a list, although you need to turn the lists inside out using transpose:

    mutate(
      df_both,
      result  = invoke_map_dbl(f, value_1),
      result2 = invoke_map_dbl(f2, transpose(list(value_1, value_2)))
    )
    
    # A tibble: 4 x 7
      key   value_1 value_2 f      f2     result result2
      <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
    1 cat        1.      2. <fn>   <fn>     1.00    9.00
    2 cat        2.      4. <fn>   <fn>     4.00   36.0 
    3 dog        3.      6. <fn>   <fn>     1.73    3.00
    4 dog        4.      8. <fn>   <fn>     2.00    3.46
    

    A set of three argument functions would then simply extend to invoke_map_dbl(f3, transpose(list(value_1, value_2, value_3))

    Note that this kind of approach will not work well on large datasets, since you aren't using vectorization.

    A more scalable alternative may involve nesting, where you at least apply each function once within each group:

    df_both %>% 
      group_by(key) %>% 
      nest() %>% 
      mutate(data = map(
        data, 
        ~mutate(., result = first(f)(value_1), result2 = first(f2)(value_1, value_2))
        )) %>% 
      unnest()
    

    Which gives the same result.