# Rolling sum from a certain position in a data frame in R

Say I have the following data, dat1;

``````width  from  by
2      1     A
3      1     A
2      2     A
3      2     A
2      1     B
3      1     B
2      2     B
3      2     B
``````

``````x      pos   by
4      1     A
5      2     A
7      3     A
3      4     A
2      1     B
4      2     B
3      3     B
5      4     B
``````

Say I want to create a new column on dat1 of rolling sum values from dat2 where;

1. Our width of this rolling sum is equivalent to the width given in that row

2. Our starting position is equivalent to the from vector value in that row

3. We wish to do it for the A or Bth factor depending on which level is in the row

So far I have that we want

``````rollapply(x = dat2\$x, width = dat1\$width, FUN = sum, align = "left", data = dat2)
``````

So I need to incorporate in the starting position and the factor level for that starting position.

So in this instance I want to get

``````width  from  by   RS
2      1     A    9
3      1     A    16
2      2     A    12
3      2     A    15
``````

etc

Any help would be greatly appreciated. Thanks

1) For each row `i` in `dat1` the anonymous function subsets dat2 to the `by` value in `dat1` and from that picks out the indicated rows of `x` and sums them:

``````transform(dat1, RS = sapply(1:nrow(dat1), function(i)
sum(subset(dat2, dat1\$by[i] == by)[seq(from[i], length = width[i]), "x"])))
``````

giving:

``````  width from by RS
1     2    1  A  9
2     3    1  A 16
3     2    2  A 12
4     3    2  A 15
5     2    1  B  6
6     3    1  B  9
7     2    2  B  7
8     3    2  B 12
``````

2) An alternative would be to calculate the start values and widths for the sequences to sum in `dat2` and then apply that:

``````st <- match(dat1\$by, dat2\$by) + dat1\$from - 1
w <- dat1\$width
Sum <- function(st, w) sum(dat2[seq(st, length = w), "x"])
transform(dat1, RS = mapply(Sum, st, w))
``````

giving:

``````  width from by RS
1     2    1  A  9
2     3    1  A 16
3     2    2  A 12
4     3    2  A 15
5     2    1  B  6
6     3    1  B  9
7     2    2  B  7
8     3    2  B 12
``````

## Note

`dat1` and `dat2` in reproducible form are:

``````Lines1 <- "
width  from  by
2      1     A
3      1     A
2      2     A
3      2     A
2      1     B
3      1     B
2      2     B
3      2     B"

Lines2 <- "
x      pos   by
4      1     A
5      2     A
7      3     A
3      4     A
2      1     B
4      2     B
3      3     B
5      4     B"
``````

## Update

Another option could be using `dplyr` and `join`. The approach would be `join` two dataframes by "by". Then apply `filter` to consider only those rows which `pos` is between `from` and `from+width`. Finally take sum of `x` column.

``````dat1 %>% inner_join(dat2, by = "by") %>%
filter(from <= pos & pos < (from + width) ) %>%
group_by(by, from, width ) %>%
summarise(RS = sum(x)) %>%
select(width, from, by, RS)

# A tibble: 8 x 4
# Groups: by, from [4]
# width  from by       RS
# <int> <int> <chr> <int>
# 1     2     1 A         9
# 2     3     1 A        16
# 3     2     2 A        12
# 4     3     2 A        15
# 5     2     1 B         6
# 6     3     1 B         9
# 7     2     2 B         7
# 8     3     2 B        12
``````

data

``````dat1 <- read.table(text =
"width  from  by
2      1     A
3      1     A
2      2     A
3      2     A
2      1     B
3      1     B
2      2     B
3      2     B", header = TRUE, stringsAsFactors = FALSE)