1 Year Rolling mean pandas on column date

I would like to compute the 1 year rolling average for each line on the Dataframe below


index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2018-03-12  2.854749e-06
2316    7034    2018-03-09  3.907458e-06
2317    7034    2018-03-08  1.662412e-06
2318    7034    2018-03-07  1.346433e-06
2319    7034    2018-03-06  8.731700e-06
2320    7034    2018-03-05  7.145597e-06
2321    7034    2018-03-02  4.893283e-06

For example, I would need to calculate:

  • mean of variation of 7034 between 2018-03-14 and 2017-08-14
  • mean of variation of 7034 between 2018-03-13 and 2017-08-13

  • etc.

I tried:


but I got the error message:

ValueError: invalid on specified as date, must be a column (if DataFrame) or None

How can I use the pandas rolling() function is this case?

1 answer

  • answered 2018-03-20 15:20 sacul

    I believe this should work for you:

    # First make sure that `date` is a datetime object:
    df['date'] = pd.to_datetime(df['date'])
    df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']

    using pd.DataFrame.rolling with datetime works well when the date is the index, which is why I used df.set_index('date') (as can be seen in one of the documentation's examples)

    I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.

    Arguably Better Solution:

    [EDIT] As pointed out by Mihai-Andrei Dinculescu, freq is now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for:


    You can take a look at the resample documentation for more details on how this works, and this link regarding the frequency arguments.