1 Year Rolling mean pandas on column date
I would like to compute the 1 year rolling average for each line on the Dataframe below
index id date variation 2313 7034 2018-03-14 4.139148e-06 2314 7034 2018-03-13 4.953194e-07 2315 7034 2018-03-12 2.854749e-06 2316 7034 2018-03-09 3.907458e-06 2317 7034 2018-03-08 1.662412e-06 2318 7034 2018-03-07 1.346433e-06 2319 7034 2018-03-06 8.731700e-06 2320 7034 2018-03-05 7.145597e-06 2321 7034 2018-03-02 4.893283e-06 ...
For example, I would need to calculate:
- mean of variation of 7034 between 2018-03-14 and 2017-08-14
mean of variation of 7034 between 2018-03-13 and 2017-08-13
but I got the error message:
ValueError: invalid on specified as date, must be a column (if DataFrame) or None
How can I use the pandas
rolling() function is this case?
I believe this should work for you:
# First make sure that `date` is a datetime object: df['date'] = pd.to_datetime(df['date']) df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']
pd.DataFrame.rollingwith datetime works well when the
dateis the index, which is why I used
df.set_index('date')(as can be seen in one of the documentation's examples)
I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.
Arguably Better Solution:
[EDIT] As pointed out by Mihai-Andrei Dinculescu,
freqis now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for: