Series indicating third weekday in a month

I would like to create a pandas Series that indicates whether a certain date - which is supposed to be the index of the Series - is a third friday in a month.

My idea is to create the Series first with zeroes as values and then changing those zeroes to ones where the index is a third friday in a month. This is my approach which seems to work.

import pandas as pd
import numpy as np

# 8 years of data
dates = pd.date_range("2010-01-01","2017-12-31")

# create series filled with zeroes over those 8 years
mySeries = pd.Series(np.zeros(len(dates)),index=dates)

# iterate over series and change value to 1 if index is a third friday in a month
for index,value in mySeries.iteritems():
    if index.weekday() == 4 and 14 < index.day < 22:
        mySeries[index] = 1

# if the sum of the series is 96 (1 third friday per month * 12 months * 8 years) then the solution should be correct
print(sum(mySeries))

I would be interested to see other and easier solutions as well of course.

1 answer

  • answered 2018-03-20 16:13 jezrael

    Use non loop faster solution with weekday and day, convert boolean mask to int with Series contructor:

    dates = pd.date_range("2010-01-01","2017-12-31")
    
    days = dates.day
    s1 = pd.Series(((dates.weekday == 4) & (days > 14) & (days < 22)).astype(int), index=dates)
    print (s1.sum())
    96
    
    print (s1.head())
    2010-01-01    0
    2010-01-02    0
    2010-01-03    0
    2010-01-04    0
    2010-01-05    0
    Freq: D, dtype: int32
    

    Timings:

    In [260]: %%timeit
         ...: mySeries = pd.Series(np.zeros(len(dates)),index=dates)
         ...: 
         ...: # iterate over series and change value to 1 if index is a third friday in a month
         ...: for index,value in mySeries.iteritems():
         ...:     if index.weekday() == 4 and 14 < index.day < 22:
         ...:         mySeries[index] = 1
         ...: 
    The slowest run took 5.18 times longer than the fastest. This could mean that an intermediate result is being cached.
    100 loops, best of 3: 2.68 ms per loop
    Compiler time: 0.31 s
    
    In [261]: %%timeit
         ...: days = dates.day
         ...: s1 = pd.Series(((dates.weekday == 4) & (days > 14) & (days < 22)).astype(int), index=dates)
         ...: 
    1000 loops, best of 3: 603 ┬Ás per loop