Function to Combine Several Pandas Series into a Dataframe

I have a Dataframe of financial values (df_price). I compute moving averages on one of the dataframe's columns/series (using a simple function, MA), and then I create a new dataframe (df_indicators), which contains each of the moving averages as a column/series.

I have to do this same basic operation repeatedly for many different indicators and dataframes so I'd like to create a function (call it bundle_indicators) to do it.

Basically, I'd like to call bundle_indicators with three arguments:

  1. A list of the indicator names,
  2. a list (or series) of values for each of the indicators,
  3. df_price so that bundle_indicators can use it's index when creating the dataframe.

I'd like bundle_indicators to return a dataframe with each of the columns/series named after one of the indicators and each row/index representing that indicator's value.

Below is how I currently do it. It runs without errors but I'd like to replace the last paragraph with a function. I've tried everything I can think of but get errors, usually related to passing the arguments. I'd really appreciate anyone's help as I've been at this for quite a while now.

import numpy as np
import pandas as pd


# Create a new dataframe
df_price = pd.DataFrame({
    'Date': ['1993.01.29', '1993.02.01', '1993.02.02', '1993.02.03', '1993.02.04', '1993.02.05', '1993.02.08', '1993.02.09', '1993.02.10', '1993.02.11'], 
    'Open': [43.80, 43.80, 44.05, 44.17, 44.67, 43.80, 44.05, 44.17, 44.67, 44.92], 
    'High': [43.80, 44.05, 44.17, 44.67, 44.92, 43.80, 43.80, 44.05, 44.17, 44.67], 
    'Low': [43.55, 43.80, 43.92, 44.17, 44.55, 43.80, 44.05, 44.17, 44.55, 44.89], 
    'Close': [43.80, 44.05, 44.17, 44.55, 44.89, 43.55, 43.80, 43.92, 44.17, 44.55],
    'Volume': [1007786, 482696, 202220, 531820, 533930, 1007786, 482696, 202220, 531820, 533930]
})


# Moving Average funtion
def MA(lb, frame):
    prices = frame['Close']
    mavg = []

    for i in range(len(prices)):
        if i < lb:
            mavg.append(0)
        else:
            sum_array = prices[(i - lb): (i + 1)] 
            mavg.append(np.mean(sum_array))
    return mavg


# Calculate the moving average for three different lookback periods: 1, 2, 4
mavg_fast = MA(1, df_price)
mavg_med = MA(2, df_price)
mavg_slow = MA(4, df_price)

# Create a new df_indicators dataframe, using df_price's index
# TODO REPLACE THIS WITH A FUNCTION THAT RETURNS A DATAFRAME 
df_indicators = pd.DataFrame({'mavg_fast': mavg_fast}, index = df_price.index)
df_indicators = df_indicators.assign(mavg_med= mavg_med)
df_indicators = df_indicators.assign(mavg_slow = mavg_slow)

print(df_indicators)

2 answers

  • answered 2018-03-20 21:44 MattB

    It looks as though you end up with a list of values the same length as the dataframe. You can assign these values as a new column in the dataframe by simply doing df_price['new_column']=list_of_values This might be what you require.

  • answered 2018-03-20 21:44 Parfait

    Consider building your dataframe with a dictionary comprehension to be passed into DataFrame() call where you iterate through the indicators name and values list elementwise with zip and map keys and values to migrate as columns and rows:

    def bundle_indicators(indicators_name, values_list, df):        
        output = pd.DataFrame({k:v for k,v in zip(indicators_name, values_list)},
                              index=df.index)        
        return output
    
    df_indicators_new = bundle_indicators(['mavg_fast', 'mavg_med', 'mavg_slow'],
                                          [mavg_fast, mavg_med, mavg_slow],
                                          df_price)
    
    print(df_indicators_new)
    #    mavg_fast   mavg_med  mavg_slow
    # 0      0.000   0.000000      0.000
    # 1     43.925   0.000000      0.000
    # 2     44.110  44.006667      0.000
    # 3     44.360  44.256667      0.000
    # 4     44.720  44.536667     44.292
    # 5     44.220  44.330000     44.242
    # 6     43.675  44.080000     44.192
    # 7     43.860  43.756667     44.142
    # 8     44.045  43.963333     44.066
    # 9     44.360  44.213333     43.998
    
    # COMPARISON WITH ORIGINAL OUTPUT
    print(df_indicators.eq(df_indicators_new))
    #    mavg_fast  mavg_med  mavg_slow
    # 0       True      True       True
    # 1       True      True       True
    # 2       True      True       True
    # 3       True      True       True
    # 4       True      True       True
    # 5       True      True       True
    # 6       True      True       True
    # 7       True      True       True
    # 8       True      True       True
    # 9       True      True       True