Looping a function through Pandas dataframe with iterrows

My goal here is to transform the data in one dataframe and output the results to a new dataframe. Here's what I have so far, using a simplified dataframe:

import math
import pandas as pd
data = {'A':[1,4,3,5,7],'B':[0,6,3,0,2],'C':[1,1,3,0,4]} #sample data
df = pd.DataFrame(data) 
transDF = pd.DataFrame() #empty dataframe for results

def Chord(y): #Chord transformation function
    ySUM = sum(a*a for a in y)
    ySUMsqrt = math.sqrt(ySUM)
    yPRIME = []
    for a in y:
        RESULT = a/ySUMsqrt
        yPRIME.append(RESULT)
    return yPRIME

for Yi, row in df.iterrows(): #my attempt at a loop
    Yrow = df.loc[df.index == Yi]
    y = yRow.values.tolist()
    tfRow = float(Chord(y))
    transDF = transDF.append(tfRow)

The function itself works if I just feed it a list, but when I try the loop I get an error that says "can't multiply sequence by non-int of type 'list'". I've tried modifying my loop as many different ways as I can think of, but at this point I'm out of thoughts. I would greatly appreciate any insight!

2 answers

  • answered 2017-06-17 20:02 Scott Boston

    IIUC, I don't think need iterrows for this problem.

    import math
    data = {'A':[1,4,3,5,7],'B':[0,6,3,0,2],'C':[1,1,3,0,4]} #sample data
    df = pd.DataFrame(data) 
    transDF = pd.DataFrame() #empty dataframe for results
    
    def Chord(y): #Chord transformation function
        ySUM = sum(a*a for a in y)
        ySUMsqrt = math.sqrt(ySUM)
        yPRIME = []
        for a in y:
            RESULT = a/ySUMsqrt
            yPRIME.append(RESULT)
        return yPRIME
    
    transDF = df.apply(Chord)
    print(transDF)
    

    Output:

         A         B        C
    0  0.1  0.000000  0.19245
    1  0.4  0.857143  0.19245
    2  0.3  0.428571  0.57735
    3  0.5  0.000000  0.00000
    4  0.7  0.285714  0.76980
    

  • answered 2017-06-17 20:02 MaxNoe

    Your code is really inefficient. Looping over rows in pandas is almost always unnecessary and looping over single elements should even more rare.

    Make use of numpys vectorisation!

    import pandas as pd
    import numpy as np
    
    def chord_transform(row):
        return row / np.sum(row**2)
    
    data = {'A':[1,4,3,5,7],'B':[0,6,3,0,2],'C':[1,1,3,0,4]} #sample data
    df = pd.DataFrame(data)
    df_chord = df.apply(chord_transform, axis=1)