Pandas Series from dictionary/ summarized DataFrame

Here is some dummy data to show what my initial table looks like:

import pandas as pd
import numpy as np

df=pd.DataFrame(columns=["Col_A","Col_B","Col_C"])
df.loc['2'] = pd.Series({"Col_A":.001,"Col_C":.4566})
df.loc['3'] = pd.Series({"Col_A":.25, "Col_B":.46})
df.loc['4'] = pd.Series({"Col_A":1, "Col_C":2})
df.loc['5'] = pd.Series({"Col_C":5.5})

Comes out like this:

    Col_A  Col_B   Col_C
2   0.001    NaN   0.4566
3   0.250    0.46   NaN
4   1.000    NaN   2.0000
5     NaN    NaN   5.5000

I would like to create a new dataframe with a summary of the non Null fields like so:

                        Column
Index
  2      'Col_A: 0.001 units; Col_C: 0.4566 units'
  3      'Col_A: 0.250 units; Col_B: 0.46 units'
  4      'Col_A: 1.000 units; Col_C: 2.0000 units'
  5      'Col_C: 5.5000 units'

I got as far as making this:

{'3': [('Col_A', 0.25), ('Col_B', 0.46)], '2': [('Col_A', 0.001), ('Col_C', nan)], '5': [('Col_C', nan)], '4': [('Col_A', 1.0), ('Col_C', nan)]}

But I cant figure out how to put it in the form above.

Thank you for your help.

2 answers

  • answered 2018-03-20 18:32 cᴏʟᴅsᴘᴇᴇᴅ

    Option 1

    1. melt your dataframe (wide-to-long conversion)
    2. Drop NaNs using dropna
    3. combine column names with values using agg + str.join
    4. combine all values by index using groupby + agg + str.join

    (df.reset_index()
       .melt('index')
       .dropna()
       .set_index('index')
       .astype(str)
       .agg(': '.join, 1)
       .add(' units') 
       .groupby(level=0)
       .agg('; '.join)
    )
    
    index
    2    Col_A: 0.001 units; Col_C: 0.4566 units
    3       Col_A: 0.25 units; Col_B: 0.46 units
    4         Col_A: 1.0 units; Col_C: 2.0 units
    5                           Col_C: 5.5 units
    dtype: object
    

    Note that this returns a Series.


    Option 2

    Replacing the melt with stack from Option 1;

    (df.stack()
       .astype(str)
       .reset_index(level=1)
       .agg(': '.join, 1)
       .add(' units')
       .groupby(level=0)
       .agg('; '.join)
    )
    
    2    Col_A: 0.001 units; Col_C: 0.4566 units
    3       Col_A: 0.25 units; Col_B: 0.46 units
    4         Col_A: 1.0 units; Col_C: 2.0 units
    5                           Col_C: 5.5 units
    dtype: object
    

  • answered 2018-03-20 18:32 Wen

    Using stack

    df.apply(lambda x : x.name+":"+x.astype(str)+' units').mask(df.isnull()).stack().groupby(level=0).apply(';'.join)
    Out[856]: 
    2    Col_A:0.001 units;Col_C:0.4566 units
    3       Col_A:0.25 units;Col_B:0.46 units
    4         Col_A:1.0 units;Col_C:2.0 units
    5                         Col_C:5.5 units
    dtype: object