python pandas how to select rows with common string values and find sum, divide each element?
I have a dataframe called Results3 that looks like:
Tag Exp. m/z Intensity a
file1.xls 1000 10000 True
file1.xls 1100 20000 True
file1.xls 1200 30000 True
file2.xls 2000 11000 True
file2.xls 2100 12000 True
file2.xls 2200 13000 True
file2.xls 2300 14000 True
file3.xls 3000 31000 True
file3.xls 3100 123 True
where Tag = str, and Exp. m/z & Intensity are float64.
My goal is to create another column named norm_intensity, where each element in Intensity is divided by the sum of the intensity values for each tag in order to normalize the intensity values. So for example, for the above table Exp. m/z 1000 of file1.xls, the norm_intensity would be: 10000/(10000+20000+30000). Those in file2.xls would be divided by the sum of their intensity values just from file2.xls group.
I tried finding the sum by:
groups_sum = results3.groupby(results3['Tag'])
Intensity_sum = groups_sum.agg({'Intensity':sum})
But then I can't seem to figure out how to go from here. Eventually, I will be binning results3 by:
bins = np.arange(900, 3000, 1)
groups = results3.groupby([np.digitize(results3['Exp. m/z'], bins), 'Tag'])
I hope to have results3 in this format before I do the binning:
Tag Exp. m/z Intensity Norm_Intensity
file1.xls 1000 10000 0.1666
file1.xls 1100 20000 0.3333
file1.xls 1200 30000 0.5
file2.xls 2000 11000 0.22
file2.xls 2100 12000 0.24
file2.xls 2200 13000 0.26
file2.xls 2300 14000 0.28
file3.xls 3000 31000 0.9960
file3.xls 3100 123 0.00395
1 answer

Let's use
groupby
,transform
andsum
:Result3.assign(Norm_Intensity=Result3.groupby('Tag')['Intensity'].transform(lambda x: x/x.sum()))
Output:
Tag Exp. m/z Intensity a Norm_Intensity 0 file1.xls 1000 10000 True 0.166667 1 file1.xls 1100 20000 True 0.333333 2 file1.xls 1200 30000 True 0.500000 3 file2.xls 2000 11000 True 0.220000 4 file2.xls 2100 12000 True 0.240000 5 file2.xls 2200 13000 True 0.260000 6 file2.xls 2300 14000 True 0.280000 7 file3.xls 3000 31000 True 0.996048 8 file3.xls 3100 123 True 0.003952