Pandas dataframe: find closest greater value in a row

I need to generate 5000 random values from 0 to 1 and for each find closest greater value of "sum" column and put this row in my new dataframe.

my old dataframe:

Probability  sum  
0.008773     0.008773  
0.008715     0.017488  
0.007244     0.024732  
0.006997     0.031730

So it will be new dataframe with 5000 rows from old one.

1 answer

  • answered 2018-03-20 14:51 YOLO

    You can try this:

    ## sample data
    sudo = pd.read_fwf(StringIO(u'''
    Probability  sumt  
    0.008773     0.008773  
    0.008715     0.017488  
    0.007244     0.024732  
    0.006997     0.031730
    '''),header=1)
    
    # get sum values
    sl = sudo.sumt.values.tolist()
    
    # create random sample of 5000 values between 0 and 1
    np.random.seed(10)
    df = pd.DataFrame({'randoms': list(np.random.random(5000))})
    
    # get closest value 
    df['random_map'] = df['randoms'].apply(lambda x: min(sl, key= lambda y: abs(y - x)))
    
    print(df.head(10))
    
        randoms     random_map
    0   0.771321    0.031730
    1   0.020752    0.017488
    2   0.633648    0.031730
    3   0.748804    0.031730
    4   0.498507    0.031730
    5   0.224797    0.031730
    6   0.198063    0.031730
    7   0.760531    0.031730
    8   0.169111    0.031730
    9   0.088340    0.031730