Dataframe column based on 4 conditions, nested np.where

The dataframe with which I am working has 4 possible combinations over 2 columns and several hundred groups.

| Group |   Before   |    After   |
|:-----:|:----------:|:----------:|
|   G1  |  Injection |  Injection |
|   G1  |  Injection | Production |
|   G1  | Production |  Injection |
|   G1  | Production | Production |

There are 3 pre-calculated columns which need to be pulled based on the Before/After combination as seen below.

| Group |   Before   |    After   |         Output         |
|:-----:|:----------:|:----------:|:----------------------:|
|   G1  |  Injection |  Injection |        df['DTI']       |
|   G1  |  Injection | Production | df['DTWF'] + df['DTP'] |
|   G1  | Production |  Injection | df['DTWF'] + df['DTI'] |
|   G1  | Production | Production |        df['DTP']       |

I have tried nesting multiple np.where's

np.where(df['Before'] == 'Injection' & df['After'] == 'Injection', df['DTI'],
np.where(....))

Which resulted in:

ValueError: either both or neither of x and y should be given

and nesting multiple np.logical:

np.where(np.logical_and(df['Before'] == 'Injection' & df['After'] == 'Injection'), df['DTP'])

Which resulted in:

the truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have reached the upper limit of what I can do and need some ideas!

2 answers

  • answered 2018-03-20 19:17 Floydian

    One way to do this is using apply function:

    Assuming your DataFrame is in variable df you can do:

    import pandas as pd
    
    df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"],
                            "After": ["Injection", "Production", "Injection", "Production"]})
    def get_output(x):
        if x['Before'] == 'Injection' and x['After'] == 'Injection':
            return 'DTI'
        elif x['Before'] == 'Injection' and x['After'] == 'Production':
            return 'DTWF + DTP'
        elif x['Before'] == 'Production' and x['After'] == 'Injection':
            return 'DTWF + DTI'
        elif x['Before'] == 'Production' and x['After'] == 'Production':
            return 'DTP'
    
    df['Output'] = df.apply(get_output, axis=1)
    

  • answered 2018-03-20 19:17 Graipher

    Before["Injection"] does not do what you think it does. In the code you showed it is not even defined.

    What you probably want is this:

    # df definition, skipping Group because it is not needed here
    df = pd.DataFrame(data={"Before": ["Injection", "Injection", "Production", "Production"], "After": ["Injection", "Production", "Injection", "Production"]})
    
    df["Output"] = "DTI"  # Use one of the cases as default
    df.loc[(df["Before"] == "Injection") & (df["After"] == "Production"), "Output"] = "DTWF + DTP"
    df[(df["Before"] == "Production") & (df["After"] == "Injection"), "Output"] = "DTWF + DTI"
    df[(df["Before"] == "Production") & (df["After"] == "Production"), "Output"] = "DTP"
    print(df)
    #         After      Before      Output
    # 0   Injection   Injection         DTI
    # 1  Production   Injection  DTWF + DTP
    # 2   Injection  Production  DTWF + DTI
    # 3  Production  Production         DTP
    

    If you have many of these combinations, using apply as suggested in the other answer might be more appropriate.

    If you have many rows it might make sense to save the boolean indices (e.g. df["Before"] == "Production") to variables and just do

    before_prod = df["Before"] == "Production"
    after_prod = df["After"] == "Production"
    df.loc[before_prod & after_prod, "Output"] = "DTP"
    ...
    

    If you also only have those two states, you can get the second one (almost) for free by using the unary negation operator ~:

    df.loc[before_prod & ~after_prod, "Output"] = "DTWF + DTI"