Adding columns to a pandas dataframe based on values of another column

This is part of an ongoing series of issues I'm having trying to condense a csv file with multiple rows for each client based on the number of medical services they received. For each service, they have a row. I've included the dataframe at the bottom.

I'm trying to calculate how many times a client (identified with an ID_profile number) got each type of service and add that to a column named for the type of service. So, if a client got 3 Early Intervention Services, I would add the number "3" to the "eisserv" column. Once that is done, I want to combine all the client rows into one.

Where I'm getting stuck is populating 3 different columns with data based off one column. I am trying to iterate through the rows using some strings for the function to compare to. The function works, but for reasons I can't understand, all the strings change to "25" as the function works.

import pandas as pd
df = pd.read_csv('fakeRWclient.csv')

df['PrimaryServiceCategory'] = df['PrimaryServiceCategory'].map({'Referral for Health Care/Supportive Services': '33', 'Health Education/Risk reduction': '25', 'Early Intervention Services (Parts A and B)': '11'})

df['ServiceDate'] = pd.to_datetime(df['ServiceDate'], format="%m/%d/%Y")
df['id_profile'] = df['id_profile'].apply(str)
df['served'] = df['id_profile']  + " " + df['PrimaryServiceCategory']

df['count'] = df['served'].map(df['served'].value_counts())
eis = "11"
ref = "33"
her = "25"
print("Here are the string values")
print(eis)
print(ref)
print(her)
df['herrserv']=""
df['refserv']=""
df['eisserv']=""
for index in df.itertuples():
    for eis in df['PrimaryServiceCategory']:
        df['eisserv'] = df['count']
    for her in df['PrimaryServiceCategory']:
        df['herrserv'] = df['count']
    for ref in df['PrimaryServiceCategory']:
        df['refserv'] = df['count']
print("Here are the string values")
print(eis)
print(ref)
print(her)

Here is the output:

Here are the string values
11
33
25
Here are the string values
25
25
25
  id_profile ServiceDate PrimaryServiceCategory     served  count  herrserv  
\
0        439  2017-12-05                     25     439 25      1         1   
1     444654  2017-01-25                     25  444654 25      2         2   
2      56454  2017-12-05                     33   56454 33      1         1   
3      56454  2017-01-25                     25   56454 25      2         2   
4     444654  2017-03-01                     25  444654 25      2         2   
5      56454  2017-01-01                     25   56454 25      2         2   
6      12222  2017-01-05                     11   12222 11      1         1   
7      12222  2017-01-30                     25   12222 25      3         3   
8      12222  2017-03-01                     25   12222 25      3         3   
9      12222  2017-03-20                     25   12222 25      3         3   

   refserv  eisserv  
0        1        1  
1        2        2  
2        1        1  
3        2        2  
4        2        2  
5        2        2  
6        1        1  
7        3        3  
8        3        3  
9        3        3  

Why do the string values switch? And is this even the right function to do what I'm hoping to do?

3 answers

  • answered 2018-03-20 15:41 jpp

    You can use pandas.get_dummies after mapping your integers to categories, then merge with your dataframe.

    You can add a 'count' column summing the 3 category counts afterwords.

    df = pd.DataFrame({'id_profile': [439, 444654, 56454, 56454, 444654, 56454, 12222, 12222, 12222, 12222],
                       'ServiceDate': ['2017-12-05', '2017-01-25', '2017-12-05', '2017-01-25', '2017-03-01', '2017-01-01', '2017-01-05', '2017-01-30', '2017-03-01', '2017-03-20'],
                       'PrimaryServiceCategory': [25, 25, 33, 25, 25, 25, 11, 25, 25, 25]})
    
    d = {11: 'eis', 33: 'ref', 25: 'her'}
    df['Service'] = df['PrimaryServiceCategory'].map(d)
    
    df = df.set_index('id_profile')\
           .join(pd.get_dummies(df.drop('PrimaryServiceCategory', 1), columns=['Service'])\
                   .groupby(['id_profile']).sum())
    
    #            ServiceDate  PrimaryServiceCategory Service  Service_eis  \
    # id_profile                                                            
    # 439         2017-12-05                      25     her            0   
    # 12222       2017-01-05                      11     eis            1   
    # 12222       2017-01-30                      25     her            1   
    # 12222       2017-03-01                      25     her            1   
    # 12222       2017-03-20                      25     her            1   
    # 56454       2017-12-05                      33     ref            0   
    # 56454       2017-01-25                      25     her            0   
    # 56454       2017-01-01                      25     her            0   
    # 444654      2017-01-25                      25     her            0   
    # 444654      2017-03-01                      25     her            0   
    
    #             Service_her  Service_ref  
    # id_profile                            
    # 439                   1            0  
    # 12222                 3            0  
    # 12222                 3            0  
    # 12222                 3            0  
    # 12222                 3            0  
    # 56454                 2            1  
    # 56454                 2            1  
    # 56454                 2            1  
    # 444654                2            0  
    # 444654                2            0  
    

  • answered 2018-03-20 15:41 Tony

    I have made changes to your existing code only.

        import pandas as pd
        df = pd.read_csv('fakeRWclient.csv')
    
        df['PrimaryServiceCategory'] = df['PrimaryServiceCategory'].map({'Referral for Health Care/Supportive Services': '33', 'Health Education/Risk reduction': '25', 'Early Intervention Services (Parts A and B)': '11'})
    
        df['ServiceDate'] = pd.to_datetime(df['ServiceDate'], format="%m/%d/%Y")
        df['id_profile'] = df['id_profile'].apply(str)
    
        print(df.groupby('id_profile').PrimaryServiceCategory.count())
    

    Above code will give output like this:

    id_profile
    439       1
    12222     4
    56454     3
    444654    2
    

  • answered 2018-03-20 15:41 Juan Javier Santos Ochoa

    The values of eis, ref and her switch to "25" because you are looping over the variable PrimaryServiceCategory, and the last value in that serie is "25". You are using eis, ref and her as the names of the iterator variable, so they change in every loop. I think this is an inefficient way to do it. It's better if you use groupby and transform:

    df['count'] = df.groupby(['id_profile','PrimaryServiceCategory']).transform('count')