Pandas data frame logic implementation

I have a dataset having columns:

 `subscribe_date`   `package_id`    `subscription_name` `user_id`   `subscription_status`

subscription_status has values cancelled, active, lapsed, expired, revoked, reactivated

Based on subscription_status value I have to create a column called churn.consider a user to have churned if they ever have a value of "cancelled" or "expired" for their subscription_status.

Some users may appear multiple times with different status values,consider a user to have churned if they ever have a value of "cancelled" or "expired" for their subscription_status at any time.

Here is my code:

# Set a default value of churn as no
 subscriber_data['churn'] = 'no'

# Set churn value for all row indexes as yes which Age are cancelled or expired
subscriber_data['churn'][(subscriber_data['subscription_status'] =="cancelled") | (subscriber_data['subscription_status'] =="expired")] = 'yes'

Now, every user is tagged with either "yes" or "no" or both. How can I proceed further such that if a user has two or more values values "yes" and "no" it should be masked to "yes" in all cases.

Sample data:

subscribe_date   package_id   subscription_name  user_id   subscription_status  churn
10/28/2015 23:29  0903a465-28f7-45b3-9860-12be9deed4ca   14 Day  0002b38f-ec0a-4ee5-8710-9cf54691bb55    cancelled   yes
6/21/2016 21:39  f3a5a639-f4df-4ebd-885d-abea26b37027    30-DayPass  00068201-1d40-4a84-b9bf-f4592aef9ba3    active  no
6/29/2016 19:30  f3a5a639-f4df-4ebd-885d-abea26b37027    30-DayPass  00068201-1d40-4a84-b9bf-f4592aef9ba3    cancelled   yes

1 answer

  • answered 2018-04-14 14:23 ASGM

    You can group the rows by user_id, check whether each row of churn is equal to "yes", transform all the rows of that group accordingly:

    import numpy as np
    df.churn = np.where(df.groupby('user_id')['churn'].transform( \
        lambda x: (x == 'yes').any()), 'yes', df.churn)