Apply a function with multiple arguments on an entire dataframe in Pandas
I have the following dataframe in pandas:
df = pd.DataFrame({'field_1' : ['a', 'b', np.nan, 'a', 'c'], 'field_2': ['c', 'b', 'a', np.nan, 'c']}, index=[1,2,3,4,5])
I want to apply the following function on the entire dataframe that replaces each value with something else.
For example:
def func_replace(value, n):
if value == 'a':
return 'This is a'*n
elif value == 'b':
return 'This is b'*n
elif value == 'c':
return 'This is c'*n
elif str(value) == 'nan':
return np.nan
else:
'The value is not included'
so that the final product would look like (given that n=1
).
For example:
df = pd.DataFrame({'field_1' : ['This is a', 'This is b', np.nan, 'This is a', 'This is c'], 'field_2': ['This is c', 'This is b', 'This is a', np.nan, 'This is c']}, index=[1,2,3,4,5])
I tried the following:
df.apply(func_replace, args=(1), axis=1)
and bunch of other options, but it always gives me an error.
I know that I can write a for
loop that goes through every column and uses lambda function to solve this problem, but I feel that there is an easier option.
I feel the solution is easier than I think, but I just can't figure out the correct syntax.
Any help would be really appreciated.
2 answers

Just modify your function to operate at the level of each value in a
Series
and useapplymap
.df = pd.DataFrame({'field_1' : ['a', 'b', np.nan, 'a', 'c'], 'field_2': ['c', 'b', 'a', np.nan, 'c']}, index=[1,2,3,4,5]) df Out[35]: field_1 field_2 1 a c 2 b b 3 NaN a 4 a NaN 5 c c
Now, if we define the function as:
def func_replace(value): if value == 'a': return 'This is a' elif value == 'b': return 'This is b' elif value == 'c': return 'This is c' elif str(value) == 'nan': return np.nan else: 'The value is not included'
Calling this function on each value on the
DataFrame
is very straightforward:df.applymap(func_replace) Out[42]: field_1 field_2 1 This is a This is c 2 This is b This is b 3 NaN This is a 4 This is a NaN 5 This is c This is c

I think you need:
def func_replace(df, n): df_temp = df.replace({r"[^abc]": "The value is not included"}, regex=True) return df_temp.replace(["a", "b", "c"], ["This is a " * n, "This is b " * n, "This is c " * n]) df.apply(func_replace, args=(2,))