In this post, I’ll show you how to apply a forward fill using the ffill() function in pandas and only apply the transformation to a specified grouping.
df["quantity"] = df.groupby('fruit')['quantity'].transform(lambda x: x.ffill())
Further Explanation
Let me break it down to understand the problem and show two different solutions.
The Problem with Forward Filling
Consider the following DataFrame:
We have some historical prices for apples and oranges but nothing for bananas. I would like to do the following transformation:
- Carry forward the previous price if the current price is
NaN
- Only carry forward the price if it’s the same fruit
If we try to do group the values and apply an ffill
, the results do carry forward but will spill over into other groups.
fruit_df.groupby(['fruit',pd.Grouper(key='date',freq='D')])['quantity'].mean().ffill().reset_index()
The historical Banana prices now contain prices for Apples. While forward filling is an essential function, we have to be careful when applying the function to data that contains groupings.
Using a Lambda Function
Using a Lambda function, we can apply an ffill
function only to the specified groupings in our groupby
function. In my example, the prices for bananas still show Nan
for every date after applying the transformation.
fruit_df["quantity"] = fruit_df.groupby('fruit')['quantity'].transform(lambda x: x.ffill()) fruit_df
Using a For Loop
This approach works by looping through each group and applying the transformation. There’s more code to this approach and it is slower.
#Set the index as your group fruit_df.set_index(['fruit','date'], inplace=True) #Create a list of the unique groups. The fruits in my case lst = fruit_df.index.get_level_values(level=0).unique() #Create an empty DataFrame to hold the results df_new = pd.DataFrame() #Start by looping and forward filling each group and then applying a concat function to cmbine all the results for i in lst: df_new = pd.concat([df_new,fruit_df.loc[i].ffill()])
Final Thoughts
Check out more Python tricks in this Colab Notebook or in my recent Python Posts.
Thanks for reading!