How to Apply a Forward Fill ffill() to Groups in Pandas

In this post, I’ll show you how to apply a forward fill using the ffill() function in pandas and only apply the transformation to a specified grouping.

df["quantity"] = df.groupby('fruit')['quantity'].transform(lambda x: x.ffill())

Further Explanation

Let me break it down to understand the problem and show two different solutions.

The Problem with Forward Filling

Consider the following DataFrame:

We have some historical prices for apples and oranges but nothing for bananas. I would like to do the following transformation:

Carry forward the previous price if the current price is NaN
Only carry forward the price if it’s the same fruit

If we try to do group the values and apply an ffill, the results do carry forward but will spill over into other groups.

fruit_df.groupby(['fruit',pd.Grouper(key='date',freq='D')])['quantity'].mean().ffill().reset_index()

The historical Banana prices now contain prices for Apples. While forward filling is an essential function, we have to be careful when applying the function to data that contains groupings.

Using a Lambda Function

Using a Lambda function, we can apply an ffill function only to the specified groupings in our groupby function. In my example, the prices for bananas still show Nan for every date after applying the transformation.

fruit_df["quantity"] = fruit_df.groupby('fruit')['quantity'].transform(lambda x: x.ffill())
fruit_df

Using a For Loop

This approach works by looping through each group and applying the transformation. There’s more code to this approach and it is slower.

#Set the index as your group
fruit_df.set_index(['fruit','date'], inplace=True)

#Create a list of the unique groups. The fruits in my case
lst = fruit_df.index.get_level_values(level=0).unique()

#Create an empty DataFrame to hold the results
df_new = pd.DataFrame()

#Start by looping and forward filling each group and then applying a concat function to cmbine all the results
for i in lst:
  df_new = pd.concat([df_new,fruit_df.loc[i].ffill()])

Final Thoughts

Check out more Python tricks in this Colab Notebook or in my recent Python Posts.

Thanks for reading!

Posted

February 25, 2021

Python

Paul

Tags: