Comparing columns in a DataFrame is essential when trying to concatenate two Pandas DataFrames with a lot of columns.
One way to complete this task is to convert each of the DataFrame columns to a list then compare the elements in each list to each other.
Consider the following two DataFrames:
The DataFrame columns are identical except for the last columns in each.
This simple function converts the columns to a list then returns the three possible comparisons:
- The columns that are in both DataFrames
- The columns not in the first DataFrame but in the second DataFrame
- The columns not in the second DataFrame but in the first DataFrame
def compare_columns(x,y): columns1 = x.columns.tolist() columns2 = y.columns.tolist() same_columns = list(set(columns1).intersection(columns2)) columns_not_in_first = list(set(columns2) - set(columns1)) columns_not_in_second = list(set(columns1) - set(columns2)) result = f"same colums: {same_columns}\ncolumns not in first dataframe: {columns_not_in_first}\ncolumns not in second dataframe: {columns_not_in_second}" return result
Let’s run the function:
print(compare_columns(df1,df2))
Final Thoughts
Check out more Python tricks in this Colab Notebook or in my recent Python Posts.
Thanks for reading!