How to Compare Two Pandas DataFrame Columns

Comparing columns in a DataFrame is essential when trying to concatenate two Pandas DataFrames with a lot of columns.

One way to complete this task is to convert each of the DataFrame columns to a list then compare the elements in each list to each other.

Consider the following two DataFrames:

The DataFrame columns are identical except for the last columns in each.

This simple function converts the columns to a list then returns the three possible comparisons:

  1. The columns that are in both DataFrames
  2. The columns not in the first DataFrame but in the second DataFrame
  3. The columns not in the second DataFrame but in the first DataFrame
def compare_columns(x,y):
  columns1 = x.columns.tolist()
  columns2 = y.columns.tolist()

  same_columns = list(set(columns1).intersection(columns2))
  columns_not_in_first = list(set(columns2) - set(columns1))
  columns_not_in_second = list(set(columns1) - set(columns2))

  result = f"same colums: {same_columns}\ncolumns not in first dataframe: {columns_not_in_first}\ncolumns not in second dataframe: {columns_not_in_second}"

  return result

Let’s run the function:

print(compare_columns(df1,df2))

Final Thoughts

Check out more Python tricks in this Colab Notebook or in my recent Python Posts.

Thanks for reading!


Posted

in

by

Tags: