Easily Create One Pandas DataFrame from Mulitple Excel Files

In this post, I’ll show you how you can quickly and easily read and combine multiple excel files into one Pandas DataFrame.

I typically work with clients who send the same data that is broken up across multiple excel files. Sometimes a client breaks up larger data sets over multiple files to make it easier to work with or they send updated data at a future date. Regardless of why, I need to combine them into one DataFrame to start an analysis.

Here are the Python libraries we will need to make the code work:

import pandas as pd
import os
import glob

To make this code dynamic, I use os and glob libraries to find all the excel files in the current directory:

path = 'your/path'
extension = 'xlsx'
os.chdir(path)
excels = glob.glob('*.{}'.format(extension))

Now that we have all the excel files, we can loop through each file using a list comprehension. Each of the DataFrames will be stored in a new variable. There are a number of arguments that can be passed into the pd.read_excel function which can be found here. I typically use the header argument because my header row is seldom the top row.

combined_excels = [pd.read_excel(i)for i in excels]

Finally, to combine all the individual DataFrames, we can leverage pandas concat function.

df = pd.concat(combined_excels)

Here’s the entire code:

import pandas as pd
import numpy as np
import os
import glob

path = 'your/path'
extension = 'xlsx'
os.chdir(path)
excels = glob.glob('*.{}'.format(extension))

combined_excels = [pd.read_excel(i)for i in excels]

df = pd.concat(combined_excels)

Posted

in

by

Tags: