In this post, I’ll show you how you can quickly and easily read and combine multiple excel files into one Pandas DataFrame.
I typically work with clients who send the same data that is broken up across multiple excel files. Sometimes a client breaks up larger data sets over multiple files to make it easier to work with or they send updated data at a future date. Regardless of why, I need to combine them into one DataFrame to start an analysis.
Here are the Python libraries we will need to make the code work:
import pandas as pd
import os
import glob
To make this code dynamic, I use os
and glob
libraries to find all the excel files in the current directory:
path = 'your/path'
extension = 'xlsx'
os.chdir(path)
excels = glob.glob('*.{}'.format(extension))
Now that we have all the excel files, we can loop through each file using a list comprehension. Each of the DataFrames will be stored in a new variable. There are a number of arguments that can be passed into the pd.read_excel
function which can be found here. I typically use the header
argument because my header row is seldom the top row.
combined_excels = [pd.read_excel(i)for i in excels]
Finally, to combine all the individual DataFrames, we can leverage pandas concat
function.
df = pd.concat(combined_excels)
Here’s the entire code:
import pandas as pd import numpy as np import os import glob path = 'your/path' extension = 'xlsx' os.chdir(path) excels = glob.glob('*.{}'.format(extension)) combined_excels = [pd.read_excel(i)for i in excels] df = pd.concat(combined_excels)