Dynamically Create Pandas DataFrames from a List of Files using a Dictionary and a For Loop

In this post, I’ll show you how to dynamically pull files from a directory, read the content using a For Loop, and finally create DataFrame variables for each file.

I will be using a Google Colab Notebook. They are free Jupyter Notebooks hosted in the cloud. There are a handful of CSV files that are pre-loaded in each notebook which I will be using in the code below.

To start, let’s import the CSV files from our directory:

import os
import glob
import pandas as pd

path = '/content/sample_data/'
extension = 'csv'
os.chdir(path)
files = glob.glob('*.{}'.format(extension))
print(files)

Running that yields a list of the CSV documents in our directory: ['california_housing_train.csv', 'mnist_test.csv', 'mnist_train_small.csv', 'california_housing_test.csv']

Now that we have the files, we can loop through each one to read the contents and create a DataFrame variable. Dynamically assigning variables using a For Loop can be done with the help of a Dictionary. A Dictionary allows us to create a key-value pair for each file where the value is the DataFrame. Here is how we can do that:

  1. Create an empty Dictionary. This will hold our key-value pairs from the For Loop
  2. Start a For Loop using files as our iterator
  3. Assign a name to the key. In my case, I am using the name of the file
  4. Read the DataFrame using pd.read_csv
  5. Add the key-value pair to the dictionary
file_dict = {}

for file in files:
  key = file
  df = pd.read_csv(file)
  
  file_dict[key] = df

And that’s it! If you want to view the DataFrame, pass in the name you assigned to it like this file_dict['california_housing_train.csv'] or file_dict[files[0]]

Final Thoughts

Check out more Python tricks in this Colab Notebook or in my recent Python Posts.

Thanks for reading!


Posted

in

by

Tags: