In this post, I’ll show you how to dynamically pull files from a directory, read the content using a For Loop, and finally create DataFrame variables for each file.
I will be using a Google Colab Notebook. They are free Jupyter Notebooks hosted in the cloud. There are a handful of CSV files that are pre-loaded in each notebook which I will be using in the code below.
To start, let’s import the CSV files from our directory:
import os import glob import pandas as pd path = '/content/sample_data/' extension = 'csv' os.chdir(path) files = glob.glob('*.{}'.format(extension)) print(files)
Running that yields a list of the CSV documents in our directory: ['california_housing_train.csv', 'mnist_test.csv', 'mnist_train_small.csv', 'california_housing_test.csv']
Now that we have the files, we can loop through each one to read the contents and create a DataFrame variable. Dynamically assigning variables using a For Loop can be done with the help of a Dictionary. A Dictionary allows us to create a key-value pair for each file where the value is the DataFrame. Here is how we can do that:
- Create an empty Dictionary. This will hold our key-value pairs from the For Loop
- Start a For Loop using
files
as our iterator - Assign a name to the key. In my case, I am using the name of the file
- Read the DataFrame using
pd.read_csv
- Add the key-value pair to the dictionary
file_dict = {} for file in files: key = file df = pd.read_csv(file) file_dict[key] = df
And that’s it! If you want to view the DataFrame, pass in the name you assigned to it like this file_dict['california_housing_train.csv']
or file_dict[files[0]]
Final Thoughts
Check out more Python tricks in this Colab Notebook or in my recent Python Posts.
Thanks for reading!