Creating a Random or Test DataFrame in Pandas

Having random or test data is a great way to test out various functions before applying them to actual data. Here are a few ways to generate random or test data in pandas.

Creating a Random DataFrame

Simply run pd.util.testing.makeDataFrame() and you’ll have a 30×4 DataFrame.

import pandas as pd

df = pd.util.testing.makeDataFrame()

df

Using Numpy

Using Numpy allows us to be more specific with our DataFrame requirements. We can set the range of values, number of values, and the number of columns.

DataFrame with One Column

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(100,size=(1000, 1)),columns=['A'])

DataFrame with Multiple Columns

df2 = pd.DataFrame(np.random.randint(100,size=(1000, 3)),columns=['A','B','C'])

df2

Time Series Data

If you want to create a time series data with random numbers, then you can combine the pandas.date_range function and the numpy.random.randint functions.

dates = pd.date_range(start='1/1/2018', periods=12, freq='M')
rand_nums = np.random.randint(12,size=(12, 1))

df = pd.DataFrame(rand_nums,index=dates,columns=['A'])

df

Economic and Financial Data

I find that using economic data or stock price data is a really easy way to get great datasets with time-series data. The pandas data reader package makes it really easy to get this data using free public APIs.

pip install pandas-datareader

In the example below, I am grabbing US GDP from the St.Louis FED (FRED) database.

import pandas_datareader as web
import datetime as dt

df = web.DataReader('GDP','fred')

df

If you want stock prices, you can leverage the public API from Tiingo.

import os
import pandas_datareader as pdr

df = pdr.get_data_tiingo('AAPL', api_key=os.getenv('tiingo_api_key'))
df.head()

My API key is stored in my bash profile. To see how to set that up, you can check out my post – How to Store Passwords in a Bash Profile and Retrieve Them in a Python.

Thanks for reading. Happy coding!


Posted

in

by

Tags: