Having random or test data is a great way to test out various functions before applying them to actual data. Here are a few ways to generate random or test data in pandas.
Creating a Random DataFrame
Simply run pd.util.testing.makeDataFrame()
and you’ll have a 30×4 DataFrame.
import pandas as pd
df = pd.util.testing.makeDataFrame()
df
Using Numpy
Using Numpy allows us to be more specific with our DataFrame requirements. We can set the range of values, number of values, and the number of columns.
DataFrame with One Column
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(100,size=(1000, 1)),columns=['A'])
DataFrame with Multiple Columns
df2 = pd.DataFrame(np.random.randint(100,size=(1000, 3)),columns=['A','B','C'])
df2
Time Series Data
If you want to create a time series data with random numbers, then you can combine the pandas.date_range function and the numpy.random.randint functions.
dates = pd.date_range(start='1/1/2018', periods=12, freq='M')
rand_nums = np.random.randint(12,size=(12, 1))
df = pd.DataFrame(rand_nums,index=dates,columns=['A'])
df
Economic and Financial Data
I find that using economic data or stock price data is a really easy way to get great datasets with time-series data. The pandas data reader package makes it really easy to get this data using free public APIs.
pip install pandas-datareader
In the example below, I am grabbing US GDP from the St.Louis FED (FRED) database.
import pandas_datareader as web
import datetime as dt
df = web.DataReader('GDP','fred')
df
If you want stock prices, you can leverage the public API from Tiingo.
import os
import pandas_datareader as pdr
df = pdr.get_data_tiingo('AAPL', api_key=os.getenv('tiingo_api_key'))
df.head()
My API key is stored in my bash profile. To see how to set that up, you can check out my post – How to Store Passwords in a Bash Profile and Retrieve Them in a Python.
Thanks for reading. Happy coding!