In this post, I’ll walk you through exporting tickets, users, and organizations from Zendesk using the API and Python for data analysis.
The full code is below followed by an explanation. You can also view the full code in Google Colab.
The main python package used for this is Zenpy. As a side note, your authentication method might be different than what I have below in lines 1 to 5.
creds = { 'email' : '', 'token' : '', 'subdomain': '' } # Import the Zenpy Class from zenpy import Zenpy import requests import pandas as pd import datetime as dt #TODO change default start date def export_data(start_date = dt.datetime(2010, 1, 1, tzinfo=timezone.utc)): tickets_json = '' tickets_df = pd.DataFrame() org_df = pd.DataFrame() users_df = pd.DataFrame() stream_status = False while stream_status == False: tickets_json = '' #clears the previous api pull result_generator = zenpy_client.tickets.incremental(start_time=start_date, include=['users','organizations','groups']) for ticket in result_generator: tickets_json += ticket.decode('utf-8', 'ignore') # ticket data comes over as bytes so has to be converted df = pd.json_normalize(json.loads(tickets_json)['tickets']) tickets_df = pd.concat([tickets_df,df]) df = pd.json_normalize(json.loads(tickets_json)['organizations']) org_df = pd.concat([org_df,df]) df = pd.json_normalize(json.loads(tickets_json)['users']) users_df = pd.concat([users_df,df]) print(str(len(tickets_df)) +" Tickets Imported, " + str(len(org_df)) + " Orgs Imported, " + str(len(users_df)) + " Users Imported, ") start_date = (json.loads(tickets_json)['end_time']) stream_status = json.loads(tickets_json)['end_of_stream'] print("starting {}".format(dt.datetime.fromtimestamp(start_date).strftime("%b-%d-%Y"))) time.sleep(1) if stream_status == True: break org_df = org_df.drop_duplicates(subset=['id']) #duplicates created from side loading users_df = users_df.drop_duplicates(subset=['id']) return tickets_df, org_df, users_df #Run the funciton and export the data as a Pandas DataFrames tickets_df, org_df, users_df = export_data() #Merge the data tickets_users_df = pd.merge(tickets_df, users_df, how='inner', left_on='submitter_id', right_on='id', suffixes=['_tickets','_users']) tickets_users_orgs_df = pd.merge(tickets_users_df, org_df, how='left', left_on='organization_id_tickets', right_on='id', suffixes=['','_orgs']) tickets_users_orgs_df.to_csv('tickets_export.csv') tickets_users_orgs_df
Breaking this Down
Authenticate your Zendesk Account using Zenpy
Zenpy is a python package that makes it easy to access the Zendesk API. The link will show you how to install Zenpy and how to authenticate your account. In my code, I am simply using the API token.
Zendesk Incremental Exports
The Zendesk Incremental Exports option is how you can bulk export using the API. One caveat is that it only allows you to export 1,000 items at a time. Because of that, you will have to continuously hit the API until you’ve returned everything that meets your criteria. The criterion is a timestamp that fetches any tickets that were changed or were created since the timestamp given. I am using the tickets Let’s first look at the response we get when we hit the API.
tickets_json = '' result_generator = zenpy_client.tickets.incremental(start_time=dt.datetime(2021,1,1, tzinfo=timezone.utc), include=['users','organizations','groups']) for ticket in result_generator: tickets_json += ticket.decode('utf-8', 'ignore') # ticket data comes over as bytes so has to be converted pd.json_normalize(json.loads(tickets_json))
The JSON response from running the above has a few parts to it. My primary goal is to return a list of tickets. In addition to that, I want to know information about the user, organization, and group. To include that information, you can take advantage of Side-Loading and use this parameter in your request include=['users','organizations','groups']
.
For some reason, the response I get back from using Zenpy is in bytes. So I set up an empty string variable, convert the bytes to string, then concatenate the strings it to the variable. Once I do that, I can convert the string to JSON then take advantage of pd.json_normalize()
which is a Pandas function that easily converts JSON data to a DataFrame.
In my JSON response, there are a number of data items that are returned:
- tickets – this is the primary data I am returning from the API request
- count – the number of tickets returned
- users – user data associated with the tickets
- groups – group data associated with the tickets
- organizations – organization data associated with the tickets
- end_of_stream – returns True or False depending on if this request API request returned everything you asked for. Since I wanted all tickets since January 1st, 2021 and there were more than 1,000, the request returned False.
- next_page – the url to get the next set of results. However, since we are using Zenpy, I am not using this to get the next page
- end_time – this timestamp value lets me know how far the request got before having to stop due to the 1,000 ticket export. This is what I’ll pass into the next request to get the next set of tickets
Add Data to DataFrame
Next, I’ll convert the tickets, users, and organizations into their own DataFrame and concatenate them to an empty DataFrame. As we loop through the data, our DataFrames will get the new data added to them.
tickets_df = pd.DataFrame() org_df = pd.DataFrame() users_df = pd.DataFrame() df = pd.json_normalize(json.loads(tickets_json)['tickets']) tickets_df = pd.concat([tickets_df,df]) df = pd.json_normalize(json.loads(tickets_json)['organizations']) org_df = pd.concat([org_df,df]) df = pd.json_normalize(json.loads(tickets_json)['users']) users_df = pd.concat([users_df,df])
The While Loop
Since I don’t know how many API calls I need to get all my data, I leverage a while loop to continue hitting the API until the end_of_stream
returns True
. Focusing just on the while loop, here is the code that is setting my stream_status
variable that is being changed after every API call.
while stream_status == False: stream_status = json.loads(tickets_json)['end_of_stream'] if stream_status == True: break
Set up the Next API Call
In the response, we’ll capture the end_time
to use in the next API call and we will keep overwriting it as we loop through the DataFrame.
My start_date
variable in the function is the same one I identified in the function parameters. I did this so that I could overwrite it with each API call. Once the API call finishes, it takes this end time as the start time for the next call.
start_date = (json.loads(tickets_json)['end_time'])
Remove Duplicates
The final thing I do is remove the duplicates from the organizations and user’s DataFrame. Since the API call is tickets focused, you’ll get the organization and user for each ticket. The more tickets you get from the same organization over a long time horizon, the more duplicates you’ll get.
org_df = org_df.drop_duplicates(subset=['id']) #duplicates created from side loading users_df = users_df.drop_duplicates(subset=['id'])
Merge the data
Now that we have returned all the ata from the API call, we can merge the data so that we have all the data points in one view.
tickets_users_df = pd.merge(tickets_df, users_df, how='inner', left_on='submitter_id', right_on='id', suffixes=['_tickets','_users']) tickets_users_orgs_df = pd.merge(tickets_users_df, org_df, how='left', left_on='organization_id_tickets', right_on='id', suffixes=['','_orgs']) tickets_users_orgs_df.to_csv('tickets_export.csv') tickets_users_orgs_df
Final Thoughts
Check out more Python tricks in this Colab Notebook or in my recent Python Posts.
Thanks for reading!