How to Pull Jira using JQL with Python and Pandas

In this post, I’ll walk you through pulling Jira issues from the API using JQL with Python Pandas.

Check out the final complete code in this notebook.

Import Libraries

Here are the libraries we will need:

import requests
from requests.auth import HTTPBasicAuth
import json
import pandas as pd
import datetime as dt
import time
import numpy as np
import re

Add Credentials

Here is where you would enter in your credentials in JSON format:

creds = {
    "url":"",
    "username":"",
    "api_token":""
}

The url variable will look like this: something.atlassian.net. Once you are logged into Jira, you’ll see the url in the address bar.

If you don’t have an API token, you can create one by following these directions.

The Request Function

Here are the variables you will need to change

JQL – Enter in the JQL you would like to use. You can test out your JQL in the Jira search functionality
start_at – The JQL will return a number of results and this parameter is set up to work like the LIMIT function in SQL
end_of_stream – This is set to False and is needed to make the while loop work since we have to keep running the function utnil we get all the issues
issue_lst – Empty list variable to hold the results with each run since we can only retrieve 100 issues at a time
retries – variable that holds that number of retries until a threshold is reached. Max retries is set to 3 times as noted on line 7 below. This is just to make sure you don’t keep hitting the API and returning errors

def get_issues():
  start_at = 0
  end_of_stream = False
  issue_lst = []
  retries = 0

  while (not end_of_stream) and (retries < 4):

    url = f"{creds['url']}rest/api/3/search?startAt={start_at}&maxResults=100"

    headers = {
      "Accept": "application/json"
    }

    auth = HTTPBasicAuth(creds["username"], creds["api_token"])

    query = {
      #TODO: add custom JQL here
      'jql': ''
    }

    response = requests.request(
      "GET",
      url,
      headers=headers,
      params=query,
      auth=auth
    )
    
    if response.status_code == 429: #rate limit hit
      print("Rate limited. Waiting 60 seconds...")
      retries += 1
      time.sleep(60)
    
    elif response.status_code == 401: #unauthorized:
      print("Unathorized. Please update credentials")
      end_of_stream = True
    
    elif response.status_code == 403: #forbidden:
      print("Forbidden")
      end_of_stream = True

    elif response.status_code == 200:
      if response.json()['issues'] == []:
        end_of_stream = True
      else:
        issue_lst.extend(response.json()['issues'])
        start_at = response.json()['startAt'] + response.json()['maxResults']
        # print(start_at)
        # print(end_of_stream)
        total_issues = response.json()['total'] 
        time.sleep(0.2)
    
    else:
      raise Exception(f"Error code {response.status_code}: {response.json()}")
      end_of_stream = True

  return issue_lst

Create a Pandas DataFrame

Now that we have a Pandas DataFrame, we can easily convert the results using pd.json_normalize().

#Run the function
issues = get_issues()

#Create a Pandas DataFrame
df = pd.json_normalize(issues) 
df.head()

Getting Custom Field Names

If you use custom field names, then the get request returns a bunch of data fields with customfield_123. We have to run a separate request to get those:

url = f"{creds['url']}rest/api/3/field"

headers = {
  "Accept": "application/json"
}

auth = HTTPBasicAuth(creds["username"], creds["api_token"])

response = requests.request(
  "GET",
  url,
  headers=headers,
  auth=auth
)

Clean up column Names and Apply to DataFrame

This next code snippet allows us to convert the custom fields names into a DataFrame so that we can replace the column names with a more readable version.

custom_fields_dict = pd.json_normalize(response.json()).query("id.str.contains('customfield')", engine='python')[['id','name']].set_index('id').to_dict()['name']
custom_fields_dict

Now we replace the DataFrame ones:

new_columns = []
for i in df.columns:
  new_columns.append(text_replacement(i))

df.columns = new_columns

df.head()

Final Thoughts

Check out more Python tricks in this Colab Notebook or in my recent Python Posts.

Thanks for reading!

Posted

February 28, 2022

Python

Paul

Tags: