Excel was the first tool I learned when I started my career in investment banking and I continued to rely on it for many years after. It’s a powerful program and its ubiquity in the workplace makes it required skill to know. However, if you want to take your data analysis to the next level, consider learning Python.
I started my career in investment banking where I pretty much spent all day making reports. Excel and PowerPoint were my tools. After a few months, I was comfortable with the steps needed to update our operational reports. The process was completed manually which consisted of copying data from one excel to another, applying the same data manipulations, recreating the same charts, pasting into a PowerPoint slide, and then adding some dull commentary. It’s not a very glamorous life.
I didn’t know that there was an alternative for the longest time. Everyone was doing it this way so it must be the best way to do accomplish the task.
Manipulating data in Excel was a constant theme in my next jobs and a few people recommended that I look into Python. I was also looking for the next step in my career. That next step for me was Data Science and the foundation of it is knowing Python. Here are the biggest reasons why I switched to Python.
Working with Python Packages
Python packages are code files that some other Python developer or developers have made. You can think of them like a Chrome Extension. They essentially allow you to leverage code that someone else put together to complete a task.
Since Python has been around for a while and is one of the most popular coding languages, a lot of people have developed tools that make life so much easier and means you don’t have to start from scratch.
Some of my favorite packages include:
- Pandas and Numpy allow you to work with data just as you would in Excel
- Beautiful Soup for scrapping websites
- Requests for interacting with APIs
- Openpyxl for Excel files
- Simple Salesforce for reading, writing, and deleting data on Salesforce
The number of packages is endless. If you search for something then append python, you’ll probably find someone who has developed a package for your task.
Work with all Data Sources in One Place
I rarely have a project that relies on a single data source. This always has made updating a report a nightmare because updating typically means running queries and copying and pasting results into an excel file. All very manual tasks that can result in various errors if the process isn’t documented well or if the data changes. Common data sources for me include:
- A SQL database
- A CRM like Salesforce
- Jira
- External data from a customer
- Internal Excel or Google Sheets documents
- Data from websites
- PDF documents
Python makes it so easy to be able to link to any of these data sources. For some of the data sources, Python developers have created packages that make connecting very easy. Some sources like Salesforce or Jira have an API that allow you to connect.
My typical flow was to run a saved query in Sequel Pro, copy the results to an Excel file. Then I would go to Salesforce, run a report, export the results as a CSV, and paste them into my Excel file. That gets to be a time-consuming process every time I wanted to work with data. I have found that, while it may be possible to connect with external data sources in Excel, it’s not easy.
By directly connecting to a data source, updating the data can be done with a click of a button or automatically.
Automation
Any repetitive task is a candidate to be automated. This can be anything from refreshing a data source, data cleaning, scraping websites, etc. For the data analyst out there, it means you just have to hook up your data source one and create your data cleaning steps once and it will be applied to any new data.
Working with Large Files
Excel only supports data with 1,048,576 rows. This means that you can’t work with large sets of data without having to use an add-on like Power Pivot or use an Access Database. Because of this limitation, it just adds time and complexity to an analysis in Excel.
Even if your data is under that row limit, you’ll discover that your excel file becomes huge and is really slow to work with. I’ve lost countless hours trying to do a VLOOKUP on a large dataset only to have my excel freeze.
In contrast, you can load any size dataset into Python and you can do it in one line of code.
pd.read_csv('file_name.csv')
This means that you can get a very large file and run some basic statistics on it within a few minutes.
I received a CSV from a customer with millions of rows. Attempting to open a CSV file that large in excel, will result in losing data because it will only keep the first 1,048,576 rows. My solution at the time was to break it up into smaller pieces using a text editor. That didn’t go so well.
Google Sheet files can especially become Frankensteins. What starts out as a simple analysis, becomes a large complicated, and messy analysis. Especially relevant if multiple people start working on it. Ultimately what happens is that the file becomes so slow to work with. I have Google Sheets I work with that take sometimes ten minutes to fully compute all the changes.
Other minor gripes with Excel
Excel Crashes. All the time.
I have lost years of my life when I have forgotten to save and my excel file just crashed.
Excel really only works well on a PC
The Mac version of excel is pretty bad. It’s missing so many key features.
Python is Free
That’s right. Free. No monthly subscription. So what’s the catch? It takes time to learn Python. It’s not something you’ll learn over a weekend. It can take months to years to learn it well, depending on your background.
Conclusion
If you are passionate about numbers and data analysis, I highly recommend learning Python. There are a ton of resources available to get you started. personally, I used CodeCademy to start my Python journey and highly recommend it. They do an excellent job of explaining the fundamentals and provide stellar interactive lessons. However, there are a lot of great free resources as well.
Thanks for reading!