How to use Google Sheets as a Basic Web Scrapper

Easily and automatically capture data from websites using some built-in functionality in Google Sheets.

For this post, I am going to use the TSA checkpoint data. This is a really interesting data set that records the number of travelers that are going through the TSA checkpoints and also has the numbers for the previous year.

Google Sheets has a function called IMPORTHTML that makes this all possible in one line of code. The function takes three arguments:

  1. The URL
  2. “Table” or “List”
  3. An index value. This is usually 1 if it’s the first table or list on the page. I play around with this number if it doesn’t return the right data

Taking a quick look at the TSA checkpoint site, you can see that it’s a fairly simple webpage with an HTML table.

All you wold need to do is add this function into a cell to pull the data.

=IMPORTHTML("https://www.tsa.gov/coronavirus/passenger-throughput","table",1)

From here, I added a column to calculate the year over year change in column D.

The data does refresh with new entries on the site. I’m not exactly sure at what interval the data is refreshed but haven’t had a problem with stale data. If you do, I imagine a quick refresh of the page will do the trick.

That’s it! Google Sheets makes it really simple to pull data stored in a simple HTML table or list.


Posted

in

by

Tags: