Stephen Childs

Can't Keep Me From My Data

One of the reasons the web is really the best platform is the ability to inspect the HTML behind every page. Because the content has to display in a web browser and that browser has developer tools, you can get the HTML behind any page fairly easily.

While my work is primarily about analyzing data, my role sits well within higher education administration. As an administrator, many of the tools that I use have a front end that is a web application. Even our email, documents and presentations can be accessed through a web interface.

I'm sure none of this is surprising or news to anyone. Everybody uses these tools and probably doesn't think about it at all. But it does mean you can always get at the data being displayed to you.

I was faced with a web application that had exactly the data I wanted, but no way to download it. So I immediately open the web inspector and discover that it's not a table. Instead, it's using CSS Grid to create a table. However, the cells each have a particular class so it's easy to use Beautiful Soup to find all those tags.

The one wrinkle I learned pretty quickly was that not all the data is loaded into the web page. I think the entries are dynamically added to the DOM as you scroll to them. That meant I had to run my script several times, scrolling through the list.

That meant I had several csv files named dataX.csv where X is an integer. These files could also contain duplicate entries, which I wanted to get rid of. I found the following command that worked perfectly:

(head -n 1 data1.csv && tail -n +2 -q data*.csv | sort -u) > merged_data.csv

The head command grabs the header from the first file. (Because they were generated by my script, the files all have the same header.) The tail command strips out the header from all the files. Then we sort and remove duplicates.

CIRPA Conference

I was honoured to be asked to be the Program Chair for the 2025 Canadian Institutional Research and Planning Association (CIRPA) conference. It's a chance to work on a great conference and with a wonderful co-chair and volunteer team.

Naturally, I accepted immediately.

This year's conference will be held virtually. Many of my colleagues working in Institutional Research and Planning do not get to attend our annual conference due to high travel expenses. This has been exacerbated lately due to a lot of cuts to professional development budgets.

The CIRPA board decided to try bringing a virtual conference into our geographical rotation. We had virtual conferences in 2020 and 2021 and we resumed in-person conferences in 2022.

The virtual conference is a great opportunity to bring people together from across the country. That's one reason I decided a theme of Cross Canada IRP would be appropriate.

You can learn more about the conference at CIRPA.

First Post

This project has been in the works for almost a year now. While this is a simple site, I had never set up an actual website before.

I needed to set up another simple website, so that had me setting up servers, learning how to configure a web server and set up certificates. With that done, there was no reason not to set up my own site.

My goal was always to own, understand and control the technology stack of my website. I now understand why so many people use the existing platforms. There is so much that goes into setting up even a simple website and if your goal is to get your writing out to the world, fiddling with all the little details will just slow you down.

I'm hoping this blog will give me a chance to write about my interests in higher education, data, board games, tabletop role-playing games and other pursuits.

My goal is to be a part of the smaller web and try to take back some of that technology for my own purposes. As always, the views expressed here are my own personal ones and don't reflect those of my current employer or any past employer.

May 2, 2025

Can't Keep Me From My Data

March 23, 2025

CIRPA Conference

March 22, 2025

First Post