October 24, 2025

Tarsnap

I keep a personal task management system, but I don't keep it particularly well. I feel it has been more observed in the breach than actually used to keep track of critical tasks. However, I have kept it going through life events, job and role changes, COVID-19, moving and general chaos.

One consequence of this is that I have some long standing tasks in that system. These are things I mean to do,but for some reason I just don't. There are usually two reasons whey they stay on my list for a long time.

The first is that I really don't want to actually do it or there is some part of it that I don't want to do. If I don't really want to do the task, why does it stick around? Well, I probably don't want to admit to myself that I'm never going to do it.

The second reason is more interesting. This is where I don't actually have a full picture of what the task entails. This is either because I haven't thought about it enough, or because have misconceptions about it.

That was the case with my task to set up Tarsnap backups. I knew it was an interesting service and that people found it useful. What I didn't know is how to actually use it to make useful backups.

I even purchased a book about it. That helped a bit, but I still didn't know exactly how I should apply it to my use case.

My first attempt was to back up a large amount of data one single time. It worked, but I blew through my money on the service too quickly. It was not a lot of money, the service is inexpensive, but it told me that I had chosen the wrong data for that task. I have since backed up that original data to physical media and stored it at a second location.

I also made the mistake of sending emails from tarsnap to my email archive. I received, but didn't see, several emails warning me I was running out of money and my content would be deleted. So, lots of lessons learned from that first experience.

I think the reason it took me awhile to figure out tarsnap is that I've never really been responsible for either tape backup or regular backup for a work system. Since tarsnap emulates that, I didn't really have a good reference. I had used tar, but mostly for un-archiving--not backing up.

My father's work had a tape backup system. I remember he had to take home the tape backups on a regular basis, just so they were off site. I remember those tapes, they actually had a metal bottom so they looked like they could survive anything.

I also remember his working having a teletype machine, which I found really interesting. We would sometimes get the really wide tractor feed paper with alternating sections of green and white. I think we were always clamouring for more paper to write on.

The part about tarsnap that I didn't get is that the backup tool doesn't dictate how you back up. The consumer-level tools that I'm more familiar with build that in, but with tarsnap you are responsible for picking a name for your backup that has the critical information like what you are backing up and the date of the backup.

Then you need to set up a system to make regular backups: say daily or weekly. It's all up to you to figure out how to prune the old backups according to your schedule.

The nice thing about tarsnap is that it handles the compression and de-duplication of your backup data. I had to think a bit about how to backup a database so that it isn't an opaque binary to tarsnap. And I also learned you want to let the database software ensure that you're not backing up the data mid-transaction.

So, I may have had this task on my list for a long time, but I'm really glad to be finally using tarsnap for something that is very useful for me.

Maybe that means I'll finish off some other long-standing tasks.

October 23, 2025

CIRPA Day Three

So that is a wrap for the main portion of the 2025 CIRPA virtual conference. You can see my posts on day one and day two for more information. This is my first time being a conference chair, and it's really only now dawning on me that I won't have our regular bi-weekly conference planning meetings to go to.

The day started out with a great discussion of modeling, machine learning and AI in Institutional Research roles during our networking session. We had some modeling experts in the room and it was really fascinating. I was glad to see that my workhouse model the logistic regression is still delivering results for people. It was also hard to hear that a lot of the modeling efforts that IR professionals do are still off the side of their desks.

I facilitated a sponsored session by Plaid Analytics and Northern Lights College on using an API to leverage an operational reporting tool to create a data warehouse out of their Student Information System. Since I'm putting together a data warehouse myself, it was a really interesting session for me.

And finally, Phillip Wallace brought down the house with his closing keynote: "Beyond Waffle House". I really loved his presentation and thoughtful answers to all our questions.

And with that, all that's left of the CIRPA conference is the President's Reception. I'll see the rest of the CIRPA folks there tomorrow.

CIRPA Day Two

See my post on the first day of the CIRPA conference.

Today was the second day of the 2025 Virtual CIRPA conference. There were some stand out highlights of the portions of the conference I was able to attend.

I think the biggest highlight was all the discussion at our Annual General Meeting. CIRPA is doing a pilot project for institutional membership where a Canadian Post-Secondary institution can pay one fee to cover all staff memberships. To do this even for one year requires approval from the membership.

There were a lot of thoughtful questions and discussion. The motion passed, so CIRPA will test out the institutional membership.

The other highlight for me was the pair of interactive sessions from Thompson Rivers University on a game to help with strategic foresight. When we started planning the conference, we called for creative submissions -- something a bit different from the regular session format. It makes a big difference and lets the attendees really interact with each other.

I'm looking forward to today's sessions.

October 22, 2025

CIRPA Day One

Yesterday was the first day of the Canadian Institutional Research and Planning 2025 virtual conference. The conference is for professionals who work in Institutional Research and Planning. We are people who count students, report to the public and the government, support data-informed decision making, run surveys and other things.

I am the Program Chair of the conference this year, the first time I'm doing anything quite like this. I really have to thank my co-chair Emily helping me when I stumbled and making sure everything got done.

We have had record setting attendance this year and the first day was filled with amazing engagement from attendees. A virtual conference is a little different, since you can't see the rooms filling up. But when we judge by the attendance at each session and number (and quality) of the questions, we know people are there and engaged. We also had really well attended networking sessions.

Our opening keynote from Madeline Bonsma-Fisher on understanding cycling network access using data really got our attendees in data frame of mind. I approached her to speak at the conference because I wanted to have someone talk about using data in a different domain than we usually do, to get people thinking about the possibilities. She also brought it home for me by showing the importance of infrastructure crossing the Humber River, where my employer's main campus is located, to providing safe cycle networks to destinations on both side of the river.

I have started saving up for a cargo bike.

I have to give a shout out to all our speakers on the first day. I didn't get to see all your sessions live, but I'm looking forward to watching the recordings. Based on the discussions from the volunteer team, they all went wonderfully. I always learn a lot at CRIPA, and yesterday was no different.

Doing a virtual conference is a bit different, and I do miss connecting with my colleagues in person. However, as Dr. Nicole Johnson of the Canadian Digital Learning Research Association pointed out in her talk: people may not always prefer online learning to online, but often it's not a decision between in-person learning and online learning. Often it's a difference between online learning and no learning at all.

This year, I probably would not have been able to attend an in-person conference. I was lucky to be able to go to last year's conference in Fredricton, New Brunswick. I'm not sure I would be able to do that this year.

I am going to try to make it work for next year's conference, through.

I'm looking forward to seeing all of our conference participants tomorrow.

May 2, 2025

Can't Keep Me From My Data

One of the reasons the web is really the best platform is the ability to inspect the HTML behind every page. Because the content has to display in a web browser and that browser has developer tools, you can get the HTML behind any page fairly easily.

While my work is primarily about analyzing data, my role sits well within higher education administration. As an administrator, many of the tools that I use have a front end that is a web application. Even our email, documents and presentations can be accessed through a web interface.

I'm sure none of this is surprising or news to anyone. Everybody uses these tools and probably doesn't think about it at all. But it does mean you can always get at the data being displayed to you.

I was faced with a web application that had exactly the data I wanted, but no way to download it. So I immediately open the web inspector and discover that it's not a table. Instead, it's using CSS Grid to create a table. However, the cells each have a particular class so it's easy to use Beautiful Soup to find all those tags.

The one wrinkle I learned pretty quickly was that not all the data is loaded into the web page. I think the entries are dynamically added to the DOM as you scroll to them. That meant I had to run my script several times, scrolling through the list.

That meant I had several csv files named dataX.csv where X is an integer. These files could also contain duplicate entries, which I wanted to get rid of. I found the following command that worked perfectly:

(head -n 1 data1.csv && tail -n +2 -q data*.csv | sort -u) > merged_data.csv

The head command grabs the header from the first file. (Because they were generated by my script, the files all have the same header.) The tail command strips out the header from all the files. Then we sort and remove duplicates.

March 23, 2025

CIRPA Conference

I was honoured to be asked to be the Program Chair for the 2025 Canadian Institutional Research and Planning Association (CIRPA) conference. It's a chance to work on a great conference and with a wonderful co-chair and volunteer team.

Naturally, I accepted immediately.

This year's conference will be held virtually. Many of my colleagues working in Institutional Research and Planning do not get to attend our annual conference due to high travel expenses. This has been exacerbated lately due to a lot of cuts to professional development budgets.

The CIRPA board decided to try bringing a virtual conference into our geographical rotation. We had virtual conferences in 2020 and 2021 and we resumed in-person conferences in 2022.

The virtual conference is a great opportunity to bring people together from across the country. That's one reason I decided a theme of Cross Canada IRP would be appropriate.

You can learn more about the conference at CIRPA.

March 22, 2025

First Post

This project has been in the works for almost a year now. While this is a simple site, I had never set up an actual website before.

I needed to set up another simple website, so that had me setting up servers, learning how to configure a web server and set up certificates. With that done, there was no reason not to set up my own site.

My goal was always to own, understand and control the technology stack of my website. I now understand why so many people use the existing platforms. There is so much that goes into setting up even a simple website and if your goal is to get your writing out to the world, fiddling with all the little details will just slow you down.

I'm hoping this blog will give me a chance to write about my interests in higher education, data, board games, tabletop role-playing games and other pursuits.

My goal is to be a part of the smaller web and try to take back some of that technology for my own purposes. As always, the views expressed here are my own personal ones and don't reflect those of my current employer or any past employer.