Reading Excel Workbooks with DuckDB
I've been using DuckDB a lot more lately It's a database, but one focused on analytics. And it's very simple to install and use.
I've been using it more an more since they added the UI extension. It really lets you just dive right into using it with a notebook interface. Also, the notebook contents is stored in a database file and is automatically versioned.
I started another data analysis today. Not a very strange occurrence since it is my job. Since I've been working on database stuff and I didn't want to shift gears, I wanted to load the data into DuckDB and work on it there.
It was easy enough to import the excel sheet and everything seemed to work. I verified that it had imported the data by checking some of the summary stats I had been provided with. All good.
But then I noticed something strange: there seemed to be a lot of missing data. Then I noticed that the table created by reading the excel file had over 4 million rows. And this from a spreadsheet with under 2 000 rows.
I specified the range incorrectly, I believe. So it just added all those empty rows. Fun times!
Posted on .