Coffee Space


Listen:

Thunderbird RSS Corruption

Preview Image

This is a story of RSS feeds. Yes, the supposedly simply RSS format, that has been around for many years. How hard can it be to parse a simple RSS XML feed? Apparently so hard that two RSS feed readers cannot actually do it reliably.

Problem

A while back I noticed that my RSS feeds would fail to update in Thunderbird, but I didn’t think so much of it. Slowly but surely the problem got worse and worse, until some time back in June it stopped pulling feeds entirely. First of all, I would like to make a few quick points:

  1. I run an up-to-date version of Ubuntu and the packaged Thunderbird, so this is supposed to be stable software.
  2. I don’t run Thunderbird with any extensions. This is just using bog-standard functionality.
  3. I doubt I am a power user, I suspect that most people who use RSS feeds use them about the same amount as I do. If you’re hardcore enough to use RSS, you really use RSS.

Of course I went to Mozilla’s website and the eventually was forced to create a Matrix account (because they removed IRC), and sat in the help rooms for a while. Despite being there for several days, there was no help.

Realising there was no help, I attempted to export the OPML file (a “standard” format for RSS feed lists) and it failed. Huh. I eventually figured out where on the disk the RSS feeds were actually stored and found out that Thunderbird had corrupted the file. This meant that my RSS feeds had been completely lost entirely and I had no back-ups.

Finally having something to search for, I checked online again, to find other people had asked about the same thing, but did not get an answer. They eventually just gave up.

In retrospect I believe I know what caused the corruption: Lack of disk space. You would think that Thunderbird would be robust to such a problem, but apparently not.

A Hack

I wasn’t giving up on years of accumulated RSS feeds so quickly. At the very least I wanted to export an OPML somehow. Luckily for me, I found that there was a possible route for recovery:

This is a solution to the problem where in your ~/.thunderbird/<ID>.default/Mail/Feeds/ directory you can find a valid feeditems.json, but not a valid feeds.json file.

I decided to write a small piece of Java to process the JSON file and export an valid OPML with the data available.

It’s not pretty, but it works.

And so it’s open sourced over on GitLab! (This is because I don’t appreciate GitHub using my source code to feed its Copilot AI without express permission.)

Akregator

After looking around, I found Akregator - which seemed to be based on C++ and therefore quite likely to be performant. I want something that doesn’t eat RAM for breakfast and is just simple. Therefore I set it up and my RSS feeds sprung back to life:

It has some UI flaws, such as needing to do alot of clicking to get anything done, some crashing, not so great searching and no real import options:

These were all things I am willing to put up with for a reliable feed reader though… Except. I then discovered that Akregator is using tonnes of RAM and was causing my entire system to completely lock-up:

I’ve since been working with the Akregator developers to hunt down and debug the problem.

Going Forwards

I had a crazy idea - I could probably do a better job than both Thunderbird and Akregator:

I know people may say: “Why not fix the bugs in the existing feed readers?” Well, for several reasons:

  1. Features - Ultimately, they want to support tonnes of features to please all customers. Actually, what I want is something simple and RAM efficient - our goals are entirely different. They want a do all piece of software, I want a simpler piece of software that does one thing well. Both Thunderbird and Akregator for example have entire web engines compiled in for processing HTML, CSS and Javascript. This is what I have a web browser for.
  2. Performance - I want something that could easily run on the PineTab or PinePhone. It has to be CPU, RAM and disk space conscious.
  3. Touch - I believe the UI should be touchscreen friendly, these days we see more touch devices rather than less. I started working on a touchscreen conscious window manager a while back now.
  4. Podcasts - Something that recently came to my attention is the want for a Podcast focused RSS reader - something I hadn’t even previously considered. This seems like an entirely reasonable piece of functionality.
  5. HTML - One thing that has always annoyed me is that there is no way to specify HTML tags to build up a feed. Many websites offer a list of pages in plain HTML and it should be possible to say something like: “Turn all <ul> items on this page into separate feed entries”. This would massively open up the number of pages that could have an arbitrary feed created for them!

Anyway, I think there is a space for a new feed reader without stepping on the toes of existing readers. I’ll write another article soon to to start laying out expected features.