Coffee Space


Listen:

RSS Reader Wishlist

Preview Image

In a previous article I complain about Thunderbird silently corrupting my RSS feeds and then in another article I complain about Akregator using way too much RAM. The purpose of this article is to design a new feed reader that solves the problems of the old feed reader.

I’ve broken this down into two different articles because the previous one was supposed to be more critical and this one is supposed to be more positive.

Functionality

Here I want to define some functionality, starting off with the absolute minimum and then some nice-to-have functionality.

Back to Basics

First of all, let’s define the basic functionality:

  • OPML Import/Export - This seems like the easiest, but actually it is not. The OPML file format is really ill defined and it seems like there is no real standard here. (I really did try to find a standard when implementing the Thunderbird recovery tool.)
  • Track Read/Unread - We want to know what is read/unread, especially if we are reading hundreds of RSS feeds a day.
  • Open Content Externally - We don’t want to re-invent the wheel, content viewing should mostly happen externally. Viewing content ‘internally’ should be mostly considered a preview, rather than the ideal experience.
  • Regular Fetching - It needs to be able to fetch new content periodically and merge that into the existing collection of data.

Archive

This will be the ability to archive content, which we define here as:

  • Storage - I want to archive all the content that arrives embedded into the feed itself, as websites go down over time and links rot.
  • Search - It’s no good keeping an archive if you can’t search it, and this requires an indexing method. I think keeping this simple is a good idea.
  • Compression - This will all likely take up quite some disk space, so disk compression is required. It should be simple, fast and offer good compression.

HTML

This will involve viewing the actual content of the feed itself:

  • Open externally - This will always be the preferred method for processing content. The less the reader has to do, the simpler it is.
  • Basic - I propose this is the absolute minimum of support, mostly just spitting out plain text. Think the lynx web browser with slightly better formatting.
  • External content - As a security feature, external content will not be loaded. I notice that most RSS feed readers have massive issues with security and tracking. It’s like they are 10 years behind current web browsers and it’s not even a good experience.
  • Javascript - Absolutely not. There is no way this can be supported safely. Also JS is likely to be superseded very soon by WebASM.
  • CSS - This is the one thing I could be tempted to support, but it does open up questions that I’m not sure I want to delve into.

Podcasts

This will be the audio content that arrives (apparently people actually do prefer some specific podcast UI):

  • Open externally - This will always be the preferred method for processing content. The less the reader has to do, the simpler it is.
  • Download/Stream - Some people may choose to stream the content rather than downloading it, many audio formats offer this ability. This can ultimately save on precious disk space - especially if you only ever intend to listen once.
  • Play/Pause - Apparently the ability to be able to play/pause audio - and remember this information long-term is really quite useful. This will likely require some audio library to help with processing the audio.
  • Show notes - Many podcast comes with show notes, this can be useful in looking up references made in the audio itself.
  • Bookmarks - Apparently some audio content leaves bookmarks that can be useful to somebody who wants to skip to appropriate parts of the audio. This is stretch and reach as this is really not well supported.

More

  • Text-to-speech reader - Another feature that will be really cool to have is text-to-speech (TTS), an idea I got recently from a HackerNews post.

Implementation

Generally I will be looking towards nothings’ single_file_libs collection, as a great resource of simple, but mostly complete single-header C/C++ implementations. They do what they say on the tin, with little to no dependencies.

I think the best way to implement this would be in two main parts:

This has mostly been inspired by git, where you have the cli and gui (git gui) - which are independent of one another.

Basics

  • XML parser - Many parsers out there exist, a few single-header examples exist that are relatively complete.
  • HTTP/S library - Given the need to be able to request RSS feeds from HTTP, HTTPS (both HTTP/1.1) and possibly HTTP/2 - I should use a modern library. This will likely be libcurl, which is unfortunately quite a large library but entirely required.

Database

  • JSON - I’m quite a big fan of JSON for the purpose of being human readable and easy to work with. It’s also a great way to store a tonne of arbitrary information in a relatively well structured way - unlike a real database for example there is no need to predict data types or structures.

Archiving

  • Compression library - For this I am thinking to use lz4 - the compression ratios are nothing to scoff at. It does tend to work better with larger amounts of data, anything less than about 200 bytes doesn’t seem to be worth the overhead.

User Interface

  • UI library - I quite like the look and simplicity of microui, fixed size memory, ANSI C, basic UI elements and can be plugged into pretty much any drawing back-end.

Media

The simplest idea will be to just export content with a given extension and/or mime type to an external program. This is ultimately the most foolproof way of handling media.

For specific types of media we can try to offer an in-app experience.

HTML
  • Custom rolled library - This is one thing I intend to write myself, I really want the absolute simplest of markup for the articles.
Podcasts
  • Audio library - This would be one of the easiest to implement, but will greatly increase code complexity. Perhaps a lightweight library like those offered by dr_libs could be used.
  • Piped audio - One way to implement streaming and audio position estimation could be to pipe the audio to a process. One issue to overcome would be in ensuring that there is enough data to keep the process busy, whilst not overfilling the buffer. It’s also not exactly clear how audio control would be performed.

Future

Going forwards I will need to see if my issues with Akregator are truly unresolvable - building an RSS feed reader from scratch, whilst something I can achieve, is quite an undertaking - not something I currently have time for.