Coffee Space


Listen:

Move Fast, Break Everything

Preview Image

This article is to just to vent about modern software’s mantra of “move fast and break stuff” has negatively effected just the last few weeks for me.

Ubuntu

Ubuntu had been a great stable desktop operating system for years. I originally picked the OS for it’s stability, as I wanted a Linux desktop that simply didn’t get in the way. If I wanted a technically better, but pain in the arse, I would go with Arch. But fixing my OS regularly is simply not good enough for working.

Ubuntu was all good and well - and then along comes snap. I believe the reason for implementing snap was to simplify the build process for Ubuntu based systems. A package could pick the dependencies it requires and deploy to all systems, without worrying about package conflicts. There are some problems though:

  1. Snap upgrades and restarts your programs ‘randomly’ - If you want to run snap packages, there is really not a choice in this. Docker has something similar. Sure, this means you get up-to-date packages, but the cost is that whatever you happen to be working on will be randomly restarted.
  2. Snap program restarts don’t happen on reboot - Even if you switch off or reboot your system, snap will not refresh itself. Windows updates are annoying for sure, but they at least have the courtesy of being performed on restart. Snap just happens whenever snap lines, meaning you cannot make any guarantee your OS is stable for more than a few days at a time.
  3. Snap packages are massive - Preventing these package conflicts means that in theory each of your packages can request their own versions of each library. For a single program this is not so much issue, but for many this gets really bad. You end up eating both more RAM and disk.
  4. Snap security hinders programs - One thing snap does for security is limits the files and folders that a given program can access, which includes hardware. Whilst this sounds great, suddenly you find that your browser can no longer access the video camera for an important video call, or you can’t load a file from disk for some important submission.

Anyway, for these reasons and more, I am quite annoyed with the direction Ubuntu is going in. I really want to switch away from this OS now.

Nvidia

There is no getting away from it, Nvidia hardware is technically pretty good, these days offering significant advantage if they can be utilized in certain types of computation. For me these is neural networks, where deep learning experiments are reduced from days to hours.

Before travelling, Ubuntu forced me to update my machine, as it had become too unstable to continue normal operation. Fine. One apt-get update; apt-get upgrade, the system completely refuses to bring up a GUI and the kernel comes up sometimes. This may not have been so much of an issue normally, but I am now in a foreign location with no WiFi or Ethernet to fix this issue.

I ended up having to bridge a connection via a device I could connect to the internet - but this still didn’t help. In the end I had to disable Nvidia entirely, Xorg server could then start on Intel graphics. I then completely purged and reinstalled the Nvidia graphics, at which point it magically ‘works’ (with some issues of course, this is still Nvidia).

We were really considering buying into the Nvidia Jetson (or better) platform for embedded deep learning. This, and a few other experiences, has seriously asked us to reconsider. The idea of developing for both ARM and Nvidia, via Ubuntu, wakes me up in cold sweats at night.

I’m also in another predicament. I desperately need a new laptop - but the idea of using a Nvidia machine, potentially paired with a backdoor’ed Intel CPU disgusts me, to the point where anything may be better.

ROS2

As part of an autonomous robotics research project, we use ROS2. So far so good, except it caused headache after headache.

One of the good points to using ROS2 is code re-use, standardized talking between modules (messages) and a lot of the ground-work has been done for you. Perhaps, this is where the good points end.

There doesn’t appear to be a standardized practice for deployment, meaning many people appear to be building on the target device. This is all good and well, but come and talk to me when you’ve got to re-compile Tensorflow on an old ARM CPU, 10 minutes before your code is supposed to be ready to rock and role.

Subscribers and publishers are a bit of a mess in general. It is far better than ROS 1 for sure, but the messages are generally a super pain in the arse to deal with. You can’t make any guarantees that if you publish something that it is picked up, or that if you subscribe to something that it comes from a specific place. There is some quality of service (QoS) stuff, but this still appears to need more work.

Then there is the bugged core code offered. We spent days debugging executors where they enter a spin() loop and do not return. How on earth anybody else used these without running into issues is beyond me. I thought that perhaps “it is me, not ROS2”. After speaking to others, they also ran into exactly the same issue - and I believe even created a merge request to fix it, without it being accepted.

Bare in mind, scheduling actions is not that fucking hard. I wrote my own in a few days, and it works flawlessly - and is far more advanced in motion control (but not QoS or callbacks).

Tensorflow

As I mentioned, we use Tensorflow for a vision pipeline. I was tasked with getting an old Tensorflow (v1) project to compile. Holy shit. I would go as far as to say it is near un-buildable. You know your process is bad when people write tools specifically to help people build your shit from source. This should set off alarm bells for all involved.

Tensorflow is far too large for what it offers us. Compared to a project like Darknet, that also makes use of deep learning (in this case YOLO CNNs) for object detection, it is far lighter and easier to compile. We are talking about a minute vs hours.

Anyway, getting Tensorflow to compile was a super pain in the arse, and I couldn’t bring myself to figure out how to get that shite to perform training on the GPU.

Note: In the future I would really like to see Darknet using OpenGL for acceleration, it would allow for support for embedded device GPU acceleration, which is pretty cool.

Firefox

I’ve quite enjoyed using the Firefox browser for years now (although this is starting to change). With NoScript and uBlock Origin it becomes quite usable on the modern web 1.

Recently I have to upgrade my OS in order to install Ubuntu updates and Nvidia drivers for CUDA. Ubuntu silently decides to install the snap version of Firefox. If it wasn’t annoying enough already, I then get a notification that I should close Firefox within X days to allow snap to update Firefox.

I reboot my entire machine several times, but apparently a complete reboot is not enough for snap to perform an upgrade. It then decides the best time is not when I rebooted my machine, but instead randomly whilst I am travelling in another Country.

This might not be so bad, except it completely forgets your tabs and browsing history, as snap enforces these into a different location for ‘security’. I lose years of tabs, with no way to recover 2.

Learned Lessons

What I’ve learned specifically about the software:

  1. Ubuntu must go - Snap is a complete deal breaker for me. It is unacceptable for my Linux machine to decide when it potentially wants to break my system with an upgrade. How can I, for example, possibly run long-running experiments on an Ubuntu machine? How could I rely on it for building/deployment? Every single upgrade they manage to break something and I need to figure out how to run it again. Snap has single-handedly killed Ubuntu for me.
  2. Nvidia is shite and there is nothing we can do - There is a good reason why Linus Torvalds refuses to merge Nvidia drivers into the kernel - they are written like shit. Working is not the same as good. Even when running an identical driver on two slightly different machines, the experience varies. A friend and I has to install Nvidia Cuda libraries and drivers, which left his machine unbootable and mine unable to suspend.
  3. ROS2 is not yet mature - And there may still not be a better alternative. Part of me believes that ROS2 would seriously benefit from going back to a simpler time - really simplify the idea of a node, subscriber and publisher. Whilst QoS, ActionServer/ActionClient, etc, is theoretically really cool - it’s simply broken.
  4. Tensorflow is too large to be stable - The truth of it is, Tensorflow is massive and you as an individual are not nearly big enough for Google to care about as a stakeholder. They have tonnes of different internal stakeholders and it’s near impossible to keep all of them happy. They really do not care if they break your specific application.
  5. Firefox is on its way out - It seems like all the things that used to make Firefox good is now being thrown out of the window. In terms of alternatives, something like the SerenityOS browser is now Acid 3 compliant, proving that a small bespoke browser can rival modern browsers.

What’s I’ve learned about software development in general:

  1. Minimise points of failure - Generally, if you use less libraries, smaller libraries, then the chances of things failing tend to be much smaller.
  2. Old is gold - I don’t want software with tonnes of updates, I want software that updates occasionally, and reliably. It’s the same for the libraries I use - I don’t want to re-program the thing that uses a library every time I want to pull in the latest changes.
  3. If you didn’t write it, assume it could be broken - This is sadly a truth I’ve had with tonnes of software. I assumed that these large software vendors would have their shit together, but the opposite is actually true.

Lastly, fuck Ubuntu. In your race to become the most stable and usable desktop, you broke everything else. I’m done with you.


  1. I have tonnes of other stuff, but that’s an article for another day.↩︎

  2. Yes, I also tried manually copying over the JSON files that are supposed to store the sessions.↩︎