Coffee Space


Listen:

Cannot Boot, ACPI Error

Preview Image

Preview Image

Problem

Using Ubuntu as a core day-to-day system is generally pretty relaxed, occasionally there is some issue every now and again but these are easily solvable. Imagine my shock the other day when after a seemingly harmless upgrade of my machine I was unable to boot my computer, with initially no error text being displayed and seemingly no way to get into the grub menu.

Solution

I'm running (according to "About This Computer"):

ubuntu 16.04 LTS
Memory:    15.6 GiB
Processor: Intel® Core™ i7-6700HQ CPU @ 2.60GHz × 8
Graphics:  GeForce GTX 960M/PCIe/SSE2
OS type:   64-bit
Disk:      109.1 GB

This should be a relatively well supported setup by now, this is definitely within the realm of a machine that a large number of people will own. One issue I do have is that this laptop is a custom build and didn't come with an OS, so no out of the box driver support, although it's basically an MSI steel series machine.

Booting

So I'm sitting there with a black screen, no error text, no hard drive activity and no keys work (including magic keys). Great. Let's get ourselves into grub and see what the issue is. Online suggests holding left shift on boot gets you into grub - cats arse it does. I found the only way to get this working semi-reliably is to boot the computer, control+alt+shift when it looks like it's starting too boot, restarting the machine, which you can then hold left shift and get into the grub menu.

Now we go through the motion of trying different kernels with different modes (recovery, upstart, etc) or different ages. No help. Okay, so at least we know it's unlikely it's a kernel issue, reverting to an older kernel almost always works.

Graphics

My initial thought is that the graphics card is messing up again, NVidia run a complete shit show when it comes to supporting Linux. Usually a major bug will appear in their drivers and it takes several release versions for it to be fixed. Most of the time I seriously doubt any kind of testing is happening, despite a growing number of their customers now doing coin mining and artificial intelligence on their graphics cards, usually using Linux. It's very disappointing that so little effort is put into maintaining a decent level of operability.

Assuming there is some issue with the graphics card, we set nomodeset on the linux kernel parameters line and try to boot by pressing e.

Hmm, now we get the error:

ACPI : EC: Fail in evaluating the _REG object of EC device. Broken bios is
suspected

Fiddling with several kernel parameters I also got an issue complaining about intel_sda and PCI interrupts...

At this point, I'm not really sure whether it's some issue with the NVidia drivers and I know they are flakey, so I purge everything NVidia related from my system with no success. At least then I know it's not causing these issues.

BIOS

So maybe somehow the BIOS got corrupted? Sounds unlikely, but I checked it out anyway. I go online to check for a BIOS update file - doesn't seem as if there are any for my specific system, again this custom issue. I really can't be bothered to go through this process and suspect that this issue stems from elsewhere.

ACPI

So it seems as though ACPI is mainly used for power management, I guess to get the computer started we can just ignore this completely? Much searching later, we have some additional kernel parameters:

  • acpi=off
  • acpi=noirq
  • acpi=strict
  • pci=noacpi

I tried acpi=off, bingo we boot! Let me just login to my account and... Crash! Awesome, at least we got a login screen. From my original adventures with this system and getting everything working, I remembered that the embedded Intel graphics for whatever reason just failed to work correctly, causing random crashes. Rather than debug them when setting up, I wanted to just make use of the far superior graphics card laying inside.

Packages

At this point, I'm starting to wonder whether some packages were corrupted during the update for some unknown reason. I run debchecksums - nothing. I run the package repair tool via grub - nothing. I manually test important programs by getting them to display some output, all start correctly. Hmm. It seems as though all of my packages are okay (had an issue caused by corrupted packages in the past).

Graphics V2

Hmm okay, should just be a simple case of just installing NVidia drivers and... Dammit! For some reason, installing NVidia's provided drivers doesn't claim the hardware device when running lshw. Reading lots of online material, I found you actually have to add NVidia's PPA and then install a version from there - it seems as if the Ubuntu package manager doesn't contain the complete story.

After installing and purging multiple times, the driver finally claims the hardware. This makes absolutely no sense and I wish anybody doing this good luck. Even worse is that last time, it was a different one that worked. I have absolutely no idea about the rhyme or reason to this.

Graphics V3

We finally login - awesome! Suspend the laptop, bring it back - ah :/ Now we have white borders around all of the windows that looks like graphics memory corruption. Looking online, this is a known issue that NVidia finally got around to sorting in the later versions (that was nice of them (sarcasm)). Updating to a newer version (that previously wouldn't work at all) now solved this issue.

Getting there, or so I thought!

ACPI V2

Now there's no battery management? Wait, now there's no power management at all? When we set acpi=off earlier, we disabled everything to get around the booting issue. Trying the other modes, reveals that acpi=strict allows us to boot, whilst providing basic power functionality such as battery information, proper low power suspend, etc.

Discussion

So to wrap up, this was all the result of doing an upgrade to my system. As a normal user, I probably would have left Linux over this or at the very least reformatted. I have some friends that use Linux regularly who say they just completely reinstall Linux entirely instead of upgrading! Apparently that is the only method they could find that avoids all of this crap. Well, I'm not so easily put off (fortunately), but I do have to admit that it was extremely tempting to re-flash my drive at one point. This is a really poor show by Linux in general.

I'm half tempted to go the way of my friends, just always assume that the OS is the variable, mount all data on it's own drive and maintain a script to install all my favourite settings/programs. At least that way I may be able to avoid some disappointment when it all breaks!

Let me know of your issues below: