Note: this is a long overdue post. I upgraded some months ago… but I promised myself to blog about my selfhosting adventures, so here you are.
You may know the story… TL;DR
- I wanted to self host my web services.
- I bought a Microserver (N54L).
- I installed Debian stable there, RAID1 (BIOS) + cryptsetup + LVM (/ and swap, /boot in another disk, unencrypted).
- I installed GNU MediaGoblin, and it works!
- When rebooting, the password to unencrypt the disk (and then, find the LVM volumes and mount the partitions), was not accepted. But it was accepted after I shutdown, unplug the electricity, replug, and turn on the machine.
After searching a bit for information about my problem and not finding anything helpful, I began to think that maybe upgrading to Jessie could fix it (recent versions of kernel and cryptsetup…). And the Jessie freeze was almost there, and I also thought that trying to make my MediaGoblin work in Jessie now that I still didn’t upload lots of content, would be a nice idea… And, I wanted to feel the adventure!
Whatever. I decided to upgrade to Jessie. This is the glory of “free software at home”: you only waste your time (and probably not, because you can learn something, at least, what not to do).
Upgrading my system to Jessie, and making it boot!
I changed sources.list, updated, did a safe-upgrade, and then upgrade. Then reboot… and the system didn’t boot.
What happened? I’m not sure, everything looked “ok” during the upgrade… But now, my system even was not asking for the passphrase to unlock the encrypted disk. It was trying to access the physical volume group as if it was in an unencrypted disk, and so, failing. The boot process left me in a “initramfs” console in which I didn’t know what to do.
I asked help from @luisgf, the system administrator of mipump.es (a Pump.io public server) and mijabber.es (an XMPP public server). We met via XMPP and with my “thinking aloud” and his patient listening and advice, we solved the problem, as you will see:
I tried to boot my rescue system (a complete system installed in different partitions in a different disk) and it booted. I tried then to manually unencrypt the encrypted disk (cryptsetup luksopen /dev/xxx), and it worked, and I could list the volume group and the volumes, and activate them, and mount the partitions. Yay! my (few) data was safe.
I rebooted and in the initramfs console I tried to do the same, but cryptsetup was not present in my initramfs.
Then I tried to boot in the old Wheezy kernel: it didn’t asked for the passphrase to unencrypt the disk, but in that initramfs console, cryptsetup was working well. So after manually unencrypt the system, activate the volumes and mount the partitions, I could exit the console and the system was booting #oleole!
So, how to tell the boot process to ask for the encryption password?
Maybe reinstalling the kernel was enough… I tried to reinstall the 3.16 kernel package. It (re)generated /boot/initrd.img-3.16.0-4-amd64 and then I restarted the system, and the problem was solved. It seems that the first time, the kernel didn’t generate the initrd image correctly, and I didn’t notice about that.
Well, problem solved. My system was booting again! No other boot problems and Jessie seemed to run perfectly. Thanks @luisgf for your help!
In addition to that, since then, my password has been accepted in every reboot, so it seems that the original problem is also gone.
A note on systemd
After all the noise of last months, I was a bit afraid that any of the different services that run on my system would not start with the migration to systemd.
I had no special tweaks, just two ‘handmade’ init scripts (for MediaGoblin, and for NoIP), but I didn’t write them myself (I just searched about systemd init scripts for the corresponding services), so if it was any problem there I was not sure that I could solve it. However, everything worked fine after the migration. Thanks Debian hackers to make this transition as smooth as possible!
My MediaGoblin was not working, and I was not sure why. Maybe it was just that I need to tune nginx or whatever, after the upgrade… But I was not going to spend time trying to know what part of the stack was the culprit, and my MediaGoblin sites were almost empty… So I decided to follow again the documentation and reinstall (maybe update would be enough, who knows). I reused the Debian user(s), the PostgreSQL users and databases, and the .ini files and nginx configuration files. So it was quick, and it worked.
I have updated my Jessie system several times since then (kernel updates, OpenSSL, PostgreSQL, and other security updates and RC bugs fixes, with the corresponding reboots or service restarts) and I didn’t experience the cryptsetup problem again. The system is working perfectly. I’m very happy.
Using dropbear to remotely provide the cryptsetup password
The last thing I made in my home server was setting up dropbear so I can remotely provide the encryption password, and then, remotely reboot my system. I followed this guide and it worked like a charm.
Some small annoyances and TODO list
- I have some warnings at boot. I think they are not important, but anyway, I post them here, and will try to figure out what do they mean:
[ 0.203617] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored [ 0.214828] ACPI: Dynamic OEM Table Load: [ 0.214841] ACPI: OEMN 0xFFFF880074642000 000624 (v01 AMD NAHP 00000001 INTL 20051117) [ 0.226879] \_SB_:_OSC evaluation returned wrong type [ 0.226883] _OSC request data:1 1f [ 0.227055] ACPI: Interpreter enabled [ 0.227062] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20140424/hwxface-580) [ 0.227067] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20140424/hwxface-580) [ 0.227070] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S3_] (20140424/hwxface-580) [ 0.227083] ACPI: (supports S0 S4 S5) [ 0.227085] ACPI: Using IOAPIC for interrupt routing [ 0.227298] HEST: Table parsing has been initialized. [ 0.227301] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
And this one
[ 1.635130] ERST: Failed to get Error Log Address Range. [ 1.645802] [Firmware Warn]: GHES: Poll interval is 0 for generic hardware error source: 1, disabled. [ 1.645894] GHES: APEI firmware first mode is enabled by WHEA _OSC.
And this one, about the 250GB disk (it came with the server, it’s not in the RAID):
[ 3.320913] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 3.321551] ata6.00: failed to enable AA (error_mask=0x1) [ 3.321593] ata6.00: ATA-8: VB0250EAVER, HPG9, max UDMA/100 [ 3.321595] ata6.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 3.322453] ata6.00: failed to enable AA (error_mask=0x1) [ 3.322502] ata6.00: configured for UDMA/100
- It would be nice to learn a bit about benchmarching tools and test my system with the nonfree VGA Radeon driver and without it.
- I need to setup an automated backup system…
A note about RAID
Some people commented about the benefits of the software RAID (mainly, not to depend on a particular, proprietary firmware, what happens if my motherboard dies and I cannot find a compatible replacement?).
Currenty I have a RAID 1 (mirror) using the capabilities of the motherboard.
The problem is that, frankly, I am not sure about how to migrate the current setup (BIOS RAID + cryptsetup + LVM + partitions) to the new setup (software RAID + cryptsetup + LVM + partitions, or better other order?).
- Would it be enough to make a Clonezilla backup of each partition, wipe my current setup, boot with the Debian installer, create the new setup (software RAID, cryptsetup, LVM and partitions), and after that, stop the installation, boot with Clonezilla and restore the partition images?
- Or even better, can I (safely) remove the RAID in the BIOS, boot in my system (let’s say, from the first disk), and create the software RAID with that 2nd disk that appeared after removing the BIOS RAID (this sounds a bit like science fiction, but who knows!).
- Is it important “when” or in which “layer” do I setup the software RAID?
As you see, lots of things to read/think/try… I hope I can find time for my home server more often!
You can comment on this pump.io thread.