The Brief Death and Resurrection of the Sollarsphere

For those who are unaware, the Sollarsphere is my dumb name for my Pleroma instance at social.ctsollars.com. social.ctsollars.com. Recently it suffered some extended downtime. Luckily, I was able to restore it. What follows is a brief post-mortem of the incident. If you like dry, technical reads, please enjoy.

Death by Updates

Up until the incident, social.ctsollars.com ran on a Raspberry Pi 4. The Pi itself sits in a cute little powered case designed to hold an SSD, which serves as the boot device and the root filesystem for the server.

Everything was humming along just fine until one day I decided to perform some long-postponed system updates. The updates themselves seemed to go without issue until I rebooted. Immediately I noticed that something was wrong. After several minutes none of the hosted services seemed to be available via the web browser and ssh was unresponsive. When this happens, it’s time to roll up the sleeves and hook the machine up to a monitor and a keyboard.

Initially booting the OS as installed failed. My suspicion was that somehow the system updates had interfered with the boot configuration, causing the Pi to no longer boot via the SSD, which was connected by USB. Normally the way to resolve this issue is to insert a MicroSD card with a bootable OS installed on it and power up the device. By default, the Pi will prefer to boot via MicroSD, if it is available. Thus Plan A began to take shape: Boot into the Pi with an alternative OS and reestablish the boot configuration such that the Pi will boot via SSD. It quickly became clear, unfortunately, that in the months since I had used my MicroSD card (which also served as a backup form my software stack and configuration) had become corrupt and was not longer usable in any way. With SSD boot no longer working, and my one correctly configured MicroSD being corrupt, Plan A took a detour. I now had to build a new bootable MicroSD.

Soon I had a my Pi booting into good ol’ vanilla Raspberry Pi OS. Pretty quickly it became apparent that something was terribly wrong. Neither my keyboard nor my mouse seemed to do anything. After swapping out several sets of keyboards and mouse (and a little Googling…) I reluctantly came to the conclusion that the USB ports on the Pi had died. This happens sometimes, apparently. Thus died Plan A.

Migration to another Home

With the USB ports being dead on the Pi, Plan B consisted of the next simplest solution: attempt to get one of my Raspberry Pi 3s to boot to the SSD. The tradeoff here would be ease of migration but with a loss of performance. For an application like Pleroma, which is touted as being very lightweight, the performance loss should not be too big of a problem. A little research, however, revealed that I don’t have a Pi 3 in my possession which supports USB boot. Thus died Plan B.

Plan C was the last possible hope of recovering the information stored on my SSD: attempt to clone the setup from the SSD onto another server. This plan actually breaks down into several distinct parts:

Copy the PostgreSql data and configuration
Copy the Pleroma files and configuration
Copy the Nginx configuration

Believe it or not, Plan B actually worked, but just barely. Here’s how it went down.

Moving the Database

The plan for moving the database consisted of 2 basic steps:

Install PostgreSql on the target machine
Replace the data files at /var/lib/postgres with the files from the Pi’s SSD

Step one went well. Step two gave me problems. Since the Raspberry Pi is an ARM architecture, many of the compilation flags were different on the target machine. This would mean that in order for my plan to work, I would have to recompile Postgres with all of the options that the Raspberry Pi’s version had, and even then it probably wouldn’t work.

The workaround was to execute steps 1.A and 1.B on an extra Raspberry Pi, generate a snapshot, and load that snapshot onto the target machine. Thus:

Install Postgres on a substitute Raspberry Pi
Replace the data files at /var/lib/postgres with the files from the Pi’s SSD
Generate a snapshot file of the Pleroma database
Install install Postgres on the target machine
Load the generated snapshot into a new database on the target machine

Steps 1.1 – 1.3 actually went pretty smoothly. Since the architectures and OS packages were nearly identical, all of the files were in the same place on both machines: /var/lib/postgres. One complexity with Postgres over MySql is the requirement to run the psql command as the user postgres user. Aside from that, there was a slight discrepancy with the UIDs of the files, but the whole process could be summarized with the following commands:

# service postgresql stop
# rsync -a /mnt/external_ssd/var/lib/postgres/ /var/lib/postgres
# chown -R postgres /var/lib/postgres
# service postgresql start
# su postgres -s $SHELL -c pg_dump pleroma > /mnt/external_ssd/postgres.pqsl

Steps 1.4 and 1.5 is pretty straightforward, but differs slightly from the command that a MySql administrator might be used to seeing:

# apt-get install postgresql
# su postgres -s $SHELL -c psql
postgres=# CREATE DATABASE pleroma;
postgres=# \q
# su postgres -s $SHELL -c psql -d pleroma < /mnt/external_ssd/pleroma.psql

To be honest, I actually struggled pretty heavily to arrive at the above list of commands. In reality I had actually performed steps 1 and 2 at the same time and that confused things. One result was that I kept running into issues by running my postgres commands as the pleroma user, which resulted in many frustrating errors. But as soon as I imported my snapshot with the postgres user, it worked like a charm.

Moving Pleroma

Moving Pleroma was pretty simple, but consisted of a lot of fiddly little steps. Pleroma’s file base is taken from github directly. It runs on Erlang and has a package / dependency manager called Mix. It runs by default on port 4000 and runs as its own user which has its own user directory. Additionally, it is designed to be started and run as SystemD service, for which the Pleroma team has provided configuration files. So the transfer process basically breaks down into the following steps, some of which were taken directly from the official installation instructions:

Install Pleroma dependencies
Copy base files / git repository
Refresh system specific Erlang dependencies
Create a new pleroma user
Copy pleroma configs from /var/lib/pleroma
Add a Postgres User for Pleroma
Set up SystemD service for Pleroma

Step 2.1 was basically taken directly from the official docs:

# apt install curl unzip libncurses5 postgresql postgresql-contrib nginx certbot libmagic-dev
# apt install imagemagick ffmpeg libimage-exiftool-perl

Step 2.2 was pretty obvious but, in reality, step 2.3 I learned toward the end of the process. Upon starting the Pleroma process I was greeted with a few errors related to dependency versions. Luckily it was pretty easy to fix. Steps 2.2 and 2.3 together look like this:

# mkdir /opt/pleroma
# rsync -a /mnt/external_ssd/opt/pleroma /opt/pleroma
# cd /opt/pleroma
# rm -Rf deps
# mix deps.mix

Next we add our Pleroma user. Again, copying directly from the docs:

# adduser --system --shell  /bin/false --home /opt/pleroma pleroma

Then we’ll copy our pleroma lib files, whatever those are for. We’ll also make sure all of our permissions are correct for the user added during the previous step:

# cp -r /mnt/external_ssd/var/lib/pleroma /var/lib/
# chown -R pleroma /opt/pleroma
# chown -R pleroma /var/lib/pleroma

For step 2.6 we’ll just run the script generated from the original installation process (as shown in the official installation docs). This will basically set up our Pleroma database user and set it up to use the correct password and give it access to the correct database.

# su postgres -s $SHELL -lc "psql -f /opt/pleroma/config/setup_db.psql"

We’ll take the last step from the official docs as well:

# cp /opt/pleroma/installation/pleroma.service /etc/systemd/system/pleroma.service
# systemctl start pleroma
# systemctl enable pleroma

As boring as that was to read through, it was even more boring and time-consuming to figure out how all of that fit together. It’s not over yet, though. At this point in the process, I still have yet to move the Nginx configuration and do some work on the router.

Configuring Nginx

The only thing about setting up Nginx that isn’t completely obvious at first is that the SSL certificates and keys must be moved over as well. Since I’m moving over the certificates, I decided I may as well move the whole Certbot configuration as well, so that my certificates continue to auto-renew. Thus, step 3 breaks down into 3 parts:

Install Certbot
Overwrite vanilla certbot /etc folder
Install Nginx
Add configuration from source server

This was actually the part that went quickly and as expected. No real roadblocks or lessons learned.

# apt-get install certbot
# rsync -a /mnt/external_ssd/etc/letsencrypt/ /etc/letsencrypt/
# apt-get install nginx
# cp /mnt/external_ssd/etc/nginx/sites-available/pleroma.conf /etc/nginx/sites-available/
# ln -s /etc/nginx/sites-available/pleroma.conf /etc/nginx/sites-enabled/
# nginx -t
# service nginx reload

At this point, everything was basically working properly. There were some minor issues with the startup of Pleroma that I neither remember, nor want to get into. At this point all of the pieces were in place, though, and now the Sollarsphere has a new home on a much more robust server. Pleroma now has a its disposal 12 cores of CPU power and 32 GB of RAM, not to mention much more storage space. After some configuration on my router for port forwarding, social.ctsollars.com is back in operation. In review, here are the steps that were followed:

Copy the PostgreSql data and configuration
1. Install Postgres on a substitute Raspberry Pi
2. Replace the data files at /var/lib/postgres with the files from the Pi’s SSD
3. Generate a snapshot file of the Pleroma database
4. Install install Postgres on the target machine
5. Load the generated snapshot into a new database on the target machine
Copy the Pleroma files and configuration
1. Install Pleroma dependencies
2. Copy base files / git repository
3. Refresh system specific Erlang dependencies
4. Create a new pleroma user
5. Copy pleroma configs from /var/lib/pleroma
6. Add a Postgres User for Pleroma
7. Set up SystemD service for Pleroma
Copy the Nginx configuration
1. Install Certbot
2. Overwrite vanilla certbot /etc folder
3. Install Nginx
4. Add configuration from source server

No wonder I put the project off for so long.

Takeaways

There are two takeaways from this whole incident. First and foremost, a system of automated backups should be set up for any system that needs to be available in the long term. Second, if minimizing downtime is an absolute priority, redundancy is a must. For my purposes, redundancy seems like overkill for my personal network. However, I plan to set up automated backups for the Sollarsphere and that will probably turn into a future article.