Backups

Of course I hope I never need a backup, but just in case...

This is probably for everyone the most boring topic. Backups are meant to never be used, it's a bother to set up, and there are lots of things to think about. Especially for that last reason I'd better document how I set it up.

The purpose of making a backup

I would like to protect myself for these cases:

  1. Hardware failure, e.g. a disk crash. I have never had that happen yet
  2. Human error: when I delete something, or make an error editing something.

Currently I do not protect against a site failure, where all computers are no longer available.

Principle

The basic principle is that all valuable data should be stored on two disks, preferably on two computers.

Laptops

The most precious data is stored on a few laptops. They're all running Linux, so a simple daily rsync of /home to my home server suffices to make sure that the data is at least available on two disks. In the past I also had Windows computers, it is more bothersome to automate this daily copying. Probably my lack of skills.

Home server

My home server has multiple disks, all formatted with btrfs. I have set up the disks to mount always a subvolume for "normal" use, and also the whole disk so that I can easily make snapshots separately.

The relevant part of /etc/fstab looks as follows:

LABEL=SAMSSSD500 / btrfs subvol=rootfs,auto,noatime 0 1
LABEL=SAMSSSD500 /mnt/SAMSSSD500 btrfs auto,noatime 0 0

LABEL="HGST1000" /mnt/HGST1000 btrfs auto,noatime 0 0
LABEL="HGST1000" /data btrfs subvol=data,auto,noatime 0 0

And mounted it looks like this:

# ls /mnt/SAMSSSD500
rootfs  snaps
# ls /mnt/HGST1000
data  snaps

So there are two disks:

  1. labeled 'SAMSSSD500` with a subvolume 'rootfs', used as linux root file system.
  2. labeled 'HGST1000', with a subvolume 'data', under which there are directories for backup..

Both disks have a directory 'snaps' under the root.

Safe guarding the home server's root disk

This can again be done with a simple rsync scriptt:

#!/bin/bash
SEMOPHORE_FILE="/tmp/backup_running.sema"
MAX_TIME=3600

echo "Start of backup"

# Check for stale semaphore file
if [ -f "${SEMOPHORE_FILE}" ]; then
    read SEM_TIME < "${SEMOPHORE_FILE}"
    RUNNING_TIME=$((`date "+%s"` - ${SEM_TIME}))
    if [[ ${RUNNING_TIME} -gt ${MAX_TIME} ]] ; then
        echo "Removing stale semaphore file ${SEMOPHORE_FILE}."
        rm ${SEMOPHORE_FILE}
    else
        echo "Other backup still running, ${RUNNING_TIME} seconds, leaving semaphore file ${SEMOPHORE_FILE} in place."
    fi
fi

# Make an rsync copy when there is no other rsync running
if [ ! -f "${SEMOPHORE_FILE}" ]; then
    date "+%s" > "${SEMOPHORE_FILE}"
    for d in boot etc home var
    do
        /usr/bin/rsync -avx --exclude-from="${HOME}/bin/backup-exclude.lst" --delete-excluded  --delete "/${d}" /data/backup/current/hf/local
    done
    rm "${SEMOPHORE_FILE}"
fi
echo "Backup done"

The scipt protects against running multiple times by using a sempahore file. It also cleans up stale semaphore files. The main function is of course to rsync from one disk to the other, and then only the relevant directories /boot, /etc, /home, and /var. It uses an 'exclude file' so that not too much is copied.

Sample entries of the exclude file are:

home/hf/.cache
home/hf/.config/chromium
home/hf/.kodi/userdata/Thumbnails
home/hf/Downloads
home/hf/downloads
var/cache/
var/db/
var/lib/
var/log/
var/spool/

Making snaphots

Just making copies on multiple disks is of course not good enough. We also need to be able to go back in time. And we need to do some house keeping: we cannot keep copies of everything until eternity.

There are many utilities that provide such functionalities, and they all have their own constraints and pecularities. Many only work with a specific disk set up.

Subsnap, a small script that I developed, aims to do away with that, and be really simple to use. It does not assume any disk layout, and lets sysadmins easily configure the amount of daily, weekly, monthly, or yearly backups that (s)he wants to retain. It does assume a naming convention for the backups: basename + label + date.

It can be called from a cronjob:

#Mins Hours Days Months Day-of-the week
10  6  *  *  *  /root/bin/snapfs /mnt/SAMSSSD500/rootfs /mnt/SAMSSSD500/snaps daily 8
#11  6  *  *  1  /root/bin/snapfs /mnt/SAMSSSD500/rootfs /mnt/SAMSSSD500/snaps weekly 14
#12  6  1  *  *  /root/bin/snapfs /mnt/SAMSSSD500/rootfs /mnt/SAMSSSD500/snaps monthly 13

20  5  *  *  *  /root/bin/snapfs /mnt/HGST1000/data /mnt/HGST1000/snaps daily 15
21  5  *  *  1  /root/bin/snapfs /mnt/HGST1000/data /mnt/HGST1000/snaps weekly 14
22  5  1  *  *  /root/bin/snapfs /mnt/HGST1000/data /mnt/HGST1000/snaps monthly 13
23  5  1  1  *  /root/bin/snapfs /mnt/HGST1000/data /mnt/HGST1000/snaps yearly 5

The top entry specifies that every day at 6:10 AM a snapshot labeled 'daily' is made from /mnt/SAMSSSD500/rootfs, which will be stored in /mnt/SAMSSSD500/snaps, and that it retains just eight copies. For today the result would be a snapshot at /mnt/SAMSSSD500/snaps/rootfs-daily-2023-02-02T06:10+01:00. I currently do not see the need to retain weekly or monthly copies of the root file system, because they are kept on the backup disk already.

For the data / backup disk I retain 15 daily (two weeks worth plus one), 14 weekly (three months plus one), 13 monthly (one year plus one), and five yearly copies.

Pages