It's been a few years since I posted about my [NAS upgrade ideas]({{< relref "/" >}}) and one of my major points was how to do regular off-site backups. > **TL;DR**: I use [Rclone](https://rclone.org/) with AWS S3 as a backend. Versioning enabled and cron setup in TrueNAS to run a sync with bandwidth limiting, performance+cost tuning, file filtering and logging of the output. A long time ago I would just manually do an off-site back up of my data to Google Drive, a laborious and time intensive process thanks to ADSL internet and the finicky nature of the Google Drive web interface. Not long before my original post about NAS upgrade ideas I migrated to ad-hoc runs of Rclone against key directories, using S3 as a backend for storage. But for something I only needed to do quarterly it was easy to forget for a year and then watch it spend hours catching up. Or remembering which NAS shares go in which bucket or subfolder and ending up with redundant copies scattered all over the place. I've been playing with offsite backups again lately and now have it running monthly via [cron](https://wiki.archlinux.org/title/cron) using the capable [TrueNAS](https://www.truenas.com/) web UI. For anyone who plans to do the same, what follows are my experiences and recommendations. Before I dive in, here's the command executed by cron: ``` rclone sync /mnt/pool-data s3:naslbs --filter-from /mnt/pool-data/conf/filter-lbs.txt -v --bwlimit 300k --size-only --fast-list --log-file /mnt/pool-data/conf/backup.log ``` ### Bandwidth Considerations I live in a country with an artificial scarcity of bandwidth thanks to years of mismanagement by [every elected party](https://dictionary.cambridge.org/dictionary/english/fuckwit). I get 18Mbps upload and in practice this works out to approximately 1.8MB/s to share across all the devices and servers running at home that either access the internet or can be accessed from it. I want to ensure that whenever scheduled backups take place, my NAS isn't hogging our [internet tube](https://rollcall.com/2018/02/16/flashback-friday-a-series-of-tubes/) and making the internet unusable for my [spousy-boo](https://studionyx.co/blog) or Mastodon server. Rclone instantly became one of my favourite pieces of software for it's simple `--bwlimit 300k` option. Regardless of how much data is queued for upload, I'm guaranteed that it won't impact our perceptual use of the connection. ### S3 Versioning We've turned versioning on for our S3 bucket, so anytime a file is changed it'll keep a copy of the old version. The main motivator for this is backup protection through a naive [Write Once Read Many (WORM)](https://en.wikipedia.org/wiki/Write_once_read_many) solution, if our files are accidentally deleted or encrypted by a ransomware infection and the cron task is run we won't lose our offsite backups too. In the modern age of live cloud backups this is critical as the original 3-2-1[^3-2-1] strategy usually included an offsite backup that was a weekly tape mailed offsite and why modern recommendations are for a [3-2-1-1-0 or 4-3-2 strategy](https://www.backblaze.com/blog/whats-the-diff-3-2-1-vs-3-2-1-1-0-vs-4-3-2/). Of course, if files are liable to change this isn't appropriate, so we save our NAS backups for files that are considered stable or unlikely to be changed. ### Limit Files Our NAS has a *lot* of stuff, about 8TB of the digital ephemera produced by work and life in the digital age. A lot of this doesn't need to be backed up, for now we're only focusing on: - Photos - Tax records - Personal documents - Finished projects (art, music, code) - Media where the original source has shut down Thankfully the `--filter-from file.txt` option in Rclone makes this a breeze. We start by adding exclusions for files and folders that are just OS-specific junk: ``` - .DS_Store - ._.DS_Store - Thumbs.db - **/.AppleDouble/* - .AppleDouble/* ``` Then we can go through and add the folders we want backed up, excluding subdirectories that don't match the criteria we set above and then finally we say "exclude everything else": ``` - projects/Assets/** - projects/in-progress/** + projects/** + learning/uni/** + pictures/** - * ``` In this case we're excluding an "Assets" folder because the original source for the files is still up and the files are large, uncompressed, and binary which would cost more for the little benefit to backing them up. If we find out the original service providing them is shutting down we want to ensure we [continue to own the things we "bought"](https://www.citationneeded.news/we-need-to-talk-about-digital-ownership/) and compressing and backing them up would be more important. We're also excluding our "in-progress" projects because the contents are subject to frequent change which would require the contents to be uploaded each run and also end up with many redundant copies as per the previous section on S3 versioning. For this kind of work we use have the projects stored in some form of version control and manage backups in a way that is relevant to that type of work. ### Optimisations And Enhancements We get verbose logging (`-v`) so we can check the results during and after, looking for any problems. We also log the output of each run with `--log-file file.txt`, this lets us check for any issues. A future update will probably push the logs to a local logging service and have a trigger on any WARN/ERROR levels to ping me with the deets. It's also pretty handy to go through and see just how much and how quickly each run has been. The last two flags are ones recommended by Rclone in it's [guide on S3 tuning](https://rclone.org/s3/#avoiding-head-requests-to-read-the-modification-time). We use [`--size-only`](https://rclone.org/s3/#avoiding-head-requests-to-read-the-modification-time) to reduce the number of HEAD requests made, saving a bunch of time and requests during backup runs. Any changes to the type of files we backup are likely to cause size changes too, which makes this an easy win. Finally we have [`--fast-list`](https://rclone.org/s3/#avoiding-head-requests-to-read-the-modification-time) which we use to reduce the number of requests, while this should save some money it's mostly about saving time when iterating over all our files. We found this "no change" sync times by about 90%. ### Final Notes Please institute backups at home (and I guess at work). Never underestimate how horrible it feels to lose years of precious memories (or shareholder value) when a hard disk carks it or gets written over because your housemate needed to install the latest Modern Warfare patch. Even just putting a copy onto an external hard drive is fine, but make sure to check it every 6 months to ensure there's no corruption. An untested backup is a calamity waiting to be discovered. If you want to do a low-budget offsite backup, find a friend who also needs to be back up their data and just fill and swap your external drives every 6 months. Also if anyone wants to ask why I don't have backups in another geographic region it's because if there's any incident big enough to wipe my NAS and the AWS datacentres than I have bigger problems to worry about (finding clean water after the apocalypse). [^3-2-1]: Strategy: 3 copies of the data, 2 on different media, 1 offsite.