Backups and Disaster Recovery: The Stuff You Skip Until You Can't

Everyone who self-hosts eventually has the same Tuesday. The database won't start. The docker volume is corrupted. The VPS provider had a hardware failure and your server is gone. The backup you thought was running hasn't actually run in three weeks because a cron job silently failed and you never checked. This article exists so your Tuesday is a minor inconvenience instead of a catastrophe. The difference is about two hours of setup and a script that runs while you sleep.

The 3-2-1 Rule and Why It Exists

The 3-2-1 backup rule has been around since before cloud computing existed, and it survived because it works. Three copies of your data. Two different storage media or locations. One copy offsite. The rule is a minimum, not a target — but meeting the minimum puts you ahead of most self-hosters, who have exactly one copy of everything sitting on one server in one datacenter.

In practice for a VPS self-hoster, 3-2-1 looks like this: your live data on the VPS (copy one), a backup on a different storage service at the same provider (copy two — better than nothing, worse than offsite), and a backup at a completely different provider in a different physical location (copy three — the one that saves you when everything else fails). The third copy is the one people skip. It's also the one that matters most.

The reason 3-2-1 matters specifically for self-hosters is that you don't have the redundancy that managed platforms build in. When you deploy on Vercel, your data lives in a managed database with automatic replication, point-in-time recovery, and a team responsible for durability. When you deploy on a Hetzner VPS, your data lives on a virtual disk attached to a single physical machine. If that machine has a hardware failure, your data is gone. Hetzner is explicit about this — they do not guarantee data durability on cloud VPS storage. [VERIFY: Hetzner's specific SLA language around VPS data durability — check their terms of service for current wording.] Your backups are your durability guarantee.

What to Back Up (And What Not To)

Not everything on your VPS needs to be backed up. Knowing what to include — and what to leave out — keeps your backups small, fast, and useful.

Back up: Database dumps (not the raw database files — dumps are portable and consistent), docker-compose files and environment variables (the configuration that defines your entire stack), application data volumes (uploads, user files, anything your apps store on disk), SSL certificates if you're not using Let's Encrypt auto-renewal, and any custom scripts or configuration files you've written. These are the things you can't regenerate. If they're gone, they're gone.

Don't back up: Container images (they're pulled from registries — just rebuild), the operating system itself (faster to reprovision a fresh VPS and restore your stack), log files unless you need them for compliance, and cached or temporary data. The distinction matters because backup storage costs money and backup jobs take time. A focused backup of your actual data might be 500MB. A full filesystem backup of a VPS might be 20GB. The 500MB version restores faster, costs less to store, and is easier to verify.

The most important item on the "back up" list is the database, and the most important thing about backing up a database is how you do it. Copying the raw PostgreSQL or MySQL data directory while the database is running is not a reliable backup — the files might be in an inconsistent state mid-write. Use pg_dump for PostgreSQL or mysqldump for MySQL. These tools produce a consistent snapshot regardless of what the database is doing at that moment. It's the difference between a backup and a hope.

The Tools That Work

Three approaches cover the vast majority of self-hosting backup needs. Pick the one that matches your comfort level.

Restic is the most commonly recommended backup tool in the self-hosting community, and for good reason. It handles encryption, deduplication (only storing the changes since the last backup, not a full copy every time), compression, and remote storage — all in a single binary with no dependencies. It supports Backblaze B2, Cloudflare R2, Amazon S3, SFTP, and local paths as backup destinations. A typical restic workflow: dump your database, point restic at your data directories, and let it figure out what changed since last time. Restores are straightforward — pick a snapshot, restore to a path.

Borgbackup does essentially the same thing as restic — encrypted, deduplicated, compressed backups — with a slightly different architecture. Borg uses a repository model and requires an agent on the backup destination if you're using SSH. It's been around longer than restic and has a devoted following. In practice, the choice between restic and borg comes down to which documentation you find clearer. Both work. Neither is wrong.

The simple approach: cron + pg_dump + rclone. If restic and borg feel like overkill, a shell script that dumps your database, tars your data volumes, and uses rclone to copy the result to cloud storage is perfectly adequate. Rclone speaks to every major cloud storage provider — S3, B2, R2, Google Cloud Storage, even Google Drive. The script might be 20 lines. It runs on a cron job. It works. The downside compared to restic is no deduplication — you're storing a full copy every time — but for small-to-medium datasets, the storage cost difference is negligible.

Where to Send Backups

The destination matters as much as the backup itself. The wrong destination turns a backup strategy into an expensive false sense of security.

Cloudflare R2 has become a popular backup destination because it's S3-compatible (most tools work with it out of the box) and charges zero egress fees. The free tier includes 10GB of storage, which is enough for many self-hosters' database dumps and configuration backups. Beyond that, storage is $0.015/GB/month. [VERIFY: Cloudflare R2 current pricing — free tier limits and per-GB rates.] The zero-egress pricing means restoring from R2 doesn't cost extra, which matters in a disaster scenario when you're pulling everything back down.

Backblaze B2 is the other common choice. Storage runs about $6/TB/month, and egress is free if you route through Cloudflare's bandwidth alliance. [VERIFY: Backblaze B2 current pricing and Cloudflare bandwidth alliance details.] B2 has been around longer than R2, has a solid track record for durability, and integrates with restic and rclone natively.

Hetzner Storage Boxes are cheap — a 1TB box costs around €3.81/month [VERIFY: current Hetzner Storage Box pricing] — and accessible via SFTP, SMB, or rsync. They're convenient if your VPS is already on Hetzner. But here's the critical caveat: backing up your Hetzner VPS to a Hetzner Storage Box means both your live data and your backup live at the same provider. A billing dispute, an account suspension, a provider-level outage, or a datacenter incident could take out both simultaneously. This is the "same provider trap," and it violates the "offsite" principle of 3-2-1.

The solution is straightforward: use your Hetzner Storage Box as copy two, and send copy three somewhere completely different — R2, B2, or a VPS at another provider. Two providers, two billing relationships, two physical locations. If either one fails, you still have your data.

Testing Restores: The Step Everyone Skips

A backup you've never restored from is a wish, not a strategy. The self-hosting community repeats this constantly, and the self-hosting community mostly ignores its own advice. The reason is simple — testing restores takes time, it's not urgent, and it feels unnecessary when the backups are running without errors. Then the day comes when you need the backup, and you discover the dumps are empty, or the encryption key is missing, or the restore process takes four hours instead of twenty minutes.

A quarterly restore test is the minimum that qualifies as responsible. The process: spin up a fresh VPS (Hetzner's hourly billing makes this cheap — a few cents for an hour of testing), pull your latest backup, restore it, verify that the database loads and the data is intact, verify that your docker-compose files bring up the full stack, and tear down the test VPS. The whole thing takes an hour if your backups work correctly. If they don't, you'll be grateful you found out during a test instead of during a real failure.

Document the restore process. Write down the exact commands, in order, that go from "fresh VPS" to "my stack is running with my data." This document is the most valuable thing in your backup strategy — because when you actually need it, you'll be stressed, possibly sleep-deprived, and in no condition to figure out restore procedures from scratch.

A Real Backup Script

For the cron-and-shell-script crowd, here's what a functional backup workflow looks like. A nightly cron job runs a script that does the following, in order: dumps PostgreSQL databases using pg_dump, creates a tarball of application data volumes and docker-compose files, runs restic to back up the dumps and tarballs to Cloudflare R2 (encrypted and deduplicated), runs restic to prune old snapshots based on a retention policy (keep the last 7 daily, 4 weekly, and 6 monthly snapshots), and logs the result. If any step fails, the script sends a notification — a webhook to a Discord channel or a simple email via an external SMTP service.

The retention policy matters. Without pruning, your backup storage grows forever. With aggressive pruning, you might delete the one backup you need. A sensible default: keep 7 daily snapshots, 4 weekly snapshots, and 6 monthly snapshots. This gives you granular recovery for the past week, weekly recovery for the past month, and monthly recovery for the past six months. Restic and borg both handle retention policies natively — you define the rules once and they prune automatically.

The notification matters too. A backup script that fails silently is worse than no script at all — it gives you false confidence. The minimum viable monitoring is a notification on failure. Better monitoring includes a notification on success (so you notice when notifications stop, which means the script isn't running). Best-case monitoring is a dead man's switch service like Healthchecks.io (free tier: 20 checks) [VERIFY: Healthchecks.io current free tier limits] that alerts you if the backup doesn't check in on schedule.

The Moment You'll Be Glad You Did This

There is no way to make backup configuration exciting. It is maintenance work that pays off in a scenario you hope never happens. The self-hosting subreddits are full of stories from people who lost everything — months of data, entire projects, irreplaceable databases — because they skipped this step or half-implemented it. The stories from people with working backups are less dramatic: "My VPS died. I spun up a new one, restored from R2, and was back online in 45 minutes." That's the story you want.

Two hours of setup. A cron job. A quarterly test. That's the difference between self-hosting as a reliable infrastructure strategy and self-hosting as an accident waiting for a trigger.


This is part of CustomClanker's Self-Hosting series — the honest cost of running it yourself.