Backup & Disaster Recovery
VeloCMS runs nightly backups of the PocketBase database to Cloudflare R2. The backup pipeline is a scheduled script that authenticates against the PocketBase backup API, creates a named snapshot, downloads the SQLite file, and uploads it to a dedicated R2 bucket with a 30-day retention lifecycle rule. A failed backup run exits with code 1, which Railway cron monitors and alerts on.
Backup architecture
| Component | Technology | Details |
|---|---|---|
| Database backup | PocketBase backup API + R2 upload | scripts/backup-pb.mjs, runs 02:00 UTC daily |
| Media backup | R2 native replication | Cloudflare R2 cross-region replication (manual setup in dashboard) |
| Retention | R2 lifecycle rule | 30 days for database backups, media retained indefinitely |
| Alerting | Railway cron exit-code monitoring | exit 1 triggers Railway incident notification |
| Schedule | Railway cron or GitHub Actions | .github/workflows/nightly-backup.yml |
Nightly backup script
The backup script at scripts/backup-pb.mjs handles the full pipeline: authenticate against PocketBase as superuser, create a named backup, download the SQLite file, upload to R2 with a date-stamped key. The script is idempotent — if a backup with today's date already exists in PocketBase (exit code 409), it downloads and uploads the existing one rather than failing.
# PocketBase credentials
POCKETBASE_URL=https://pb.velocms.org
POCKETBASE_ADMIN_EMAIL=<YOUR_ADMIN_EMAIL>
POCKETBASE_ADMIN_PASSWORD=<YOUR_ADMIN_PASSWORD>
# R2 backup bucket credentials (separate from media bucket)
CLOUDFLARE_R2_ACCOUNT_ID=<YOUR_ACCOUNT_ID>
CLOUDFLARE_R2_ACCESS_KEY_ID=<YOUR_BACKUP_KEY_ID>
CLOUDFLARE_R2_SECRET_ACCESS_KEY=<YOUR_BACKUP_SECRET>
BACKUP_R2_BUCKET=velocms-backups # default if not set# Run backup now (production)
POCKETBASE_URL=https://pb.velocms.org \
[email protected] \
POCKETBASE_ADMIN_PASSWORD=yourpassword \
CLOUDFLARE_R2_ACCOUNT_ID=abc123 \
CLOUDFLARE_R2_ACCESS_KEY_ID=key123 \
CLOUDFLARE_R2_SECRET_ACCESS_KEY=secret123 \
node scripts/backup-pb.mjs
# Expected output:
# [backup] Created PB backup: 2026-04-29.db
# [backup] Downloaded 4194304 bytes from PocketBase
# [backup] Uploaded pb/2026-04-29.db to velocms-backups (4.0 MB)
# [backup] DoneR2 backup bucket structure
Database backups land in a dedicated R2 bucket (velocms-backups) separate from the media bucket (velocms-media). This separation means a media storage incident cannot corrupt or exhaust the backup bucket. The key structure is pb/YYYY-MM-DD.db for database snapshots. With 30-day retention, the bucket holds at most 30 files at steady state.
velocms-backups/
pb/
2026-04-29.db # today
2026-04-28.db # yesterday
2026-04-27.db # ...
... # up to 30 daysRestore procedure
To restore a PocketBase database from backup, download the target .db file from R2 and import it via the PocketBase admin UI. Full restore replaces all data including user accounts, posts, and settings. Do not restore a multi-tenant master database over a running instance without first taking a fresh backup of the current state.
- Download the target backup from R2: aws s3 cp s3://velocms-backups/pb/2026-04-28.db ./restore-2026-04-28.db --endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com
- Stop the PocketBase service in Railway (Settings → Deploy → put the service in maintenance mode or scale to 0)
- Open PocketBase Admin UI → Settings → Backups → Restore from file
- Upload the .db file and confirm restore
- Verify data integrity: check post counts, user counts, and the most recent audit log entry
- Restart the PocketBase service
- Run production smoke test: node scripts/production-smoke-test.mjs
Restore drill schedule
A backup that has never been tested is not a backup — it is a hope. Run a restore drill quarterly by downloading the most recent backup, restoring it to a local or staging PocketBase instance, and verifying that the data is readable and complete. Record the drill date and result in wiki/log.md. A drill that fails (corrupt file, missing records, schema mismatch) is a P0 incident, not a footnote.
# 1. Download latest backup from R2
aws s3 cp s3://velocms-backups/pb/$(date +%Y-%m-%d).db ./drill.db \
--endpoint-url https://<ACCOUNT_ID>.r2.cloudflarestorage.com
# 2. Start a local PocketBase instance with the backup
./pocketbase serve --dir ./drill-data
# 3. Import via PocketBase Admin UI (localhost:8090/_/)
# Settings → Backups → Restore from file → upload drill.db
# 4. Spot-check:
curl http://localhost:8090/api/collections/posts/records?page=1 \
-H "Authorization: Bearer <admin_token>"
# Expected: JSON with totalItems > 0, no 500 errorsDisaster recovery runbook
| Scenario | First action | Second action | SLA target |
|---|---|---|---|
| PocketBase container crash | Railway auto-restarts container (< 60s) | Check logs in Railway dashboard, run smoke test | < 5 minutes |
| Corrupted PocketBase database | Take current backup (even if corrupted), restore from last clean R2 backup | Run smoke test, notify affected tenants | < 2 hours |
| R2 outage (media unavailable) | Media images return 5xx; posts still readable. No action required — Cloudflare resolves R2 incidents. | Monitor Cloudflare status page, communicate ETA to tenants | Cloudflare SLA |
| Next.js service down | Railway auto-restarts (< 60s). If loop: check last deploy log for build errors. | Roll back to previous deployment in Railway dashboard | < 10 minutes |
| Accidental mass deletion of posts | Do not write any new data. Restore from last night's backup immediately. | Identify root cause before re-enabling write operations | < 4 hours |
| Railway region outage | Monitor Railway status page. No action — Railway handles region failover. | After recovery, run production smoke test | Railway SLA |
GDPR Article 17 — tenant deletion residue
When a tenant requests account deletion, VeloCMS deletes the tenant row, all associated posts, media records, and member records from PocketBase. However, R2 objects (the actual media files) require a separate sweep because PocketBase deletion only removes the record, not the binary. The tenant deletion flow calls a cleanup job that enumerates all R2 keys under the tenant's slug prefix and deletes them.
The gap to be aware of: daily backups contain a point-in-time snapshot of the database. A backup taken before deletion will include the deleted tenant's data in the SQLite file. Backups are retained for 30 days. If a tenant exercises their right to erasure (GDPR Art. 17), acknowledge in writing that full erasure from backups takes up to 30 days as older snapshots age out. Document this timeline in your privacy policy.
Monitoring backup health
The nightly backup job exits with code 0 on success and code 1 on any failure. Railway cron job monitoring converts a non-zero exit into an alert. Additionally, the admin dashboard can surface backup status by listing R2 keys in the velocms-backups bucket and comparing the most recent key's date to today. A backup older than 36 hours is a warning; older than 48 hours is an alert.
Reference
| File / Resource | Purpose |
|---|---|
| scripts/backup-pb.mjs | Nightly backup script — PocketBase API → R2 upload |
| .github/workflows/nightly-backup.yml | GitHub Actions schedule (alternative to Railway cron) |
| scripts/production-smoke-test.mjs | Post-restore smoke test — verifies core endpoints |
| R2 Lifecycle Rules | Cloudflare Dashboard → R2 → velocms-backups → Settings → Object Lifecycle |