Files
local-deep-research/docs/security/database-backup.md
LearningCircuit 819fafe8c2 feat: add automatic database backup system (#3006)
* feat: add automatic database backup system with review fixes

Adds encrypted database backups triggered on login, based on PR #2565
with critical fixes from code review applied.

New backup module:
- BackupService: encrypted backups via sqlcipher_export(), atomic
  rename, per-user locking, disk space validation, backup verification
- BackupScheduler: singleton with ThreadPoolExecutor (max 2 workers),
  non-blocking background backup, atexit shutdown
- Configurable via settings: backup.enabled, backup.max_count (3),
  backup.max_age_days (7)

Review fixes applied (not in original PR):
- Add PRAGMA busy_timeout = 10000 to prevent instant failure on
  concurrent writer lock contention
- Use settings defaults (or 3/7) instead of raising ValueError when
  backup settings are missing (djpetti's review feedback)
- Integrate into _perform_post_login_tasks background thread pattern
- Add stale .tmp file cleanup in _cleanup_old_backups
- Fix stat() TOCTOU in cleanup loop with FileNotFoundError handling
- Enforce directory permissions with os.chmod after mkdir
- Use safe_close() instead of bare .close() in finally blocks
- Fix .gitignore to not ignore backup source code

Includes 94 tests (4523 lines) and security documentation.

* fix: update key derivation API and add crash recovery tests

- Replace _get_key_from_password (private, old 1-arg API) with
  get_key_from_password (public, with db_path for per-DB salt)
  to match current main's key derivation interface
- Add 3 end-to-end crash recovery tests using real SQLCipher:
  1. Full round-trip: backup, delete original, open backup, verify
     all rows and integrity_check pass
  2. Wrong password rejection: backup can't be decrypted with wrong key
  3. Encryption verification: backup file has no plaintext SQLite header
- Tests skip when SQLCipher is not installed (CI Docker image has it)

* feat: purge old-key backups on password change + 9 new tests

Security fix: after a password change, old backups remain encrypted
with the old (potentially compromised) password. Per NIST SP 800-57,
OWASP A02, and patterns from VeraCrypt/Bitwarden/Signal, old backups
should be purged and replaced with a fresh backup using the new key.

Changes:
- Add BackupService.purge_and_refresh() method that deletes all
  existing backups and creates a fresh one with the current password
- Integrate into change_password route (auth/routes.py)
- Add empty-file check to _verify_backup (0-byte files were passing)
- Add gitleaks allowlist entry for auth/routes.py

New tests (9):
- TestPasswordChangeBackupSecurity (3 real SQLCipher tests)
- TestBackupCorruptionDetection (3 real SQLCipher tests)
- TestBackupRetentionEnforcement (3 mocked tests)

* test: rewrite crash recovery test with correct SQLCipher connection API

Fixes from 6-agent verification round:
- Use create_sqlcipher_connection() instead of manual connect+key+pragmas
- Wrap wrong-password checks in pytest.raises around connection factory
- Add @pytest.mark.timeout(120) for CI stability
- Add encryption header check for fresh backup after purge_and_refresh
- Use inline patches, fix docstring step count

* test: add 15 more backup system tests

New test classes:
- TestBackupDiskSpaceAndAtomicity (3): missing source DB, atomic
  rename pattern, size_bytes accuracy
- TestBackupFilePermissionsExtended (1): backup file 0o600 mode
- TestPurgeAndRefreshEdgeCases (6): no existing backups, multiple
  old backups, .tmp cleanup, list ordering, get_latest edge cases
- TestBackupServiceInitValidation (3): boundary values for
  max_backups, max_age_days, empty username

* feat: reduce backup defaults and add pre-migration backup

- Change max_backups default from 3 to 2 and max_age_days from 7 to 2
  to reduce disk usage for databases with large PDF BLOBs while keeping
  a safety net against corruption overwriting the only backup.

- Add synchronous pre-migration backup in open_user_database() that
  triggers before Alembic migrations run. Only fires when
  needs_migration() returns True (version upgrades), not on every login.
  Backup failure is logged as error but does not block migration.

* fix: use get_setting default parameter for backup.enabled

The expression `sm.get_setting("backup.enabled") or True` always
evaluates to True (False or True == True), making it impossible for
users to disable backups. Use the get_setting default parameter
instead, which is the established pattern throughout the codebase.

* fix: address review findings from 6-round 30-agent review

Critical fixes:
- Fix _verify_backup() salt mismatch: pass db_path=self.db_path to
  set_sqlcipher_key so backup verification uses the correct per-database
  salt instead of the legacy salt. Without this, all v2 database backups
  fail verification and are silently deleted.
- Fix purge_and_refresh() race condition: hold per-user lock for the
  entire purge+create operation to prevent a concurrent backup from
  writing an old-key backup between purge and fresh backup creation.
- Fix DETACH not in finally: wrap DETACH DATABASE in its own finally
  block so the attached backup file is always released even if
  sqlcipher_export() raises. Remove no-op conn.commit() after DETACH.

Important fixes:
- Fix _cleanup_old_backups/list_backups/get_latest_backup TOCTOU: use
  safe_mtime helper that catches FileNotFoundError in sort key lambda.
- Fix list_backups timezone: use tz=UTC consistent with codebase.
- Fix get_backup_scheduler() thread safety: remove redundant module-
  level singleton; rely on thread-safe __new__.
- Fix docs: replace VACUUM INTO with sqlcipher_export() throughout.
- Fix test_no_raw_sql.py: add backup_service.py to skip list.
- Fix test readonly dir: skip when running as root in Docker.

* fix: address djpetti review + add 6 high-value backup tests

Review feedback (djpetti):
- Restore max_age_days default to 7 (2 days was too aggressive — a
  weekend gap would delete all backups)
- Replace `or 2`/`or 7` fallbacks with `get_setting(key, default)`
  which is the established codebase pattern (30+ uses)
- Keep max_backups=2 for disk space savings

New integration tests (real SQLCipher, in test_backup_crash_recovery_ci.py):
- test_backup_preserves_all_schema_objects: compare sqlite_master
- test_backup_passes_foreign_key_check: PRAGMA foreign_key_check
- test_restored_backup_accepts_new_writes: INSERT/UPDATE + durability

New unit tests (mocked, in test_backup_service.py):
- test_backup_created_when_migration_needed
- test_no_backup_when_no_migration_needed
- test_migration_proceeds_when_backup_raises

* feat: limit backups to one per calendar day to prevent corruption propagation

A corrupted database that overwrites all backups via rapid login cycles
is the primary risk for a 2-backup rotation. Now create_backup() skips
if a backup with today's date prefix already exists in the backup dir.

Exceptions that always create a backup regardless:
- Pre-migration backups (force=True) — schema changes are the highest
  risk moment and must always have a safety net
- purge_and_refresh() on password change — calls _create_backup_impl()
  directly, bypassing the daily check (security requirement)

* fix: sort daily backup glob + wrap DETACH in try/except

- Use max(existing_today, key=lambda p: p.name) instead of
  existing_today[0] for the daily backup limit check, since glob()
  returns results in arbitrary filesystem order.
- Wrap DETACH DATABASE in try/except inside the finally block to
  prevent masking the original sqlcipher_export exception if DETACH
  also fails.

* fix: check purge_and_refresh result instead of logging unconditional success

The return value of svc.purge_and_refresh() was discarded, so a failed
fresh backup after password change logged "Backups refreshed" falsely.
Now checks result.success and logs error if backup creation failed,
making it visible that the user has zero backups after purge.

* test: add daily backup limit tests + add missing warning log

New tests (TestDailyBackupLimit, 3 tests):
- test_skips_when_backup_exists_for_today: verify create_backup skips
  when a backup with today's date already exists
- test_force_bypasses_daily_limit: verify force=True enters
  _create_backup_impl even when today's backup exists
- test_proceeds_normally_for_different_day: verify yesterday's backup
  doesn't trigger the daily skip

Also: add logger.warning for failed .tmp file deletion in
purge_and_refresh (was silently swallowed with bare except pass).

* docs: add disk space warning and disable instructions to backup settings

Update backup.enabled description to mention disk usage and how to
disable. Update docs with clearer disk space guidance noting that
backups can be disabled via settings if space is limited.

* fix: reduce default max_backups from 2 to 1

Encrypted backups cannot be compressed (AES-256 has maximum entropy),
so each backup equals the full database size. With large databases
containing PDFs (100s of MB), keeping 2 backups doubles disk usage.

The daily backup limit already prevents the corruption-overwrite
scenario that was the original justification for 2 backups. Users
who want extra safety can increase max_backups in settings.

* feat: add backup status warnings to research page

Add two dismissable warnings to the existing warning system:
- "Database Backups Disabled" when backup.enabled is False
- "No Backups Found" when enabled but none exist yet

Uses the existing warning_checks infrastructure (yellow alert boxes
on the research page). Backup check uses a lightweight filesystem
glob — no password or encryption needed.

Removes flash-based approach from login (research page doesn't
render flash messages).
2026-03-26 10:52:29 +01:00

3.9 KiB

Database Backup System

Local Deep Research includes an automatic database backup system that creates encrypted backups of your user database after each successful login.

Overview

  • Automatic: Backups run in the background after login without blocking the UI
  • Encrypted: Backups use the same encryption as your main database
  • Safe: Uses SQLCipher's sqlcipher_export() for atomic backups that work correctly with WAL mode
  • Configurable: Enable/disable and configure retention via settings
  • Pre-migration: A backup is automatically created before any database schema migration

How It Works

  1. When you log in successfully, a background backup is scheduled
  2. Only one backup per calendar day is created — subsequent logins the same day are skipped to prevent a corrupted database from overwriting all good backups
  3. The backup runs in a separate thread (non-blocking)
  4. Uses sqlcipher_export() to create an encrypted copy preserving all cipher settings
  5. Old backups are automatically cleaned up based on your retention settings
  6. Before database migrations, a backup is always created regardless of the daily limit

Backup Location

Backups are stored in:

{data_directory}/encrypted_databases/backups/{user_hash}/

Where {user_hash} is the first 16 hex characters of the SHA-256 hash of your username.

Each backup file is named with a timestamp:

ldr_backup_20250125_143022.db

Settings

Configure backup behavior in Settings > Backup:

Setting Default Description
Enable Auto-Backup true Enable/disable automatic backups on login. Disable if disk space is limited.
Max Backups 1 Maximum number of backup files to keep (1-30)
Backup Retention (days) 7 Delete backups older than this many days

Note: Each backup is a full encrypted copy of your database and cannot be compressed. With the default of 1 backup, disk usage equals your database size. Users with large databases (e.g., containing uploaded PDFs) should monitor disk usage and can reduce the backup count or disable backups entirely via the Enable Auto-Backup setting if disk space is limited.

Why sqlcipher_export()?

We use SQLCipher's sqlcipher_export() instead of VACUUM INTO or simple file copy because:

  1. Encryption Preservation: VACUUM INTO does not preserve SQLCipher encryption settings. sqlcipher_export() correctly copies data while maintaining the same encryption key and cipher configuration
  2. WAL Safety: Regular file copy can corrupt databases using WAL (Write-Ahead Logging) mode
  3. Atomic Operation: The backup uses ATTACH + export + DETACH, and is written to a temporary file then atomically renamed
  4. Integrity Verification: Each backup is verified with PRAGMA quick_check before being finalized

Restoring from Backup

To restore from a backup:

  1. Stop the application
  2. Locate your backup in the backup directory
  3. Copy it to replace your current database file (keep the .salt file alongside it)
  4. Restart the application

Important: The backup uses the same password as when it was created. If you've changed your password since the backup, you'll need to use the old password to access it.

Troubleshooting

Backups not being created

  1. Check if backups are enabled in Settings
  2. Check the logs for backup-related errors
  3. Verify sufficient disk space (requires 2x database size)

Disk space issues

The system checks for available disk space before creating a backup. If you see "Insufficient disk space" errors:

  1. Free up disk space
  2. Reduce the max backup count setting
  3. Reduce the retention days setting

Backup verification failed

If you see "Backup verification failed" in logs, the backup may be corrupted. This can happen if:

  1. The disk ran out of space during backup
  2. There was a system crash during backup
  3. The source database is corrupted

In this case, the corrupted backup file is automatically deleted.