mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
* feat: add automatic database backup system with review fixes Adds encrypted database backups triggered on login, based on PR #2565 with critical fixes from code review applied. New backup module: - BackupService: encrypted backups via sqlcipher_export(), atomic rename, per-user locking, disk space validation, backup verification - BackupScheduler: singleton with ThreadPoolExecutor (max 2 workers), non-blocking background backup, atexit shutdown - Configurable via settings: backup.enabled, backup.max_count (3), backup.max_age_days (7) Review fixes applied (not in original PR): - Add PRAGMA busy_timeout = 10000 to prevent instant failure on concurrent writer lock contention - Use settings defaults (or 3/7) instead of raising ValueError when backup settings are missing (djpetti's review feedback) - Integrate into _perform_post_login_tasks background thread pattern - Add stale .tmp file cleanup in _cleanup_old_backups - Fix stat() TOCTOU in cleanup loop with FileNotFoundError handling - Enforce directory permissions with os.chmod after mkdir - Use safe_close() instead of bare .close() in finally blocks - Fix .gitignore to not ignore backup source code Includes 94 tests (4523 lines) and security documentation. * fix: update key derivation API and add crash recovery tests - Replace _get_key_from_password (private, old 1-arg API) with get_key_from_password (public, with db_path for per-DB salt) to match current main's key derivation interface - Add 3 end-to-end crash recovery tests using real SQLCipher: 1. Full round-trip: backup, delete original, open backup, verify all rows and integrity_check pass 2. Wrong password rejection: backup can't be decrypted with wrong key 3. Encryption verification: backup file has no plaintext SQLite header - Tests skip when SQLCipher is not installed (CI Docker image has it) * feat: purge old-key backups on password change + 9 new tests Security fix: after a password change, old backups remain encrypted with the old (potentially compromised) password. Per NIST SP 800-57, OWASP A02, and patterns from VeraCrypt/Bitwarden/Signal, old backups should be purged and replaced with a fresh backup using the new key. Changes: - Add BackupService.purge_and_refresh() method that deletes all existing backups and creates a fresh one with the current password - Integrate into change_password route (auth/routes.py) - Add empty-file check to _verify_backup (0-byte files were passing) - Add gitleaks allowlist entry for auth/routes.py New tests (9): - TestPasswordChangeBackupSecurity (3 real SQLCipher tests) - TestBackupCorruptionDetection (3 real SQLCipher tests) - TestBackupRetentionEnforcement (3 mocked tests) * test: rewrite crash recovery test with correct SQLCipher connection API Fixes from 6-agent verification round: - Use create_sqlcipher_connection() instead of manual connect+key+pragmas - Wrap wrong-password checks in pytest.raises around connection factory - Add @pytest.mark.timeout(120) for CI stability - Add encryption header check for fresh backup after purge_and_refresh - Use inline patches, fix docstring step count * test: add 15 more backup system tests New test classes: - TestBackupDiskSpaceAndAtomicity (3): missing source DB, atomic rename pattern, size_bytes accuracy - TestBackupFilePermissionsExtended (1): backup file 0o600 mode - TestPurgeAndRefreshEdgeCases (6): no existing backups, multiple old backups, .tmp cleanup, list ordering, get_latest edge cases - TestBackupServiceInitValidation (3): boundary values for max_backups, max_age_days, empty username * feat: reduce backup defaults and add pre-migration backup - Change max_backups default from 3 to 2 and max_age_days from 7 to 2 to reduce disk usage for databases with large PDF BLOBs while keeping a safety net against corruption overwriting the only backup. - Add synchronous pre-migration backup in open_user_database() that triggers before Alembic migrations run. Only fires when needs_migration() returns True (version upgrades), not on every login. Backup failure is logged as error but does not block migration. * fix: use get_setting default parameter for backup.enabled The expression `sm.get_setting("backup.enabled") or True` always evaluates to True (False or True == True), making it impossible for users to disable backups. Use the get_setting default parameter instead, which is the established pattern throughout the codebase. * fix: address review findings from 6-round 30-agent review Critical fixes: - Fix _verify_backup() salt mismatch: pass db_path=self.db_path to set_sqlcipher_key so backup verification uses the correct per-database salt instead of the legacy salt. Without this, all v2 database backups fail verification and are silently deleted. - Fix purge_and_refresh() race condition: hold per-user lock for the entire purge+create operation to prevent a concurrent backup from writing an old-key backup between purge and fresh backup creation. - Fix DETACH not in finally: wrap DETACH DATABASE in its own finally block so the attached backup file is always released even if sqlcipher_export() raises. Remove no-op conn.commit() after DETACH. Important fixes: - Fix _cleanup_old_backups/list_backups/get_latest_backup TOCTOU: use safe_mtime helper that catches FileNotFoundError in sort key lambda. - Fix list_backups timezone: use tz=UTC consistent with codebase. - Fix get_backup_scheduler() thread safety: remove redundant module- level singleton; rely on thread-safe __new__. - Fix docs: replace VACUUM INTO with sqlcipher_export() throughout. - Fix test_no_raw_sql.py: add backup_service.py to skip list. - Fix test readonly dir: skip when running as root in Docker. * fix: address djpetti review + add 6 high-value backup tests Review feedback (djpetti): - Restore max_age_days default to 7 (2 days was too aggressive — a weekend gap would delete all backups) - Replace `or 2`/`or 7` fallbacks with `get_setting(key, default)` which is the established codebase pattern (30+ uses) - Keep max_backups=2 for disk space savings New integration tests (real SQLCipher, in test_backup_crash_recovery_ci.py): - test_backup_preserves_all_schema_objects: compare sqlite_master - test_backup_passes_foreign_key_check: PRAGMA foreign_key_check - test_restored_backup_accepts_new_writes: INSERT/UPDATE + durability New unit tests (mocked, in test_backup_service.py): - test_backup_created_when_migration_needed - test_no_backup_when_no_migration_needed - test_migration_proceeds_when_backup_raises * feat: limit backups to one per calendar day to prevent corruption propagation A corrupted database that overwrites all backups via rapid login cycles is the primary risk for a 2-backup rotation. Now create_backup() skips if a backup with today's date prefix already exists in the backup dir. Exceptions that always create a backup regardless: - Pre-migration backups (force=True) — schema changes are the highest risk moment and must always have a safety net - purge_and_refresh() on password change — calls _create_backup_impl() directly, bypassing the daily check (security requirement) * fix: sort daily backup glob + wrap DETACH in try/except - Use max(existing_today, key=lambda p: p.name) instead of existing_today[0] for the daily backup limit check, since glob() returns results in arbitrary filesystem order. - Wrap DETACH DATABASE in try/except inside the finally block to prevent masking the original sqlcipher_export exception if DETACH also fails. * fix: check purge_and_refresh result instead of logging unconditional success The return value of svc.purge_and_refresh() was discarded, so a failed fresh backup after password change logged "Backups refreshed" falsely. Now checks result.success and logs error if backup creation failed, making it visible that the user has zero backups after purge. * test: add daily backup limit tests + add missing warning log New tests (TestDailyBackupLimit, 3 tests): - test_skips_when_backup_exists_for_today: verify create_backup skips when a backup with today's date already exists - test_force_bypasses_daily_limit: verify force=True enters _create_backup_impl even when today's backup exists - test_proceeds_normally_for_different_day: verify yesterday's backup doesn't trigger the daily skip Also: add logger.warning for failed .tmp file deletion in purge_and_refresh (was silently swallowed with bare except pass). * docs: add disk space warning and disable instructions to backup settings Update backup.enabled description to mention disk usage and how to disable. Update docs with clearer disk space guidance noting that backups can be disabled via settings if space is limited. * fix: reduce default max_backups from 2 to 1 Encrypted backups cannot be compressed (AES-256 has maximum entropy), so each backup equals the full database size. With large databases containing PDFs (100s of MB), keeping 2 backups doubles disk usage. The daily backup limit already prevents the corruption-overwrite scenario that was the original justification for 2 backups. Users who want extra safety can increase max_backups in settings. * feat: add backup status warnings to research page Add two dismissable warnings to the existing warning system: - "Database Backups Disabled" when backup.enabled is False - "No Backups Found" when enabled but none exist yet Uses the existing warning_checks infrastructure (yellow alert boxes on the research page). Backup check uses a lightweight filesystem glob — no password or encryption needed. Removes flash-based approach from login (research page doesn't render flash messages).
93 lines
3.9 KiB
Markdown
93 lines
3.9 KiB
Markdown
# Database Backup System
|
|
|
|
Local Deep Research includes an automatic database backup system that creates encrypted backups of your user database after each successful login.
|
|
|
|
## Overview
|
|
|
|
- **Automatic**: Backups run in the background after login without blocking the UI
|
|
- **Encrypted**: Backups use the same encryption as your main database
|
|
- **Safe**: Uses SQLCipher's `sqlcipher_export()` for atomic backups that work correctly with WAL mode
|
|
- **Configurable**: Enable/disable and configure retention via settings
|
|
- **Pre-migration**: A backup is automatically created before any database schema migration
|
|
|
|
## How It Works
|
|
|
|
1. When you log in successfully, a background backup is scheduled
|
|
2. Only one backup per calendar day is created — subsequent logins the same day are skipped to prevent a corrupted database from overwriting all good backups
|
|
3. The backup runs in a separate thread (non-blocking)
|
|
4. Uses `sqlcipher_export()` to create an encrypted copy preserving all cipher settings
|
|
5. Old backups are automatically cleaned up based on your retention settings
|
|
6. Before database migrations, a backup is always created regardless of the daily limit
|
|
|
|
## Backup Location
|
|
|
|
Backups are stored in:
|
|
```
|
|
{data_directory}/encrypted_databases/backups/{user_hash}/
|
|
```
|
|
|
|
Where `{user_hash}` is the first 16 hex characters of the SHA-256 hash of your username.
|
|
|
|
Each backup file is named with a timestamp:
|
|
```
|
|
ldr_backup_20250125_143022.db
|
|
```
|
|
|
|
## Settings
|
|
|
|
Configure backup behavior in Settings > Backup:
|
|
|
|
| Setting | Default | Description |
|
|
|---------|---------|-------------|
|
|
| **Enable Auto-Backup** | `true` | Enable/disable automatic backups on login. Disable if disk space is limited. |
|
|
| **Max Backups** | `1` | Maximum number of backup files to keep (1-30) |
|
|
| **Backup Retention (days)** | `7` | Delete backups older than this many days |
|
|
|
|
**Note**: Each backup is a full encrypted copy of your database and cannot be compressed. With the default of 1 backup, disk usage equals your database size. Users with large databases (e.g., containing uploaded PDFs) should monitor disk usage and can reduce the backup count or disable backups entirely via the **Enable Auto-Backup** setting if disk space is limited.
|
|
|
|
## Why sqlcipher_export()?
|
|
|
|
We use SQLCipher's `sqlcipher_export()` instead of `VACUUM INTO` or simple file copy because:
|
|
|
|
1. **Encryption Preservation**: `VACUUM INTO` does not preserve SQLCipher encryption settings. `sqlcipher_export()` correctly copies data while maintaining the same encryption key and cipher configuration
|
|
2. **WAL Safety**: Regular file copy can corrupt databases using WAL (Write-Ahead Logging) mode
|
|
3. **Atomic Operation**: The backup uses ATTACH + export + DETACH, and is written to a temporary file then atomically renamed
|
|
4. **Integrity Verification**: Each backup is verified with `PRAGMA quick_check` before being finalized
|
|
|
|
## Restoring from Backup
|
|
|
|
To restore from a backup:
|
|
|
|
1. Stop the application
|
|
2. Locate your backup in the backup directory
|
|
3. Copy it to replace your current database file (keep the `.salt` file alongside it)
|
|
4. Restart the application
|
|
|
|
**Important**: The backup uses the same password as when it was created. If you've changed your password since the backup, you'll need to use the old password to access it.
|
|
|
|
## Troubleshooting
|
|
|
|
### Backups not being created
|
|
|
|
1. Check if backups are enabled in Settings
|
|
2. Check the logs for backup-related errors
|
|
3. Verify sufficient disk space (requires 2x database size)
|
|
|
|
### Disk space issues
|
|
|
|
The system checks for available disk space before creating a backup. If you see "Insufficient disk space" errors:
|
|
|
|
1. Free up disk space
|
|
2. Reduce the max backup count setting
|
|
3. Reduce the retention days setting
|
|
|
|
### Backup verification failed
|
|
|
|
If you see "Backup verification failed" in logs, the backup may be corrupted. This can happen if:
|
|
|
|
1. The disk ran out of space during backup
|
|
2. There was a system crash during backup
|
|
3. The source database is corrupted
|
|
|
|
In this case, the corrupted backup file is automatically deleted.
|