mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
* feat: add Alembic migrations for database schema management Replace manual migration system with Alembic for proper schema versioning. This is a clean rebase of PR #1534, porting only the genuinely new Alembic infrastructure and adapting it to main's current codebase. New files: - alembic_runner.py: programmatic migration runner with security checks - migrations/: Alembic env + 2 migration versions (initial schema, progress cols) - test_alembic_migrations.py: comprehensive test suite (143 tests) Modified: - initialize.py: replace _run_migrations/_add_column_if_not_exists with Alembic - test_initialize_functions.py: remove tests for deleted functions - test_database_initialization.py: users table is auth-only, skip in per-user DBs - pyproject.toml: add alembic~=1.17 dependency Closes #644, closes #1108 Supersedes #1534 * test: add backward compatibility tests for Alembic migration upgrade path Tests simulate real-world upgrade scenarios: - Pre-Alembic database (created with old create_all) upgrades cleanly - Old database missing progress columns gets them via migration 0002 - All user data (settings, research history, tasks) survives upgrade - Upgrade + re-initialize is idempotent (no data loss, no table changes) - Schema of upgraded legacy DB matches freshly created Alembic DB - Users table (auth-only) excluded from fresh DBs but preserved in legacy - Sequential migration path (0001 → data → 0002) works correctly * test: close integration gaps for Alembic migration CI coverage - Add encrypted DB + Alembic integration tests verifying create_user_database and open_user_database produce databases with alembic_version at head revision - Add import smoke test for migration modules - Add migration-tests job to backwards-compatibility workflow (release-gate only, skipped on PRs) - Expand backwards-compatibility paths trigger to include alembic_runner.py, migrations/**, and test_alembic_migrations.py - Wire backwards-compatibility workflow into release-gate as reusable workflow call with summary reporting * feat: add 0003 migration for research table indexes Add migration 0003 that creates 9 performance indexes on research_tasks and research_history tables, matching PR #2015's model-level declarations. Ensures existing databases get indexes that new databases create via create_all(). Uses existence checks and if_not_exists for idempotency. Includes 26 tests covering upgrade, downgrade, idempotency, data preservation, and edge cases. Updates existing test assertions for new head revision (0003). * test: add deep integration tests for Alembic migration machinery Prove migration machinery works beyond table/column name checks: - Operations API can create/drop tables on migrated engine - Migration 0002 adds columns with correct nullable/defaults to old schema - ORM CRUD (Create/Read/Update/Delete) works after initialize_database() - Downgrade/upgrade roundtrip preserves column properties + ORM works * feat: add warning logs when database migrations are applied Log at WARNING level (not just INFO) when migrations change the schema: - Before migration: warns if database has no history or is outdated - After migration: warns with the revision transition (e.g. 0001 -> 0002) Schema migrations on user databases are significant events that operators should notice in logs without needing DEBUG level enabled. * test: add 6 migration safety guard tests Add TestMigrationSafetyGuards class with structural checks that catch common Alembic migration pitfalls: 1. Schema drift detection (compare_metadata vs ORM models) 2. Single head revision enforcement (no branch conflicts) 3. Stairway up-down-up per revision (parametrized) 4. Substantive downgrade verification (AST-parsed) 5. All models registered on metadata (pkgutil walk) 6. No residual tables after downgrade (parametrized) Total: 10 test items (3 parametrized × 2 + 4 standalone). * test: add 5 targeted migration safety tests Add deterministic schema, importability, downgrade data-loss, env.py offline guard, and revision-ID/filename match tests. * feat: add 0004 migration to move legacy app_settings keys to Alembic Adds migration 0004_migrate_legacy_app_settings that renames 17 legacy settings keys in the app_settings table to their canonical names from default_settings.json. This replaces the runtime re-scope blocks that were previously in settings_routes.py. - Uses parameterized SQL queries for safety - Handles missing app_settings table gracefully - Downgrade is intentional no-op (deleted keys have no consumers) - 19 dedicated tests covering all mappings, idempotency, edge cases - Updates head assertions across existing test files (0003→0004) * fix: make migration failures visible instead of silently swallowing them run_migrations() was catching all exceptions and returning False, which every caller ignored. Failed migrations left the database needing schema changes with no indication to the user, causing cryptic errors later. Now run_migrations() re-raises the original exception after logging. The database is safe — engine.begin() auto-rolls back the transaction. Callers in encrypted_db.py catch and log at ERROR level (upgraded from WARNING) with clear context about retry-on-next-login behavior. * fix: 3 migration bugs — env.py engine fallback, 0004 missing guard, race condition 1. env.py: Remove engine fallback that bypassed transaction safety by opening a new connection outside the caller's engine.begin() block. Now requires connection via config.attributes["connection"] only. 2. 0004 migration: Add has_table("settings") guard matching 0002/0003 pattern — prevents OperationalError on databases missing the table. 3. encrypted_db: Move self.connections[username] store to AFTER initialize_database() completes in both create_user_database() and open_user_database(), preventing other threads from seeing a mid-migration database. Also removes dead config.attributes["engine"] assignment in alembic_runner.py and updates corresponding test assertion. * fix: resolve ruff S103 and mypy errors in alembic PR - Add noqa: S103 to intentional os.chmod(0o666) in permission validation test - Fix mypy errors in initialize.py by typing schema_info as dict[str, Any] * chore: update env.py docstring to reflect connection-only design * fix: address deep review findings — misleading logs, CI gaps, missing tests 1. encrypted_db.py: Fix misleading "retried on next login" log messages. The cached engine prevents retry; corrected to "next process restart". 2. backwards-compatibility.yml: Add test_migration_0003_indexes.py and test_migration_0004_app_settings.py to both path triggers and the migration-tests job. Pin harden-runner to v2.16.0 (was stale v2.14.2). 3. 0003 migration: Fix docstring that incorrectly claimed indexes "match model-level declarations" — they exist only in the migration. 4. test_alembic_migrations.py: Add two tests for recent bug fixes: - test_env_online_mode_requires_connection (RuntimeError guard) - test_migration_0004_skips_without_settings_table (has_table guard) * fix: replace silent except pass with logger.debug in 0001 downgrade Pre-commit hook 'check-silent-exceptions' requires at least a logger.debug() call instead of bare 'except Exception: pass'.
166 lines
5.5 KiB
Python
166 lines
5.5 KiB
Python
"""Test database initialization module"""
|
|
|
|
import os
|
|
import tempfile
|
|
from pathlib import Path
|
|
import pytest
|
|
from sqlalchemy import create_engine, inspect
|
|
from sqlalchemy.orm import sessionmaker
|
|
|
|
from local_deep_research.database.initialize import (
|
|
initialize_database,
|
|
check_database_schema,
|
|
)
|
|
from local_deep_research.database.models import (
|
|
Setting,
|
|
)
|
|
|
|
|
|
class TestDatabaseInitialization:
|
|
"""Test the centralized database initialization"""
|
|
|
|
@pytest.fixture
|
|
def temp_db(self):
|
|
"""Create a temporary database for testing"""
|
|
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
|
|
db_path = tmp.name
|
|
|
|
engine = create_engine(f"sqlite:///{db_path}")
|
|
yield engine, db_path
|
|
|
|
# Cleanup
|
|
engine.dispose()
|
|
if Path(db_path).exists():
|
|
os.unlink(db_path)
|
|
|
|
def test_initialize_core_tables(self, temp_db):
|
|
"""Test that core tables are created correctly"""
|
|
engine, db_path = temp_db
|
|
|
|
# Initialize database
|
|
Session = sessionmaker(bind=engine)
|
|
with Session() as session:
|
|
initialize_database(engine, session)
|
|
|
|
# Check that core tables exist
|
|
inspector = inspect(engine)
|
|
tables = inspector.get_table_names()
|
|
|
|
# Verify essential tables (users is auth-only, not in per-user DBs)
|
|
expected_core_tables = [
|
|
"settings",
|
|
"research",
|
|
"research_history",
|
|
"journals",
|
|
"app_logs", # Correct table name
|
|
"queued_researches", # Correct table name
|
|
"search_cache",
|
|
"token_usage",
|
|
"research_ratings",
|
|
]
|
|
|
|
for table in expected_core_tables:
|
|
assert table in tables, f"Table '{table}' should exist"
|
|
|
|
def test_initialize_with_news_tables(self, temp_db):
|
|
"""Test that news tables are created when requested"""
|
|
engine, db_path = temp_db
|
|
|
|
# Initialize database with news tables
|
|
Session = sessionmaker(bind=engine)
|
|
with Session() as session:
|
|
initialize_database(engine, session)
|
|
|
|
# Check schema
|
|
schema_info = check_database_schema(engine)
|
|
|
|
# News tables might not be available if the news module isn't imported
|
|
# Just check that initialization doesn't fail
|
|
assert len(schema_info["tables"]) > 0
|
|
assert len(schema_info["missing_tables"]) == 0
|
|
|
|
def test_idempotent_initialization(self, temp_db):
|
|
"""Test that initialization is idempotent (can be run multiple times)"""
|
|
engine, db_path = temp_db
|
|
Session = sessionmaker(bind=engine)
|
|
|
|
# Initialize twice
|
|
with Session() as session:
|
|
initialize_database(engine, session)
|
|
|
|
# Get initial table count
|
|
inspector = inspect(engine)
|
|
initial_tables = set(inspector.get_table_names())
|
|
|
|
# Initialize again
|
|
with Session() as session:
|
|
initialize_database(engine, session)
|
|
|
|
# Check that no duplicate tables were created
|
|
final_tables = set(inspector.get_table_names())
|
|
assert initial_tables == final_tables
|
|
|
|
def test_check_database_schema(self, temp_db):
|
|
"""Test the schema checking function"""
|
|
engine, db_path = temp_db
|
|
|
|
# Check schema before initialization
|
|
schema_info = check_database_schema(engine)
|
|
assert len(schema_info["tables"]) == 0
|
|
assert len(schema_info["missing_tables"]) > 0
|
|
|
|
# Initialize database
|
|
Session = sessionmaker(bind=engine)
|
|
with Session() as session:
|
|
initialize_database(engine, session)
|
|
|
|
# Check schema after initialization
|
|
schema_info = check_database_schema(engine)
|
|
assert len(schema_info["tables"]) > 0
|
|
assert len(schema_info["missing_tables"]) == 0
|
|
|
|
def test_settings_initialization(self, temp_db):
|
|
"""Test that settings can be initialized"""
|
|
engine, db_path = temp_db
|
|
|
|
# Initialize database with settings
|
|
Session = sessionmaker(bind=engine)
|
|
with Session() as session:
|
|
initialize_database(engine, session)
|
|
|
|
# Check if settings table is queryable
|
|
count = session.query(Setting).count()
|
|
assert count >= 0 # Should not raise an error
|
|
|
|
def test_partial_table_creation(self, temp_db):
|
|
"""Test that initialization completes even with existing tables"""
|
|
engine, db_path = temp_db
|
|
|
|
# Create only one table manually (Setting has no foreign keys)
|
|
from local_deep_research.database.models import Setting
|
|
|
|
Setting.__table__.create(engine)
|
|
|
|
# Verify only that table exists
|
|
inspector = inspect(engine)
|
|
initial_tables = inspector.get_table_names()
|
|
assert "settings" in initial_tables
|
|
assert len(initial_tables) == 1 # Only settings table
|
|
|
|
# Initialize database (should create all missing tables)
|
|
Session = sessionmaker(bind=engine)
|
|
with Session() as session:
|
|
initialize_database(engine, session)
|
|
|
|
# Verify many more tables now exist (need fresh inspector)
|
|
final_inspector = inspect(engine)
|
|
final_tables = final_inspector.get_table_names()
|
|
assert "settings" in final_tables # Original table still there
|
|
assert len(final_tables) > 20 # Many more tables created
|
|
|
|
# Verify some key tables
|
|
assert "research" in final_tables
|
|
assert "journals" in final_tables
|
|
assert "app_logs" in final_tables
|
|
assert "token_usage" in final_tables
|