Commit Graph

3 Commits

Author SHA1 Message Date
LearningCircuit
4fde036dfd feat: alembic (#2348)
* feat: add Alembic migrations for database schema management

Replace manual migration system with Alembic for proper schema versioning.
This is a clean rebase of PR #1534, porting only the genuinely new Alembic
infrastructure and adapting it to main's current codebase.

New files:
- alembic_runner.py: programmatic migration runner with security checks
- migrations/: Alembic env + 2 migration versions (initial schema, progress cols)
- test_alembic_migrations.py: comprehensive test suite (143 tests)

Modified:
- initialize.py: replace _run_migrations/_add_column_if_not_exists with Alembic
- test_initialize_functions.py: remove tests for deleted functions
- test_database_initialization.py: users table is auth-only, skip in per-user DBs
- pyproject.toml: add alembic~=1.17 dependency

Closes #644, closes #1108
Supersedes #1534

* test: add backward compatibility tests for Alembic migration upgrade path

Tests simulate real-world upgrade scenarios:
- Pre-Alembic database (created with old create_all) upgrades cleanly
- Old database missing progress columns gets them via migration 0002
- All user data (settings, research history, tasks) survives upgrade
- Upgrade + re-initialize is idempotent (no data loss, no table changes)
- Schema of upgraded legacy DB matches freshly created Alembic DB
- Users table (auth-only) excluded from fresh DBs but preserved in legacy
- Sequential migration path (0001 → data → 0002) works correctly

* test: close integration gaps for Alembic migration CI coverage

- Add encrypted DB + Alembic integration tests verifying
  create_user_database and open_user_database produce databases
  with alembic_version at head revision
- Add import smoke test for migration modules
- Add migration-tests job to backwards-compatibility workflow
  (release-gate only, skipped on PRs)
- Expand backwards-compatibility paths trigger to include
  alembic_runner.py, migrations/**, and test_alembic_migrations.py
- Wire backwards-compatibility workflow into release-gate as
  reusable workflow call with summary reporting

* feat: add 0003 migration for research table indexes

Add migration 0003 that creates 9 performance indexes on research_tasks
and research_history tables, matching PR #2015's model-level declarations.

Ensures existing databases get indexes that new databases create via
create_all(). Uses existence checks and if_not_exists for idempotency.

Includes 26 tests covering upgrade, downgrade, idempotency, data
preservation, and edge cases. Updates existing test assertions for
new head revision (0003).

* test: add deep integration tests for Alembic migration machinery

Prove migration machinery works beyond table/column name checks:
- Operations API can create/drop tables on migrated engine
- Migration 0002 adds columns with correct nullable/defaults to old schema
- ORM CRUD (Create/Read/Update/Delete) works after initialize_database()
- Downgrade/upgrade roundtrip preserves column properties + ORM works

* feat: add warning logs when database migrations are applied

Log at WARNING level (not just INFO) when migrations change the schema:
- Before migration: warns if database has no history or is outdated
- After migration: warns with the revision transition (e.g. 0001 -> 0002)

Schema migrations on user databases are significant events that operators
should notice in logs without needing DEBUG level enabled.

* test: add 6 migration safety guard tests

Add TestMigrationSafetyGuards class with structural checks that catch
common Alembic migration pitfalls:

1. Schema drift detection (compare_metadata vs ORM models)
2. Single head revision enforcement (no branch conflicts)
3. Stairway up-down-up per revision (parametrized)
4. Substantive downgrade verification (AST-parsed)
5. All models registered on metadata (pkgutil walk)
6. No residual tables after downgrade (parametrized)

Total: 10 test items (3 parametrized × 2 + 4 standalone).

* test: add 5 targeted migration safety tests

Add deterministic schema, importability, downgrade data-loss,
env.py offline guard, and revision-ID/filename match tests.

* feat: add 0004 migration to move legacy app_settings keys to Alembic

Adds migration 0004_migrate_legacy_app_settings that renames 17 legacy
settings keys in the app_settings table to their canonical names from
default_settings.json. This replaces the runtime re-scope blocks that
were previously in settings_routes.py.

- Uses parameterized SQL queries for safety
- Handles missing app_settings table gracefully
- Downgrade is intentional no-op (deleted keys have no consumers)
- 19 dedicated tests covering all mappings, idempotency, edge cases
- Updates head assertions across existing test files (0003→0004)

* fix: make migration failures visible instead of silently swallowing them

run_migrations() was catching all exceptions and returning False, which
every caller ignored. Failed migrations left the database needing schema
changes with no indication to the user, causing cryptic errors later.

Now run_migrations() re-raises the original exception after logging.
The database is safe — engine.begin() auto-rolls back the transaction.
Callers in encrypted_db.py catch and log at ERROR level (upgraded from
WARNING) with clear context about retry-on-next-login behavior.

* fix: 3 migration bugs — env.py engine fallback, 0004 missing guard, race condition

1. env.py: Remove engine fallback that bypassed transaction safety by
   opening a new connection outside the caller's engine.begin() block.
   Now requires connection via config.attributes["connection"] only.

2. 0004 migration: Add has_table("settings") guard matching 0002/0003
   pattern — prevents OperationalError on databases missing the table.

3. encrypted_db: Move self.connections[username] store to AFTER
   initialize_database() completes in both create_user_database() and
   open_user_database(), preventing other threads from seeing a
   mid-migration database.

Also removes dead config.attributes["engine"] assignment in
alembic_runner.py and updates corresponding test assertion.

* fix: resolve ruff S103 and mypy errors in alembic PR

- Add noqa: S103 to intentional os.chmod(0o666) in permission validation test
- Fix mypy errors in initialize.py by typing schema_info as dict[str, Any]

* chore: update env.py docstring to reflect connection-only design

* fix: address deep review findings — misleading logs, CI gaps, missing tests

1. encrypted_db.py: Fix misleading "retried on next login" log messages.
   The cached engine prevents retry; corrected to "next process restart".

2. backwards-compatibility.yml: Add test_migration_0003_indexes.py and
   test_migration_0004_app_settings.py to both path triggers and the
   migration-tests job. Pin harden-runner to v2.16.0 (was stale v2.14.2).

3. 0003 migration: Fix docstring that incorrectly claimed indexes "match
   model-level declarations" — they exist only in the migration.

4. test_alembic_migrations.py: Add two tests for recent bug fixes:
   - test_env_online_mode_requires_connection (RuntimeError guard)
   - test_migration_0004_skips_without_settings_table (has_table guard)

* fix: replace silent except pass with logger.debug in 0001 downgrade

Pre-commit hook 'check-silent-exceptions' requires at least a
logger.debug() call instead of bare 'except Exception: pass'.
2026-03-21 15:35:31 +01:00
LearningCircuit
0c6635ecc2 feat: Add pre-commit hook to enforce pathlib usage (issue #640) (#656)
* feat: Add pre-commit hook to enforce pathlib usage (issue #640)

- Created check-pathlib-usage.py pre-commit hook using AST parsing
- Detects os.path usage and suggests pathlib alternatives
- Fixed os.path.normpath usage in auth/routes.py to use PurePosixPath
- Added hook configuration to .pre-commit-config.yaml

The hook provides helpful suggestions for replacing os.path calls with
their pathlib equivalents for better cross-platform compatibility.

Co-Authored-By: djpetti <djpetti@users.noreply.github.com>

* feat: Add missing pathlib pre-commit hook script

Co-Authored-By: djpetti <djpetti@users.noreply.github.com>

* refactor: Migrate core src modules from os.path to pathlib

- Fixed web/app_factory.py, config/llm_config.py, metrics/token_counter.py
- Fixed utilities/es_utils.py, web/routes/benchmark_routes.py
- Fixed web/routes/settings_routes.py, web_search_engines/engines/search_engine_local.py
- Replaced os.path.join() with Path() / syntax
- Replaced os.path.exists() with Path().exists()
- Replaced os.path.basename() with Path().name
- Replaced os.path.dirname() with Path().parent

Part of the migration to modern pathlib API for better cross-platform
compatibility and cleaner code.

Co-Authored-By: djpetti <djpetti@users.noreply.github.com>

* refactor: Migrate from os.path to pathlib in src and tests (issue #640)

Replaced os.path usage with pathlib.Path throughout:
- src/local_deep_research/benchmarks: All os.path.join, exists, dirname, basename, abspath replaced
- tests directory: Complete migration of all test files
- Improved cross-platform compatibility and code readability
- Kept os.path.expandvars in env_settings.py (no pathlib equivalent)

Part of pre-commit hook enforcement for pathlib usage.
Remaining work: examples/ and scripts/ directories.

Co-Authored-By: djpetti

* fix: Complete migration from os.path to pathlib.Path (issue #640)

Completed manual migration of all os.path usage to pathlib.Path across:
- scripts/ directory (3 files)
- examples/ directory (25 files total)
  - examples/benchmarks/ (8 files)
  - examples/optimization/ (16 files)
  - examples/show_env_vars.py
- src/local_deep_research/settings/env_settings.py

Changes made:
- Replaced os.path.join() with Path() / syntax
- Replaced os.path.exists() with Path().exists()
- Replaced os.path.dirname() with Path().parent
- Replaced os.path.basename() with Path().name or Path().stem
- Replaced os.path.abspath() with Path().resolve()
- Replaced os.makedirs() with Path().mkdir(parents=True, exist_ok=True)
- Added pathlib import where needed

Note: Kept os.path.expandvars in env_settings.py as there is no pathlib
equivalent. Added comment explaining this limitation.

This completes the pathlib migration for issue #640.

Co-Authored-By: djpetti

* fix: Allow os.path.expandvars in pathlib pre-commit hook

Updated the check-pathlib-usage.py pre-commit hook to skip checking
os.path.expandvars since it has no pathlib equivalent.

Changes:
- Added exception for expandvars in both visit_Attribute and visit_Call methods
- Added comment in equivalents dictionary noting expandvars is allowed
- This allows env_settings.py to use os.path.expandvars without failing checks

This resolves the pre-commit CI failure while maintaining the pathlib
enforcement for all other os.path methods.

Co-Authored-By: djpetti

---------

Co-authored-by: djpetti
2025-08-17 22:52:35 +02:00
LearningCircuit
6c93348873 Fix broken database migration system (issue #638) (#646)
* fix: Fix broken database migration system (issue #638)

- Remove broken migrations.py with incorrect imports and missing functions
- Create centralized database initialization module as temporary solution
- Update encrypted_db.py to use centralized initialization
- Remove broken setup_predefined_settings reference from settings_routes
- Add comprehensive tests for database initialization
- Add tests to GitHub Actions CI workflow

This provides a working database initialization system until Alembic
migrations are implemented (see issue #644).

* fix: Address PR review feedback - remove unnecessary error handling

- Remove try/except for news models import (they're in the same codebase)
- Remove ImportError catch for SettingsManager (programming errors should surface)
- Simplify code by using single Base class for all models
- Remove include_news parameter as all tables use the same Base

Per djpetti's review: These were examples of being overzealous with error handling.
Internal imports should fail loudly if there are issues.
2025-08-16 13:28:59 -04:00