Commit Graph

5 Commits

Author SHA1 Message Date
LearningCircuit
8e11dcf729 refactor(db): remove per-thread NullPool engines to fix FD leak (#3441)
Previously DatabaseManager kept a dedicated per-(username, thread_id)
NullPool engine in `_thread_engines` for background-thread metric
writes, alongside the per-user QueuePool engine in `connections`.
Orphaned entries leaked SQLCipher+WAL file handles (3 FDs per active
connection) when @thread_cleanup did not fire, eventually exhausting
the 1024 FD soft limit and causing werkzeug's per-request selector to
fail on every request.

Route metric writes through the shared per-user QueuePool engine, which
is already created with check_same_thread=False and is safe to use
from background threads. FD usage is now bounded by
pool_size + max_overflow per user instead of scaling with background
thread count.

Also:
- Bump pool_size=20, max_overflow=40, add pool_timeout=10 to absorb
  concurrent research + HTTP + metric writers against the shared pool.
- Add pool_checked_out observability to the periodic Resource monitor.
- Delete ~200 lines of thread-engine bookkeeping:
  cleanup_thread_engines, cleanup_dead_thread_engines,
  maybe_sweep_dead_engines, cleanup_all_thread_engines, _sweep_lock,
  _last_sweep_time, _thread_engine_lock, _thread_engines.
- Force QueuePool on the SQLCipher integration-test fixture so
  concurrent-write tests exercise real pooling (not StaticPool).
- Update docs/architecture.md and web/database/README.md.

Known follow-up: parallel_constrained_strategy.py uses max_workers=100
which could spike pool pressure under worst-case load; sessions are
short-lived so sustained contention is unlikely, and pool_timeout=10
will surface it as errors rather than deadlock.

1996 passed, 8 skipped across tests/database and tests/web/auth.
2026-04-12 20:02:22 +02:00
LearningCircuit
6d8ea23d95 fix(config): sync ruff version between pre-commit and pyproject.toml (#3221)
* fix(config): sync ruff version between pre-commit and pyproject.toml

Bump ruff in .pre-commit-config.yaml from v0.14.10 to v0.15.8 to match
the ~=0.15 specifier in pyproject.toml dev dependencies. This eliminates
behavioral differences between developers running pre-commit hooks vs
ruff directly.

* style(tests): fix ruff lambda formatting for newer ruff version (#3238)

Apply ruff's updated lambda formatting rules to test files. The newer
ruff version prefers parenthesized return values over parenthesized
assignments for multi-line lambda expressions.

* style: reformat remaining files for ruff 0.15 lambda style

Run `ruff format .` to catch all files still using the 0.14 lambda
formatting style. The ruff 0.15 2026 style keeps lambda parameters
on a single line and parenthesizes the body for line breaks.
2026-03-28 11:37:41 +01:00
LearningCircuit
76524cc4de fix: correct runtime bugs and CI failure masking (from #2039) (#2118)
* fix: remove continue-on-error masking and fix runtime bugs

- Remove continue-on-error from pytest step so test failures are visible
- Add always() to coverage badge step so it runs regardless of test outcome
- Fix calculate_combined_score call to use single metrics dict (evaluator.py)
- Fix validate_url to expect bool return, not tuple (api_routes.py)
- Fix patched_get_setting signature to match real get_setting_from_snapshot
  to avoid TypeError with pytest-xdist (test_custom_context.py)

* fix: add missing salt parameter to _get_key_from_password test calls

The function signature changed to require 3 args (password, salt,
kdf_iterations) but tests were still calling with 2 args. Add
LEGACY_PBKDF2_SALT as the salt parameter in all test calls, and update
mock assertion to include db_path=None for the public wrapper.

* fix: auth rate limiting, username validation, and PDF upload test failures

- Rate limiting tests: move limiter.enabled=True after create_app() so
  it is not overridden by init_app which reads DISABLE_RATE_LIMITING env
- test_single_character_username: expect 400 since validation requires
  >= 3 characters (the validation is intentional)
- test_upload_too_many_files_rejected: use MAX_FILES_PER_REQUEST constant
  (200) instead of hardcoded 101, and fix assertion to accept "errors"
  or "status" keys in the response

* fix: remove region/gdpr/country/data_location test references from provider tests

These attributes were intentionally removed from ProviderInfo. Delete
test methods that assert on region, gdpr_compliant, country, and
data_location. Clean mock setups that set these removed attributes.
Also relax is_cloud assertion to allow None in integration test.

* fix: resolve 4 settings test failures

- Regenerate stale golden master snapshot (deleted and re-created)
- Fix test_integer_setting assertions to match actual default of 64
- Add None guard in InMemorySettingsManager._get_typed_value to prevent
  str(None) -> "None" conversion for API key defaults
- Wrap DB query in get_all_settings with try/except SQLAlchemyError for
  graceful degradation when database is unavailable

* fix: resolve 7 miscellaneous test failures (category 8)

1. History routes: fix mock chain to include .limit().offset() matching
   the actual query chain in get_history()
2. Bootstrap colors: replace #28a745/#ffc107 fallbacks with non-Bootstrap
   colors #2e7d32/#f9a825 in embedding_settings.js
3. test_request_import_removed: delete incorrect test - request IS used
   for request.args.get() pagination parameters
4. test_research_count_uses_debug: add logger.debug for research count
   in get_history()
5. Scheduler unbound var: move clear_settings_context import before try
   block so it is available in the finally block
6. Raw SQL false positive: add auth_db.py to skip list - it uses
   SQLAlchemy DDL (CreateTable/CreateIndex), not raw SQL
7. Connection log message: fix assertion to match actual log message
   "Failed to create thread connection" instead of "connection failed"

* fix: resolve remaining test failures for clean CI

Root causes fixed:
- test_custom_context.py: permanently monkey-patched get_setting_from_snapshot
  without cleanup, polluting all subsequent tests in the same xdist worker.
  Added try/finally to restore original functions.
- conftest.py: setup_database_for_all_tests used session_mocker (session-scoped)
  causing patches to leak for the entire test session. Changed to mocker
  (function-scoped).
- test_auth_rate_limiting.py: limiter.init_app() returns early when disabled,
  skipping storage/handler setup. Re-call init_app() after enabling.
- Deleted 4 obsolete search engine test files (duplicated in web_search_engines/)
- Fixed DDG availability check to try instantiation not just import
- Fixed flaky stampede concurrency test threshold
- Fixed contradicting settings DB error test to match graceful degradation behavior
2026-02-22 23:21:08 +01:00
LearningCircuit
65b2383eb7 Revert "Revert "fix: add SQLCipher 4.x compatibility for cipher pragma ordering"" (#1867)
* Revert "Revert "fix: add SQLCipher 4.x compatibility for cipher pragma orderi…"

This reverts commit d72b7ae668.

* ci: add backwards compatibility workflow for SQLCipher encryption

Add a CI workflow that verifies database encryption backwards compatibility:

- Runs encryption constants tests on PRs touching database files
- Runs full PyPI version compatibility test on main/releases
- Triggers on schedule (weekly) to catch dependency drift
- Path-filtered to only run when relevant files change

This prevents regressions like salt or KDF parameter changes that would
break existing encrypted databases.

* ci: add backwards compatibility to security release gate

- Add workflow_call trigger to backwards-compatibility.yml
- Include backwards-compatibility check in security-release-gate.yml
- Run full PyPI compatibility test on releases (not just encryption constants)

This ensures database encryption changes can't break existing user databases
in a release.

* test: increase timeouts for PyPI backwards compatibility test

- Add pytest.mark.timeout(600) for the test function
- Increase pip install timeout from 300s to 600s

The package has many dependencies (numpy, torch, etc.) which take
significant time to install in a fresh venv.

* ci: fix shellcheck SC2129 in backwards-compatibility workflow

Group consecutive GITHUB_OUTPUT redirects into a single block
to satisfy actionlint/shellcheck SC2129 style check.

* fix: comprehensive SQLCipher security and correctness improvements

- Fix critical PRAGMA ordering: cipher_default_* before key for new DBs,
  cipher_* after key for existing DBs (per Zetetic SQLCipher 4.x docs)
- Replace unbounded @cache with @lru_cache(maxsize=8) to prevent memory leaks
- Add thread safety: RLock for connections dict, Lock for cipher_default globals
- Pre-derive hex keys before closures to avoid capturing plaintext passwords
- Add connection cleanup on failure (close conn, remove partial DB files)
- Add engine.dispose() on failed open_user_database()
- Whitespace password validation (reject " " as password)
- Remove cipher_memory_security=OFF (SQLCipher >=4.5 defaults to OFF)
- CI-aware KDF minimum: 100k production, relaxed in test environments
- Add set_sqlcipher_key_from_hex(), get_sqlcipher_version() utilities
- Replace direct db_manager.connections dict access with is_user_connected()
- Update harden-runner to v2.14.1, improve CI error handling
- Fix all except Exception: pass patterns to re-raise AssertionError in tests
- Update all test files for renamed functions and correct PRAGMA ordering

* fix: address race condition in get_session() and exception swallowing in test

Move sessionmaker + session creation inside _connections_lock to prevent
race with close_user_database() disposing the engine. Also fix exception
handler in test_sqlcipher_integration.py to re-raise AssertionError.

* fix: address review issues in SQLCipher 4.x compatibility PR

- Update 3 broken web test files (test_queue_manager, test_decorators,
  test_session_cleanup) to mock is_user_connected()/get_session() instead
  of the removed connections.get() API
- Wire backwards-compatibility.yml into security-release-gate.yml so
  encryption compat checks actually run during releases
- Add missing apply_sqlcipher_pragmas() call in _check_encryption_available()
  to properly set kdf_iter after key on the test connection
- Replace generic CI/TESTING env var checks with PYTEST_CURRENT_TEST and
  LDR_TEST_MODE to prevent accidental KDF weakening in production
- Add timeout-minutes to both backwards-compatibility CI jobs
2026-02-11 02:26:58 +00:00
LearningCircuit
67866d99c4 refactor: extract shared SQLCipher connection factory method (#1967)
* refactor: extract shared SQLCipher connection factory method

Extract _make_sqlcipher_connection() method to replace 3 nearly identical
inner functions in create_user_database(), open_user_database(), and
create_thread_safe_session_for_metrics(). Each call site now uses a
lambda that passes the appropriate parameters.

* test: add tests for _make_sqlcipher_connection factory method

Verifies factory method extracted from duplicate inline connection
creators and is used in all three engine creation paths.

* fix: restore error logging in metrics path and replace superficial tests

- Restore logger.exception() for thread connection failures in
  create_thread_safe_session_for_metrics by wrapping factory call in a
  proper function instead of a bare lambda
- Replace inspect.getsource string-matching tests with real behavioral
  tests that mock SQLCipher and verify: parameter passing, call order,
  error handling, and metrics error logging

* fix: close cursor and connection on exception in factory method

_make_sqlcipher_connection leaked the cursor and connection when
set_sqlcipher_key, verify, or pragma calls raised an exception.
Wrap the initialization sequence in try/except to ensure both cursor
and connection are closed before re-raising.

Add 3 tests verifying cleanup on verification failure, key error,
and pragma error.

* fix: correct SQLCipher pragma ordering and improve factory method

Cipher pragmas (page_size, hmac_algorithm, kdf_iter) must be set
before the first query (verification SELECT 1) because that query
triggers page decryption with the active cipher settings. With
non-default settings the previous ordering caused verification to fail.

Also adds @staticmethod, return type annotation, improved docstring,
and additional test coverage for edge cases.

* fix: make cleanup exception-safe and add edge-case tests

Wrap cursor.close() in try/except during error cleanup so conn.close()
always runs even if cursor.close() throws. Add tests for cleanup
exception safety, ImportError propagation, and connect() failure.

* test: add coverage for performance pragma and verify exception paths

Add tests for two untested error paths in _make_sqlcipher_connection:
- apply_performance_pragmas() raising cleans up cursor and connection
- verify_sqlcipher_connection() raising (vs returning False) triggers
  cleanup and prevents performance pragmas from running
2026-02-10 07:58:42 +01:00