local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-16 03:51:07 +03:00

Author	SHA1	Message	Date
LearningCircuit	8e11dcf729	refactor(db): remove per-thread NullPool engines to fix FD leak (#3441 ) Previously DatabaseManager kept a dedicated per-(username, thread_id) NullPool engine in `_thread_engines` for background-thread metric writes, alongside the per-user QueuePool engine in `connections`. Orphaned entries leaked SQLCipher+WAL file handles (3 FDs per active connection) when @thread_cleanup did not fire, eventually exhausting the 1024 FD soft limit and causing werkzeug's per-request selector to fail on every request. Route metric writes through the shared per-user QueuePool engine, which is already created with check_same_thread=False and is safe to use from background threads. FD usage is now bounded by pool_size + max_overflow per user instead of scaling with background thread count. Also: - Bump pool_size=20, max_overflow=40, add pool_timeout=10 to absorb concurrent research + HTTP + metric writers against the shared pool. - Add pool_checked_out observability to the periodic Resource monitor. - Delete ~200 lines of thread-engine bookkeeping: cleanup_thread_engines, cleanup_dead_thread_engines, maybe_sweep_dead_engines, cleanup_all_thread_engines, _sweep_lock, _last_sweep_time, _thread_engine_lock, _thread_engines. - Force QueuePool on the SQLCipher integration-test fixture so concurrent-write tests exercise real pooling (not StaticPool). - Update docs/architecture.md and web/database/README.md. Known follow-up: parallel_constrained_strategy.py uses max_workers=100 which could spike pool pressure under worst-case load; sessions are short-lived so sustained contention is unlikely, and pool_timeout=10 will surface it as errors rather than deadlock. 1996 passed, 8 skipped across tests/database and tests/web/auth.	2026-04-12 20:02:22 +02:00
LearningCircuit	6d8ea23d95	fix(config): sync ruff version between pre-commit and pyproject.toml (#3221 ) * fix(config): sync ruff version between pre-commit and pyproject.toml Bump ruff in .pre-commit-config.yaml from v0.14.10 to v0.15.8 to match the ~=0.15 specifier in pyproject.toml dev dependencies. This eliminates behavioral differences between developers running pre-commit hooks vs ruff directly. * style(tests): fix ruff lambda formatting for newer ruff version (#3238) Apply ruff's updated lambda formatting rules to test files. The newer ruff version prefers parenthesized return values over parenthesized assignments for multi-line lambda expressions. * style: reformat remaining files for ruff 0.15 lambda style Run `ruff format .` to catch all files still using the 0.14 lambda formatting style. The ruff 0.15 2026 style keeps lambda parameters on a single line and parenthesizes the body for line breaks.	2026-03-28 11:37:41 +01:00
LearningCircuit	76524cc4de	fix: correct runtime bugs and CI failure masking (from #2039 ) (#2118 ) * fix: remove continue-on-error masking and fix runtime bugs - Remove continue-on-error from pytest step so test failures are visible - Add always() to coverage badge step so it runs regardless of test outcome - Fix calculate_combined_score call to use single metrics dict (evaluator.py) - Fix validate_url to expect bool return, not tuple (api_routes.py) - Fix patched_get_setting signature to match real get_setting_from_snapshot to avoid TypeError with pytest-xdist (test_custom_context.py) * fix: add missing salt parameter to _get_key_from_password test calls The function signature changed to require 3 args (password, salt, kdf_iterations) but tests were still calling with 2 args. Add LEGACY_PBKDF2_SALT as the salt parameter in all test calls, and update mock assertion to include db_path=None for the public wrapper. * fix: auth rate limiting, username validation, and PDF upload test failures - Rate limiting tests: move limiter.enabled=True after create_app() so it is not overridden by init_app which reads DISABLE_RATE_LIMITING env - test_single_character_username: expect 400 since validation requires >= 3 characters (the validation is intentional) - test_upload_too_many_files_rejected: use MAX_FILES_PER_REQUEST constant (200) instead of hardcoded 101, and fix assertion to accept "errors" or "status" keys in the response * fix: remove region/gdpr/country/data_location test references from provider tests These attributes were intentionally removed from ProviderInfo. Delete test methods that assert on region, gdpr_compliant, country, and data_location. Clean mock setups that set these removed attributes. Also relax is_cloud assertion to allow None in integration test. * fix: resolve 4 settings test failures - Regenerate stale golden master snapshot (deleted and re-created) - Fix test_integer_setting assertions to match actual default of 64 - Add None guard in InMemorySettingsManager._get_typed_value to prevent str(None) -> "None" conversion for API key defaults - Wrap DB query in get_all_settings with try/except SQLAlchemyError for graceful degradation when database is unavailable * fix: resolve 7 miscellaneous test failures (category 8) 1. History routes: fix mock chain to include .limit().offset() matching the actual query chain in get_history() 2. Bootstrap colors: replace #28a745/#ffc107 fallbacks with non-Bootstrap colors #2e7d32/#f9a825 in embedding_settings.js 3. test_request_import_removed: delete incorrect test - request IS used for request.args.get() pagination parameters 4. test_research_count_uses_debug: add logger.debug for research count in get_history() 5. Scheduler unbound var: move clear_settings_context import before try block so it is available in the finally block 6. Raw SQL false positive: add auth_db.py to skip list - it uses SQLAlchemy DDL (CreateTable/CreateIndex), not raw SQL 7. Connection log message: fix assertion to match actual log message "Failed to create thread connection" instead of "connection failed" * fix: resolve remaining test failures for clean CI Root causes fixed: - test_custom_context.py: permanently monkey-patched get_setting_from_snapshot without cleanup, polluting all subsequent tests in the same xdist worker. Added try/finally to restore original functions. - conftest.py: setup_database_for_all_tests used session_mocker (session-scoped) causing patches to leak for the entire test session. Changed to mocker (function-scoped). - test_auth_rate_limiting.py: limiter.init_app() returns early when disabled, skipping storage/handler setup. Re-call init_app() after enabling. - Deleted 4 obsolete search engine test files (duplicated in web_search_engines/) - Fixed DDG availability check to try instantiation not just import - Fixed flaky stampede concurrency test threshold - Fixed contradicting settings DB error test to match graceful degradation behavior	2026-02-22 23:21:08 +01:00
LearningCircuit	65b2383eb7	Revert "Revert "fix: add SQLCipher 4.x compatibility for cipher pragma ordering"" (#1867 ) * Revert "Revert "fix: add SQLCipher 4.x compatibility for cipher pragma orderi…" This reverts commit `d72b7ae668`. * ci: add backwards compatibility workflow for SQLCipher encryption Add a CI workflow that verifies database encryption backwards compatibility: - Runs encryption constants tests on PRs touching database files - Runs full PyPI version compatibility test on main/releases - Triggers on schedule (weekly) to catch dependency drift - Path-filtered to only run when relevant files change This prevents regressions like salt or KDF parameter changes that would break existing encrypted databases. * ci: add backwards compatibility to security release gate - Add workflow_call trigger to backwards-compatibility.yml - Include backwards-compatibility check in security-release-gate.yml - Run full PyPI compatibility test on releases (not just encryption constants) This ensures database encryption changes can't break existing user databases in a release. * test: increase timeouts for PyPI backwards compatibility test - Add pytest.mark.timeout(600) for the test function - Increase pip install timeout from 300s to 600s The package has many dependencies (numpy, torch, etc.) which take significant time to install in a fresh venv. * ci: fix shellcheck SC2129 in backwards-compatibility workflow Group consecutive GITHUB_OUTPUT redirects into a single block to satisfy actionlint/shellcheck SC2129 style check. * fix: comprehensive SQLCipher security and correctness improvements - Fix critical PRAGMA ordering: cipher_default_* before key for new DBs, cipher_* after key for existing DBs (per Zetetic SQLCipher 4.x docs) - Replace unbounded @cache with @lru_cache(maxsize=8) to prevent memory leaks - Add thread safety: RLock for connections dict, Lock for cipher_default globals - Pre-derive hex keys before closures to avoid capturing plaintext passwords - Add connection cleanup on failure (close conn, remove partial DB files) - Add engine.dispose() on failed open_user_database() - Whitespace password validation (reject " " as password) - Remove cipher_memory_security=OFF (SQLCipher >=4.5 defaults to OFF) - CI-aware KDF minimum: 100k production, relaxed in test environments - Add set_sqlcipher_key_from_hex(), get_sqlcipher_version() utilities - Replace direct db_manager.connections dict access with is_user_connected() - Update harden-runner to v2.14.1, improve CI error handling - Fix all except Exception: pass patterns to re-raise AssertionError in tests - Update all test files for renamed functions and correct PRAGMA ordering * fix: address race condition in get_session() and exception swallowing in test Move sessionmaker + session creation inside _connections_lock to prevent race with close_user_database() disposing the engine. Also fix exception handler in test_sqlcipher_integration.py to re-raise AssertionError. * fix: address review issues in SQLCipher 4.x compatibility PR - Update 3 broken web test files (test_queue_manager, test_decorators, test_session_cleanup) to mock is_user_connected()/get_session() instead of the removed connections.get() API - Wire backwards-compatibility.yml into security-release-gate.yml so encryption compat checks actually run during releases - Add missing apply_sqlcipher_pragmas() call in _check_encryption_available() to properly set kdf_iter after key on the test connection - Replace generic CI/TESTING env var checks with PYTEST_CURRENT_TEST and LDR_TEST_MODE to prevent accidental KDF weakening in production - Add timeout-minutes to both backwards-compatibility CI jobs	2026-02-11 02:26:58 +00:00
LearningCircuit	67866d99c4	refactor: extract shared SQLCipher connection factory method (#1967 ) * refactor: extract shared SQLCipher connection factory method Extract _make_sqlcipher_connection() method to replace 3 nearly identical inner functions in create_user_database(), open_user_database(), and create_thread_safe_session_for_metrics(). Each call site now uses a lambda that passes the appropriate parameters. * test: add tests for _make_sqlcipher_connection factory method Verifies factory method extracted from duplicate inline connection creators and is used in all three engine creation paths. * fix: restore error logging in metrics path and replace superficial tests - Restore logger.exception() for thread connection failures in create_thread_safe_session_for_metrics by wrapping factory call in a proper function instead of a bare lambda - Replace inspect.getsource string-matching tests with real behavioral tests that mock SQLCipher and verify: parameter passing, call order, error handling, and metrics error logging * fix: close cursor and connection on exception in factory method _make_sqlcipher_connection leaked the cursor and connection when set_sqlcipher_key, verify, or pragma calls raised an exception. Wrap the initialization sequence in try/except to ensure both cursor and connection are closed before re-raising. Add 3 tests verifying cleanup on verification failure, key error, and pragma error. * fix: correct SQLCipher pragma ordering and improve factory method Cipher pragmas (page_size, hmac_algorithm, kdf_iter) must be set before the first query (verification SELECT 1) because that query triggers page decryption with the active cipher settings. With non-default settings the previous ordering caused verification to fail. Also adds @staticmethod, return type annotation, improved docstring, and additional test coverage for edge cases. * fix: make cleanup exception-safe and add edge-case tests Wrap cursor.close() in try/except during error cleanup so conn.close() always runs even if cursor.close() throws. Add tests for cleanup exception safety, ImportError propagation, and connect() failure. * test: add coverage for performance pragma and verify exception paths Add tests for two untested error paths in _make_sqlcipher_connection: - apply_performance_pragmas() raising cleans up cursor and connection - verify_sqlcipher_connection() raising (vs returning False) triggers cleanup and prevents performance pragmas from running	2026-02-10 07:58:42 +01:00

5 Commits