mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
b1cbc6fbe085d56effd5dbf499cf8071759d60f6
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
901d0db8d9 |
fix(examples): make mock LLM example truly offline + reject search.tool='none' (#3520)
* ci: skip live-network LLM examples, keep compile-checked The basic_custom_llm.py and advanced_custom_llm.py examples execute a real research pipeline that hits Wikipedia and PubMed. Under the job's 60s timeout they flake whenever those services are slow (seen on #3467 and elsewhere). Drop the two exec steps from llm-example-tests, add both files to the compile-check block so syntax/import regressions are still caught, and leave mock_llm_example.py running since it exercises the same integration path offline. * fix(examples): make mock LLM example truly offline + reject search.tool='none' The mock example claimed to run "offline" with `search.tool: "none"`, but the factory silently fell back to the `auto` engine and dispatched real searches to PubMed/Wikipedia — which is why the `LLM Example Tests` CI job still timed out at 60s after #3478 removed the other two examples. - `MockRetriever` is now a proper `langchain_core.retrievers.BaseRetriever` at module scope, so `RetrieverSearchEngine.run`'s `.invoke(query)` call actually works (previously the inner class was only discovered via the broad `except Exception` path and returned no results). - The three `main()` tests that used `search.tool: "none"` now register the mock retriever and use `search.tool: "mock_retriever"`. - `create_search_engine` rejects the literal string `"none"` with a `ValueError` so this silent-fallback class of bug cannot recur. End-to-end run of the example in the project venv completes in ~13s with no network traffic; previously this timed out at 60s hitting NCBI and en.wikipedia.org. * test(factory): regression coverage for search.tool='none' guard Asserts that create_search_engine('none', ...) raises ValueError rather than silently falling through to the 'auto' engine — the exact failure mode that broke the mock LLM example in CI. |
||
|
|
12160e26e1 |
chore(lint): add ruff rules for logging, performance, exceptions, and print detection (#3211)
* chore(lint): add ruff rules for logging, performance, exceptions, and print detection Add wave 2 lint rules: G, PERF, RET, TRY, T20, C4, ERA. All existing violations are suppressed via ignore/per-file-ignores so this config change is merge-safe. Follow-up PRs will fix violations and remove the ignore entries incrementally. * fix(lint): exempt pre-commit hooks from T201 print rule (#3270) Pre-commit hooks are CLI scripts where print is the intended output interface, same as scripts/ and cli/ directories already exempted. * fix(lint): fix all low-count ruff violations instead of suppressing them (#3275) * fix(lint): replace manual dict-building loops with dict comprehensions (PERF403) * fix(lint): replace bare Exception raises with specific built-in types (TRY002) Replace all `raise Exception(...)` in production code with appropriate built-in exception types: RuntimeError for operational/state failures, ValueError for invalid data, and ConnectionError for HTTP errors. * fix(lint): resolve TRY004 and PERF402 ruff violations Use TypeError instead of ValueError for isinstance/issubclass type checks (TRY004), and replace manual for-loop list copies with list.extend() (PERF402). * fix(lint): fix all low-count ruff violations instead of suppressing them Fix all violations for 15 ruff rules that had ≤10 occurrences each, rather than suppressing them with ignore directives: - TRY002: raise-vanilla-class → use specific built-in exceptions - TRY004: type-check-without-type-error → use TypeError - C408: unnecessary-collection-call → use dict/list literals - C401: unnecessary-generator-set → use set comprehensions - C416: unnecessary-comprehension → use list()/set() - C414: unnecessary-double-cast-or-process → simplify - PERF403: manual-dict-comprehension → use dict comprehensions - PERF102: incorrect-dict-iterator → use .values()/.keys() - PERF402: manual-list-copy → use list.extend() - RET503/RET506/RET507/RET508: superfluous else after return/raise/continue/break - RET501/RET502: unnecessary/implicit return None Adds per-file-ignores for tests/ and examples/ where these patterns are acceptable (e.g. bare Exception in tests, dict() calls in fixtures). * fix(lint): enforce E722, ERA001, RET505 and fix pre-commit RET503 gap (#3276) Remove three rules from the global ignore list by fixing all violations: E722 (bare except) — 6 violations in tests: Replace `except:` with `except Exception:` to avoid swallowing KeyboardInterrupt and SystemExit. ERA001 (commented-out code) — 25 violations: Delete 18 true positives (dead variables, disabled debug logs, commented-out imports). Add `# noqa: ERA001` to 7 false positives (template instructions, type annotations, documentation comments). RET505 (superfluous else after return) — 413 violations: Auto-fix all occurrences. Also fixes 5 cascading RET506/RET507 violations exposed by the RET505 removals. Pre-commit hooks gap: Add RET503 to `.pre-commit-hooks/**` per-file-ignores alongside T201. * fix(lint): enforce RET504 and TRY301 — fix all violations (#3279) * fix(lint): enforce RET504 — collapse unnecessary assign-before-return Auto-fix all 46 RET504 violations via ruff unsafe-fixes: collapse `result = expr; return result` into `return expr`. Remove RET504 from global ignore list. Add to tests/examples per-file-ignores where intermediate variables aid test clarity. Also removes TRY301 from global ignore (violations fixed in next commit). * fix(lint): enforce TRY301 — fix raises inside broad try/except blocks Structural fixes for 65 TRY301 violations: Security-critical fixes: - url_validator.py: move 6 validation raises before try block, replace isinstance-based re-raise with specific except clause - path_validator.py: move validation outside try block - env_settings.py: separate parsing (try) from validation (outside) Route/service fixes: - research_routes.py: replace raise-then-catch with direct error return - mcp/server.py: move all 7 tool validations before try blocks - news/api.py: move validation before try, noqa for db-session raises - notifications: move rate limit and URL validation before try blocks - iterative_refinement_strategy.py: move JSON validation after try Added noqa for intentional patterns: re-raise in except handlers, nested function definitions, db-session-dependent checks, rate limit re-raises for base class retry logic. * merge: resolve conflicts between wave2 lint branch and main Resolve 14 merge conflicts by always starting from main's version and re-applying lint fixes on top: - mcp_strategy.py, ollama.py, security_settings.py, delete_routes.py: Take main's code, re-apply RET505 (remove else: after return) - mcp/server.py (3 conflicts): Take main's ValidationError handlers and set_settings_context, re-apply TRY301 fixes, fix sensitive data logging - research_routes.py: Take main, fix duplicate block (merge artifact) - settings_routes.py: Take main's default-settings fallback feature - meta_search_engine.py, parallel_search_engine.py: Take main's get_available_engines delegation, delete unreachable code - search_engine_ddg.py, search_engine_google_pse.py: Take main's sanitization, re-apply RET506 (if not elif after raise) - rag_routes.py: Accept main's deletion (route moved to delete_routes) - encryption_check.py: Accept main's deletion (dead code) - test_storage_coverage.py: Remove broken test classes referencing undefined stubs - pre-commit hooks: extend per-file-ignores for ERA001, RET504 * fix: revert ValueError→TypeError changes that break tests and API contracts Revert TRY004 fixes in 3 files where changing ValueError to TypeError would break existing tests and HTTP status code contracts: - card_factory.py: 5 tests assert pytest.raises(ValueError) - base_rater.py: flask_api.py catches ValueError for HTTP 400 responses; TypeError would fall through to HTTP 500 - full_search.py: test asserts pytest.raises(ValueError) Add # noqa: TRY004 to suppress the lint rule on these lines. * fix: move benchmark_data check back inside try block The ValueError for missing benchmark_data must be inside the try/except so the except handler can mark the run as FAILED in the database. Without this, the exception propagates unhandled in a daemon thread, leaving the benchmark run stuck in RUNNING state permanently. * chore(lint): remove ERA rule and suppress TRY004 globally Remove ERA (eradicate — commented-out code detection) from ruff select: - 28% false positive rate in our codebase (7 of 25 violations) - No major Python project enables it (Django, FastAPI, Pydantic, Airflow) - Ruff itself doesn't use it; autofix was demoted to manual-only - 172 noqa suppressions provided zero enforcement value Suppress TRY004 (type-check-without-type-error) globally: - Ruff maintainer agreed the autofix "can change functionality" - We already had to revert 3 TypeError changes that broke tests and HTTP 400→500 API contracts - Django, Flask, pandas all use isinstance + ValueError routinely - Pylint has no equivalent rule; near-zero PyPI adoption Remove all 173 # noqa: ERA001 and 49 # noqa: TRY004 comments from the codebase — no longer needed with rules disabled/suppressed. * fix: resolve mypy errors, failing MCP test, and TRY301 noqa - search_engine_factory.py: restore typed intermediate variable to fix mypy no-any-return (RET504 collapse lost the type annotation) - search_engine_pubchem.py: add explicit list[str] type annotation - test_edge_cases.py: fix assertion that expected engine name in sanitized error message - mcp/server.py: add noqa: TRY301 to validation raises inside try blocks (from main's new merge code) |
||
|
|
3087eba843 |
fix: remove || true from LLM example tests (#2913)
* fix: remove || true from LLM example tests The four LLM example test steps in docker-tests.yml silently swallowed all failures with `|| true`, providing zero signal on whether the examples actually work. The tests already set LDR_USE_FALLBACK_LLM=true and have a 60s timeout, so they should succeed in CI. * fix: declare CustomLLM fields as Pydantic class attributes The __init__ approach fails with Pydantic v2 because setting undeclared fields via __setattr__ raises ValueError. Declaring them as class-level fields lets Pydantic handle initialization natively. * fix: add settings_snapshot creation to detailed_research() detailed_research() was missing the settings_snapshot creation that quick_summary() and generate_report() already have, causing a RuntimeError when called outside a Flask app context. * fix: declare MockLLM and ScenarioMockLLM fields as Pydantic attributes Same Pydantic v2 compatibility fix as basic_custom_llm.py — fields must be declared at class level, not set in __init__. * fix: pass response_map as keyword argument to MockLLM Pydantic models don't accept positional arguments. * fix: Pydantic v2 compat for advanced_custom_llm + fix workflow refs - Convert RetryLLM, ConfigurableLLM, DomainExpertLLM to use Pydantic class-level field declarations instead of __init__ - Replace workflow references to non-existent switch_providers.py and custom_research_example.py with advanced_custom_llm.py * fix: pass base_llm as keyword argument to RetryLLM * fix: code review fixes for Pydantic v2 compat and research context - Add Optional[] to MockLLM nullable field types (Pydantic v2 rejects None for non-Optional annotations) - Use local variable for RetryLLM exponential backoff instead of mutating self.retry_delay across calls - Pass research_id and research_context to _init_search_system() in detailed_research(), matching the pattern in quick_summary() * fix: simplify detailed_research() to use settings_snapshot only Remove redundant provider/api_key/temperature/etc parameters — these should be configured via settings_snapshot, not individual params. Just create a default snapshot when none is provided. * fix: remove silent settings_snapshot fallback from detailed_research() Callers must explicitly pass settings_snapshot — silently creating a default hides errors. * fix: revert init_kwargs injection in detailed_research() Remove the research_id/research_context injection we added — this was us papering over missing caller-side responsibility. Restore the original call pattern. * fix: use explicit settings_snapshot in all example scripts - Add auto-creation of default settings_snapshot with info log in detailed_research() when none is provided - Update all example scripts to create and pass settings_snapshot explicitly via create_settings_snapshot(), demonstrating the correct programmatic API pattern * feat: add API stability smoke test to CI Add api_smoke_test.py that verifies the public API surface hasn't changed — imports, function signatures, settings utilities, and LDRClient interface. Also add test_direct_import.py to CI. These tests catch breaking API changes early. The test file includes a prominent warning that it should NOT be modified to accommodate API changes — the API change should be reverted instead. * feat: add CI testing for all example files - Create examples/_ci_helpers.py with shared CIMockLLM for CI testing - Add LDR_CI_TEST=1 mode to simple_programmatic, advanced_features, and search_strategies examples for full execution with mock LLM - Refactor simple_programmatic_example.py to use main() guard - Add py_compile checks for 9 examples with external dependencies - Add show_env_vars.py execution to CI - Total: 19 example files now covered (was 5) * fix: move show_env_vars.py to compile check (uses removed API method) * fix: address code review findings - Add if __name__ guard to api_smoke_test.py (prevents pytest crash) - Fix wasted _get_settings() call in advanced_features demonstrate_report_generation - Move unused settings_snapshot creation inside non-CI branch in simple_programmatic - Add missing files to compile checks (run_benchmark.py, elasticsearch/search_example.py, _ci_helpers.py) * fix: add examples/** to change detection filter + job timeout - Add examples/** to the llm path filter so PRs touching only example files trigger the LLM Example Tests job - Add timeout-minutes: 20 to llm-example-tests job (was missing, unlike all other jobs) * fix: revert programmatic examples to original, use compile checks instead Revert simple_programmatic_example.py, advanced_features_example.py, and search_strategies_example.py to their original state. Examples should be clean user-facing documentation, not polluted with CI infrastructure. Use py_compile checks for these files instead of full execution with mock LLM injection. * fix: rename api_smoke_test to api_public_contract_guardrail Rename to signal that this file protects the public API and must not be modified to accommodate breaking changes. Added DO NOT MODIFY comments at every test section so AI agents scanning inline comments will see the restriction even without reading the docstring. * fix: remove dead _ci_helpers.py (no example imports it) * fix: raise ValueError instead of fallback when settings_snapshot missing detailed_research() now raises a clear error when settings_snapshot is not provided, instead of silently creating a default one. Callers must explicitly pass create_settings_snapshot(...) so they know what configuration they're getting. quick_summary() and generate_report() are not affected — they build the snapshot from their explicit provider/api_key/temperature params. * fix: add warnings to all API functions when no config provided All three public API functions (quick_summary, generate_report, detailed_research) now log a warning when called without explicit configuration (no settings_snapshot, no provider, no settings). They still work using defaults + environment variables, but the warning alerts callers that they may not get expected results. |
||
|
|
2eaaf12109 |
feat: Implement per-user encrypted databases with comprehensive auth system
BREAKING CHANGE: Data files now stored in platform-specific user directories with SQLCipher encryption. Users must register/login to access the application. ## Major Features ### Security & Authentication - Implemented complete multi-user authentication system with Flask-Login - Per-user SQLCipher encrypted databases (falls back to SQLite with warnings) - Secure session management with proper CSRF protection - Password hashing with bcrypt for user credentials - Complete isolation between user data - no cross-user access possible - Thread-safe database connections with proper session management ### Database Architecture - Migrated from single shared database to per-user encrypted databases - Centralized auth database for user management - User-specific databases for research data, settings, and metrics - Automatic database initialization on user registration - Platform-specific data directories using platformdirs library - Removed all hardcoded paths and personal information ### User Experience - Registration page with data privacy acknowledgment - Login/logout functionality with session persistence - Automatic redirect to login for unauthenticated access - Research queue system with 3 concurrent research limit per user - Real-time queue position updates - Comprehensive error handling with user-friendly messages ### API & Routes - All API endpoints now require authentication - Updated routes: /auth/register, /auth/login, /auth/logout, /auth/check - Protected research submission and history endpoints - Proper JSON error responses for API routes - CSRF token validation for state-changing operations ### Testing - Added 53 Puppeteer tests for UI authentication flows - Comprehensive auth integration tests (248 Python test files) - Multi-user concurrent access testing - Queue system testing with position tracking - Database migration and encryption tests ### Configuration - Single LDR_DATA_DIR environment variable for data location - LDR_ALLOW_UNENCRYPTED environment variable for development - Updated Docker configuration for proper volume mounting - Removed multiple environment variables for simplicity ### Documentation - Added DATA_MIGRATION_GUIDE.md for upgrade instructions - Added SQLCIPHER_INSTALL.md for encryption setup - Updated environment configuration documentation - Professional error messages throughout ## Technical Improvements - Replaced raw SQL with SQLAlchemy ORM throughout - Proper database session management with context managers - Thread-local storage for database connections - Automatic cleanup of stale sessions - Rate limiting infrastructure for future use - Comprehensive logging with loguru ## Files Changed - 322 files modified/added - 248 Python files (core functionality and tests) - 53 JavaScript files (Puppeteer tests) - 6 Markdown files (documentation) - No binary files, screenshots, or database files included - All test credentials properly marked with pragma comments This migration ensures each user's research data is completely isolated and encrypted, providing enterprise-grade security for sensitive research operations. |
||
|
|
3d102da08d |
feat: Add custom LLM integration support (#507)
* feat: Add custom LLM integration support - Add LLM registry system for managing custom language models - Support both LLM instances and factory functions - Add llms parameter to API functions (quick_summary, detailed_research, generate_report) - Create comprehensive test suite with 38 tests covering: - Registry functionality - Integration with get_llm() - API integration - Edge cases (streaming, errors, concurrency) - Benchmark compatibility - Add CI/CD workflow for LLM tests - Include example implementations and documentation - Thread-safe implementation with proper cleanup This feature allows users to pass custom LangChain-compatible LLMs to the research system, similar to how custom retrievers work. Users can register LLMs programmatically and use them via the provider parameter. * Fix PR review issues for custom LLM integration - Replace print statements with loguru logger in advanced_custom_llm.py - Add docstrings for ConfigurableLLM parameters - Remove hardcoded confidence value, use descriptive text instead - Fix clear-text logging of sensitive medical information - Move LLM and retriever registration to _init_search_system to centralize logic - Use logger.exception instead of logger.error for better error tracking |