local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-16 03:51:07 +03:00

Author	SHA1	Message	Date
LearningCircuit	533a878769	docs: fix troubleshooting link casing (#3854 ) Follow-up to #3852: the actual file is docs/troubleshooting.md (lowercase). The uppercase reference 404s on case-sensitive filesystems and on github.com.	2026-05-08 01:34:35 +02:00
Aqil Aziz	3bf78baf07	docs: fix API example links (#3852 )	2026-05-08 01:30:25 +02:00
LearningCircuit	12160e26e1	chore(lint): add ruff rules for logging, performance, exceptions, and print detection (#3211 ) * chore(lint): add ruff rules for logging, performance, exceptions, and print detection Add wave 2 lint rules: G, PERF, RET, TRY, T20, C4, ERA. All existing violations are suppressed via ignore/per-file-ignores so this config change is merge-safe. Follow-up PRs will fix violations and remove the ignore entries incrementally. * fix(lint): exempt pre-commit hooks from T201 print rule (#3270) Pre-commit hooks are CLI scripts where print is the intended output interface, same as scripts/ and cli/ directories already exempted. * fix(lint): fix all low-count ruff violations instead of suppressing them (#3275) * fix(lint): replace manual dict-building loops with dict comprehensions (PERF403) * fix(lint): replace bare Exception raises with specific built-in types (TRY002) Replace all `raise Exception(...)` in production code with appropriate built-in exception types: RuntimeError for operational/state failures, ValueError for invalid data, and ConnectionError for HTTP errors. * fix(lint): resolve TRY004 and PERF402 ruff violations Use TypeError instead of ValueError for isinstance/issubclass type checks (TRY004), and replace manual for-loop list copies with list.extend() (PERF402). * fix(lint): fix all low-count ruff violations instead of suppressing them Fix all violations for 15 ruff rules that had ≤10 occurrences each, rather than suppressing them with ignore directives: - TRY002: raise-vanilla-class → use specific built-in exceptions - TRY004: type-check-without-type-error → use TypeError - C408: unnecessary-collection-call → use dict/list literals - C401: unnecessary-generator-set → use set comprehensions - C416: unnecessary-comprehension → use list()/set() - C414: unnecessary-double-cast-or-process → simplify - PERF403: manual-dict-comprehension → use dict comprehensions - PERF102: incorrect-dict-iterator → use .values()/.keys() - PERF402: manual-list-copy → use list.extend() - RET503/RET506/RET507/RET508: superfluous else after return/raise/continue/break - RET501/RET502: unnecessary/implicit return None Adds per-file-ignores for tests/ and examples/ where these patterns are acceptable (e.g. bare Exception in tests, dict() calls in fixtures). * fix(lint): enforce E722, ERA001, RET505 and fix pre-commit RET503 gap (#3276) Remove three rules from the global ignore list by fixing all violations: E722 (bare except) — 6 violations in tests: Replace `except:` with `except Exception:` to avoid swallowing KeyboardInterrupt and SystemExit. ERA001 (commented-out code) — 25 violations: Delete 18 true positives (dead variables, disabled debug logs, commented-out imports). Add `# noqa: ERA001` to 7 false positives (template instructions, type annotations, documentation comments). RET505 (superfluous else after return) — 413 violations: Auto-fix all occurrences. Also fixes 5 cascading RET506/RET507 violations exposed by the RET505 removals. Pre-commit hooks gap: Add RET503 to `.pre-commit-hooks/*` per-file-ignores alongside T201. fix(lint): enforce RET504 and TRY301 — fix all violations (#3279) * fix(lint): enforce RET504 — collapse unnecessary assign-before-return Auto-fix all 46 RET504 violations via ruff unsafe-fixes: collapse `result = expr; return result` into `return expr`. Remove RET504 from global ignore list. Add to tests/examples per-file-ignores where intermediate variables aid test clarity. Also removes TRY301 from global ignore (violations fixed in next commit). * fix(lint): enforce TRY301 — fix raises inside broad try/except blocks Structural fixes for 65 TRY301 violations: Security-critical fixes: - url_validator.py: move 6 validation raises before try block, replace isinstance-based re-raise with specific except clause - path_validator.py: move validation outside try block - env_settings.py: separate parsing (try) from validation (outside) Route/service fixes: - research_routes.py: replace raise-then-catch with direct error return - mcp/server.py: move all 7 tool validations before try blocks - news/api.py: move validation before try, noqa for db-session raises - notifications: move rate limit and URL validation before try blocks - iterative_refinement_strategy.py: move JSON validation after try Added noqa for intentional patterns: re-raise in except handlers, nested function definitions, db-session-dependent checks, rate limit re-raises for base class retry logic. * merge: resolve conflicts between wave2 lint branch and main Resolve 14 merge conflicts by always starting from main's version and re-applying lint fixes on top: - mcp_strategy.py, ollama.py, security_settings.py, delete_routes.py: Take main's code, re-apply RET505 (remove else: after return) - mcp/server.py (3 conflicts): Take main's ValidationError handlers and set_settings_context, re-apply TRY301 fixes, fix sensitive data logging - research_routes.py: Take main, fix duplicate block (merge artifact) - settings_routes.py: Take main's default-settings fallback feature - meta_search_engine.py, parallel_search_engine.py: Take main's get_available_engines delegation, delete unreachable code - search_engine_ddg.py, search_engine_google_pse.py: Take main's sanitization, re-apply RET506 (if not elif after raise) - rag_routes.py: Accept main's deletion (route moved to delete_routes) - encryption_check.py: Accept main's deletion (dead code) - test_storage_coverage.py: Remove broken test classes referencing undefined stubs - pre-commit hooks: extend per-file-ignores for ERA001, RET504 * fix: revert ValueError→TypeError changes that break tests and API contracts Revert TRY004 fixes in 3 files where changing ValueError to TypeError would break existing tests and HTTP status code contracts: - card_factory.py: 5 tests assert pytest.raises(ValueError) - base_rater.py: flask_api.py catches ValueError for HTTP 400 responses; TypeError would fall through to HTTP 500 - full_search.py: test asserts pytest.raises(ValueError) Add # noqa: TRY004 to suppress the lint rule on these lines. * fix: move benchmark_data check back inside try block The ValueError for missing benchmark_data must be inside the try/except so the except handler can mark the run as FAILED in the database. Without this, the exception propagates unhandled in a daemon thread, leaving the benchmark run stuck in RUNNING state permanently. * chore(lint): remove ERA rule and suppress TRY004 globally Remove ERA (eradicate — commented-out code detection) from ruff select: - 28% false positive rate in our codebase (7 of 25 violations) - No major Python project enables it (Django, FastAPI, Pydantic, Airflow) - Ruff itself doesn't use it; autofix was demoted to manual-only - 172 noqa suppressions provided zero enforcement value Suppress TRY004 (type-check-without-type-error) globally: - Ruff maintainer agreed the autofix "can change functionality" - We already had to revert 3 TypeError changes that broke tests and HTTP 400→500 API contracts - Django, Flask, pandas all use isinstance + ValueError routinely - Pylint has no equivalent rule; near-zero PyPI adoption Remove all 173 # noqa: ERA001 and 49 # noqa: TRY004 comments from the codebase — no longer needed with rules disabled/suppressed. * fix: resolve mypy errors, failing MCP test, and TRY301 noqa - search_engine_factory.py: restore typed intermediate variable to fix mypy no-any-return (RET504 collapse lost the type annotation) - search_engine_pubchem.py: add explicit list[str] type annotation - test_edge_cases.py: fix assertion that expected engine name in sanitized error message - mcp/server.py: add noqa: TRY301 to validation raises inside try blocks (from main's new merge code)	2026-03-29 17:01:23 +02:00
LearningCircuit	5e748e8155	fix: comprehensive file descriptor leak prevention (#1860 ) * feat: extend resource leak hook to detect database session leaks The pre-commit hook now detects unsafe usage of get_auth_db_session() and suggests using the auth_db_session() context manager instead. This prevents database session leaks when exceptions occur. Changes: - Add FUNCTIONS_REQUIRING_CONTEXT to detect function calls that return resources needing cleanup - Fix nested try/finally detection for close() calls - Update user_exists() in encrypted_db.py to use context manager - Update example files to use auth_db_session() context manager * fix: prevent session use after close and add search engine cleanup - Move config dict creation inside with block in api_routes.py to prevent using SettingsManager after database session is closed (was causing errors) - Remove redundant session.close() call that was after context manager exit - Add close() method and context manager support to BaseSearchEngine so search engines with HTTP sessions can be properly cleaned up	2026-01-31 18:24:19 -05:00
LearningCircuit	3be8341f66	fix: enable localhost HTTP for development without TESTING flag Implement dynamic cookie security that allows localhost HTTP connections to work out of the box while maintaining security for production: - Add WSGI middleware (SecureCookieMiddleware) for dynamic Secure flag - Localhost HTTP (127.0.0.1, ::1): No Secure flag (local traffic is safe) - Proxied requests (X-Forwarded-For): Always add Secure flag (production) - Non-localhost HTTP: Add Secure flag (requires HTTPS by design) - TESTING mode: Never add Secure flag (for CI/development) Security: Prevents X-Forwarded-For spoofing by checking for header presence rather than value - any proxy header triggers Secure flag. Also includes: - Update HTTP examples with clear "LOCALHOST ONLY" documentation - Add helpful CSRF error message explaining the security model - Add comprehensive cookie security tests (9 tests) - Add cookie security tests to CI workflow	2025-12-07 13:59:32 +01:00
LearningCircuit	7a73ee26b9	docs: fix incorrect API endpoint paths in documentation (#1210 ) Updates documentation and examples to use the correct API endpoints: - /api/start_research (was /research/api/start) - /api/research/{id}/status (was /research/api/research/{id}/status) - /api/report/{id} (was /research/api/research/{id}/result) - /api/terminate/{id} (was /research/api/research/{id}/terminate) Fixes #1205	2025-12-02 19:54:46 +00:00
LearningCircuit	bdcb934cbe	refactor: remove curl examples and improve HTTP API examples organization - Remove curl_examples.sh as authentication is too complex for simple curl commands - Move complex examples to advanced/ subfolder for better organization - Keep simple_working_example.py prominent as the recommended starting point - Add comprehensive CI test for HTTP examples - Update documentation to highlight the working example and learning path - Improve user experience by focusing on Python examples with automatic auth	2025-11-01 01:19:44 +01:00
LearningCircuit	ddcd962a7e	feat: enhance HTTP API examples with retry logic and automatic user creation Major improvements to HTTP API examples: - Add intelligent retry logic for fetching research results (up to 2 minutes) - Implement automatic user creation for out-of-the-box functionality - Fix API endpoint usage (/api/start_research instead of /research/api/start) - Add proper CSRF token handling and authentication flow - Create comprehensive documentation with environment variable configuration - Add progress monitoring and detailed status reporting - Include remote Ollama and SearXNG configuration examples - Provide multiple example scripts for different use cases - Use pathlib.Path instead of os.path for modern Python practice Examples now work completely out of the box without manual user setup and include proper error handling and user guidance throughout the process.	2025-10-31 23:48:01 +01:00
LearningCircuit	ccd809dbe3	fix: Correct API endpoint and authentication in examples and documentation Fixes critical issues with HTTP API documentation and examples that were causing authentication failures and "endpoint not found" errors for users. ## Changes Made ### 🔧 Fixed API Endpoint - Updated examples to use correct endpoint: `/api/start_research` - Previously examples used wrong endpoint: `/research/api/start` ### 🔐 Fixed Authentication Flow - Updated login examples to use form data (not JSON) - Added proper CSRF token handling for login - Fixed authentication flow to work with v2.0+ security ### 📚 Documentation Updates - Updated `examples/api_usage/README.md` with working example - Fixed `examples/api_usage/http/simple_http_example.py` - Added comprehensive `working_api_example.py` with proper error handling ### 🧪 Testing Tools Added - Created `tests/api_tests/test_research_api_debug.py` for debugging API issues - Added comprehensive test suite for authentication and API endpoints ## Impact This fixes the most common issue reported by users trying to use the HTTP API, where they get "Failed to start research" errors due to incorrect endpoint usage and authentication problems. ## Testing - ✅ Tested with fresh user registration and login - ✅ Verified correct API endpoint works properly - ✅ Confirmed authentication flow works end-to-end - ✅ Added comprehensive debugging tools for future issues Resolves user reports of API authentication failures and endpoint errors.	2025-10-31 22:58:27 +01:00
LearningCircuit	1677ba9c00	fix: Change research_id type hints from int to str Fix type hints in http_api_examples.py to use str instead of int for research_id parameters	2025-07-30 23:45:54 +02:00
LearningCircuit	62928db777	feat: Implement per-user encrypted databases with comprehensive security overhaul This major release introduces fundamental security and architectural improvements to Local Deep Research, transitioning from a single-user system to a secure multi-user platform with encrypted databases and proper authentication. ## 🔐 Security & Authentication - Per-user encrypted databases: Each user now has their own SQLCipher-encrypted database with AES-256 encryption, protecting API keys and research data - Mandatory authentication: All API endpoints and programmatic access now require user authentication - Session-based security: Implemented secure session management with CSRF protection for all state-changing operations - Password-based encryption: User passwords serve as database encryption keys (no recovery mechanism - intentional security feature) ## 🏗️ Architecture Changes - Thread-safe design: Complete overhaul of settings and database access to ensure thread safety across all operations - Settings snapshots: New immutable settings snapshot pattern prevents race conditions in concurrent operations - In-memory queue tracking: Replaced unencrypted service.db with memory-only queue tracking to eliminate PII storage risks - Optimized middleware: Reduced middleware overhead by 70% through intelligent request filtering and caching ## 📊 Database Structure - Migrated from single shared database to per-user encrypted databases - New models: User, UserSettings, UserActiveResearch, AuthSession - Removed global models that could leak data between users - All sensitive data (API keys, research history) now user-scoped ## 🧪 Testing & Quality - Added 200+ new tests covering authentication, encryption, and thread safety - New Puppeteer UI tests for end-to-end authentication flows - Comprehensive OpenAI API key configuration tests - LangChain integration tests for custom LLMs and retrievers - All tests updated to work with new authentication system ## 📚 Documentation - New migration guide for v0.x to v1.0 upgrade - SQLCipher installation guide for all platforms - Troubleshooting guide for OpenAI API configuration - Updated all examples to demonstrate authenticated usage - Comprehensive API documentation with authentication examples ## 🔧 Technical Implementation - SQLCipher integration with hex-encoded password handling - Thread-local session storage preventing cross-contamination - Context-aware database sessions with proper cleanup - Automatic session lifecycle management - Rate limiting now per-user instead of global ## 💥 Breaking Changes - All API access now requires authentication - Database structure completely changed (migration required) - Settings API redesigned for thread safety - Removed direct database access methods - Changed research ID type from integer to UUID ## 📦 Dependencies - Added: pysqlcipher3 for database encryption - Added: Additional auth-related dependencies - Updated: All major dependencies to latest versions ## 🚀 Performance Improvements - Middleware optimization reduces overhead by 70% - Cached settings reduce database queries by 90% - Thread-local sessions eliminate lock contention - Smarter request routing skips auth for static assets This release represents a complete security overhaul making LDR suitable for production multi-user deployments while maintaining full backward compatibility through migration guides and extensive documentation.	2025-07-03 02:17:44 +02:00
LearningCircuit	2eaaf12109	feat: Implement per-user encrypted databases with comprehensive auth system BREAKING CHANGE: Data files now stored in platform-specific user directories with SQLCipher encryption. Users must register/login to access the application. ## Major Features ### Security & Authentication - Implemented complete multi-user authentication system with Flask-Login - Per-user SQLCipher encrypted databases (falls back to SQLite with warnings) - Secure session management with proper CSRF protection - Password hashing with bcrypt for user credentials - Complete isolation between user data - no cross-user access possible - Thread-safe database connections with proper session management ### Database Architecture - Migrated from single shared database to per-user encrypted databases - Centralized auth database for user management - User-specific databases for research data, settings, and metrics - Automatic database initialization on user registration - Platform-specific data directories using platformdirs library - Removed all hardcoded paths and personal information ### User Experience - Registration page with data privacy acknowledgment - Login/logout functionality with session persistence - Automatic redirect to login for unauthenticated access - Research queue system with 3 concurrent research limit per user - Real-time queue position updates - Comprehensive error handling with user-friendly messages ### API & Routes - All API endpoints now require authentication - Updated routes: /auth/register, /auth/login, /auth/logout, /auth/check - Protected research submission and history endpoints - Proper JSON error responses for API routes - CSRF token validation for state-changing operations ### Testing - Added 53 Puppeteer tests for UI authentication flows - Comprehensive auth integration tests (248 Python test files) - Multi-user concurrent access testing - Queue system testing with position tracking - Database migration and encryption tests ### Configuration - Single LDR_DATA_DIR environment variable for data location - LDR_ALLOW_UNENCRYPTED environment variable for development - Updated Docker configuration for proper volume mounting - Removed multiple environment variables for simplicity ### Documentation - Added DATA_MIGRATION_GUIDE.md for upgrade instructions - Added SQLCIPHER_INSTALL.md for encryption setup - Updated environment configuration documentation - Professional error messages throughout ## Technical Improvements - Replaced raw SQL with SQLAlchemy ORM throughout - Proper database session management with context managers - Thread-local storage for database connections - Automatic cleanup of stale sessions - Rate limiting infrastructure for future use - Comprehensive logging with loguru ## Files Changed - 322 files modified/added - 248 Python files (core functionality and tests) - 53 JavaScript files (Puppeteer tests) - 6 Markdown files (documentation) - No binary files, screenshots, or database files included - All test credentials properly marked with pragma comments This migration ensures each user's research data is completely isolated and encrypted, providing enterprise-grade security for sensitive research operations.	2025-06-29 11:32:48 +02:00
LearningCircuit	d8d982d338	Feature/langchain retriever integration (#502 ) * feat: Add LangChain retriever integration for vector store support - Add RetrieverRegistry for dynamic retriever registration - Create RetrieverSearchEngine wrapper for LangChain BaseRetriever - Integrate retrievers with search factory and config system - Add retrievers parameter to all API functions - Include comprehensive test suite and examples - Support thread-safe operations and multiple retrievers This allows users to pass any LangChain retriever (FAISS, Pinecone, Vertex AI, etc.) to LDR and use it as a search engine seamlessly. * refactor: Organize API examples into structured folders - Create api_usage/ directory with programmatic/ and http/ subdirectories - Move existing examples to appropriate folders - Add comprehensive HTTP API examples (simple and advanced) - Add curl examples for command-line usage - Add simple programmatic example for quick start - Include README explaining when to use each API type * chore: Remove old example files from root examples directory Files have been moved to examples/api_usage/programmatic/ * fix: Address PR review comments - Replace logger.error with logger.exception for better error tracking - Default retriever name to class name if not provided	2025-06-19 08:44:21 -04:00

13 Commits