local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-15 19:46:56 +03:00

Author	SHA1	Message	Date
LearningCircuit	83f632e069	fix: treat empty environment variables as unset to fix provider selection (#3362 ) * fix: treat empty environment variables as unset to fix provider selection When deploying via Docker/Unraid templates, all environment variables are created even when left blank (e.g. LDR_LLM_ANTHROPIC_API_KEY=""). The check_env_setting() function previously treated these empty strings as valid overrides, which caused provider settings to be blanked out and prevented proper provider selection on fresh installs. Empty env vars are now treated as unset, allowing database defaults to take effect normally. Fixes #3339 * fix(tests): update test to match empty env var behavior Update test_env_override_empty_string to assert that empty environment variables are treated as unset (returning DB value) rather than overriding with empty string. This aligns with the fix for #3339. * docs: add ecosystem context for empty env var handling decision Document that treating empty environment variables as unset is standard practice across major projects (botocore, viper, Turborepo, Go stdlib, Docker Compose) with references to the PR discussion. * feat: add warning log for empty env vars, fix references, add tests and docs - Log warning when empty env vars are detected (helps users diagnose Unraid/Docker template issues) - Replace misleading viper/Docker Compose references with CPython official docs and Pallets/Click PR #2223 - Add unit tests: empty string returns None, warning is logged, provider/model/multiple keys handled - Add integration tests: empty string with no DB value, checkbox, number settings - Document empty env var behavior in unraid.md, docker-compose-guide.md, and env_configuration.md * docs: recommend DISABLED instead of Web UI for blocking settings Users can set env vars to a non-empty invalid value like "DISABLED" to explicitly block a key, which is simpler than navigating the UI.	2026-04-05 12:19:44 +02:00
LearningCircuit	add97b1793	docs: polish installation docs after migration (#2889 ) * docs: move detailed installation instructions from README to dedicated pages README Installation Options section (~200 lines) replaced with a compact table linking to docs/installation.md (hub page), docs/install-pip.md (dedicated pip guide), and existing docker-compose and Unraid guides. No content lost — everything is now in focused doc files. * docs: trim redundant pip section in installation hub page The pip section in docs/installation.md duplicated nearly all of the Quick Install content from docs/install-pip.md. Replace with a brief summary + single install command + link to the dedicated guide, consistent with the hub-and-spoke pattern used by the Unraid section. Addresses review feedback from djpetti on PR #2819. * docs: restore missing installation info from README migration - Add NVIDIA Container Toolkit full install commands (Ubuntu/Debian) with distro note for RHEL/Fedora/Arch to docs/installation.md - Add GPU docker-compose alias convenience tip - Add DIY docker-compose configuration guidance (GPU driver, context length, keep alive, model selection) - Add Windows PDF export warning (Pango/WeasyPrint) to docs/install-pip.md - Fix SQLCipher wording: pre-built wheels available, not "requires system-level libraries" - Restore ldr-web command instead of python -m invocation * docs: follow-up polish for installation docs migration - Restructure README Quick Start with clear Option 1/2/3 labels - Update deprecated LDR_ALLOW_UNENCRYPTED to LDR_BOOTSTRAP_ALLOW_UNENCRYPTED - Add "Open http://localhost:5000" to install-pip.md after ldr-web step - Add back-link from install-pip.md to installation overview - Add Docker/Docker Compose install prerequisite links to installation.md - Cross-link NVIDIA toolkit commands from docker-compose-guide to installation.md - Use double quotes for volume spec in Docker Run for cross-platform compat * docs: restore original Quick Start ordering (Docker Run first)	2026-03-20 11:26:48 +01:00
LearningCircuit	abbd19584a	docs: move detailed install instructions from README to dedicated pages (#2819 ) * docs: move detailed installation instructions from README to dedicated pages README Installation Options section (~200 lines) replaced with a compact table linking to docs/installation.md (hub page), docs/install-pip.md (dedicated pip guide), and existing docker-compose and Unraid guides. No content lost — everything is now in focused doc files. * docs: trim redundant pip section in installation hub page The pip section in docs/installation.md duplicated nearly all of the Quick Install content from docs/install-pip.md. Replace with a brief summary + single install command + link to the dedicated guide, consistent with the hub-and-spoke pattern used by the Unraid section. Addresses review feedback from djpetti on PR #2819. * docs: restore missing installation info from README migration - Add NVIDIA Container Toolkit full install commands (Ubuntu/Debian) with distro note for RHEL/Fedora/Arch to docs/installation.md - Add GPU docker-compose alias convenience tip - Add DIY docker-compose configuration guidance (GPU driver, context length, keep alive, model selection) - Add Windows PDF export warning (Pango/WeasyPrint) to docs/install-pip.md - Fix SQLCipher wording: pre-built wheels available, not "requires system-level libraries" - Restore ldr-web command instead of python -m invocation	2026-03-20 10:46:32 +01:00
LearningCircuit	0b23d58e85	docs: thread lifecycle, FD budget, and resource exhaustion (#2605 ) * fix: prevent file descriptor exhaustion from dead thread engine accumulation Three root causes addressed: 1. Dead thread engine accumulation (primary): _thread_engines grows unboundedly as crashed/terminated threads leave orphaned NullPool engines. Add cleanup_dead_thread_engines() that sweeps entries for threads no longer in threading.enumerate(). Integrate via throttled sweep in teardown_appcontext (every 60s) and periodic sweep in the queue processor loop (every 6 iterations). 2. Generic downloader stream=True leak (secondary): generic.py used stream=True but never read or closed the response body, holding connections open. Removed stream=True since only status_code and headers are inspected. 3. Docker default 1024 FD limit (contributing): Add nofile ulimit (65536) to docker-compose.yml so the container has headroom for WAL mode databases, thread pools, and connection pools. * fix: address review findings — sweep lock, credential cleanup, flaky test - Add _sweep_lock to prevent TOCTOU race on _last_sweep_time in maybe_sweep_dead_engines() (concurrent teardowns could all pass the interval check) - Move alive_ids computation inside _thread_engine_lock to prevent race between snapshot and engine dict mutation - Sweep dead _thread_credentials (plaintext passwords) alongside engines in processor_v2.py and app_factory.py teardown - Fix flaky test_sweeps_after_interval: replace time.sleep(0.15) with _last_sweep_time backdating - Add tests for credential sweep and module-level cleanup_dead_threads() * fix: close search engine sessions after research, fix stream=True leak properly Three improvements to the FD exhaustion fix: 1. generic.py: Restore stream=True (removing it is unsafe — GenericDownloader handles ALL URLs and would download multi-GB files into memory). Use context manager instead to ensure the streamed connection is properly closed on all return paths, preventing socket FD leaks. 2. research_service.py: Add use_search.close() and system.close() in finally block of run_research_process(). Search engine HTTP sessions (e.g. SemanticScholar's SafeSession) were never explicitly closed after research, relying on non-deterministic GC for cleanup. 3. search_system.py + strategies: Add close() method to AdvancedSearchSystem and BaseSearchStrategy, with overrides in ConstraintParallelStrategy and ConcurrentDualConfidenceStrategy to shut down persistent ThreadPoolExecutors. Also adds detailed design comments throughout the codebase documenting: - Why NullPool engines don't leak FDs (memory leak only) - Why stream=True must NOT be removed from the diagnostic block - The dual sweep trigger architecture (request-driven + queue-driven) - Thread ID recycling limitations - Search engine lifecycle and cleanup responsibilities Fixes flaky test_removes_dead_thread_entries by using threading.Barrier to prevent thread ID recycling during test. * fix: unregister user from news scheduler on logout The logout handler never called scheduler.unregister_user(), causing: - Passwords to persist in scheduler memory for up to 48 hours - Orphaned APScheduler jobs to keep running after logout - Orphaned jobs to re-create QueuePool engines (~10 FDs each) after close_user_database() disposed the original, contributing to FD leaks Add scheduler unregistration before close_user_database() so running jobs can finish gracefully while the DB engine is still available. Add design comment documenting the logout cleanup order. * test: remove ineffective patch in logout scheduler test The `routes.get_news_scheduler` patch was ineffective because the logout handler imports `get_news_scheduler` dynamically inside the function body, so the name never enters the routes module namespace. The `create=True` flag masked this by silently creating a new attribute. The real patch on `subscription_manager.scheduler.get_news_scheduler` is sufficient. * fix: remove nofile ulimit override from docker-compose.yml Docker containers inherit ulimits from the Docker daemon, which typically runs with LimitNOFILE=infinity (1073741816+). Setting nofile to 65536 could actually lower the limit for most users, hurting large installations. The FD leak root causes are already fixed in this PR (dead-thread engine sweep, session close, scheduler unregister), so the safety net is unnecessary. Let users and their Docker daemon config control this. * fix: add try-except to strategy executor shutdown, elevate scheduler unregister log level - Wrap executor.shutdown(wait=False) in try-except in strategy close() methods for consistency with parallel_search_engine.py pattern - Change logger.debug → logger.warning for scheduler unregister failure on logout, since failure means password stays in scheduler memory * docs: add comments explaining non-obvious design decisions from deep review - SQLCipher WAL FD cost (1-3 FDs per connection, multiplied by users) - Logout cleanup ordering: why unregister before close, known race window - shutdown(wait=False): why non-blocking, safety via double-cleanup pattern * docs: add thread lifecycle, FD budget, and resource exhaustion documentation Knowledge captured from PR #2591 deep review (5 rounds of verification): - architecture.md: Thread & Resource Lifecycle section with cleanup layers, mermaid diagram, FD budget table, and key files reference - troubleshooting.md: Resource Exhaustion section with diagnosis commands and solutions for FD exhaustion - docker-compose-guide.md: Resource Limits note explaining nofile/memlock - web/database/README.md: Thread Safety & Connection Model section - Cross-references added between all 4 docs - Updated Areas for Improvement (container optimization → resource observability) - Added encrypted_db.py and thread_local_session.py to Key Source Files	2026-03-08 16:22:17 +01:00
LearningCircuit	33119ae2a4	refactor: remove deprecated settings-based local search engines (#2344 ) * refactor: remove deprecated settings-based local search engines The old settings-based local engines (research_papers, project_docs, personal_notes, local_all) are fully superseded by the database-backed Collection system with CollectionSearchEngine and LibraryRAGSearchEngine. - Delete LocalAllSearchEngine and LocalSearchEngine classes - Remove 58 settings entries from default_settings.json - Remove local engine registration from search_engines_config.py - Remove local_search_engines() function - Clean up LocalEmbeddingManager: remove 14 dead methods and unused attrs - Remove Docker volume mounts for local_collections - Update security whitelist, rate limiter, bearer config - Remove dead force_reindex code path in research_functions.py - Update docs to reference Collections UI - Remove/update all associated tests - Regenerate golden master settings * fix: address review comments from djpetti - Revert unintentional formatting change in theme options (keep compact inline format) - Restore unicode arrow character (→) that was escaped to \u2192 by JSON serializer - Rename search_engine_local.py → local_embedding_manager.py since it only contains LocalEmbeddingManager now (no search engines) - Remove unused chunk_size, chunk_overlap, cache_dir params from LocalEmbeddingManager - Update all imports and references across codebase	2026-02-28 16:00:13 +01:00
LearningCircuit	890c84e534	docs: link auto-generated Configuration Reference across docs & fix stale env var docs (#2472 ) - Add "Config Reference" link to Settings page "Learn & Get Help" bar - Overhaul docs/env_configuration.md: remove stale Dynaconf references, fix wrong double-underscore env var format, remove documented-as-fixed bug, replace duplicate tables with links to CONFIGURATION.md - Fix broken case-sensitive link in docs/deployment/unraid.md - Add CONFIGURATION.md cross-references to 12 docs' "See Also" sections - Update .env.template with correct LDR_-prefixed variable names - Add config reference comment to docker-compose.yml environment block	2026-02-28 13:46:34 +01:00
LearningCircuit	f24feefc86	security: make allow_registrations env-var-only (#2164 ) * security: make allow_registrations non-editable, env-var-only Set editable=false and visible=false so users cannot toggle registration through the UI. The setting is still controllable via the LDR_APP_ALLOW_REGISTRATIONS environment variable. * update description and keep visible for user awareness * security: enforce editable flag on bulk settings save endpoints The `editable: false` flag was only checked by the individual PUT /settings/api/<key> endpoint. Both bulk save endpoints (save_all_settings and save_settings) bypassed it, allowing any authenticated user to modify non-editable settings like allow_registrations via crafted requests. Add editable filtering to both bulk endpoints so non-editable settings are silently skipped (with a warning log), matching the UI behavior where these fields are never rendered as inputs. * security: harden registration protection — DELETE check, env var warning, docs Three follow-up hardening fixes from security review: 1. Add editable check to DELETE /settings/api/<key> endpoint — previously only checked is_blocked_setting(), allowing deletion of non-editable settings which could reset them to permissive defaults. 2. Warn on unrecognized LDR_APP_ALLOW_REGISTRATIONS values — parse_boolean uses HTML checkbox semantics where any non-empty non-falsy string is True. Values like "disabled" or "none" silently enable registrations. Now logs a clear warning with accepted values. 3. Document LDR_APP_ALLOW_REGISTRATIONS in docker-compose.yml and the Docker Compose guide so operators deploying publicly can discover it.	2026-02-14 02:46:00 +00:00
LearningCircuit	7a41cd28d7	docs: fix inaccuracies in docker-compose-guide.md (#1868 ) * docs: fix inaccuracies in docker-compose-guide.md - Use LDR_LLM_MODEL instead of non-existent MODEL env var - Add /v1 suffix to LM Studio URL for OpenAI-compatible API - Remove port 8080 from troubleshooting (SearXNG is internal-only) * docs: add warning about env variable hard overrides Environment variables cause settings to become read-only in the UI. Users should prefer the web UI for settings they may want to change.	2026-02-01 01:34:28 +00:00
LearningCircuit	18de862c45	docs: add missing docker-compose-guide.md (#1863 ) Create the Docker Compose guide that was referenced in README.md but didn't exist. The guide consolidates Docker Compose setup information including quick start commands, configuration options, Cookie Cutter approach, and troubleshooting tips. Fixes #1817	2026-01-31 19:18:48 -05:00
Daniel Petti	30e1dddcdc	Delete some outdated docs and link to the wiki.	2025-05-10 12:58:36 -04:00
LearningCircuit	ad3dbd39d3	added docker-compose.yml	2025-03-26 01:08:09 +01:00

11 Commits