* feat: add needs_reranking flag to auto-enable LLM relevance filtering for keyword-based engines Engines with poor native relevance ranking (arXiv, PubMed, Wikipedia, GitHub, Mojeek, etc.) now auto-enable LLM-based result filtering via a new `needs_reranking` class attribute. This fixes the priority bug where the global `skip_relevance_filter=True` incorrectly overrode auto-detection for engines that genuinely need filtering. Priority is now: per-engine setting > needs_reranking > global skip. The global skip only affects unclassified engines. Closes #2297 * fix: address 7 code-review issues on needs_reranking branch 1. Rename needs_reranking → needs_llm_relevance_filter for consistency with enable_llm_relevance_filter and skip_relevance_filter naming 2. Fix Paperless dead code: replace non-existent _apply_content_filters with proper _filter_for_relevance() call in custom run() override 3. Fix misleading skip_relevance_filter description to accurately reflect checkbox behavior and keyword engine exceptions 4. Delete 4 vacuously-true inline tests that duplicated factory logic instead of calling the real factory (coverage tests already exist) 5. Add needs_llm_relevance_filter to EXTENDING.md and OVERVIEW.md 6. Clarify is_generic comment: generic does not imply good ranking 7. Upgrade no-LLM log from debug to warning when filtering was requested but no LLM is available (with should_filter guard) * fix: remove Paperless fallback that overrode valid empty LLM filter results Replace the fallback that restored all previews when the LLM filter returned empty with an info log. The base class _filter_for_relevance() already handles errors internally (returns previews[:5] on exception or JSON parse failure). An empty result means the LLM legitimately found nothing relevant — trust it, don't override it. * refactor: rename needs_llm_relevance_filter → is_lexical The flag describes what the engine IS (lexical/keyword-based search) rather than what it needs. This is a general classification that can drive multiple behaviors beyond just the relevance filter — e.g. query optimization strategies, result deduplication, or UI hints. Matches the existing is_* naming pattern (is_scientific, is_generic). * Revert "refactor: rename needs_llm_relevance_filter → is_lexical" This reverts commitc322d478a1. * Reapply "refactor: rename needs_llm_relevance_filter → is_lexical" This reverts commit853dfe90bd. * feat: add is_lexical classification flag alongside needs_llm_relevance_filter Separates classification from behavior: - is_lexical: informational flag indicating the engine uses keyword/lexical search. Reusable for query optimization, UI hints, deduplication, etc. - needs_llm_relevance_filter: behavioral flag that the factory reads to auto-enable LLM relevance filtering on the engine instance. Both flags are set on all 15 keyword-based engines. The factory only checks needs_llm_relevance_filter for filtering decisions. * fix: improve relevance filter error handling and logging - Return [] on all error paths instead of hiding failures behind previews[:5] fallback — failures should be visible, not masked - Log errors at error level (not warning) for LLM parse failures - Add engine name prefix to all log messages for traceability - Add token estimate debug log to help diagnose context overflow - Reduce log noise: routine operations are debug, only summary is info - Consolidate validation into single check * fix: address PR review findings for relevance filter - Fix literal \n in EXTENDING.md code block - Remove 'Maximum results to return' from LLM prompt (LLM decides) - Add INPUT/KEPT/REMOVED debug logging for filter quality analysis - Add is_lexical + needs_llm_relevance_filter to ElasticsearchSearchEngine - Delete vacuously-true test_missing_llm_returns_none test - Downgrade no-op skip_relevance_filter log from info to debug * refactor: extract relevance filter into dedicated module Pull the inline _filter_for_relevance() logic out of BaseSearchEngine into a new web_search_engines/relevance_filter.py module. - Use with_structured_output() with Pydantic schema; let LangChain pick the per-provider default method (JSON schema on Ollama, tool-calling on Anthropic, responseSchema on Gemini). - Trim prompt: drop URLs, cap snippets at 200 chars. - Suppress reasoning on Ollama thinking-by-default models via reasoning=False — saves 30-60s per call on qwen3 dense variants. - Treat empty LLM responses as valid judgments; log a warning on batches >2 so users notice a misbehaving model. - On exception or parse failure, return first N previews (cap=5 or max_filtered_results) to avoid overwhelming downstream. * refactor(relevance_filter): cleanup + add direct tests * feat(relevance_filter): batch previews in parallel for speed and reliability Adds two tunable parameters to the LLM relevance filter: - batch_size: split previews into chunks before sending to the LLM. Each batch uses local indices [0..batch_size-1] mapped back to global. Default 10. Smaller batches are faster per call AND more reliable on weaker models that struggle with many indices in one context. - max_parallel_batches: dispatch batches concurrently via a ThreadPoolExecutor. Default 4. Result order is preserved across parallel batches. Both exposed as BaseSearchEngine class attributes (relevance_filter_batch_size, relevance_filter_max_parallel_batches) so individual engines can override. Failure semantics: - Hard exception on any batch -> capped slice fallback (unchanged). - Parse failure on a single batch -> skip that batch only, keep results from successful batches. Adds 4 direct unit tests covering chunk/index mapping, batch_size=None single-call mode, failed-batch-skip-keeps-others, and parallel dispatch order preservation. All 120 tests pass. * refactor(relevance_filter): drop structured output, parse plain text The Pydantic with_structured_output() path had several issues: - qwen3 dense models returned prose instead of JSON, raising OutputParserException and disabling the filter for that call - grammar-constrained output on Ollama was 6-10x slower than plain text generation (~24s vs ~4s for 50 previews) - per-provider quirks (function_calling latency, schema bikeshedding) Switch to plain llm.invoke() and parse integers from the response with a tightened regex (word-boundary, no decimal fractions). The prompt now instructs the model to output ONLY the indices, which combined with the regex is robust against prose-injection of small numbers. Removes RelevanceResult Pydantic class, _invoke_structured, the _BATCH_FAILED_PARSE sentinel, and the "all batches failed" branch (all dead under the new contract). Updates tests to mock llm.invoke directly. Tightens default batch_size to 5 and parallel batches to 10 based on benchmark runs against Ollama. * docs: fix stale _filter_for_relevance docstring after text-parsing rewrite
13 KiB
Architecture Overview
This document provides a comprehensive overview of Local Deep Research's system architecture.
Table of Contents
- System Components
- Entry Points
- Research Execution Flow
- Module Responsibilities
- Threading Model
- Configuration System
- Key Interfaces
System Components
graph TB
subgraph "Entry Points"
CLI[ldr CLI]
WEB[ldr-web Flask App]
API[REST API /api/v1]
end
subgraph "Core Research Engine"
SS[SearchSystem<br/>search_system.py]
SSF[StrategyFactory<br/>search_system_factory.py]
RG[ReportGenerator<br/>report_generator.py]
end
subgraph "Search Layer"
STRAT[32 Search Strategies<br/>advanced_search_system/strategies/]
ENG[30+ Search Engines<br/>web_search_engines/engines/]
RL[Rate Limiter<br/>rate_limiting/]
end
subgraph "Data Layer"
DB[(SQLCipher DB<br/>Per-User Encrypted)]
MODELS[20+ ORM Models<br/>database/models/]
CACHE[Memory Cache]
end
subgraph "LLM Layer"
PROV[LLM Providers<br/>Ollama, OpenAI, etc.]
EMB[Embeddings<br/>embeddings/]
RERANK[Reranker<br/>reranker/]
end
CLI --> SS
WEB --> SS
API --> SS
SS --> SSF
SS --> RG
SSF --> STRAT
STRAT --> ENG
ENG --> RL
SS --> PROV
SS --> DB
MODELS --> DB
RG --> PROV
ENG --> CACHE
Entry Points
Web Application (ldr-web)
Location: src/local_deep_research/web/app.py
The primary user interface. Launches a Flask server with SocketIO for real-time updates.
graph LR
A[Browser] -->|HTTP/WS| B[Flask App]
B --> C[Blueprints]
C --> D[research_routes]
C --> E[api_routes]
C --> F[settings_routes]
C --> G[auth_routes]
B -->|Real-time| H[SocketIO]
Key files:
web/app.py- Main entry, starts serverweb/app_factory.py- Flask app creation with middlewareweb/routes/- Blueprint route handlersweb/services/- Business logic services
CLI (ldr)
Location: src/local_deep_research/main.py
Command-line interface for headless research operations.
REST API (/api/v1)
Location: src/local_deep_research/web/api.py
Programmatic access for integrations.
| Endpoint | Method | Purpose |
|---|---|---|
/api/v1/quick_summary |
POST | Quick research summary |
/api/v1/generate_report |
POST | Full research report |
/api/v1/analyze_documents |
POST | Search local collections |
/api/v1/health |
GET | Health check |
Research Execution Flow
sequenceDiagram
participant User
participant Web as Flask App
participant SS as SearchSystem
participant SF as StrategyFactory
participant Strat as Strategy
participant Eng as SearchEngine
participant LLM as LLM Provider
participant DB as Database
User->>Web: Submit Query
Web->>SS: start_research()
SS->>SF: create_strategy(name)
SF-->>SS: Strategy instance
loop Research Iterations
SS->>Strat: analyze_topic(query)
Strat->>LLM: Generate questions
LLM-->>Strat: Questions
loop Per Question
Strat->>Eng: search(question)
Eng-->>Strat: Results
end
Strat->>LLM: Synthesize findings
LLM-->>Strat: Synthesis
Strat-->>SS: Findings
end
SS->>DB: Save results
SS-->>Web: Research complete
Web-->>User: Results via SocketIO
Research Status Lifecycle
stateDiagram-v2
[*] --> QUEUED : Concurrency limit reached
[*] --> IN_PROGRESS : Slots available
QUEUED --> IN_PROGRESS : Worker picks up task
QUEUED --> SUSPENDED : User terminates
IN_PROGRESS --> COMPLETED : Research succeeds
IN_PROGRESS --> FAILED : Unrecoverable error
IN_PROGRESS --> SUSPENDED : User terminates
COMPLETED --> [*]
FAILED --> [*]
SUSPENDED --> [*]
note right of FAILED
Set by processor_v2 and
research_service on errors
end note
note left of SUSPENDED
Set when user clicks
terminate/stop
end note
Unused statuses:
PENDING(declared as a model default but never set by any creation path),ERROR(never set; predatesFAILED), andCANCELLED(unused by research; used by benchmarks) exist inResearchStatusfor backward compatibility.
Module Responsibilities
Core Modules
| Module | Location | Responsibility |
|---|---|---|
| SearchSystem | search_system.py |
Orchestrates research, coordinates strategies and engines |
| StrategyFactory | search_system_factory.py |
Creates strategy instances based on configuration |
| ReportGenerator | report_generator.py |
Generates structured reports from research findings |
| CitationHandler | citation_handler.py |
Processes and validates citations |
Search System
| Module | Location | Responsibility |
|---|---|---|
| BaseSearchEngine | web_search_engines/search_engine_base.py |
Abstract base for all search engines |
| SearchEngineFactory | web_search_engines/search_engine_factory.py |
Creates engine instances |
| RateLimitTracker | web_search_engines/rate_limiting/tracker.py |
Adaptive rate limiting |
| RetrieverRegistry | web_search_engines/retriever_registry.py |
LangChain retriever integration |
Strategy System
| Module | Location | Responsibility |
|---|---|---|
| BaseSearchStrategy | advanced_search_system/strategies/base_strategy.py |
Abstract base for strategies |
| FindingsRepository | advanced_search_system/findings/ |
Accumulates research findings |
| QuestionGenerator | advanced_search_system/questions/ |
Generates research questions |
Web Application
| Module | Location | Responsibility |
|---|---|---|
| SocketIOService | web/services/socket_service.py |
Real-time communication |
| ResearchService | web/services/research_service.py |
Research execution |
| QueueManager | web/queue/ |
Background task queue |
| SessionManager | web/auth/session_manager.py |
User session handling |
Data Layer
| Module | Location | Responsibility |
|---|---|---|
| Models | database/models/ |
SQLAlchemy ORM models |
| SessionContext | database/session_context.py |
Thread-safe DB sessions |
| EncryptedDB | database/encrypted_db.py |
SQLCipher integration |
LLM Integration
| Module | Location | Responsibility |
|---|---|---|
| LLM Providers | llm/providers/implementations/ |
Provider-specific LLM wrappers |
| AutoDiscovery | llm/providers/auto_discovery.py |
Dynamic provider detection |
| LLMRegistry | llm/llm_registry.py |
Custom LLM registration |
Threading Model
graph TB
subgraph "Main Thread"
FLASK[Flask Server]
SOCKETIO[SocketIO Handler]
end
subgraph "Research Threads"
RT1[Research Thread 1]
RT2[Research Thread 2]
RTN[Research Thread N]
end
subgraph "Queue Processor"
QP[Queue Processor Thread]
end
subgraph "Thread-Local Storage"
TC[Thread Context<br/>Settings Snapshot]
DBS[DB Session<br/>Per-User]
end
FLASK --> RT1
FLASK --> RT2
FLASK --> RTN
RT1 --> TC
RT2 --> TC
RTN --> TC
QP --> RT1
RT1 -.-> SOCKETIO
RT2 -.-> SOCKETIO
Key Threading Concepts:
-
Thread Context (
config/thread_settings.py)- Each research thread has its own settings snapshot
- Prevents race conditions on configuration changes
-
Per-User DB Sessions (
database/session_context.py)- Each user has an isolated SQLCipher database
- Sessions are thread-local via context manager
-
Queue Processing (
web/queue/)- Background queue for long-running research
- Processes items from
QueuedResearchtable
-
SocketIO Updates
- Research threads emit progress via SocketIO
- Uses threading async mode (not asyncio)
Configuration System
graph TB
subgraph "Configuration Sources"
ENV[Environment Variables]
DB[(Database Settings<br/>per-user)]
JSON[Default Settings<br/>default_settings.json]
end
subgraph "Settings Management"
SM[SettingsManager<br/>manager.py]
end
subgraph "Runtime Access"
SNAP[Settings Snapshot<br/>Thread-safe copy]
TC[Thread Context<br/>thread_settings.py]
end
ENV --> SM
JSON --> SM
DB --> SM
SM --> SNAP
SNAP --> TC
Configuration Flow:
- Defaults - Default settings loaded from JSON
- Environment - Environment variables override defaults
- Database - User settings loaded from encrypted per-user DB
- Snapshot - Thread-safe copy created for each research
- Access - Code reads from snapshot via thread context
Key Settings Categories:
| Category | Examples |
|---|---|
llm.* |
Provider, model, temperature, API keys |
search.* |
Engine selection, max results, rate limits |
app.* |
Debug mode, logging, UI preferences |
notifications.* |
Email, webhook configurations |
Key Interfaces
Search Engine Interface
All search engines implement BaseSearchEngine:
class BaseSearchEngine(ABC):
# Classification flags
is_public: bool = True
is_generic: bool = True
is_scientific: bool = False
is_local: bool = False
is_news: bool = False
is_code: bool = False
is_lexical: bool = False
needs_llm_relevance_filter: bool = False
@abstractmethod
def run(self, query: str) -> List[Dict[str, Any]]:
"""Execute search and return results."""
# Returns: [{"title": ..., "link": ..., "snippet": ...}]
Strategy Interface
All strategies implement BaseSearchStrategy:
class BaseSearchStrategy(ABC):
def __init__(self, search, model, all_links_of_system,
settings_snapshot, **kwargs):
...
@abstractmethod
def analyze_topic(self, query: str) -> Dict:
"""Execute research strategy."""
# Returns: {
# "findings": [...],
# "iterations": int,
# "questions": {...},
# "formatted_findings": str,
# "current_knowledge": {...}
# }
LLM Provider Interface
All providers extend OpenAICompatibleProvider:
class OpenAICompatibleProvider:
provider_name: str
api_key_setting: str
url_setting: str
default_base_url: str
default_model: str
@classmethod
def create_llm(cls, model_name, temperature, **kwargs) -> BaseChatModel:
"""Create LangChain LLM instance."""
Directory Structure
src/local_deep_research/
├── search_system.py # Main orchestrator
├── search_system_factory.py # Strategy factory
├── report_generator.py # Report generation
├── citation_handler.py # Citation processing
│
├── web/ # Flask application
│ ├── app.py # Entry point
│ ├── app_factory.py # App creation
│ ├── routes/ # Blueprint handlers
│ ├── services/ # Business logic
│ ├── queue/ # Task queue
│ └── auth/ # Authentication
│
├── advanced_search_system/ # Search strategies
│ ├── strategies/ # 32 strategy implementations
│ ├── questions/ # Question generation
│ ├── findings/ # Findings management
│ └── ...
│
├── web_search_engines/ # Search engines
│ ├── engines/ # 30+ engine implementations
│ ├── search_engine_base.py # Abstract base
│ ├── search_engine_factory.py
│ └── rate_limiting/ # Adaptive rate limiting
│
├── database/ # Data layer
│ ├── models/ # 20+ ORM models
│ ├── session_context.py # Session management
│ └── encrypted_db.py # SQLCipher
│
├── llm/ # LLM integration
│ ├── providers/ # Provider implementations
│ └── llm_registry.py # Custom LLM registration
│
├── config/ # Configuration
│ ├── llm_config.py # LLM setup
│ ├── search_config.py # Search setup
│ └── thread_settings.py # Thread context
│
├── settings/ # Settings management
│ └── manager.py # SettingsManager
│
└── api/ # Programmatic API
├── client.py # HTTP client
└── research_functions.py # Direct functions
See Also
- Database Schema - Detailed data model documentation
- Semantic Search - Indexing pipeline, search modes, and three-tier merge algorithm
- Extension Guide - How to add custom components
- Troubleshooting - Common issues and solutions