Files
local-deep-research/docs/architecture/OVERVIEW.md
LearningCircuit 061cd83dd4 feat: add is_lexical flag to auto-enable LLM relevance filtering for keyword-based engines (#3403)
* feat: add needs_reranking flag to auto-enable LLM relevance filtering for keyword-based engines

Engines with poor native relevance ranking (arXiv, PubMed, Wikipedia,
GitHub, Mojeek, etc.) now auto-enable LLM-based result filtering via
a new `needs_reranking` class attribute. This fixes the priority bug
where the global `skip_relevance_filter=True` incorrectly overrode
auto-detection for engines that genuinely need filtering.

Priority is now: per-engine setting > needs_reranking > global skip.
The global skip only affects unclassified engines.

Closes #2297

* fix: address 7 code-review issues on needs_reranking branch

1. Rename needs_reranking → needs_llm_relevance_filter for consistency
   with enable_llm_relevance_filter and skip_relevance_filter naming
2. Fix Paperless dead code: replace non-existent _apply_content_filters
   with proper _filter_for_relevance() call in custom run() override
3. Fix misleading skip_relevance_filter description to accurately
   reflect checkbox behavior and keyword engine exceptions
4. Delete 4 vacuously-true inline tests that duplicated factory logic
   instead of calling the real factory (coverage tests already exist)
5. Add needs_llm_relevance_filter to EXTENDING.md and OVERVIEW.md
6. Clarify is_generic comment: generic does not imply good ranking
7. Upgrade no-LLM log from debug to warning when filtering was
   requested but no LLM is available (with should_filter guard)

* fix: remove Paperless fallback that overrode valid empty LLM filter results

Replace the fallback that restored all previews when the LLM filter
returned empty with an info log. The base class _filter_for_relevance()
already handles errors internally (returns previews[:5] on exception
or JSON parse failure). An empty result means the LLM legitimately
found nothing relevant — trust it, don't override it.

* refactor: rename needs_llm_relevance_filter → is_lexical

The flag describes what the engine IS (lexical/keyword-based search)
rather than what it needs. This is a general classification that can
drive multiple behaviors beyond just the relevance filter — e.g.
query optimization strategies, result deduplication, or UI hints.
Matches the existing is_* naming pattern (is_scientific, is_generic).

* Revert "refactor: rename needs_llm_relevance_filter → is_lexical"

This reverts commit c322d478a1.

* Reapply "refactor: rename needs_llm_relevance_filter → is_lexical"

This reverts commit 853dfe90bd.

* feat: add is_lexical classification flag alongside needs_llm_relevance_filter

Separates classification from behavior:
- is_lexical: informational flag indicating the engine uses keyword/lexical
  search. Reusable for query optimization, UI hints, deduplication, etc.
- needs_llm_relevance_filter: behavioral flag that the factory reads to
  auto-enable LLM relevance filtering on the engine instance.

Both flags are set on all 15 keyword-based engines. The factory only
checks needs_llm_relevance_filter for filtering decisions.

* fix: improve relevance filter error handling and logging

- Return [] on all error paths instead of hiding failures behind
  previews[:5] fallback — failures should be visible, not masked
- Log errors at error level (not warning) for LLM parse failures
- Add engine name prefix to all log messages for traceability
- Add token estimate debug log to help diagnose context overflow
- Reduce log noise: routine operations are debug, only summary is info
- Consolidate validation into single check

* fix: address PR review findings for relevance filter

- Fix literal \n in EXTENDING.md code block
- Remove 'Maximum results to return' from LLM prompt (LLM decides)
- Add INPUT/KEPT/REMOVED debug logging for filter quality analysis
- Add is_lexical + needs_llm_relevance_filter to ElasticsearchSearchEngine
- Delete vacuously-true test_missing_llm_returns_none test
- Downgrade no-op skip_relevance_filter log from info to debug

* refactor: extract relevance filter into dedicated module

Pull the inline _filter_for_relevance() logic out of BaseSearchEngine
into a new web_search_engines/relevance_filter.py module.

- Use with_structured_output() with Pydantic schema; let LangChain
  pick the per-provider default method (JSON schema on Ollama,
  tool-calling on Anthropic, responseSchema on Gemini).
- Trim prompt: drop URLs, cap snippets at 200 chars.
- Suppress reasoning on Ollama thinking-by-default models via
  reasoning=False — saves 30-60s per call on qwen3 dense variants.
- Treat empty LLM responses as valid judgments; log a warning on
  batches >2 so users notice a misbehaving model.
- On exception or parse failure, return first N previews (cap=5 or
  max_filtered_results) to avoid overwhelming downstream.

* refactor(relevance_filter): cleanup + add direct tests

* feat(relevance_filter): batch previews in parallel for speed and reliability

Adds two tunable parameters to the LLM relevance filter:

- batch_size: split previews into chunks before sending to the LLM.
  Each batch uses local indices [0..batch_size-1] mapped back to
  global. Default 10. Smaller batches are faster per call AND more
  reliable on weaker models that struggle with many indices in one
  context.

- max_parallel_batches: dispatch batches concurrently via a
  ThreadPoolExecutor. Default 4. Result order is preserved across
  parallel batches.

Both exposed as BaseSearchEngine class attributes
(relevance_filter_batch_size, relevance_filter_max_parallel_batches)
so individual engines can override.

Failure semantics:
- Hard exception on any batch -> capped slice fallback (unchanged).
- Parse failure on a single batch -> skip that batch only, keep
  results from successful batches.

Adds 4 direct unit tests covering chunk/index mapping, batch_size=None
single-call mode, failed-batch-skip-keeps-others, and parallel dispatch
order preservation. All 120 tests pass.

* refactor(relevance_filter): drop structured output, parse plain text

The Pydantic with_structured_output() path had several issues:
- qwen3 dense models returned prose instead of JSON, raising
  OutputParserException and disabling the filter for that call
- grammar-constrained output on Ollama was 6-10x slower than plain
  text generation (~24s vs ~4s for 50 previews)
- per-provider quirks (function_calling latency, schema bikeshedding)

Switch to plain llm.invoke() and parse integers from the response with
a tightened regex (word-boundary, no decimal fractions). The prompt
now instructs the model to output ONLY the indices, which combined
with the regex is robust against prose-injection of small numbers.

Removes RelevanceResult Pydantic class, _invoke_structured, the
_BATCH_FAILED_PARSE sentinel, and the "all batches failed" branch
(all dead under the new contract). Updates tests to mock llm.invoke
directly. Tightens default batch_size to 5 and parallel batches to 10
based on benchmark runs against Ollama.

* docs: fix stale _filter_for_relevance docstring after text-parsing rewrite
2026-04-06 23:04:47 +02:00

13 KiB

Architecture Overview

This document provides a comprehensive overview of Local Deep Research's system architecture.

Table of Contents


System Components

graph TB
    subgraph "Entry Points"
        CLI[ldr CLI]
        WEB[ldr-web Flask App]
        API[REST API /api/v1]
    end

    subgraph "Core Research Engine"
        SS[SearchSystem<br/>search_system.py]
        SSF[StrategyFactory<br/>search_system_factory.py]
        RG[ReportGenerator<br/>report_generator.py]
    end

    subgraph "Search Layer"
        STRAT[32 Search Strategies<br/>advanced_search_system/strategies/]
        ENG[30+ Search Engines<br/>web_search_engines/engines/]
        RL[Rate Limiter<br/>rate_limiting/]
    end

    subgraph "Data Layer"
        DB[(SQLCipher DB<br/>Per-User Encrypted)]
        MODELS[20+ ORM Models<br/>database/models/]
        CACHE[Memory Cache]
    end

    subgraph "LLM Layer"
        PROV[LLM Providers<br/>Ollama, OpenAI, etc.]
        EMB[Embeddings<br/>embeddings/]
        RERANK[Reranker<br/>reranker/]
    end

    CLI --> SS
    WEB --> SS
    API --> SS

    SS --> SSF
    SS --> RG
    SSF --> STRAT
    STRAT --> ENG
    ENG --> RL

    SS --> PROV
    SS --> DB
    MODELS --> DB

    RG --> PROV
    ENG --> CACHE

Entry Points

Web Application (ldr-web)

Location: src/local_deep_research/web/app.py

The primary user interface. Launches a Flask server with SocketIO for real-time updates.

graph LR
    A[Browser] -->|HTTP/WS| B[Flask App]
    B --> C[Blueprints]
    C --> D[research_routes]
    C --> E[api_routes]
    C --> F[settings_routes]
    C --> G[auth_routes]
    B -->|Real-time| H[SocketIO]

Key files:

  • web/app.py - Main entry, starts server
  • web/app_factory.py - Flask app creation with middleware
  • web/routes/ - Blueprint route handlers
  • web/services/ - Business logic services

CLI (ldr)

Location: src/local_deep_research/main.py

Command-line interface for headless research operations.

REST API (/api/v1)

Location: src/local_deep_research/web/api.py

Programmatic access for integrations.

Endpoint Method Purpose
/api/v1/quick_summary POST Quick research summary
/api/v1/generate_report POST Full research report
/api/v1/analyze_documents POST Search local collections
/api/v1/health GET Health check

Research Execution Flow

sequenceDiagram
    participant User
    participant Web as Flask App
    participant SS as SearchSystem
    participant SF as StrategyFactory
    participant Strat as Strategy
    participant Eng as SearchEngine
    participant LLM as LLM Provider
    participant DB as Database

    User->>Web: Submit Query
    Web->>SS: start_research()
    SS->>SF: create_strategy(name)
    SF-->>SS: Strategy instance

    loop Research Iterations
        SS->>Strat: analyze_topic(query)
        Strat->>LLM: Generate questions
        LLM-->>Strat: Questions

        loop Per Question
            Strat->>Eng: search(question)
            Eng-->>Strat: Results
        end

        Strat->>LLM: Synthesize findings
        LLM-->>Strat: Synthesis
        Strat-->>SS: Findings
    end

    SS->>DB: Save results
    SS-->>Web: Research complete
    Web-->>User: Results via SocketIO

Research Status Lifecycle

stateDiagram-v2
    [*] --> QUEUED : Concurrency limit reached
    [*] --> IN_PROGRESS : Slots available

    QUEUED --> IN_PROGRESS : Worker picks up task
    QUEUED --> SUSPENDED : User terminates

    IN_PROGRESS --> COMPLETED : Research succeeds
    IN_PROGRESS --> FAILED : Unrecoverable error
    IN_PROGRESS --> SUSPENDED : User terminates

    COMPLETED --> [*]
    FAILED --> [*]
    SUSPENDED --> [*]

    note right of FAILED
        Set by processor_v2 and
        research_service on errors
    end note

    note left of SUSPENDED
        Set when user clicks
        terminate/stop
    end note

Unused statuses: PENDING (declared as a model default but never set by any creation path), ERROR (never set; predates FAILED), and CANCELLED (unused by research; used by benchmarks) exist in ResearchStatus for backward compatibility.


Module Responsibilities

Core Modules

Module Location Responsibility
SearchSystem search_system.py Orchestrates research, coordinates strategies and engines
StrategyFactory search_system_factory.py Creates strategy instances based on configuration
ReportGenerator report_generator.py Generates structured reports from research findings
CitationHandler citation_handler.py Processes and validates citations

Search System

Module Location Responsibility
BaseSearchEngine web_search_engines/search_engine_base.py Abstract base for all search engines
SearchEngineFactory web_search_engines/search_engine_factory.py Creates engine instances
RateLimitTracker web_search_engines/rate_limiting/tracker.py Adaptive rate limiting
RetrieverRegistry web_search_engines/retriever_registry.py LangChain retriever integration

Strategy System

Module Location Responsibility
BaseSearchStrategy advanced_search_system/strategies/base_strategy.py Abstract base for strategies
FindingsRepository advanced_search_system/findings/ Accumulates research findings
QuestionGenerator advanced_search_system/questions/ Generates research questions

Web Application

Module Location Responsibility
SocketIOService web/services/socket_service.py Real-time communication
ResearchService web/services/research_service.py Research execution
QueueManager web/queue/ Background task queue
SessionManager web/auth/session_manager.py User session handling

Data Layer

Module Location Responsibility
Models database/models/ SQLAlchemy ORM models
SessionContext database/session_context.py Thread-safe DB sessions
EncryptedDB database/encrypted_db.py SQLCipher integration

LLM Integration

Module Location Responsibility
LLM Providers llm/providers/implementations/ Provider-specific LLM wrappers
AutoDiscovery llm/providers/auto_discovery.py Dynamic provider detection
LLMRegistry llm/llm_registry.py Custom LLM registration

Threading Model

graph TB
    subgraph "Main Thread"
        FLASK[Flask Server]
        SOCKETIO[SocketIO Handler]
    end

    subgraph "Research Threads"
        RT1[Research Thread 1]
        RT2[Research Thread 2]
        RTN[Research Thread N]
    end

    subgraph "Queue Processor"
        QP[Queue Processor Thread]
    end

    subgraph "Thread-Local Storage"
        TC[Thread Context<br/>Settings Snapshot]
        DBS[DB Session<br/>Per-User]
    end

    FLASK --> RT1
    FLASK --> RT2
    FLASK --> RTN

    RT1 --> TC
    RT2 --> TC
    RTN --> TC

    QP --> RT1

    RT1 -.-> SOCKETIO
    RT2 -.-> SOCKETIO

Key Threading Concepts:

  1. Thread Context (config/thread_settings.py)

    • Each research thread has its own settings snapshot
    • Prevents race conditions on configuration changes
  2. Per-User DB Sessions (database/session_context.py)

    • Each user has an isolated SQLCipher database
    • Sessions are thread-local via context manager
  3. Queue Processing (web/queue/)

    • Background queue for long-running research
    • Processes items from QueuedResearch table
  4. SocketIO Updates

    • Research threads emit progress via SocketIO
    • Uses threading async mode (not asyncio)

Configuration System

graph TB
    subgraph "Configuration Sources"
        ENV[Environment Variables]
        DB[(Database Settings<br/>per-user)]
        JSON[Default Settings<br/>default_settings.json]
    end

    subgraph "Settings Management"
        SM[SettingsManager<br/>manager.py]
    end

    subgraph "Runtime Access"
        SNAP[Settings Snapshot<br/>Thread-safe copy]
        TC[Thread Context<br/>thread_settings.py]
    end

    ENV --> SM
    JSON --> SM
    DB --> SM
    SM --> SNAP
    SNAP --> TC

Configuration Flow:

  1. Defaults - Default settings loaded from JSON
  2. Environment - Environment variables override defaults
  3. Database - User settings loaded from encrypted per-user DB
  4. Snapshot - Thread-safe copy created for each research
  5. Access - Code reads from snapshot via thread context

Key Settings Categories:

Category Examples
llm.* Provider, model, temperature, API keys
search.* Engine selection, max results, rate limits
app.* Debug mode, logging, UI preferences
notifications.* Email, webhook configurations

Key Interfaces

Search Engine Interface

All search engines implement BaseSearchEngine:

class BaseSearchEngine(ABC):
    # Classification flags
    is_public: bool = True
    is_generic: bool = True
    is_scientific: bool = False
    is_local: bool = False
    is_news: bool = False
    is_code: bool = False
    is_lexical: bool = False
    needs_llm_relevance_filter: bool = False

    @abstractmethod
    def run(self, query: str) -> List[Dict[str, Any]]:
        """Execute search and return results."""
        # Returns: [{"title": ..., "link": ..., "snippet": ...}]

Strategy Interface

All strategies implement BaseSearchStrategy:

class BaseSearchStrategy(ABC):
    def __init__(self, search, model, all_links_of_system,
                 settings_snapshot, **kwargs):
        ...

    @abstractmethod
    def analyze_topic(self, query: str) -> Dict:
        """Execute research strategy."""
        # Returns: {
        #     "findings": [...],
        #     "iterations": int,
        #     "questions": {...},
        #     "formatted_findings": str,
        #     "current_knowledge": {...}
        # }

LLM Provider Interface

All providers extend OpenAICompatibleProvider:

class OpenAICompatibleProvider:
    provider_name: str
    api_key_setting: str
    url_setting: str
    default_base_url: str
    default_model: str

    @classmethod
    def create_llm(cls, model_name, temperature, **kwargs) -> BaseChatModel:
        """Create LangChain LLM instance."""

Directory Structure

src/local_deep_research/
├── search_system.py           # Main orchestrator
├── search_system_factory.py   # Strategy factory
├── report_generator.py        # Report generation
├── citation_handler.py        # Citation processing
│
├── web/                       # Flask application
│   ├── app.py                # Entry point
│   ├── app_factory.py        # App creation
│   ├── routes/               # Blueprint handlers
│   ├── services/             # Business logic
│   ├── queue/                # Task queue
│   └── auth/                 # Authentication
│
├── advanced_search_system/    # Search strategies
│   ├── strategies/           # 32 strategy implementations
│   ├── questions/            # Question generation
│   ├── findings/             # Findings management
│   └── ...
│
├── web_search_engines/        # Search engines
│   ├── engines/              # 30+ engine implementations
│   ├── search_engine_base.py # Abstract base
│   ├── search_engine_factory.py
│   └── rate_limiting/        # Adaptive rate limiting
│
├── database/                  # Data layer
│   ├── models/               # 20+ ORM models
│   ├── session_context.py    # Session management
│   └── encrypted_db.py       # SQLCipher
│
├── llm/                       # LLM integration
│   ├── providers/            # Provider implementations
│   └── llm_registry.py       # Custom LLM registration
│
├── config/                    # Configuration
│   ├── llm_config.py         # LLM setup
│   ├── search_config.py      # Search setup
│   └── thread_settings.py    # Thread context
│
├── settings/                  # Settings management
│   └── manager.py            # SettingsManager
│
└── api/                       # Programmatic API
    ├── client.py             # HTTP client
    └── research_functions.py # Direct functions

See Also