* refactor(llm-providers): add API-key helpers + max_tokens cap to live class path * refactor(llm-config): delete ~360 lines of dead procedural code in get_llm() * test(llm-providers): replace dead-path tests with live class-path coverage * fix(llamacpp): send Bearer header in is_available() probe + add changelog * docs(llm-providers): clarify compute_max_tokens None semantics + duplicate-helper rationale * test(llm-providers): add enable_thinking default-True test + clarify max_tokens default rationale * fix(tests): rewrite test_api_key_configuration for live class path after dead-code deletion * docs: fix stale EXTENDING.md import path + OVERVIEW.md provider interface * docs(changelog): add llama.cpp is_available auth fix to PR #3984 entry * fix(tests): rewrite test_anthropic_no_api_key_raises for live registered path * docs(providers): expand BaseLLMProvider docstring + auto-discovery filter note in EXTENDING.md * test(llm): end-to-end dispatch integration tests for the registry-only path 15 tests pinning the full chain get_llm() -> registry -> provider class -> langchain client constructor, with only the client class patched. Covers: registry population for every VALID_PROVIDER, cloud dispatch + required-key semantics (openai/anthropic), LM Studio /v1 forcing (#4532) + optional-key placeholder, llamacpp verbatim URL, Ollama enable_thinking (the PR's headline fix), the 80% max_tokens cap on cloud and local windows, llm.supports_max_tokens=False, and context_limit bookkeeping via the wrapper. * fix(llm-providers): address multi-agent review findings on #3984 Review ran 8 dimension reviewers + adversarial verification; 7 confirmed findings (1 medium, 6 low), fixed here: - compute_max_tokens: absent llm.max_tokens now omits the kwarg (returns None / raises NoSettingsContextError for guarded callers) instead of injecting the dead chain's 100000 default, which exceeded most cloud models' output limits for partial-snapshot programmatic callers. This restores the pre-refactor live-class semantics the helper's docstring already promised. DEFAULT_MAX_TOKENS constant removed. - Ollama: num_ctx resolves through the shared get_context_window_for_provider helper, so it can no longer drift from the context_limit reported for overflow detection (was: inline 4096 default / omit-on-None vs helper's 8192). The max_tokens kwarg is no longer passed to ChatOllama — it has extra='ignore' and silently dropped it; num_predict is the real control and num_ctx behavior is unchanged. - openai_endpoint: keyless construction logs the dead chain's warning again instead of failing later with an opaque upstream 401. - get_llm docstring: openai_endpoint_url marked NON-FUNCTIONAL (it was already ignored on main's live path; honoring/removing it is a follow-up). Tests updated to pin the new semantics; +1 integration test for the unset-max_tokens path. * fix(llm-providers): address round-2 review findings on #3984 - conftest: re-run discover_providers(force_refresh=True) after the ProviderDiscovery singleton reset. Test modules calling clear_llm_registry() otherwise broke every later builtin get_llm test in the same process (the deleted chain was the accidental safety net); reproduced with tests/test_llm/test_llm_edge_cases.py followed by tests/test_llm_provider_integration.py. - llm_config: drop the except-ImportError around the discover_providers import. A broken providers package (e.g. bad langchain install) now fails the module import loudly instead of starting with an empty registry and a misdirecting per-call error (the swallowed failure left llm.providers.base in sys.modules, so the next import line didn't re-trigger it). Extend the unregistered-provider error to mention clear_llm_registry()/unregister_llm() and the recovery call. - _helpers: correct the compute_max_tokens Raises clause — a present snapshot lacking llm.max_tokens still raises when no thread context exists. - test_ollama_deep_coverage: the explicit-None window test patched the ollama module binding, which the shared helper never reads — it passed vacuously. Use a real settings_snapshot so the explicit-None branch is actually exercised. - changelog: keyless openai_endpoint previously raised at construction on the live path (not a late 401); document the always-on local-window cap for LM Studio/llama.cpp, supports_max_tokens=False now honored, and whitespace-only key errors. - EXTENDING.md: document that clear_llm_registry() removes built-in providers and how to restore them. * test(llm): pin LM Studio max_tokens cap to the local context window The AI review flagged a suspected provider_key regression (local providers falling through to the cloud window branch). Verified false: LMStudio/ LlamaCpp/CustomOpenAIEndpoint all define provider_key (auto-discovery registration requires it), and lmstudio/llamacpp resolve through the LOCAL_PROVIDERS branch. This test pins that resolution end to end. * test(security): patch get_setting_from_snapshot at its source module tests/security/test_api_key_leakage.py patched local_deep_research.llm.providers.openai_base.get_setting_from_snapshot, but the refactor made that import function-local in openai_base (so source- module patches are picked up), so the module no longer exposes it and the patch raised AttributeError. This file came in via the merge of main and was not in the PR's original diff, so the same merge fix applied to the anthropic/openai test files was missed here. Repoint both patches at local_deep_research.config.thread_settings.get_setting_from_snapshot, the convention documented in openai_base.py's header. * docs(llm): note is_available() is a config check, not a server probe Clarify in OpenAICompatibleProvider.is_available that the api_key_optional branch returns True without a reachability check, and that local optional providers (LM Studio, llama.cpp) override it with an HTTP probe. Addresses an AI-review readability note; no behavior change. * test(cleanup): drop unused clear_llm_registry import in test_api_key_configuration The fixture now restores the registry via discover_providers(force_refresh=True) and no longer calls clear_llm_registry(); the import (kept alive only by a # noqa: F401) is dead. Addresses an AI-review cleanup note.
20 KiB
Extension Guide
This guide explains how to extend Local Deep Research with custom components.
Table of Contents
- Adding Custom Search Engines
- Adding Custom Search Strategies
- Using LangChain Retrievers
- Adding Custom LLM Providers
- Registering Custom LLMs
Adding Custom Search Engines
Search engines are responsible for fetching results from external sources. All engines extend BaseSearchEngine.
Basic Search Engine
Create a new file in src/local_deep_research/web_search_engines/engines/:
# search_engine_custom.py
from typing import Any, Dict, List, Optional
from langchain_core.language_models import BaseLLM
from loguru import logger
from ..search_engine_base import BaseSearchEngine
class CustomSearchEngine(BaseSearchEngine):
"""Custom search engine implementation."""
# Classification flags - set appropriately for your engine
is_public = True # Searches public internet
is_generic = False # Specialized (vs general web search)
is_scientific = False # Academic/scientific content
is_local = False # Local document search
is_news = False # News content
is_code = False # Code repositories
is_lexical = False # Uses keyword/lexical search (informational)
needs_llm_relevance_filter = False # Set True to auto-enable LLM relevance filtering
def __init__(
self,
max_results: int = 10,
credential: Optional[str] = None,
llm: Optional[BaseLLM] = None,
max_filtered_results: Optional[int] = None,
**kwargs,
):
"""
Initialize the search engine.
Args:
max_results: Maximum number of results to return
credential: API credential for the service (if required)
llm: Language model for relevance filtering
max_filtered_results: Max results after filtering
**kwargs: Additional parameters
"""
super().__init__(
llm=llm,
max_filtered_results=max_filtered_results,
max_results=max_results,
)
self.credential = credential
def _get_previews(self, query: str) -> List[Dict[str, Any]]:
"""
Get preview results (first phase of two-phase retrieval).
Args:
query: Search query
Returns:
List of preview dictionaries with keys:
- id: Unique identifier
- title: Result title
- snippet: Brief description/summary
- link: URL to the content
- source: Source name (e.g., "CustomEngine")
"""
logger.info(f"Searching custom engine for: {query}")
# Apply rate limiting before request
self._last_wait_time = self.rate_tracker.apply_rate_limit(self.engine_type)
# Your search implementation here
results = self._call_api(query)
previews = []
for item in results:
previews.append({
"id": item["id"],
"title": item["title"],
"snippet": item["description"],
"link": item["url"],
"source": "CustomEngine",
})
return previews
def _get_full_content(
self, relevant_items: List[Dict[str, Any]]
) -> List[Dict[str, Any]]:
"""
Get full content for relevant items (second phase).
Args:
relevant_items: Items that passed relevance filtering
Returns:
Items enriched with full content
"""
results = []
for item in relevant_items:
# Apply rate limiting
self._last_wait_time = self.rate_tracker.apply_rate_limit(self.engine_type)
# Fetch full content
full_content = self._fetch_content(item["link"])
result = item.copy()
result["content"] = full_content
result["full_content"] = full_content
results.append(result)
return results
def _call_api(self, query: str) -> List[Dict]:
"""Your API implementation."""
# Implement your search logic here
pass
def _fetch_content(self, url: str) -> str:
"""Fetch full content from URL."""
# Implement content fetching
pass
Registering the Engine
Option 1: Register in engine_registry.py (Required)
Add the engine to src/local_deep_research/web_search_engines/engine_registry.py so the system knows how to load it. The registry maps engine names to their Python module and class:
# In engine_registry.py — ENGINE_REGISTRY dict
"custom_engine": EngineEntry(
module_path=".engines.search_engine_custom",
class_name="CustomSearchEngine",
),
Module paths must be relative (starting with .) and listed in the security whitelist (ALLOWED_MODULE_PATHS in module_whitelist.py).
Option 1b: Configure user-facing settings (Optional)
After registering in the engine registry, you can expose user-configurable settings via the settings database:
# Key: search.engine.web.custom_engine
config = {
"requires_api_key": True,
"requires_llm": False,
"description": "Custom search engine for specific use case",
"strengths": ["Feature 1", "Feature 2"],
"weaknesses": ["Limitation 1"],
"reliability": 0.8,
"default_params": {
"max_results": 10
}
}
Option 2: Modify Factory (For Core Engines)
Add to search_engine_factory.py:
def create_search_engine(engine_name: str, ...) -> BaseSearchEngine:
# ... existing code ...
if engine_name.lower() == "custom_engine":
from .engines.search_engine_custom import CustomSearchEngine
return CustomSearchEngine(
max_results=max_results,
api_key=api_key,
llm=llm,
**kwargs
)
Search Engine Best Practices
-
Always apply rate limiting before API calls:
self._last_wait_time = self.rate_tracker.apply_rate_limit(self.engine_type) -
Set classification flags accurately - they affect engine selection. For keyword-based engines without ML ranking, set
is_lexical = Trueandneeds_llm_relevance_filter = True— the factory will auto-enable LLM relevance filtering -
Handle errors gracefully - return empty list on failure, don't crash
-
Use logging for debugging:
from loguru import logger logger.info(f"Searching for: {query}") logger.error(f"API error: {e}") -
Support snippet-only mode by checking the config:
from ...config import search_config if search_config.SEARCH_SNIPPETS_ONLY: return relevant_items # Skip full content
Adding Custom Search Strategies
Strategies define how research is conducted - question generation, iteration, and synthesis.
Basic Strategy
Create a new file in src/local_deep_research/advanced_search_system/strategies/:
# my_custom_strategy.py
from typing import Dict, List, Optional
from loguru import logger
from .base_strategy import BaseSearchStrategy
class MyCustomStrategy(BaseSearchStrategy):
"""Custom search strategy implementation."""
def __init__(
self,
search=None,
model=None,
all_links_of_system=None,
settings_snapshot=None,
max_iterations: int = 3,
**kwargs,
):
"""
Initialize the strategy.
Args:
search: Search engine instance
model: LLM for question generation and synthesis
all_links_of_system: Shared list for discovered links
settings_snapshot: Configuration snapshot
max_iterations: Maximum research iterations
**kwargs: Additional parameters
"""
super().__init__(
all_links_of_system=all_links_of_system,
settings_snapshot=settings_snapshot,
)
self.search = search
self.model = model
self.max_iterations = max_iterations
def analyze_topic(self, query: str) -> Dict:
"""
Execute the research strategy.
Args:
query: Research query
Returns:
Dict with:
- findings: List of research findings
- iterations: Number of iterations completed
- questions: Dict of questions by iteration
- formatted_findings: Formatted output string
- current_knowledge: Accumulated knowledge dict
- error: Optional error message
"""
logger.info(f"Starting custom strategy for: {query}")
findings = []
current_knowledge = {}
try:
for iteration in range(1, self.max_iterations + 1):
# Update progress
self._update_progress(
f"Iteration {iteration}/{self.max_iterations}",
progress_percent=int(iteration / self.max_iterations * 100),
metadata={"iteration": iteration}
)
# Generate questions for this iteration
questions = self._generate_questions(query, current_knowledge)
self.questions_by_iteration[iteration] = questions
# Search for each question
for question in questions:
results = self._search(question)
findings.extend(results)
# Track links
for result in results:
if result.get("link"):
self.all_links_of_system.append(result["link"])
# Synthesize findings
current_knowledge = self._synthesize(findings)
# Check if we should stop early
if self._should_stop(current_knowledge):
logger.info(f"Early stopping at iteration {iteration}")
break
# Format final output
formatted = self._format_findings(findings, current_knowledge)
return {
"findings": findings,
"iterations": iteration,
"questions": self.questions_by_iteration,
"formatted_findings": formatted,
"current_knowledge": current_knowledge,
}
except Exception as e:
logger.error(f"Strategy error: {e}")
return {
"findings": findings,
"iterations": 0,
"questions": self.questions_by_iteration,
"formatted_findings": "",
"current_knowledge": current_knowledge,
"error": str(e),
}
def _generate_questions(self, query: str, knowledge: Dict) -> List[str]:
"""Generate research questions using the LLM."""
prompt = f"""Given the query: {query}
And current knowledge: {knowledge}
Generate 3 specific research questions."""
response = self.model.invoke(prompt)
# Parse response into questions
return self._parse_questions(response.content)
def _search(self, question: str) -> List[Dict]:
"""Execute search for a question."""
return self.search.run(question)
def _synthesize(self, findings: List[Dict]) -> Dict:
"""Synthesize findings into knowledge."""
# Implement synthesis logic
return {"summary": "...", "key_points": [...]}
def _should_stop(self, knowledge: Dict) -> bool:
"""Check if research should stop early."""
# Implement stopping criteria
return False
def _format_findings(self, findings: List[Dict], knowledge: Dict) -> str:
"""Format findings as output string."""
# Implement formatting
return "Formatted research results..."
def _parse_questions(self, content: str) -> List[str]:
"""Parse LLM response into question list."""
# Implement parsing
return content.strip().split("\n")
Registering the Strategy
Add to search_system_factory.py:
def create_strategy(strategy_name: str, ...) -> BaseSearchStrategy:
strategy_name_lower = strategy_name.lower()
# ... existing strategies ...
elif strategy_name_lower in ["my-custom", "mycustom", "custom"]:
from .advanced_search_system.strategies.my_custom_strategy import (
MyCustomStrategy,
)
return MyCustomStrategy(
search=search,
model=model,
all_links_of_system=all_links_of_system,
settings_snapshot=settings_snapshot,
**kwargs
)
Strategy Best Practices
-
Use progress callbacks to update the UI:
self._update_progress("Searching...", progress_percent=50) -
Track all discovered links in
self.all_links_of_system -
Store questions by iteration in
self.questions_by_iteration -
Access settings via the snapshot:
max_results = self.get_setting("search.max_results", default=10) -
Handle errors gracefully - return partial results with error message
Using LangChain Retrievers
The easiest way to add custom search is through LangChain retrievers.
Registering a Retriever
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from local_deep_research.web_search_engines.retriever_registry import retriever_registry
# Create your retriever
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
# Register globally
retriever_registry.register("my_documents", retriever)
# Now use in research
from local_deep_research.api import quick_summary
result = quick_summary(
query="What does the documentation say about X?",
search_tool="my_documents", # Use registered retriever
programmatic_mode=True
)
Passing Retrievers Directly
from local_deep_research.api import quick_summary
# Create retriever
retriever = my_vectorstore.as_retriever()
# Pass directly to API
result = quick_summary(
query="Search my documents",
retrievers={"private_docs": retriever},
search_tool="private_docs",
programmatic_mode=True
)
Registry Methods
from local_deep_research.web_search_engines.retriever_registry import retriever_registry
# Register
retriever_registry.register("name", retriever)
retriever_registry.register_multiple({"a": ret1, "b": ret2})
# Query
retriever_registry.get("name")
retriever_registry.is_registered("name")
retriever_registry.list_registered()
# Remove
retriever_registry.unregister("name")
retriever_registry.clear()
Adding Custom LLM Providers
LLM providers wrap language model APIs for use in LDR.
Basic Provider
Create in src/local_deep_research/llm/providers/implementations/:
# my_provider.py
from typing import Dict, Optional
from langchain_core.language_models import BaseChatModel
from langchain_openai import ChatOpenAI
from ..openai_base import OpenAICompatibleProvider
class MyProvider(OpenAICompatibleProvider):
"""Custom LLM provider."""
provider_name = "My Provider"
api_key_setting = "llm.my_provider.api_key"
url_setting = "llm.my_provider.url"
default_base_url = "https://api.myprovider.com/v1"
default_model = "my-model-v1"
# Optional: set to True if missing key should fall back to a placeholder
# rather than raising ValueError.
api_key_optional = False
@classmethod
def create_llm(
cls,
model_name: Optional[str] = None,
temperature: float = 0.7,
settings_snapshot: Optional[Dict] = None,
**kwargs
) -> BaseChatModel:
"""
Create LLM instance.
Args:
model_name: Model to use
temperature: Sampling temperature
settings_snapshot: Configuration
**kwargs: Additional parameters
Returns:
LangChain chat model instance
"""
from ....config.thread_settings import get_setting_from_snapshot
# Resolve API key via the base helper. Raises ValueError when
# required and missing, returns the unified placeholder when
# api_key_optional=True and the key is unset.
api_key = cls.resolve_api_key_or_placeholder(settings_snapshot)
# Get base URL
base_url = get_setting_from_snapshot(
cls.url_setting,
cls.default_base_url,
settings_snapshot=settings_snapshot,
)
return ChatOpenAI(
model=model_name or cls.default_model,
temperature=temperature,
api_key=api_key,
base_url=base_url,
**kwargs
)
@classmethod
def list_models(cls, settings_snapshot: Optional[Dict] = None) -> list[str]:
"""List available models."""
return ["my-model-v1", "my-model-v2", "my-model-large"]
Register in Auto-Discovery
Drop the provider class file into
src/local_deep_research/llm/providers/implementations/. Auto-discovery
will scan that directory at import time and register every class whose
name ends with Provider, subclasses BaseLLMProvider, and has
provider_name set to a real value (i.e., overridden away from the
"unknown" default). Setting provider_name = "unknown" — or leaving
it unset on the class — will cause the class to be silently filtered
out of auto-discovery, which is a common gotcha when copying an
existing provider as a template.
Optional cloud-metadata registration in auto_discovery.py:
PROVIDER_METADATA = {
# ... existing providers ...
"my_provider": ProviderMetadata(
provider_id="my_provider",
provider_name="My Provider",
company_name="My Company",
region="US",
country="United States",
data_location="US",
gdpr_compliant=False,
is_cloud=True,
),
}
Registering Custom LLMs
For programmatic use, register LLMs directly:
from langchain_openai import ChatOpenAI
from local_deep_research.llm.llm_registry import register_llm, get_llm_from_registry
# Create custom LLM
custom_llm = ChatOpenAI(
model="gpt-4",
temperature=0.5,
api_key="...",
)
# Register it
register_llm("my_gpt4", custom_llm)
# Use in research
from local_deep_research.api import quick_summary
result = quick_summary(
query="Research topic",
llms={"my_gpt4": custom_llm}, # Or use registered name
provider_name="my_gpt4",
programmatic_mode=True
)
Factory Functions
You can also register factory functions:
def create_my_llm(temperature=0.7):
return ChatOpenAI(model="gpt-4", temperature=temperature)
register_llm("my_factory", create_my_llm)
# Will be called when needed
llm = get_llm_from_registry("my_factory")
Registry caveat
The built-in providers (ollama, openai, anthropic, ...) live in the same
registry, auto-registered at import time. clear_llm_registry() removes
them too, and get_llm() has no other construction path — every provider
will raise "was not registered by auto-discovery" until you restore them:
from local_deep_research.llm.providers import discover_providers
discover_providers(force_refresh=True)
Prefer unregister_llm("<your name>") over clear_llm_registry() to
remove only your own registrations.
See Also
- Architecture Overview - System architecture
- Database Schema - Data models
- Full Configuration Reference - All settings and environment variables
- Troubleshooting - Common issues
- API Quickstart - Using the API