mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-15 19:46:56 +03:00

Files

LearningCircuit 890c84e534 docs: link auto-generated Configuration Reference across docs & fix stale env var docs (#2472 )

- Add "Config Reference" link to Settings page "Learn & Get Help" bar
- Overhaul docs/env_configuration.md: remove stale Dynaconf references,
  fix wrong double-underscore env var format, remove documented-as-fixed
  bug, replace duplicate tables with links to CONFIGURATION.md
- Fix broken case-sensitive link in docs/deployment/unraid.md
- Add CONFIGURATION.md cross-references to 12 docs' "See Also" sections
- Update .env.template with correct LDR_-prefixed variable names
- Add config reference comment to docker-compose.yml environment block

2026-02-28 13:46:34 +01:00

8.8 KiB

Raw Blame History

Search Engines Guide

Local Deep Research integrates with multiple search engines to provide comprehensive research capabilities. This guide covers all available search engines, their specializations, and configuration details.

Note

: This documentation is maintained by the community and may contain inaccuracies. While we strive to keep it up-to-date, please verify critical information and report any errors via GitHub Issues.

Overview

LDR supports three categories of search engines:

Free Search Engines - No API key required
Premium Search Engines - Require API keys but offer enhanced features
Custom Sources - Your own documents and databases

Search Engine Selection

Auto Engine (Recommended)

The auto search engine intelligently selects the most appropriate engines based on your query:

result = quick_summary(
    query="What are the latest advances in quantum computing?",
    search_tool="auto"  # Automatically selects best engines
)

Free Search Engines

Academic Search Engines

arXiv

Specialization: Scientific papers and preprints
Best for: Physics, mathematics, computer science, biology
Results: Direct access to research papers
Rate Limit: Moderate - automatic retry on limits

PubMed

Specialization: Biomedical and life science literature
Best for: Medical research, clinical studies, biology
Results: Abstracts and links to full papers
Rate Limit: Generous - rarely hits limits

Semantic Scholar

Specialization: Academic literature across all fields
Best for: Cross-disciplinary research, citation networks
Results: Paper summaries with citation context
Rate Limit: Moderate - adaptive rate limiting handles this

General Purpose

Wikipedia

Specialization: General knowledge and encyclopedic information
Best for: Background information, concepts, facts
Results: Well-structured article content
Rate Limit: Very generous

SearXNG (Highly Recommended)

Specialization: Meta-search engine aggregating multiple sources
Best for: Comprehensive web search with privacy
Results: Aggregated results from Google, Bing, DuckDuckGo, etc.

Setup:

docker pull searxng/searxng
docker run -d -p 8080:8080 --name searxng searxng/searxng

Configuration: Set URL to http://localhost:8080 in settings

DuckDuckGo

Specialization: Privacy-focused web search
Best for: General web queries without tracking
Results: Web pages, instant answers
Rate Limit: Strict - use SearXNG for better reliability

Technical Search

GitHub

Specialization: Code repositories and documentation
Best for: Finding code examples, libraries, technical solutions
Results: Repository information, code snippets, issues
Rate Limit: Moderate when unauthenticated

Elasticsearch

Specialization: Custom search within your Elasticsearch cluster
Best for: Searching your own indexed data
Configuration: See Elasticsearch Setup Guide

Historical Search

Wayback Machine

Specialization: Historical web content
Best for: Finding deleted content, tracking changes over time
Results: Archived web pages with timestamps
Rate Limit: Moderate

News Search

The Guardian

Specialization: News articles and journalism
Best for: Current events, news analysis
Results: Recent news articles
Note: Requires API key (free tier available at https://open-platform.theguardian.com/)

Wikinews

Specialization: Open and collaboratively-written news articles on a wide range of topics
Best for: Historical and recent news, general news coverage, quick overviews
Results: News articles written by volunteers with verified sources

Premium Search Engines

Tavily

Specialization: AI-optimized search for LLM applications
Best for: High-quality, relevant results for AI research
Pricing: Free tier available, paid plans for higher volume

Configuration:

# In .env file or web interface
LDR_SEARCH_ENGINE_TAVILY_API_KEY=your-key-here

Google (via SerpAPI)

Specialization: Comprehensive web search
Best for: Most current and comprehensive results
Pricing: Paid service with free trial

Configuration:

LDR_SEARCH_ENGINE_WEB_SERPAPI_API_KEY=your-key-here

Google Programmable Search Engine

Specialization: Customizable Google search
Best for: Searching specific sites or topics
Pricing: Free tier with limits

Configuration:

LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_API_KEY=your-key-here
LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_ENGINE_ID=your-engine-id

Brave Search

Specialization: Independent search index with privacy focus
Best for: Web search without big tech tracking
Pricing: Free tier available

Configuration:

LDR_SEARCH_ENGINE_WEB_BRAVE_API_KEY=your-key-here

Custom Sources

Local Documents

Specialization: Search your private documents
Supported formats: PDF, TXT, MD, DOCX, CSV, and more
Configuration: See Configuring Local Search
Setup:
1. Go to Settings → Search for "local"
2. Add document collection paths
3. Choose embedding model (CPU or Ollama)
4. First search will index documents

LangChain Retrievers

Specialization: Any vector store or database
Supported: FAISS, Chroma, Pinecone, Weaviate, Elasticsearch
Configuration: See LangChain Integration Guide

Search Performance Comparison

Engine	Speed	Quality	Privacy	Rate Limits
SearXNG	★★★★★	★★★★☆	★★★★★	★★★★★
Wikipedia	★★★★☆	★★★★☆	★★★★★	★★★★★
arXiv	★★★★☆	★★★★★	★★★★★	★★★☆☆
PubMed	★★★★☆	★★★★★	★★★★★	★★★★☆
Tavily	★★★★☆	★★★★★	★★★☆☆	★★★★☆
Google (SerpAPI)	★★★★☆	★★★★★	★★☆☆☆	★★★★★
Local Documents	★★★☆☆	★★★★★	★★★★★	★★★★★

Rate Limiting and Reliability

LDR includes intelligent adaptive rate limiting that:

Learns optimal wait times for each engine
Automatically retries failed requests
Prevents your IP from being blocked
Maintains high reliability

Managing Rate Limits

# Check rate limit status
python -m local_deep_research.web_search_engines.rate_limiting status

# Reset rate limits if needed
python -m local_deep_research.web_search_engines.rate_limiting reset

Search Strategies

LDR supports multiple search strategies that determine how queries are processed:

source-based: Single query, fast results
focused_iteration: Iterative refinement for accuracy (recommended)

Best Practices

For General Research: Use auto or searxng
For Academic Research: Combine arxiv, pubmed, and semantic_scholar
For Technical Questions: Use github with searxng
For Maximum Privacy: Use searxng with local Ollama models
For Best Quality: Use tavily or Google with focused_iteration strategy

Troubleshooting

SearXNG Not Working

Verify container is running: docker ps | grep searxng
Check URL in settings: http://localhost:8080
Test directly: curl http://localhost:8080
Check the logs: docker logs searxng or view them in the LDR web UI

Rate Limit Errors

Wait a few minutes and retry
Use auto engine for automatic fallbacks
Consider adding premium engines for higher limits

No Results Found

Try different search engines
Broaden your query
Check internet connectivity
Verify API keys for premium engines

Advanced Configuration

Configuring Search Engines

You can enable/disable specific search engines and adjust their reliability parameters in the settings. This affects which engines are used in 'auto' mode and how the system handles rate limiting.

Parallel Search Configuration

Adjust parallel search settings:

result = quick_summary(
    query="your query",
    max_parallel_searches=5,  # Number of engines to query simultaneously
    timeout_per_engine=30     # Seconds per engine
)

8.8 KiB Raw Blame History

Search Engines Guide

Overview

Search Engine Selection

Auto Engine (Recommended)

Free Search Engines

Academic Search Engines

arXiv

PubMed

Semantic Scholar

General Purpose

Wikipedia

SearXNG (Highly Recommended)

DuckDuckGo

Technical Search

GitHub

Elasticsearch

Historical Search

Wayback Machine

News Search

The Guardian

Wikinews

Premium Search Engines

Tavily

Google (via SerpAPI)

Google Programmable Search Engine

Brave Search

Custom Sources

Local Documents

LangChain Retrievers

Search Performance Comparison

Rate Limiting and Reliability

Managing Rate Limits

Search Strategies

Best Practices

Troubleshooting

SearXNG Not Working

Rate Limit Errors

No Results Found

Advanced Configuration

Configuring Search Engines

Parallel Search Configuration

Related Documentation

8.8 KiB

Raw Blame History