- Add "Config Reference" link to Settings page "Learn & Get Help" bar - Overhaul docs/env_configuration.md: remove stale Dynaconf references, fix wrong double-underscore env var format, remove documented-as-fixed bug, replace duplicate tables with links to CONFIGURATION.md - Fix broken case-sensitive link in docs/deployment/unraid.md - Add CONFIGURATION.md cross-references to 12 docs' "See Also" sections - Update .env.template with correct LDR_-prefixed variable names - Add config reference comment to docker-compose.yml environment block
8.8 KiB
Search Engines Guide
Local Deep Research integrates with multiple search engines to provide comprehensive research capabilities. This guide covers all available search engines, their specializations, and configuration details.
Note
: This documentation is maintained by the community and may contain inaccuracies. While we strive to keep it up-to-date, please verify critical information and report any errors via GitHub Issues.
Overview
LDR supports three categories of search engines:
- Free Search Engines - No API key required
- Premium Search Engines - Require API keys but offer enhanced features
- Custom Sources - Your own documents and databases
Search Engine Selection
Auto Engine (Recommended)
The auto search engine intelligently selects the most appropriate engines based on your query:
result = quick_summary(
query="What are the latest advances in quantum computing?",
search_tool="auto" # Automatically selects best engines
)
Free Search Engines
Academic Search Engines
arXiv
- Specialization: Scientific papers and preprints
- Best for: Physics, mathematics, computer science, biology
- Results: Direct access to research papers
- Rate Limit: Moderate - automatic retry on limits
PubMed
- Specialization: Biomedical and life science literature
- Best for: Medical research, clinical studies, biology
- Results: Abstracts and links to full papers
- Rate Limit: Generous - rarely hits limits
Semantic Scholar
- Specialization: Academic literature across all fields
- Best for: Cross-disciplinary research, citation networks
- Results: Paper summaries with citation context
- Rate Limit: Moderate - adaptive rate limiting handles this
General Purpose
Wikipedia
- Specialization: General knowledge and encyclopedic information
- Best for: Background information, concepts, facts
- Results: Well-structured article content
- Rate Limit: Very generous
SearXNG (Highly Recommended)
- Specialization: Meta-search engine aggregating multiple sources
- Best for: Comprehensive web search with privacy
- Results: Aggregated results from Google, Bing, DuckDuckGo, etc.
- Setup:
docker pull searxng/searxng docker run -d -p 8080:8080 --name searxng searxng/searxng - Configuration: Set URL to
http://localhost:8080in settings
DuckDuckGo
- Specialization: Privacy-focused web search
- Best for: General web queries without tracking
- Results: Web pages, instant answers
- Rate Limit: Strict - use SearXNG for better reliability
Technical Search
GitHub
- Specialization: Code repositories and documentation
- Best for: Finding code examples, libraries, technical solutions
- Results: Repository information, code snippets, issues
- Rate Limit: Moderate when unauthenticated
Elasticsearch
- Specialization: Custom search within your Elasticsearch cluster
- Best for: Searching your own indexed data
- Configuration: See Elasticsearch Setup Guide
Historical Search
Wayback Machine
- Specialization: Historical web content
- Best for: Finding deleted content, tracking changes over time
- Results: Archived web pages with timestamps
- Rate Limit: Moderate
News Search
The Guardian
- Specialization: News articles and journalism
- Best for: Current events, news analysis
- Results: Recent news articles
- Note: Requires API key (free tier available at https://open-platform.theguardian.com/)
Wikinews
- Specialization: Open and collaboratively-written news articles on a wide range of topics
- Best for: Historical and recent news, general news coverage, quick overviews
- Results: News articles written by volunteers with verified sources
Premium Search Engines
Tavily
- Specialization: AI-optimized search for LLM applications
- Best for: High-quality, relevant results for AI research
- Pricing: Free tier available, paid plans for higher volume
- Configuration:
# In .env file or web interface LDR_SEARCH_ENGINE_TAVILY_API_KEY=your-key-here
Google (via SerpAPI)
- Specialization: Comprehensive web search
- Best for: Most current and comprehensive results
- Pricing: Paid service with free trial
- Configuration:
LDR_SEARCH_ENGINE_WEB_SERPAPI_API_KEY=your-key-here
Google Programmable Search Engine
- Specialization: Customizable Google search
- Best for: Searching specific sites or topics
- Pricing: Free tier with limits
- Configuration:
LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_API_KEY=your-key-here LDR_SEARCH_ENGINE_WEB_GOOGLE_PSE_ENGINE_ID=your-engine-id
Brave Search
- Specialization: Independent search index with privacy focus
- Best for: Web search without big tech tracking
- Pricing: Free tier available
- Configuration:
LDR_SEARCH_ENGINE_WEB_BRAVE_API_KEY=your-key-here
Custom Sources
Local Documents
- Specialization: Search your private documents
- Supported formats: PDF, TXT, MD, DOCX, CSV, and more
- Configuration: See Configuring Local Search
- Setup:
- Go to Settings → Search for "local"
- Add document collection paths
- Choose embedding model (CPU or Ollama)
- First search will index documents
LangChain Retrievers
- Specialization: Any vector store or database
- Supported: FAISS, Chroma, Pinecone, Weaviate, Elasticsearch
- Configuration: See LangChain Integration Guide
Search Performance Comparison
| Engine | Speed | Quality | Privacy | Rate Limits |
|---|---|---|---|---|
| SearXNG | ★★★★★ | ★★★★☆ | ★★★★★ | ★★★★★ |
| Wikipedia | ★★★★☆ | ★★★★☆ | ★★★★★ | ★★★★★ |
| arXiv | ★★★★☆ | ★★★★★ | ★★★★★ | ★★★☆☆ |
| PubMed | ★★★★☆ | ★★★★★ | ★★★★★ | ★★★★☆ |
| Tavily | ★★★★☆ | ★★★★★ | ★★★☆☆ | ★★★★☆ |
| Google (SerpAPI) | ★★★★☆ | ★★★★★ | ★★☆☆☆ | ★★★★★ |
| Local Documents | ★★★☆☆ | ★★★★★ | ★★★★★ | ★★★★★ |
Rate Limiting and Reliability
LDR includes intelligent adaptive rate limiting that:
- Learns optimal wait times for each engine
- Automatically retries failed requests
- Prevents your IP from being blocked
- Maintains high reliability
Managing Rate Limits
# Check rate limit status
python -m local_deep_research.web_search_engines.rate_limiting status
# Reset rate limits if needed
python -m local_deep_research.web_search_engines.rate_limiting reset
Search Strategies
LDR supports multiple search strategies that determine how queries are processed:
- source-based: Single query, fast results
- focused_iteration: Iterative refinement for accuracy (recommended)
Best Practices
- For General Research: Use
autoorsearxng - For Academic Research: Combine
arxiv,pubmed, andsemantic_scholar - For Technical Questions: Use
githubwithsearxng - For Maximum Privacy: Use
searxngwith local Ollama models - For Best Quality: Use
tavilyor Google withfocused_iterationstrategy
Troubleshooting
SearXNG Not Working
- Verify container is running:
docker ps | grep searxng - Check URL in settings:
http://localhost:8080 - Test directly:
curl http://localhost:8080 - Check the logs:
docker logs searxngor view them in the LDR web UI
Rate Limit Errors
- Wait a few minutes and retry
- Use
autoengine for automatic fallbacks - Consider adding premium engines for higher limits
No Results Found
- Try different search engines
- Broaden your query
- Check internet connectivity
- Verify API keys for premium engines
Advanced Configuration
Configuring Search Engines
You can enable/disable specific search engines and adjust their reliability parameters in the settings. This affects which engines are used in 'auto' mode and how the system handles rate limiting.
Parallel Search Configuration
Adjust parallel search settings:
result = quick_summary(
query="your query",
max_parallel_searches=5, # Number of engines to query simultaneously
timeout_per_engine=30 # Seconds per engine
)