* refactor: remove dead benchmark and citation functions * cleanup: drop orphan cli.py stub, orphaned tests, stale docs Follow-up to #3187 addressing djpetti's review and the failing All Pytest Tests + Coverage check. - Delete benchmarks/cli.py entirely. The file was already shadowed by the benchmarks/cli/ package (same import path), so the deprecation stub was unreachable dead code. - Remove test classes that imported now-deleted functions: check_system_resources, plot_parameter_importance, plot_quality_vs_speed, CitationFormatter._to_superscript. This is what the pytest lane was failing on. - Update docs/cli-tools.md and benchmarks/metrics/README.md to drop references to the removed CLI module and plot helpers.
7.8 KiB
CLI Tools Reference
Local Deep Research includes command-line tools for benchmarking and rate limit management.
Table of Contents
Benchmarking CLI
Run benchmarks to evaluate search quality and compare configurations.
Basic Usage
python -m local_deep_research.benchmarks.cli.benchmark_commands <command> [options]
Commands
simpleqa - Run SimpleQA Benchmark
Tests factual question answering accuracy.
python -m local_deep_research.benchmarks.cli.benchmark_commands simpleqa [options]
Options:
| Option | Default | Description |
|---|---|---|
--examples |
100 | Number of questions to test |
--iterations |
3 | Search iterations per question |
--questions |
3 | Questions per iteration |
--search-tool |
searxng | Search engine to use |
--search-strategy |
source_based | Strategy (source_based, standard, rapid, parallel, iterdrag) |
--search-model |
(default) | LLM model for research |
--search-provider |
(default) | LLM provider |
--eval-model |
(default) | Model for answer evaluation |
--eval-provider |
(default) | Provider for evaluation |
--output-dir |
~/.local-deep-research/benchmark_results | Results directory |
--human-eval |
false | Use human evaluation |
--no-eval |
false | Skip evaluation phase |
--custom-dataset |
- | Path to custom dataset |
Example:
# Run 50 examples with Ollama
python -m local_deep_research.benchmarks.cli.benchmark_commands simpleqa \
--examples 50 \
--search-provider ollama \
--search-model llama3.2
browsecomp - Run BrowseComp Benchmark
Tests complex reasoning and multi-step research.
python -m local_deep_research.benchmarks.cli.benchmark_commands browsecomp [options]
Same options as simpleqa.
Example:
# Run BrowseComp with focused-iteration strategy
python -m local_deep_research.benchmarks.cli.benchmark_commands browsecomp \
--examples 20 \
--search-strategy iterdrag \
--iterations 5
compare - Compare Configurations
Compare multiple search configurations on the same dataset.
python -m local_deep_research.benchmarks.cli.benchmark_commands compare [options]
Options:
| Option | Default | Description |
|---|---|---|
--dataset |
simpleqa | Dataset to use (simpleqa, browsecomp) |
--examples |
20 | Examples per configuration |
--output-dir |
~/.local-deep-research/benchmark_results/comparison | Results directory |
Example:
# Compare configurations
python -m local_deep_research.benchmarks.cli.benchmark_commands compare \
--dataset simpleqa \
--examples 30
list - List Available Benchmarks
python -m local_deep_research.benchmarks.cli.benchmark_commands list
Shows available benchmark datasets and their descriptions.
Rate Limiting CLI
Monitor and manage the adaptive rate limiting system.
Basic Usage
python -m local_deep_research.web_search_engines.rate_limiting.cli <command> [options]
Commands
status - Show Rate Limit Statistics
View current rate limit data for search engines.
# All engines
python -m local_deep_research.web_search_engines.rate_limiting.cli status
# Specific engine
python -m local_deep_research.web_search_engines.rate_limiting.cli status --engine DuckDuckGoSearchEngine
Output columns:
| Column | Description |
|---|---|
| Engine | Search engine name |
| Base Wait | Current wait time in seconds |
| Range | Min-max wait times |
| Success | Success rate percentage |
| Attempts | Total request attempts |
| Updated | Last update timestamp |
Example output:
Rate Limit Statistics:
--------------------------------------------------------------------------------
Engine Base Wait Range Success Attempts Updated
--------------------------------------------------------------------------------
DuckDuckGoSearchEngine 2.50 1.0s - 5.0s 95.2% 150 12-26 14:30
ArXivSearchEngine 0.50 0.5s - 1.0s 99.8% 85 12-26 12:15
reset - Reset Engine Rate Limits
Clear learned rate limit data for an engine.
python -m local_deep_research.web_search_engines.rate_limiting.cli reset --engine <engine_name>
Example:
# Reset DuckDuckGo rate limits
python -m local_deep_research.web_search_engines.rate_limiting.cli reset --engine DuckDuckGoSearchEngine
Use this when:
- Rate limits are too conservative
- After API changes
- When switching environments
export - Export Rate Limit Data
Export rate limit statistics in various formats.
# Table format (default)
python -m local_deep_research.web_search_engines.rate_limiting.cli export
# CSV format
python -m local_deep_research.web_search_engines.rate_limiting.cli export --format csv
# JSON format
python -m local_deep_research.web_search_engines.rate_limiting.cli export --format json
Formats:
| Format | Use Case |
|---|---|
table |
Human-readable display |
csv |
Spreadsheet import |
json |
Programmatic processing |
Example CSV output:
engine_type,base_wait_seconds,min_wait_seconds,max_wait_seconds,last_updated,total_attempts,success_rate
DuckDuckGoSearchEngine,2.5,1.0,5.0,1703612400,150,0.952
cleanup - Remove Old Data
Clean up rate limit data older than a specified number of days.
python -m local_deep_research.web_search_engines.rate_limiting.cli cleanup --days <days>
Example:
# Remove data older than 30 days
python -m local_deep_research.web_search_engines.rate_limiting.cli cleanup --days 30
# Remove data older than 7 days
python -m local_deep_research.web_search_engines.rate_limiting.cli cleanup --days 7
MCP Server CLI
Run the MCP server for Claude Desktop integration.
Basic Usage
# Via entry point
ldr-mcp
# Via module
python -m local_deep_research.mcp
The server communicates over STDIO (stdin/stdout for JSON-RPC, stderr for logs). It is designed to be launched by Claude Desktop, not run interactively.
Environment Variables
Set via Claude Desktop config env block or shell environment:
| Variable | Description | Example |
|---|---|---|
LDR_LLM_PROVIDER |
LLM provider | openai, ollama, anthropic |
LDR_LLM_MODEL |
Model name | gpt-4, llama3:8b |
LDR_LLM_OPENAI_API_KEY |
OpenAI API key | sk-... |
LDR_SEARCH_TOOL |
Default search engine | auto, arxiv, wikipedia |
LDR_SEARCH_SEARCH_STRATEGY |
Default strategy | source-based, focused-iteration |
See MCP Server Guide for full documentation.
Common Engine Names
When using the rate limiting CLI, use these engine class names:
| Engine | Class Name |
|---|---|
| DuckDuckGo | DuckDuckGoSearchEngine |
| SearXNG | SearXNGSearchEngine |
| Brave | BraveSearchEngine |
| arXiv | ArXivSearchEngine |
| PubMed | PubMedSearchEngine |
| Semantic Scholar | SemanticScholarSearchEngine |
| Wikipedia | WikipediaSearchEngine |
| GitHub | GitHubSearchEngine |
Troubleshooting
Benchmark Not Starting
- Verify LLM provider is configured
- Check search engine is available
- Ensure sufficient disk space for results
Rate Limit Data Missing
- Run some searches first to generate data
- Check database file exists
- Try
statuswithout--engineflag
Export Permission Error
- Check write permissions on output directory
- Use a different output directory
See Also
- Architecture Overview - System architecture
- Troubleshooting - Common issues
- BENCHMARKING.md - Detailed benchmark documentation