mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-15 19:46:56 +03:00

Files

LearningCircuit d3570c355a refactor: remove dead benchmark and citation functions (#3187 )

* refactor: remove dead benchmark and citation functions

* cleanup: drop orphan cli.py stub, orphaned tests, stale docs

Follow-up to #3187 addressing djpetti's review and the failing
All Pytest Tests + Coverage check.

- Delete benchmarks/cli.py entirely. The file was already shadowed by the
  benchmarks/cli/ package (same import path), so the deprecation stub was
  unreachable dead code.
- Remove test classes that imported now-deleted functions:
  check_system_resources, plot_parameter_importance, plot_quality_vs_speed,
  CitationFormatter._to_superscript. This is what the pytest lane was
  failing on.
- Update docs/cli-tools.md and benchmarks/metrics/README.md to drop
  references to the removed CLI module and plot helpers.

2026-04-26 10:58:31 +02:00

7.8 KiB

Raw Blame History

CLI Tools Reference

Local Deep Research includes command-line tools for benchmarking and rate limit management.

Benchmarking CLI
Rate Limiting CLI
MCP Server CLI

Benchmarking CLI

Run benchmarks to evaluate search quality and compare configurations.

Basic Usage

python -m local_deep_research.benchmarks.cli.benchmark_commands <command> [options]

Commands

`simpleqa` - Run SimpleQA Benchmark

Tests factual question answering accuracy.

python -m local_deep_research.benchmarks.cli.benchmark_commands simpleqa [options]

Options:

Option	Default	Description
`--examples`	100	Number of questions to test
`--iterations`	3	Search iterations per question
`--questions`	3	Questions per iteration
`--search-tool`	searxng	Search engine to use
`--search-strategy`	source_based	Strategy (source_based, standard, rapid, parallel, iterdrag)
`--search-model`	(default)	LLM model for research
`--search-provider`	(default)	LLM provider
`--eval-model`	(default)	Model for answer evaluation
`--eval-provider`	(default)	Provider for evaluation
`--output-dir`	~/.local-deep-research/benchmark_results	Results directory
`--human-eval`	false	Use human evaluation
`--no-eval`	false	Skip evaluation phase
`--custom-dataset`	-	Path to custom dataset

Example:

# Run 50 examples with Ollama
python -m local_deep_research.benchmarks.cli.benchmark_commands simpleqa \
  --examples 50 \
  --search-provider ollama \
  --search-model llama3.2

`browsecomp` - Run BrowseComp Benchmark

Tests complex reasoning and multi-step research.

python -m local_deep_research.benchmarks.cli.benchmark_commands browsecomp [options]

Same options as simpleqa.

Example:

# Run BrowseComp with focused-iteration strategy
python -m local_deep_research.benchmarks.cli.benchmark_commands browsecomp \
  --examples 20 \
  --search-strategy iterdrag \
  --iterations 5

`compare` - Compare Configurations

Compare multiple search configurations on the same dataset.

python -m local_deep_research.benchmarks.cli.benchmark_commands compare [options]

Options:

Option	Default	Description
`--dataset`	simpleqa	Dataset to use (simpleqa, browsecomp)
`--examples`	20	Examples per configuration
`--output-dir`	~/.local-deep-research/benchmark_results/comparison	Results directory

Example:

# Compare configurations
python -m local_deep_research.benchmarks.cli.benchmark_commands compare \
  --dataset simpleqa \
  --examples 30

`list` - List Available Benchmarks

python -m local_deep_research.benchmarks.cli.benchmark_commands list

Shows available benchmark datasets and their descriptions.

Rate Limiting CLI

Monitor and manage the adaptive rate limiting system.

Basic Usage

python -m local_deep_research.web_search_engines.rate_limiting.cli <command> [options]

Commands

`status` - Show Rate Limit Statistics

View current rate limit data for search engines.

# All engines
python -m local_deep_research.web_search_engines.rate_limiting.cli status

# Specific engine
python -m local_deep_research.web_search_engines.rate_limiting.cli status --engine DuckDuckGoSearchEngine

Output columns:

Column	Description
Engine	Search engine name
Base Wait	Current wait time in seconds
Range	Min-max wait times
Success	Success rate percentage
Attempts	Total request attempts
Updated	Last update timestamp

Example output:

Rate Limit Statistics:
--------------------------------------------------------------------------------
Engine               Base Wait    Range                Success    Attempts   Updated
--------------------------------------------------------------------------------
DuckDuckGoSearchEngine 2.50        1.0s - 5.0s         95.2%      150        12-26 14:30
ArXivSearchEngine      0.50        0.5s - 1.0s         99.8%      85         12-26 12:15

`reset` - Reset Engine Rate Limits

Clear learned rate limit data for an engine.

python -m local_deep_research.web_search_engines.rate_limiting.cli reset --engine <engine_name>

Example:

# Reset DuckDuckGo rate limits
python -m local_deep_research.web_search_engines.rate_limiting.cli reset --engine DuckDuckGoSearchEngine

Use this when:

Rate limits are too conservative
After API changes
When switching environments

`export` - Export Rate Limit Data

Export rate limit statistics in various formats.

# Table format (default)
python -m local_deep_research.web_search_engines.rate_limiting.cli export

# CSV format
python -m local_deep_research.web_search_engines.rate_limiting.cli export --format csv

# JSON format
python -m local_deep_research.web_search_engines.rate_limiting.cli export --format json

Formats:

Format	Use Case
`table`	Human-readable display
`csv`	Spreadsheet import
`json`	Programmatic processing

Example CSV output:

engine_type,base_wait_seconds,min_wait_seconds,max_wait_seconds,last_updated,total_attempts,success_rate
DuckDuckGoSearchEngine,2.5,1.0,5.0,1703612400,150,0.952

`cleanup` - Remove Old Data

Clean up rate limit data older than a specified number of days.

python -m local_deep_research.web_search_engines.rate_limiting.cli cleanup --days <days>

Example:

# Remove data older than 30 days
python -m local_deep_research.web_search_engines.rate_limiting.cli cleanup --days 30

# Remove data older than 7 days
python -m local_deep_research.web_search_engines.rate_limiting.cli cleanup --days 7

MCP Server CLI

Run the MCP server for Claude Desktop integration.

Basic Usage

# Via entry point
ldr-mcp

# Via module
python -m local_deep_research.mcp

The server communicates over STDIO (stdin/stdout for JSON-RPC, stderr for logs). It is designed to be launched by Claude Desktop, not run interactively.

Environment Variables

Set via Claude Desktop config env block or shell environment:

Variable	Description	Example
`LDR_LLM_PROVIDER`	LLM provider	`openai`, `ollama`, `anthropic`
`LDR_LLM_MODEL`	Model name	`gpt-4`, `llama3:8b`
`LDR_LLM_OPENAI_API_KEY`	OpenAI API key	`sk-...`
`LDR_SEARCH_TOOL`	Default search engine	`auto`, `arxiv`, `wikipedia`
`LDR_SEARCH_SEARCH_STRATEGY`	Default strategy	`source-based`, `focused-iteration`

See MCP Server Guide for full documentation.

Common Engine Names

When using the rate limiting CLI, use these engine class names:

Engine	Class Name
DuckDuckGo	`DuckDuckGoSearchEngine`
SearXNG	`SearXNGSearchEngine`
Brave	`BraveSearchEngine`
arXiv	`ArXivSearchEngine`
PubMed	`PubMedSearchEngine`
Semantic Scholar	`SemanticScholarSearchEngine`
Wikipedia	`WikipediaSearchEngine`
GitHub	`GitHubSearchEngine`

Troubleshooting

Benchmark Not Starting

Verify LLM provider is configured
Check search engine is available
Ensure sufficient disk space for results

Rate Limit Data Missing

Run some searches first to generate data
Check database file exists
Try status without --engine flag

Export Permission Error

Check write permissions on output directory
Use a different output directory

7.8 KiB

Raw Blame History

CLI Tools Reference

Table of Contents

Benchmarking CLI

Basic Usage

Commands

`simpleqa` - Run SimpleQA Benchmark

`browsecomp` - Run BrowseComp Benchmark

`compare` - Compare Configurations

`list` - List Available Benchmarks

Rate Limiting CLI

Basic Usage

Commands

`status` - Show Rate Limit Statistics

`reset` - Reset Engine Rate Limits

`export` - Export Rate Limit Data

`cleanup` - Remove Old Data

MCP Server CLI

Basic Usage

Environment Variables

Common Engine Names

Troubleshooting

Benchmark Not Starting

Rate Limit Data Missing

Export Permission Error

See Also

7.8 KiB Raw Blame History

CLI Tools Reference

Table of Contents

Benchmarking CLI

Basic Usage

Commands

simpleqa - Run SimpleQA Benchmark

browsecomp - Run BrowseComp Benchmark

compare - Compare Configurations

list - List Available Benchmarks

Rate Limiting CLI

Basic Usage

Commands

status - Show Rate Limit Statistics

reset - Reset Engine Rate Limits

export - Export Rate Limit Data

cleanup - Remove Old Data

MCP Server CLI

Basic Usage

Environment Variables

Common Engine Names

Troubleshooting

Benchmark Not Starting

Rate Limit Data Missing

Export Permission Error

See Also

7.8 KiB

Raw Blame History

`simpleqa` - Run SimpleQA Benchmark

`browsecomp` - Run BrowseComp Benchmark

`compare` - Compare Configurations

`list` - List Available Benchmarks

`status` - Show Rate Limit Statistics

`reset` - Reset Engine Rate Limits

`export` - Export Rate Limit Data

`cleanup` - Remove Old Data