mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-16 03:51:07 +03:00
* fix: detect redundant exception args in logger.exception pre-commit hook
The regex-based check only caught f-string interpolation ({e}) but missed
%-style formatting (logger.exception("..%s..", e)), str(e)/repr(e) as
arguments, and {str(e)} inside f-strings. Replaced the regex with an
AST-based check that reliably catches all forms.
* fix: remove redundant exception vars from logger.exception calls
logger.exception() automatically includes the full traceback, so
passing the exception variable (e, exc, err, etc.) is redundant.
Removed 143 instances across 71 files including f-string {e},
%-style formatting, and bare exception arg patterns.
* fix: enhance hook with string concat detection and fix remaining violations
- Extract exception var names to class constant EXCEPTION_VAR_NAMES
- Add _string_concat_references_exception() to detect "Error: " + str(e)
- Fix 7 remaining files in examples/ and tests/ that failed CI
* fix: restore missing space in notification manager log message
Benchmarks for Local Deep Research
This directory contains scripts for running benchmarks to evaluate Local Deep Research's performance.
Available Benchmarks
SimpleQA
The SimpleQA benchmark evaluates factual question answering capabilities.
python run_simpleqa.py --examples 10 --iterations 3 --questions 3
Options:
--examples: Number of examples to run (default: 10)--iterations: Number of search iterations (default: 3)--questions: Questions per iteration (default: 3)--search-tool: Search tool to use (default: "searxng")--output-dir: Directory to save results (default: "benchmark_results")--no-eval: Skip evaluation--human-eval: Use human evaluation--eval-model: Model to use for evaluation--eval-provider: Provider to use for evaluation
BrowseComp
The BrowseComp benchmark evaluates web browsing comprehension and complex question answering.
python run_browsecomp.py --examples 5 --iterations 3 --questions 3
Options:
--examples: Number of examples to run (default: 2)--iterations: Number of search iterations (default: 1)--questions: Questions per iteration (default: 1)--search-tool: Search tool to use (default: "searxng")--output-dir: Directory to save results (default: "browsecomp_results")
See browsecomp_benchmark_readme.md for more information on how BrowseComp works.
Running All Benchmarks
To run both benchmarks and compare results:
# Run SimpleQA with default settings
python run_simpleqa.py
# Run BrowseComp with increased iterations and questions
python run_browsecomp.py --iterations 3 --questions 3
Evaluating Results
Results are saved in the specified output directories and include:
- Raw results (JSONL format)
- Evaluation results (JSONL format)
- Summary reports (Markdown format)
The scripts will also print a summary of the results to the console, including accuracy metrics.