Files
local-deep-research/examples/optimization/example_multi_benchmark.py
LearningCircuit 12160e26e1 chore(lint): add ruff rules for logging, performance, exceptions, and print detection (#3211)
* chore(lint): add ruff rules for logging, performance, exceptions, and print detection

Add wave 2 lint rules: G, PERF, RET, TRY, T20, C4, ERA. All existing
violations are suppressed via ignore/per-file-ignores so this config
change is merge-safe. Follow-up PRs will fix violations and remove the
ignore entries incrementally.

* fix(lint): exempt pre-commit hooks from T201 print rule (#3270)

Pre-commit hooks are CLI scripts where print is the intended output
interface, same as scripts/ and cli/ directories already exempted.

* fix(lint): fix all low-count ruff violations instead of suppressing them (#3275)

* fix(lint): replace manual dict-building loops with dict comprehensions (PERF403)

* fix(lint): replace bare Exception raises with specific built-in types (TRY002)

Replace all `raise Exception(...)` in production code with appropriate
built-in exception types: RuntimeError for operational/state failures,
ValueError for invalid data, and ConnectionError for HTTP errors.

* fix(lint): resolve TRY004 and PERF402 ruff violations

Use TypeError instead of ValueError for isinstance/issubclass type
checks (TRY004), and replace manual for-loop list copies with
list.extend() (PERF402).

* fix(lint): fix all low-count ruff violations instead of suppressing them

Fix all violations for 15 ruff rules that had ≤10 occurrences each,
rather than suppressing them with ignore directives:

- TRY002: raise-vanilla-class → use specific built-in exceptions
- TRY004: type-check-without-type-error → use TypeError
- C408: unnecessary-collection-call → use dict/list literals
- C401: unnecessary-generator-set → use set comprehensions
- C416: unnecessary-comprehension → use list()/set()
- C414: unnecessary-double-cast-or-process → simplify
- PERF403: manual-dict-comprehension → use dict comprehensions
- PERF102: incorrect-dict-iterator → use .values()/.keys()
- PERF402: manual-list-copy → use list.extend()
- RET503/RET506/RET507/RET508: superfluous else after return/raise/continue/break
- RET501/RET502: unnecessary/implicit return None

Adds per-file-ignores for tests/ and examples/ where these patterns
are acceptable (e.g. bare Exception in tests, dict() calls in fixtures).

* fix(lint): enforce E722, ERA001, RET505 and fix pre-commit RET503 gap (#3276)

Remove three rules from the global ignore list by fixing all violations:

E722 (bare except) — 6 violations in tests:
  Replace `except:` with `except Exception:` to avoid swallowing
  KeyboardInterrupt and SystemExit.

ERA001 (commented-out code) — 25 violations:
  Delete 18 true positives (dead variables, disabled debug logs,
  commented-out imports). Add `# noqa: ERA001` to 7 false positives
  (template instructions, type annotations, documentation comments).

RET505 (superfluous else after return) — 413 violations:
  Auto-fix all occurrences. Also fixes 5 cascading RET506/RET507
  violations exposed by the RET505 removals.

Pre-commit hooks gap:
  Add RET503 to `.pre-commit-hooks/**` per-file-ignores alongside T201.

* fix(lint): enforce RET504 and TRY301 — fix all violations (#3279)

* fix(lint): enforce RET504 — collapse unnecessary assign-before-return

Auto-fix all 46 RET504 violations via ruff unsafe-fixes: collapse
`result = expr; return result` into `return expr`.

Remove RET504 from global ignore list. Add to tests/examples
per-file-ignores where intermediate variables aid test clarity.

Also removes TRY301 from global ignore (violations fixed in next commit).

* fix(lint): enforce TRY301 — fix raises inside broad try/except blocks

Structural fixes for 65 TRY301 violations:

Security-critical fixes:
- url_validator.py: move 6 validation raises before try block,
  replace isinstance-based re-raise with specific except clause
- path_validator.py: move validation outside try block
- env_settings.py: separate parsing (try) from validation (outside)

Route/service fixes:
- research_routes.py: replace raise-then-catch with direct error return
- mcp/server.py: move all 7 tool validations before try blocks
- news/api.py: move validation before try, noqa for db-session raises
- notifications: move rate limit and URL validation before try blocks
- iterative_refinement_strategy.py: move JSON validation after try

Added noqa for intentional patterns: re-raise in except handlers,
nested function definitions, db-session-dependent checks, rate limit
re-raises for base class retry logic.

* merge: resolve conflicts between wave2 lint branch and main

Resolve 14 merge conflicts by always starting from main's version
and re-applying lint fixes on top:

- mcp_strategy.py, ollama.py, security_settings.py, delete_routes.py:
  Take main's code, re-apply RET505 (remove else: after return)
- mcp/server.py (3 conflicts): Take main's ValidationError handlers
  and set_settings_context, re-apply TRY301 fixes, fix sensitive
  data logging
- research_routes.py: Take main, fix duplicate block (merge artifact)
- settings_routes.py: Take main's default-settings fallback feature
- meta_search_engine.py, parallel_search_engine.py: Take main's
  get_available_engines delegation, delete unreachable code
- search_engine_ddg.py, search_engine_google_pse.py: Take main's
  sanitization, re-apply RET506 (if not elif after raise)
- rag_routes.py: Accept main's deletion (route moved to delete_routes)
- encryption_check.py: Accept main's deletion (dead code)
- test_storage_coverage.py: Remove broken test classes referencing
  undefined stubs
- pre-commit hooks: extend per-file-ignores for ERA001, RET504

* fix: revert ValueError→TypeError changes that break tests and API contracts

Revert TRY004 fixes in 3 files where changing ValueError to TypeError
would break existing tests and HTTP status code contracts:

- card_factory.py: 5 tests assert pytest.raises(ValueError)
- base_rater.py: flask_api.py catches ValueError for HTTP 400 responses;
  TypeError would fall through to HTTP 500
- full_search.py: test asserts pytest.raises(ValueError)

Add # noqa: TRY004 to suppress the lint rule on these lines.

* fix: move benchmark_data check back inside try block

The ValueError for missing benchmark_data must be inside the try/except
so the except handler can mark the run as FAILED in the database.
Without this, the exception propagates unhandled in a daemon thread,
leaving the benchmark run stuck in RUNNING state permanently.

* chore(lint): remove ERA rule and suppress TRY004 globally

Remove ERA (eradicate — commented-out code detection) from ruff select:
- 28% false positive rate in our codebase (7 of 25 violations)
- No major Python project enables it (Django, FastAPI, Pydantic, Airflow)
- Ruff itself doesn't use it; autofix was demoted to manual-only
- 172 noqa suppressions provided zero enforcement value

Suppress TRY004 (type-check-without-type-error) globally:
- Ruff maintainer agreed the autofix "can change functionality"
- We already had to revert 3 TypeError changes that broke tests
  and HTTP 400→500 API contracts
- Django, Flask, pandas all use isinstance + ValueError routinely
- Pylint has no equivalent rule; near-zero PyPI adoption

Remove all 173 # noqa: ERA001 and 49 # noqa: TRY004 comments
from the codebase — no longer needed with rules disabled/suppressed.

* fix: resolve mypy errors, failing MCP test, and TRY301 noqa

- search_engine_factory.py: restore typed intermediate variable to fix
  mypy no-any-return (RET504 collapse lost the type annotation)
- search_engine_pubchem.py: add explicit list[str] type annotation
- test_edge_cases.py: fix assertion that expected engine name in
  sanitized error message
- mcp/server.py: add noqa: TRY301 to validation raises inside try
  blocks (from main's new merge code)
2026-03-29 17:01:23 +02:00

195 lines
6.6 KiB
Python

"""
Example of multi-benchmark optimization using weighted benchmarks.
This script demonstrates how to use the optimization system with both
SimpleQA and BrowseComp benchmarks with custom weights.
"""
import os
import sys
from datetime import datetime
from pathlib import Path
from typing import Any, Dict
# Print current directory and python path for debugging
print(f"Current directory: {os.getcwd()}")
print(f"Python path: {sys.path}")
# Add appropriate paths
sys.path.insert(0, str(Path(__file__).parent.parent.resolve()))
try:
# Try to import from the local module structure
from src.local_deep_research.benchmarks.optimization.optuna_optimizer import (
optimize_for_quality,
optimize_for_speed,
optimize_parameters,
)
print("Successfully imported using src.local_deep_research path")
except ImportError:
print("First import attempt failed, trying with direct import...")
try:
# Try to import directly
from local_deep_research.benchmarks.optimization.optuna_optimizer import (
optimize_for_quality,
optimize_for_speed,
optimize_parameters,
)
print("Successfully imported using local_deep_research path")
except ImportError as e:
print(f"Import error: {e}")
print("Creating simulation functions for demonstration only...")
# Create simulation functions if imports fail
def optimize_parameters(*args, **kwargs):
benchmark_weights = kwargs.get(
"benchmark_weights", {"simpleqa": 1.0}
)
print(
f"SIMULATION: optimize_parameters called with benchmark_weights={benchmark_weights}"
)
# Return different results based on the benchmark weights
if (
"browsecomp" in benchmark_weights
and benchmark_weights["browsecomp"] >= 1.0
):
# BrowseComp only
return {
"iterations": 4,
"questions_per_iteration": 5,
"search_strategy": "parallel",
}, 0.78
if (
"browsecomp" in benchmark_weights
and benchmark_weights["browsecomp"] > 0
):
# Mixed weights
return {
"iterations": 2,
"questions_per_iteration": 2,
"search_strategy": "iterdrag",
}, 0.81
# SimpleQA only (default)
return {
"iterations": 3,
"questions_per_iteration": 2,
"search_strategy": "standard",
}, 0.75
def optimize_for_quality(*args, **kwargs):
benchmark_weights = kwargs.get(
"benchmark_weights", {"simpleqa": 1.0}
)
print(
f"SIMULATION: optimize_for_quality called with benchmark_weights={benchmark_weights}"
)
return {
"iterations": 4,
"questions_per_iteration": 1,
"search_strategy": "iterdrag",
}, 0.85
def optimize_for_speed(*args, **kwargs):
benchmark_weights = kwargs.get(
"benchmark_weights", {"simpleqa": 1.0}
)
print(
f"SIMULATION: optimize_for_speed called with benchmark_weights={benchmark_weights}"
)
return {
"iterations": 2,
"questions_per_iteration": 2,
"search_strategy": "rapid",
}, 0.67
# Loguru automatically handles logging configuration
def print_optimization_results(params: Dict[str, Any], score: float):
"""Print optimization results in a nicely formatted way."""
print("\n" + "=" * 50)
print(" OPTIMIZATION RESULTS ")
print("=" * 50)
print(f"SCORE: {score:.4f}")
print("\nBest Parameters:")
for param, value in params.items():
print(f" {param}: {value}")
print("=" * 50 + "\n")
def main():
"""Run the multi-benchmark optimization examples."""
# Create a timestamp-based directory for results
from datetime import timezone
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
output_dir = f"optimization_demo_{timestamp}"
os.makedirs(output_dir, exist_ok=True)
# Research query for optimization examples
query = "Recent advancements in renewable energy"
# Example 1: SimpleQA only (default)
print("\n🔍 Running optimization with SimpleQA benchmark only...")
params1, score1 = optimize_parameters(
query=query,
n_trials=3, # Using a small number for quick demonstration
output_dir=str(Path(output_dir) / "simpleqa_only"),
)
print_optimization_results(params1, score1)
# Example 2: BrowseComp only
print("\n🔍 Running optimization with BrowseComp benchmark only...")
params2, score2 = optimize_parameters(
query=query,
n_trials=3, # Using a small number for quick demonstration
output_dir=str(Path(output_dir) / "browsecomp_only"),
benchmark_weights={"browsecomp": 1.0},
)
print_optimization_results(params2, score2)
# Example 3: 60/40 weighted combination (SimpleQA/BrowseComp)
print("\n🔍 Running optimization with 60% SimpleQA and 40% BrowseComp...")
params3, score3 = optimize_parameters(
query=query,
n_trials=5, # Using a small number for quick demonstration
output_dir=str(Path(output_dir) / "weighted_combination"),
benchmark_weights={
"simpleqa": 0.6, # 60% weight for SimpleQA
"browsecomp": 0.4, # 40% weight for BrowseComp
},
)
print_optimization_results(params3, score3)
# Example 4: Quality-focused with both benchmarks
print("\n🔍 Running quality-focused optimization with both benchmarks...")
params4, score4 = optimize_for_quality(
query=query,
n_trials=3,
output_dir=str(Path(output_dir) / "quality_focused"),
benchmark_weights={"simpleqa": 0.6, "browsecomp": 0.4},
)
print_optimization_results(params4, score4)
# Example 5: Speed-focused with both benchmarks
print("\n🔍 Running speed-focused optimization with both benchmarks...")
params5, score5 = optimize_for_speed(
query=query,
n_trials=3,
output_dir=str(Path(output_dir) / "speed_focused"),
benchmark_weights={"simpleqa": 0.5, "browsecomp": 0.5},
)
print_optimization_results(params5, score5)
print(f"\nAll optimization results saved to: {output_dir}")
print("View the results directory for detailed logs and visualizations.")
if __name__ == "__main__":
main()