mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-16 12:02:34 +03:00
* chore(lint): add ruff rules for logging, performance, exceptions, and print detection Add wave 2 lint rules: G, PERF, RET, TRY, T20, C4, ERA. All existing violations are suppressed via ignore/per-file-ignores so this config change is merge-safe. Follow-up PRs will fix violations and remove the ignore entries incrementally. * fix(lint): exempt pre-commit hooks from T201 print rule (#3270) Pre-commit hooks are CLI scripts where print is the intended output interface, same as scripts/ and cli/ directories already exempted. * fix(lint): fix all low-count ruff violations instead of suppressing them (#3275) * fix(lint): replace manual dict-building loops with dict comprehensions (PERF403) * fix(lint): replace bare Exception raises with specific built-in types (TRY002) Replace all `raise Exception(...)` in production code with appropriate built-in exception types: RuntimeError for operational/state failures, ValueError for invalid data, and ConnectionError for HTTP errors. * fix(lint): resolve TRY004 and PERF402 ruff violations Use TypeError instead of ValueError for isinstance/issubclass type checks (TRY004), and replace manual for-loop list copies with list.extend() (PERF402). * fix(lint): fix all low-count ruff violations instead of suppressing them Fix all violations for 15 ruff rules that had ≤10 occurrences each, rather than suppressing them with ignore directives: - TRY002: raise-vanilla-class → use specific built-in exceptions - TRY004: type-check-without-type-error → use TypeError - C408: unnecessary-collection-call → use dict/list literals - C401: unnecessary-generator-set → use set comprehensions - C416: unnecessary-comprehension → use list()/set() - C414: unnecessary-double-cast-or-process → simplify - PERF403: manual-dict-comprehension → use dict comprehensions - PERF102: incorrect-dict-iterator → use .values()/.keys() - PERF402: manual-list-copy → use list.extend() - RET503/RET506/RET507/RET508: superfluous else after return/raise/continue/break - RET501/RET502: unnecessary/implicit return None Adds per-file-ignores for tests/ and examples/ where these patterns are acceptable (e.g. bare Exception in tests, dict() calls in fixtures). * fix(lint): enforce E722, ERA001, RET505 and fix pre-commit RET503 gap (#3276) Remove three rules from the global ignore list by fixing all violations: E722 (bare except) — 6 violations in tests: Replace `except:` with `except Exception:` to avoid swallowing KeyboardInterrupt and SystemExit. ERA001 (commented-out code) — 25 violations: Delete 18 true positives (dead variables, disabled debug logs, commented-out imports). Add `# noqa: ERA001` to 7 false positives (template instructions, type annotations, documentation comments). RET505 (superfluous else after return) — 413 violations: Auto-fix all occurrences. Also fixes 5 cascading RET506/RET507 violations exposed by the RET505 removals. Pre-commit hooks gap: Add RET503 to `.pre-commit-hooks/**` per-file-ignores alongside T201. * fix(lint): enforce RET504 and TRY301 — fix all violations (#3279) * fix(lint): enforce RET504 — collapse unnecessary assign-before-return Auto-fix all 46 RET504 violations via ruff unsafe-fixes: collapse `result = expr; return result` into `return expr`. Remove RET504 from global ignore list. Add to tests/examples per-file-ignores where intermediate variables aid test clarity. Also removes TRY301 from global ignore (violations fixed in next commit). * fix(lint): enforce TRY301 — fix raises inside broad try/except blocks Structural fixes for 65 TRY301 violations: Security-critical fixes: - url_validator.py: move 6 validation raises before try block, replace isinstance-based re-raise with specific except clause - path_validator.py: move validation outside try block - env_settings.py: separate parsing (try) from validation (outside) Route/service fixes: - research_routes.py: replace raise-then-catch with direct error return - mcp/server.py: move all 7 tool validations before try blocks - news/api.py: move validation before try, noqa for db-session raises - notifications: move rate limit and URL validation before try blocks - iterative_refinement_strategy.py: move JSON validation after try Added noqa for intentional patterns: re-raise in except handlers, nested function definitions, db-session-dependent checks, rate limit re-raises for base class retry logic. * merge: resolve conflicts between wave2 lint branch and main Resolve 14 merge conflicts by always starting from main's version and re-applying lint fixes on top: - mcp_strategy.py, ollama.py, security_settings.py, delete_routes.py: Take main's code, re-apply RET505 (remove else: after return) - mcp/server.py (3 conflicts): Take main's ValidationError handlers and set_settings_context, re-apply TRY301 fixes, fix sensitive data logging - research_routes.py: Take main, fix duplicate block (merge artifact) - settings_routes.py: Take main's default-settings fallback feature - meta_search_engine.py, parallel_search_engine.py: Take main's get_available_engines delegation, delete unreachable code - search_engine_ddg.py, search_engine_google_pse.py: Take main's sanitization, re-apply RET506 (if not elif after raise) - rag_routes.py: Accept main's deletion (route moved to delete_routes) - encryption_check.py: Accept main's deletion (dead code) - test_storage_coverage.py: Remove broken test classes referencing undefined stubs - pre-commit hooks: extend per-file-ignores for ERA001, RET504 * fix: revert ValueError→TypeError changes that break tests and API contracts Revert TRY004 fixes in 3 files where changing ValueError to TypeError would break existing tests and HTTP status code contracts: - card_factory.py: 5 tests assert pytest.raises(ValueError) - base_rater.py: flask_api.py catches ValueError for HTTP 400 responses; TypeError would fall through to HTTP 500 - full_search.py: test asserts pytest.raises(ValueError) Add # noqa: TRY004 to suppress the lint rule on these lines. * fix: move benchmark_data check back inside try block The ValueError for missing benchmark_data must be inside the try/except so the except handler can mark the run as FAILED in the database. Without this, the exception propagates unhandled in a daemon thread, leaving the benchmark run stuck in RUNNING state permanently. * chore(lint): remove ERA rule and suppress TRY004 globally Remove ERA (eradicate — commented-out code detection) from ruff select: - 28% false positive rate in our codebase (7 of 25 violations) - No major Python project enables it (Django, FastAPI, Pydantic, Airflow) - Ruff itself doesn't use it; autofix was demoted to manual-only - 172 noqa suppressions provided zero enforcement value Suppress TRY004 (type-check-without-type-error) globally: - Ruff maintainer agreed the autofix "can change functionality" - We already had to revert 3 TypeError changes that broke tests and HTTP 400→500 API contracts - Django, Flask, pandas all use isinstance + ValueError routinely - Pylint has no equivalent rule; near-zero PyPI adoption Remove all 173 # noqa: ERA001 and 49 # noqa: TRY004 comments from the codebase — no longer needed with rules disabled/suppressed. * fix: resolve mypy errors, failing MCP test, and TRY301 noqa - search_engine_factory.py: restore typed intermediate variable to fix mypy no-any-return (RET504 collapse lost the type annotation) - search_engine_pubchem.py: add explicit list[str] type annotation - test_edge_cases.py: fix assertion that expected engine name in sanitized error message - mcp/server.py: add noqa: TRY301 to validation raises inside try blocks (from main's new merge code)
195 lines
6.6 KiB
Python
195 lines
6.6 KiB
Python
"""
|
|
Example of multi-benchmark optimization using weighted benchmarks.
|
|
|
|
This script demonstrates how to use the optimization system with both
|
|
SimpleQA and BrowseComp benchmarks with custom weights.
|
|
"""
|
|
|
|
import os
|
|
import sys
|
|
from datetime import datetime
|
|
from pathlib import Path
|
|
from typing import Any, Dict
|
|
|
|
|
|
# Print current directory and python path for debugging
|
|
print(f"Current directory: {os.getcwd()}")
|
|
print(f"Python path: {sys.path}")
|
|
|
|
# Add appropriate paths
|
|
sys.path.insert(0, str(Path(__file__).parent.parent.resolve()))
|
|
|
|
try:
|
|
# Try to import from the local module structure
|
|
from src.local_deep_research.benchmarks.optimization.optuna_optimizer import (
|
|
optimize_for_quality,
|
|
optimize_for_speed,
|
|
optimize_parameters,
|
|
)
|
|
|
|
print("Successfully imported using src.local_deep_research path")
|
|
except ImportError:
|
|
print("First import attempt failed, trying with direct import...")
|
|
try:
|
|
# Try to import directly
|
|
from local_deep_research.benchmarks.optimization.optuna_optimizer import (
|
|
optimize_for_quality,
|
|
optimize_for_speed,
|
|
optimize_parameters,
|
|
)
|
|
|
|
print("Successfully imported using local_deep_research path")
|
|
except ImportError as e:
|
|
print(f"Import error: {e}")
|
|
print("Creating simulation functions for demonstration only...")
|
|
|
|
# Create simulation functions if imports fail
|
|
def optimize_parameters(*args, **kwargs):
|
|
benchmark_weights = kwargs.get(
|
|
"benchmark_weights", {"simpleqa": 1.0}
|
|
)
|
|
print(
|
|
f"SIMULATION: optimize_parameters called with benchmark_weights={benchmark_weights}"
|
|
)
|
|
|
|
# Return different results based on the benchmark weights
|
|
if (
|
|
"browsecomp" in benchmark_weights
|
|
and benchmark_weights["browsecomp"] >= 1.0
|
|
):
|
|
# BrowseComp only
|
|
return {
|
|
"iterations": 4,
|
|
"questions_per_iteration": 5,
|
|
"search_strategy": "parallel",
|
|
}, 0.78
|
|
if (
|
|
"browsecomp" in benchmark_weights
|
|
and benchmark_weights["browsecomp"] > 0
|
|
):
|
|
# Mixed weights
|
|
return {
|
|
"iterations": 2,
|
|
"questions_per_iteration": 2,
|
|
"search_strategy": "iterdrag",
|
|
}, 0.81
|
|
# SimpleQA only (default)
|
|
return {
|
|
"iterations": 3,
|
|
"questions_per_iteration": 2,
|
|
"search_strategy": "standard",
|
|
}, 0.75
|
|
|
|
def optimize_for_quality(*args, **kwargs):
|
|
benchmark_weights = kwargs.get(
|
|
"benchmark_weights", {"simpleqa": 1.0}
|
|
)
|
|
print(
|
|
f"SIMULATION: optimize_for_quality called with benchmark_weights={benchmark_weights}"
|
|
)
|
|
return {
|
|
"iterations": 4,
|
|
"questions_per_iteration": 1,
|
|
"search_strategy": "iterdrag",
|
|
}, 0.85
|
|
|
|
def optimize_for_speed(*args, **kwargs):
|
|
benchmark_weights = kwargs.get(
|
|
"benchmark_weights", {"simpleqa": 1.0}
|
|
)
|
|
print(
|
|
f"SIMULATION: optimize_for_speed called with benchmark_weights={benchmark_weights}"
|
|
)
|
|
return {
|
|
"iterations": 2,
|
|
"questions_per_iteration": 2,
|
|
"search_strategy": "rapid",
|
|
}, 0.67
|
|
|
|
|
|
# Loguru automatically handles logging configuration
|
|
|
|
|
|
def print_optimization_results(params: Dict[str, Any], score: float):
|
|
"""Print optimization results in a nicely formatted way."""
|
|
print("\n" + "=" * 50)
|
|
print(" OPTIMIZATION RESULTS ")
|
|
print("=" * 50)
|
|
print(f"SCORE: {score:.4f}")
|
|
print("\nBest Parameters:")
|
|
for param, value in params.items():
|
|
print(f" {param}: {value}")
|
|
print("=" * 50 + "\n")
|
|
|
|
|
|
def main():
|
|
"""Run the multi-benchmark optimization examples."""
|
|
# Create a timestamp-based directory for results
|
|
from datetime import timezone
|
|
|
|
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
|
|
output_dir = f"optimization_demo_{timestamp}"
|
|
os.makedirs(output_dir, exist_ok=True)
|
|
|
|
# Research query for optimization examples
|
|
query = "Recent advancements in renewable energy"
|
|
|
|
# Example 1: SimpleQA only (default)
|
|
print("\n🔍 Running optimization with SimpleQA benchmark only...")
|
|
params1, score1 = optimize_parameters(
|
|
query=query,
|
|
n_trials=3, # Using a small number for quick demonstration
|
|
output_dir=str(Path(output_dir) / "simpleqa_only"),
|
|
)
|
|
print_optimization_results(params1, score1)
|
|
|
|
# Example 2: BrowseComp only
|
|
print("\n🔍 Running optimization with BrowseComp benchmark only...")
|
|
params2, score2 = optimize_parameters(
|
|
query=query,
|
|
n_trials=3, # Using a small number for quick demonstration
|
|
output_dir=str(Path(output_dir) / "browsecomp_only"),
|
|
benchmark_weights={"browsecomp": 1.0},
|
|
)
|
|
print_optimization_results(params2, score2)
|
|
|
|
# Example 3: 60/40 weighted combination (SimpleQA/BrowseComp)
|
|
print("\n🔍 Running optimization with 60% SimpleQA and 40% BrowseComp...")
|
|
params3, score3 = optimize_parameters(
|
|
query=query,
|
|
n_trials=5, # Using a small number for quick demonstration
|
|
output_dir=str(Path(output_dir) / "weighted_combination"),
|
|
benchmark_weights={
|
|
"simpleqa": 0.6, # 60% weight for SimpleQA
|
|
"browsecomp": 0.4, # 40% weight for BrowseComp
|
|
},
|
|
)
|
|
print_optimization_results(params3, score3)
|
|
|
|
# Example 4: Quality-focused with both benchmarks
|
|
print("\n🔍 Running quality-focused optimization with both benchmarks...")
|
|
params4, score4 = optimize_for_quality(
|
|
query=query,
|
|
n_trials=3,
|
|
output_dir=str(Path(output_dir) / "quality_focused"),
|
|
benchmark_weights={"simpleqa": 0.6, "browsecomp": 0.4},
|
|
)
|
|
print_optimization_results(params4, score4)
|
|
|
|
# Example 5: Speed-focused with both benchmarks
|
|
print("\n🔍 Running speed-focused optimization with both benchmarks...")
|
|
params5, score5 = optimize_for_speed(
|
|
query=query,
|
|
n_trials=3,
|
|
output_dir=str(Path(output_dir) / "speed_focused"),
|
|
benchmark_weights={"simpleqa": 0.5, "browsecomp": 0.5},
|
|
)
|
|
print_optimization_results(params5, score5)
|
|
|
|
print(f"\nAll optimization results saved to: {output_dir}")
|
|
print("View the results directory for detailed logs and visualizations.")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|