mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
* chore(lint): add ruff rules for logging, performance, exceptions, and print detection Add wave 2 lint rules: G, PERF, RET, TRY, T20, C4, ERA. All existing violations are suppressed via ignore/per-file-ignores so this config change is merge-safe. Follow-up PRs will fix violations and remove the ignore entries incrementally. * fix(lint): exempt pre-commit hooks from T201 print rule (#3270) Pre-commit hooks are CLI scripts where print is the intended output interface, same as scripts/ and cli/ directories already exempted. * fix(lint): fix all low-count ruff violations instead of suppressing them (#3275) * fix(lint): replace manual dict-building loops with dict comprehensions (PERF403) * fix(lint): replace bare Exception raises with specific built-in types (TRY002) Replace all `raise Exception(...)` in production code with appropriate built-in exception types: RuntimeError for operational/state failures, ValueError for invalid data, and ConnectionError for HTTP errors. * fix(lint): resolve TRY004 and PERF402 ruff violations Use TypeError instead of ValueError for isinstance/issubclass type checks (TRY004), and replace manual for-loop list copies with list.extend() (PERF402). * fix(lint): fix all low-count ruff violations instead of suppressing them Fix all violations for 15 ruff rules that had ≤10 occurrences each, rather than suppressing them with ignore directives: - TRY002: raise-vanilla-class → use specific built-in exceptions - TRY004: type-check-without-type-error → use TypeError - C408: unnecessary-collection-call → use dict/list literals - C401: unnecessary-generator-set → use set comprehensions - C416: unnecessary-comprehension → use list()/set() - C414: unnecessary-double-cast-or-process → simplify - PERF403: manual-dict-comprehension → use dict comprehensions - PERF102: incorrect-dict-iterator → use .values()/.keys() - PERF402: manual-list-copy → use list.extend() - RET503/RET506/RET507/RET508: superfluous else after return/raise/continue/break - RET501/RET502: unnecessary/implicit return None Adds per-file-ignores for tests/ and examples/ where these patterns are acceptable (e.g. bare Exception in tests, dict() calls in fixtures). * fix(lint): enforce E722, ERA001, RET505 and fix pre-commit RET503 gap (#3276) Remove three rules from the global ignore list by fixing all violations: E722 (bare except) — 6 violations in tests: Replace `except:` with `except Exception:` to avoid swallowing KeyboardInterrupt and SystemExit. ERA001 (commented-out code) — 25 violations: Delete 18 true positives (dead variables, disabled debug logs, commented-out imports). Add `# noqa: ERA001` to 7 false positives (template instructions, type annotations, documentation comments). RET505 (superfluous else after return) — 413 violations: Auto-fix all occurrences. Also fixes 5 cascading RET506/RET507 violations exposed by the RET505 removals. Pre-commit hooks gap: Add RET503 to `.pre-commit-hooks/**` per-file-ignores alongside T201. * fix(lint): enforce RET504 and TRY301 — fix all violations (#3279) * fix(lint): enforce RET504 — collapse unnecessary assign-before-return Auto-fix all 46 RET504 violations via ruff unsafe-fixes: collapse `result = expr; return result` into `return expr`. Remove RET504 from global ignore list. Add to tests/examples per-file-ignores where intermediate variables aid test clarity. Also removes TRY301 from global ignore (violations fixed in next commit). * fix(lint): enforce TRY301 — fix raises inside broad try/except blocks Structural fixes for 65 TRY301 violations: Security-critical fixes: - url_validator.py: move 6 validation raises before try block, replace isinstance-based re-raise with specific except clause - path_validator.py: move validation outside try block - env_settings.py: separate parsing (try) from validation (outside) Route/service fixes: - research_routes.py: replace raise-then-catch with direct error return - mcp/server.py: move all 7 tool validations before try blocks - news/api.py: move validation before try, noqa for db-session raises - notifications: move rate limit and URL validation before try blocks - iterative_refinement_strategy.py: move JSON validation after try Added noqa for intentional patterns: re-raise in except handlers, nested function definitions, db-session-dependent checks, rate limit re-raises for base class retry logic. * merge: resolve conflicts between wave2 lint branch and main Resolve 14 merge conflicts by always starting from main's version and re-applying lint fixes on top: - mcp_strategy.py, ollama.py, security_settings.py, delete_routes.py: Take main's code, re-apply RET505 (remove else: after return) - mcp/server.py (3 conflicts): Take main's ValidationError handlers and set_settings_context, re-apply TRY301 fixes, fix sensitive data logging - research_routes.py: Take main, fix duplicate block (merge artifact) - settings_routes.py: Take main's default-settings fallback feature - meta_search_engine.py, parallel_search_engine.py: Take main's get_available_engines delegation, delete unreachable code - search_engine_ddg.py, search_engine_google_pse.py: Take main's sanitization, re-apply RET506 (if not elif after raise) - rag_routes.py: Accept main's deletion (route moved to delete_routes) - encryption_check.py: Accept main's deletion (dead code) - test_storage_coverage.py: Remove broken test classes referencing undefined stubs - pre-commit hooks: extend per-file-ignores for ERA001, RET504 * fix: revert ValueError→TypeError changes that break tests and API contracts Revert TRY004 fixes in 3 files where changing ValueError to TypeError would break existing tests and HTTP status code contracts: - card_factory.py: 5 tests assert pytest.raises(ValueError) - base_rater.py: flask_api.py catches ValueError for HTTP 400 responses; TypeError would fall through to HTTP 500 - full_search.py: test asserts pytest.raises(ValueError) Add # noqa: TRY004 to suppress the lint rule on these lines. * fix: move benchmark_data check back inside try block The ValueError for missing benchmark_data must be inside the try/except so the except handler can mark the run as FAILED in the database. Without this, the exception propagates unhandled in a daemon thread, leaving the benchmark run stuck in RUNNING state permanently. * chore(lint): remove ERA rule and suppress TRY004 globally Remove ERA (eradicate — commented-out code detection) from ruff select: - 28% false positive rate in our codebase (7 of 25 violations) - No major Python project enables it (Django, FastAPI, Pydantic, Airflow) - Ruff itself doesn't use it; autofix was demoted to manual-only - 172 noqa suppressions provided zero enforcement value Suppress TRY004 (type-check-without-type-error) globally: - Ruff maintainer agreed the autofix "can change functionality" - We already had to revert 3 TypeError changes that broke tests and HTTP 400→500 API contracts - Django, Flask, pandas all use isinstance + ValueError routinely - Pylint has no equivalent rule; near-zero PyPI adoption Remove all 173 # noqa: ERA001 and 49 # noqa: TRY004 comments from the codebase — no longer needed with rules disabled/suppressed. * fix: resolve mypy errors, failing MCP test, and TRY301 noqa - search_engine_factory.py: restore typed intermediate variable to fix mypy no-any-return (RET504 collapse lost the type annotation) - search_engine_pubchem.py: add explicit list[str] type annotation - test_edge_cases.py: fix assertion that expected engine name in sanitized error message - mcp/server.py: add noqa: TRY301 to validation raises inside try blocks (from main's new merge code)
435 lines
16 KiB
Python
Executable File
435 lines
16 KiB
Python
Executable File
#!/usr/bin/env python3
|
|
"""
|
|
Simple HTTP API Example for Local Deep Research v1.0+
|
|
|
|
This example shows how to use the LDR API with authentication.
|
|
Works completely out of the box with automatic user creation.
|
|
|
|
================================================================================
|
|
IMPORTANT - LOCALHOST ONLY
|
|
================================================================================
|
|
This example ONLY works when connecting via localhost:
|
|
✅ http://localhost:5000
|
|
✅ http://127.0.0.1:5000
|
|
|
|
It will NOT work via http://192.168.x.x:5000 or other non-localhost addresses.
|
|
|
|
WHY: Session cookies require HTTPS for non-localhost (security).
|
|
|
|
SOLUTIONS for non-localhost:
|
|
1. HTTPS with reverse proxy (production)
|
|
2. SSH tunnel: ssh -L 5000:localhost:5000 user@server
|
|
3. TESTING=1 env var (INSECURE - dev only!)
|
|
|
|
WARNING: TESTING=1 disables cookie security. Never use in production.
|
|
================================================================================
|
|
"""
|
|
|
|
import requests
|
|
import time
|
|
import sys
|
|
from bs4 import BeautifulSoup
|
|
from pathlib import Path
|
|
|
|
# Add the src directory to Python path for programmatic user creation
|
|
sys.path.insert(0, str(Path(__file__).parent.parent.parent.parent / "src"))
|
|
|
|
from local_deep_research.database.encrypted_db import DatabaseManager
|
|
from local_deep_research.database.models import User
|
|
from local_deep_research.database.auth_db import auth_db_session
|
|
|
|
# Configuration
|
|
API_URL = "http://localhost:5000"
|
|
|
|
|
|
def create_test_user():
|
|
"""Create a test user programmatically."""
|
|
username = f"testuser_{int(time.time())}"
|
|
password = "testpassword123"
|
|
|
|
print(f"Creating test user: {username}")
|
|
|
|
try:
|
|
# Create user in auth database
|
|
with auth_db_session() as session:
|
|
new_user = User(username=username)
|
|
session.add(new_user)
|
|
session.commit()
|
|
|
|
# Create encrypted database for user
|
|
db_manager = DatabaseManager()
|
|
db_manager.create_user_database(username, password)
|
|
|
|
print(f"✅ User created successfully: {username}")
|
|
return username, password
|
|
|
|
except Exception as e:
|
|
print(f"❌ Failed to create user: {e}")
|
|
return None, None
|
|
|
|
|
|
def main():
|
|
print("=== LDR HTTP API Example ===")
|
|
print("🎯 This example works completely out of the box!\n")
|
|
|
|
print("⚠️ IMPORTANT NOTES:")
|
|
print(" • This script may take several minutes to complete")
|
|
print(" • Research progress can be monitored in the server logs")
|
|
print(" • Server logs are available at: /tmp/ldr_server.log")
|
|
print(
|
|
" • Use 'tail -f /tmp/ldr_server.log' to monitor progress in real-time"
|
|
)
|
|
print(" • Results will be available at the URL shown when complete\n")
|
|
|
|
# Check if server is running
|
|
try:
|
|
response = requests.get(f"{API_URL}/", timeout=5)
|
|
if response.status_code != 200:
|
|
print("❌ Server is not responding correctly")
|
|
print("\n📋 HOW TO START THE SERVER:")
|
|
print(" • Option 1: python -m local_deep_research.web.app")
|
|
print(
|
|
" • Option 2: bash scripts/dev/restart_server.sh (recommended)"
|
|
)
|
|
print(
|
|
" • Note: restart_server.sh will kill existing server process"
|
|
)
|
|
sys.exit(1)
|
|
print("✅ Server is running")
|
|
except Exception:
|
|
print(
|
|
"❌ Cannot connect to server. Please make sure it's running on http://localhost:5000"
|
|
)
|
|
print("\n📋 HOW TO START THE SERVER:")
|
|
print(" • Option 1: python -m local_deep_research.web.app")
|
|
print(" • Option 2: bash scripts/dev/restart_server.sh (recommended)")
|
|
print(" • Note: restart_server.sh will kill existing server process")
|
|
sys.exit(1)
|
|
|
|
# Create test user automatically
|
|
username, password = create_test_user()
|
|
if not username:
|
|
print("❌ Failed to create test user")
|
|
sys.exit(1)
|
|
|
|
# Create a session to persist cookies
|
|
session = requests.Session()
|
|
print(f"\nTesting with user: {username}")
|
|
|
|
# Step 1: Login
|
|
print("\n1. Authenticating...")
|
|
|
|
# Get login page and CSRF token
|
|
login_page = session.get(f"{API_URL}/auth/login")
|
|
soup = BeautifulSoup(login_page.text, "html.parser")
|
|
csrf_input = soup.find("input", {"name": "csrf_token"})
|
|
login_csrf = csrf_input.get("value")
|
|
|
|
if not login_csrf:
|
|
print("❌ Could not get CSRF token from login page")
|
|
sys.exit(1)
|
|
|
|
# Login with form data (not JSON)
|
|
login_response = session.post(
|
|
f"{API_URL}/auth/login",
|
|
data={
|
|
"username": username,
|
|
"password": password,
|
|
"csrf_token": login_csrf,
|
|
},
|
|
allow_redirects=False,
|
|
)
|
|
|
|
if login_response.status_code not in [200, 302]:
|
|
print(f"❌ Login failed: {login_response.text}")
|
|
print("\nPlease ensure:")
|
|
print("- The server is running: python -m local_deep_research.web.app")
|
|
sys.exit(1)
|
|
|
|
print("✅ Login successful")
|
|
|
|
# Step 2: Get CSRF token
|
|
print("\n2. Getting CSRF token...")
|
|
csrf_response = session.get(f"{API_URL}/auth/csrf-token")
|
|
csrf_token = csrf_response.json()["csrf_token"]
|
|
headers = {"X-CSRF-Token": csrf_token}
|
|
print("✅ CSRF token obtained")
|
|
|
|
# Initialize research_id to None
|
|
research_id = None
|
|
|
|
# Example 1: Quick Summary (using the start endpoint)
|
|
print("\n=== Example 1: Quick Summary ===")
|
|
print(
|
|
"📝 This example demonstrates starting a research query and polling for results"
|
|
)
|
|
print("⏱️ This typically takes 1-3 minutes to complete\n")
|
|
|
|
research_request = {
|
|
"query": "What is machine learning?",
|
|
"model": None, # Will use default from settings
|
|
"search_engines": ["wikipedia"], # Fast for demo
|
|
"iterations": 1,
|
|
"questions_per_iteration": 2,
|
|
}
|
|
|
|
# Start research - CORRECT ENDPOINT
|
|
print("🚀 Starting research...")
|
|
start_response = session.post(
|
|
f"{API_URL}/api/start_research", json=research_request, headers=headers
|
|
)
|
|
|
|
if start_response.status_code != 200:
|
|
print(f"❌ Failed to start research: {start_response.text}")
|
|
sys.exit(1)
|
|
|
|
research_data = start_response.json()
|
|
research_id = research_data["research_id"]
|
|
print("✅ Research started successfully!")
|
|
print(f"🆔 Research ID: {research_id}")
|
|
print("📊 Monitor progress in server logs: tail -f /tmp/ldr_server.log")
|
|
print(f"🌐 Results will be available at: {API_URL}/results/{research_id}\n")
|
|
|
|
# Poll for results
|
|
print("⏳ Waiting for research to complete...")
|
|
print(
|
|
"⚠️ NOTE: This will poll for up to 3 minutes to ensure research completes"
|
|
)
|
|
print(
|
|
" If it fails, the research may still be running - check the results URL\n"
|
|
)
|
|
|
|
poll_count = 0
|
|
max_polls = 18 # Maximum 3 minutes (18 * 10 seconds)
|
|
|
|
while poll_count < max_polls:
|
|
status_response = session.get(
|
|
f"{API_URL}/api/research/{research_id}/status"
|
|
)
|
|
|
|
if status_response.status_code == 200:
|
|
status = status_response.json()
|
|
current_status = status.get("status", "unknown")
|
|
progress = status.get("progress", 0)
|
|
|
|
poll_count += 1
|
|
elapsed_time = poll_count * 10 # 10 seconds per poll
|
|
print(
|
|
f" Check {poll_count} ({elapsed_time}s): Status = {current_status} (Progress: {progress}%)"
|
|
)
|
|
|
|
if current_status == "completed":
|
|
print("🎉 Research completed successfully!")
|
|
break
|
|
if current_status == "failed":
|
|
print(
|
|
f"❌ Research failed: {status.get('error', 'Unknown error')}"
|
|
)
|
|
print(
|
|
"📋 Check server logs for details: tail -f /tmp/ldr_server.log"
|
|
)
|
|
sys.exit(1)
|
|
elif current_status in ["queued", "in_progress"]:
|
|
# Continue polling
|
|
pass
|
|
else:
|
|
print(f"⚠️ Unexpected status: {current_status}")
|
|
|
|
else:
|
|
print(
|
|
f"⚠️ Status check failed with code: {status_response.status_code}"
|
|
)
|
|
|
|
time.sleep(10) # Wait 10 seconds between polls
|
|
|
|
if poll_count >= max_polls:
|
|
print("⏰ 3-minute timeout reached - research is still running")
|
|
print("💡 This is normal for complex research queries!")
|
|
print(f"📊 Check results later at: {API_URL}/results/{research_id}")
|
|
print("📋 Monitor progress with: tail -f /tmp/ldr_server.log")
|
|
print(
|
|
"🔍 The script will still try to fetch results (may be incomplete)"
|
|
)
|
|
|
|
# Get results
|
|
results_response = session.get(f"{API_URL}/api/report/{research_id}")
|
|
|
|
if results_response.status_code == 200:
|
|
results = results_response.json()
|
|
print(f"\n📝 Summary: {results['summary'][:300]}...")
|
|
print(f"📚 Sources: {len(results.get('sources', []))} found")
|
|
print(f"🔍 Findings: {len(results.get('findings', []))} findings")
|
|
|
|
# Example 2: Check Settings
|
|
print("\n=== Example 2: Current Settings ===")
|
|
settings_response = session.get(f"{API_URL}/settings/api")
|
|
|
|
if settings_response.status_code == 200:
|
|
settings = settings_response.json()["settings"]
|
|
|
|
# Show some key settings
|
|
llm_provider = settings.get("llm.provider", {}).get("value", "Not set")
|
|
llm_model = settings.get("llm.model", {}).get("value", "Not set")
|
|
|
|
print(f"LLM Provider: {llm_provider}")
|
|
print(f"LLM Model: {llm_model}")
|
|
|
|
# Example 3: Get Research History
|
|
print("\n=== Example 3: Research History ===")
|
|
history_response = session.get(f"{API_URL}/history/api")
|
|
|
|
if history_response.status_code == 200:
|
|
history = history_response.json()
|
|
items = history.get("items", history.get("history", []))
|
|
|
|
print(f"Found {len(items)} research items")
|
|
for item in items[:3]: # Show first 3
|
|
print(
|
|
f"- {item.get('query', 'Unknown query')} ({item.get('created_at', 'Unknown date')})"
|
|
)
|
|
|
|
# Example 4: Get and Display Research Results (with retry logic)
|
|
print("\n=== Example 4: Research Results ===")
|
|
if research_id:
|
|
print(f"📄 Fetching research results for ID: {research_id}")
|
|
print(
|
|
"🔄 Will retry until results are available (up to 2 additional minutes)\n"
|
|
)
|
|
|
|
# Retry fetching results until available
|
|
results_retries = 0
|
|
max_results_retries = 12 # 2 minutes (12 * 10 seconds)
|
|
|
|
while results_retries < max_results_retries:
|
|
results_response = session.get(
|
|
f"{API_URL}/api/report/{research_id}"
|
|
)
|
|
|
|
if results_response.status_code == 200:
|
|
# Results are available, parse and display them
|
|
results = results_response.json()
|
|
|
|
content = results.get("content", "")
|
|
sources = results.get("sources", [])
|
|
findings = results.get("findings", [])
|
|
|
|
print(
|
|
f"✅ Results retrieved successfully after {(results_retries + 1) * 10} seconds!"
|
|
)
|
|
print("\n📝 RESEARCH SUMMARY:")
|
|
print("=" * 50)
|
|
if content:
|
|
# Show first 500 characters of the summary
|
|
summary_preview = (
|
|
content[:500] + "..." if len(content) > 500 else content
|
|
)
|
|
print(summary_preview)
|
|
else:
|
|
print("No summary content available")
|
|
|
|
print(f"\n📚 SOURCES FOUND: {len(sources)}")
|
|
for i, source in enumerate(
|
|
sources[:3], 1
|
|
): # Show first 3 sources
|
|
title = source.get("title", "Unknown Title")
|
|
url = source.get("url", "No URL")
|
|
print(f" {i}. {title}")
|
|
print(f" {url}")
|
|
|
|
if len(sources) > 3:
|
|
print(f" ... and {len(sources) - 3} more sources")
|
|
|
|
print(f"\n🔍 KEY FINDINGS: {len(findings)}")
|
|
for i, finding in enumerate(
|
|
findings[:3], 1
|
|
): # Show first 3 findings
|
|
finding_text = finding.get("text", "No finding text")
|
|
finding_preview = (
|
|
finding_text[:150] + "..."
|
|
if len(finding_text) > 150
|
|
else finding_text
|
|
)
|
|
print(f" {i}. {finding_preview}")
|
|
|
|
if len(findings) > 3:
|
|
print(f" ... and {len(findings) - 3} more findings")
|
|
|
|
print(
|
|
f"\n🌐 View full results at: {API_URL}/results/{research_id}"
|
|
)
|
|
print("=" * 50)
|
|
print("🎉 Results displayed successfully!")
|
|
break # Exit retry loop - success!
|
|
|
|
if results_response.status_code == 404:
|
|
results_retries += 1
|
|
elapsed_time = results_retries * 10
|
|
print(
|
|
f" Retry {results_retries}/{max_results_retries} ({elapsed_time}s): Results not ready yet, waiting..."
|
|
)
|
|
time.sleep(10) # Wait 10 seconds before retrying
|
|
|
|
else:
|
|
print(
|
|
f"❌ Failed to fetch results: {results_response.status_code}"
|
|
)
|
|
print(f"Response: {results_response.text[:200]}")
|
|
break # Exit retry loop - error
|
|
|
|
# Handle case where max retries reached
|
|
if results_retries >= max_results_retries:
|
|
print(
|
|
f"\n⏰ Maximum retry time reached ({max_results_retries * 10} seconds)"
|
|
)
|
|
print("💡 This is normal for complex research queries!")
|
|
print(f"📊 Check results later at: {API_URL}/results/{research_id}")
|
|
print("📋 Monitor progress with: tail -f /tmp/ldr_server.log")
|
|
print(
|
|
"🔍 The research is still running - results will be available when complete"
|
|
)
|
|
else:
|
|
print(
|
|
"⚠️ No research ID available - research may not have started properly"
|
|
)
|
|
|
|
# Logout
|
|
print("\n5. Logging out...")
|
|
session.post(f"{API_URL}/auth/logout", headers=headers)
|
|
print("✅ Logged out successfully")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
print("🎯 Simple LDR HTTP API Example - Works out of the box!")
|
|
print("⚡ This script creates a user automatically and tests the API")
|
|
print(
|
|
"⏱️ Total runtime: Up to 3 minutes polling + 2 minutes results retry + research time"
|
|
)
|
|
print(
|
|
"🔄 Automatically retries fetching results until available (up to 2 minutes)\n"
|
|
)
|
|
|
|
print("📋 REQUIREMENTS:")
|
|
print(" • LDR server running")
|
|
print(" • Beautiful Soup: pip install beautifulsoup4\n")
|
|
|
|
print("🚀 START THE SERVER:")
|
|
print(" • Option 1: python -m local_deep_research.web.app")
|
|
print(" • Option 2: bash scripts/dev/restart_server.sh (recommended)")
|
|
print(" • Note: restart_server.sh will kill existing server process\n")
|
|
|
|
print("📊 MONITORING:")
|
|
print(" • Server logs: tail -f /tmp/ldr_server.log")
|
|
print(" • This script polls for up to 3 minutes")
|
|
print(" • If research takes longer, script shows where to check results\n")
|
|
|
|
print("⏰ TIMING INFO:")
|
|
print(" • Script polls for 3 minutes to let research complete")
|
|
print(" • Then retries fetching results for up to 2 additional minutes")
|
|
print(" • Research typically completes in 2-10 minutes")
|
|
print(" • Script displays results automatically when available")
|
|
print(
|
|
" • If timeout reached, results URL provided for checking completion\n"
|
|
)
|
|
|
|
main()
|