mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-13 10:34:06 +03:00

Files

LearningCircuit 7d8fdee7dd fix: unify SettingsManagers, fix env var bugs (#2070 )

* fix: unify SettingsManagers, fix env var bugs, delete duplicate

Two parallel SettingsManager implementations existed (settings/manager.py
and web/services/settings_manager.py) that diverged accidentally, each
with different bugs. This unifies them into a single implementation.

Bug fixes in settings/manager.py:
- get_setting() now checks env vars when setting is not in DB (was
  jumping straight to return default, ignoring env override)
- get_all_settings() now type-converts env overrides through
  get_typed_setting_value() (was storing raw strings like "true"
  instead of True)
- create_or_update_setting() now correctly checks db_setting.editable
  (was checking input dict's .editable which caused AttributeError)
- Added missing ui_element types: textarea, multiselect

Features added to settings/manager.py:
- get_bool_setting() method (required by rag_routes.py)
- default_settings now loads all 18 JSON files via rglob (was only
  loading 1 file with 370 settings, now loads 526)

All production and test imports updated from web.services.settings_manager
to settings.manager. Duplicate web/services/settings_manager.py deleted.

314 tests pass across 7 test files. 9 new tests cover bug fixes.

* test: add 29 tests for unified SettingsManager coverage gaps (#2071)

Cover create_or_update_setting (8 tests), default_settings property (4),
_ensure_settings_initialized (2), new UI element types textarea/multiselect/
range (4), _emit_settings_changed error resilience (3), plus edge cases
for get_setting check_env=False, get_all_settings with locked settings,
get_bool_setting with integers, parse_boolean edge cases, and env override
type conversion for text settings.

* fix: add missing abstract methods, env var defaults override, and type bug (#2074)

- Add get_bool_setting() and get_settings_snapshot() abstract methods to
  ISettingsManager base class so the interface contract is complete
- Fix create_or_update_setting: use setting_obj.type directly instead of
  SettingType[setting_obj.type.upper()] which fails when type is already
  a SettingType enum from the Pydantic model
- Add env var override in get_all_settings() defaults loop so settings
  not yet in DB can still be overridden via LDR_* environment variables
- Fix test_get_all_settings_db_error to expect defaults on DB failure
  (graceful degradation after unification)

* refactor: deduplicate provider availability checks and settings wrapper (#2054) (#2068)

- Delegate 5 provider availability functions in llm_config.py to their
  existing provider class is_available() methods (OpenAI, Anthropic,
  CustomOpenAIEndpoint, Ollama, LMStudio)
- Extract _get_or_create_status() helper in queue_service.py to
  eliminate duplicated QueueStatus lookup-or-create pattern
- Centralize get_llm_setting_from_snapshot() in thread_settings.py,
  replacing 6 identical copy-pasted wrappers across provider files
- Update test mock targets to reflect new delegation pattern

* fix: add missing abstract method implementations to InMemorySettingsManager

InMemorySettingsManager was missing get_bool_setting() and
get_settings_snapshot() implementations required by the ISettingsManager
ABC, causing TypeError on instantiation and cascading failures in
LLM unit tests, REST API tests, and Puppeteer auth tests.

* fix: convert web SettingType to database SettingType in create_or_update_setting

The PR changed `type=SettingType[setting_obj.type.upper()]` to
`type=setting_obj.type`, but setting_obj.type is a web model SettingType
(str, Enum) while Setting.type expects the database SettingType (enum.Enum).
This causes a 500 error when creating new settings via PUT endpoint.

Use `.name` for cleaner enum-to-enum conversion instead of `.upper()`.

* fix: add multiselect type conversion and warn on untyped env overrides (#2080)

Address review feedback from @djpetti on PR #2070:

1. Replace multiselect `lambda x: x` with `_parse_multiselect()` that
   properly handles env var strings — parses JSON arrays (e.g.
   '["markdown","latex"]') and comma-separated values (e.g.
   'markdown,latex') while passing through lists from SQLAlchemy
   unchanged.

2. Log a warning when get_setting() encounters an env var override for
   a setting not in defaults, returning the raw string without type
   conversion. This surfaces settings that should be added to a
   defaults JSON file to get proper type information.

Tests: 14 new tests (111 total in test_settings_manager.py, 0 failures)

* test: add tests for consolidated UI element-to-type mapping

Verifies single canonical _UI_ELEMENT_TO_SETTING_TYPE is reused by
both InMemorySettingsManager and SettingsManager.

2026-02-11 06:59:07 +01:00

6.5 KiB

Raw Blame History

API Quick Start

Overview

Local Deep Research provides both HTTP REST API and programmatic Python API access. Since version 1.0, authentication is required for all API endpoints, and the system uses per-user encrypted databases.

Simplest Usage - Python Client

The easiest way to use the API is with the built-in client that handles all authentication complexity:

from local_deep_research.api import LDRClient, quick_query

# One-liner for quick research
summary = quick_query("username", "password", "What is DNA?")
print(summary)

# Or use the client for multiple operations
client = LDRClient()
client.login("username", "password")
result = client.quick_research("What is machine learning?")
print(result["summary"])

No need to worry about CSRF tokens, HTML parsing, or session management!

Authentication

Web UI Authentication

The API requires authentication through the web interface first:

Start the server:
```
python -m local_deep_research.web.app
```
Open http://localhost:5000 in your browser
Register a new account or login
Your session cookie will be used for API authentication

HTTP API Authentication

For HTTP API requests, you need to:

First authenticate through the login endpoint
Include the session cookie in subsequent requests
Include CSRF token for state-changing operations

Example authentication flow:

import requests
from bs4 import BeautifulSoup

# Create a session to persist cookies
session = requests.Session()

# 1. Get login page and extract CSRF token for login
login_page = session.get("http://localhost:5000/auth/login")
soup = BeautifulSoup(login_page.text, 'html.parser')
csrf_input = soup.find('input', {'name': 'csrf_token'})
login_csrf = csrf_input.get('value') if csrf_input else None

# 2. Login with form data (not JSON) including CSRF
login_response = session.post(
    "http://localhost:5000/auth/login",
    data={
        "username": "your_username",
        "password": "your_password",
        "csrf_token": login_csrf
    }
)

if login_response.status_code in [200, 302]:
    print("Login successful")
    # Session cookie is automatically stored
else:
    print(f"Login failed: {login_response.text}")

# 3. Get CSRF token for API requests
csrf_response = session.get("http://localhost:5000/auth/csrf-token")
csrf_token = csrf_response.json()["csrf_token"]

# 4. Make API requests with CSRF header
headers = {"X-CSRF-Token": csrf_token}
api_response = session.post(
    "http://localhost:5000/api/start_research",
    json={
        "query": "What is quantum computing?",
        "model": "gpt-3.5-turbo",
        "search_engines": ["searxng"],
    },
    headers=headers
)

Programmatic API Access

The programmatic API now requires a settings snapshot for proper context:

from local_deep_research.api import quick_summary
from local_deep_research.settings import SettingsManager
from local_deep_research.database.session_context import get_user_db_session

# Get user session and settings
with get_user_db_session(username="your_username", password="your_password") as session:
    settings_manager = SettingsManager(session)
    settings_snapshot = settings_manager.get_all_settings()

    # Use the API with settings snapshot
    result = quick_summary(
        query="What is machine learning?",
        settings_snapshot=settings_snapshot,
        iterations=2,
        questions_per_iteration=3
    )

    print(result["summary"])

API Endpoints

Research Endpoints

Research endpoints are under /api/:

POST /api/start_research - Start new research
GET /api/research/{id}/status - Check research status
GET /api/report/{id} - Get research results
POST /api/terminate/{id} - Stop running research

Settings Endpoints

Settings endpoints are under /settings/api/:

GET /settings/api - Get all settings
GET /settings/api/{key} - Get specific setting
PUT /settings/api/{key} - Update setting
GET /settings/api/available-models - Get available LLM providers
GET /settings/api/available-search-engines - Get search engines

History Endpoints

GET /history/api - Get research history
GET /history/api/{id} - Get specific research details

Important Changes from v1.x

Authentication Required: All API endpoints now require authentication
Settings Snapshot: Programmatic API calls need settings_snapshot parameter
Per-User Databases: Each user has their own encrypted database
CSRF Protection: State-changing requests require CSRF token
New Endpoint Structure: Research APIs are under /api/ (e.g., /api/start_research)

Example: Complete Research Flow

import requests
import time

# Setup session and login
session = requests.Session()
session.post(
    "http://localhost:5000/auth/login",
    json={"username": "user", "password": "pass"}
)

# Get CSRF token
csrf = session.get("http://localhost:5000/auth/csrf-token").json()["csrf_token"]
headers = {"X-CSRF-Token": csrf}

# Start research
research = session.post(
    "http://localhost:5000/api/start_research",
    json={
        "query": "Latest advances in quantum computing",
        "model": "gpt-3.5-turbo",
        "search_engines": ["arxiv", "wikipedia"],
        "iterations": 3
    },
    headers=headers
).json()

research_id = research["research_id"]

# Poll for results
while True:
    status = session.get(
        f"http://localhost:5000/api/research/{research_id}/status"
    ).json()

    if status["status"] in ["completed", "failed"]:
        break

    print(f"Progress: {status.get('progress', 'unknown')}")
    time.sleep(5)

# Get final results
results = session.get(
    f"http://localhost:5000/api/report/{research_id}"
).json()

print(f"Summary: {results['summary']}")
print(f"Sources: {len(results['sources'])}")

Rate Limiting

The API includes adaptive rate limiting:

Default: 60 requests per minute per user
Automatic retry with exponential backoff
Rate limits are per-user, not per-IP

Error Handling

Common error responses:

401: Not authenticated - login required
403: CSRF token missing or invalid
404: Resource not found
429: Rate limit exceeded
500: Server error

Always check response status and handle errors appropriately.

Next Steps

See examples/api_usage for complete examples
Check docs/CUSTOM_LLM_INTEGRATION.md for custom LLM setup
Read docs/LANGCHAIN_RETRIEVER_INTEGRATION.md for custom retrievers

6.5 KiB Raw Blame History