mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
main
6880 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
63853b7923 |
test(themes): generate theme CSS from registry, not the gitignored file (#4599)
The theme tests' themes_css_content fixtures read static/css/themes.css directly, but that file is auto-generated and gitignored (PR #3400). On a fresh checkout / in CI it is absent, so the tests errored with FileNotFoundError (or failed --bg-primary extraction against a separately built copy, e.g. the sepia light-theme assertion that was red on every PR). Switch all themes_css_content fixtures to _generate_combined_css() (the theme registry — the single source of truth the app actually serves, already used by TestThemeDefinitions). Verified: full theme suite passes (155) with static/css/themes.css absent. |
||
|
|
6d94587f01 |
fix(settings): fix text_separators JSON decoding, add UI validation and fallback handling (#4231)
* fix(settings): fix text_separators JSON decoding, add UI validation and fallback handling * test(settings): cover #4230 corrupt-input recovery; drop unrelated whitelist entries - Add a test that feeds str(list) Python-repr corruption (the exact shape produced by the prior bug) and asserts ast.literal_eval recovers a list whose first element is a *real* newline character (ord == 10), not a literal backslash-n sequence. - Revert two unrelated KNOWN_MISSING_DEFAULTS additions (embeddings.ollama.num_ctx, web.enable_javascript_rendering): the integrity test passes without them, so they belong to a separate scope-limited change. * fix: Use Loguru brace formatting instead of printf-style * some fixes, cleanups * refactor: extract default local search text separators to global constants * refactor: use global default separators constants in test files * remove an initial fix, some linting and formatting * test: add helper unit tests for text separator parsing * fix(settings): restore null-guard on text_separators load else branch The PR's new text-separators load branch stringifies the value in the else path. Guard it with `settings.text_separators != null` so a null or undefined value never writes the literal string "null" into the textarea, leaving the field empty instead. * fix(settings): drop runtime text_separators healing; rely on migration #4298 Per djpetti's CHANGES_REQUESTED on #4231, the healing of corrupted local_search_text_separators DB rows should be a migration, not on-the-fly runtime recovery. Migration #4298 ("0010 heal local_search_text_separators Python-repr rows") already heals the existing corrupt data, so the runtime recovery code is now dead and is removed: - settings/manager.py: remove the heal-on-write block in get_typed_setting_value that self-upgraded a list/dict value stored with ui_element="text" to "json" (djpetti seconded removing this at manager.py:131). _infer_ui_element is still used elsewhere and is kept. - research_library/routes/rag_routes.py: in _get_text_separators, drop the ast.literal_eval recovery branch and the "keep raw value for UI validation" path. A value that is not valid JSON now falls back to DEFAULT_LOCAL_SEARCH_TEXT_SEPARATORS (graceful default, not healing). The local `import ast` is gone. configure_rag's write path is unchanged in behavior; only its now-misleading "type healing" comment is corrected. - research_library/services/rag_service_factory.py: in _get_default_text_separators, drop the same ast.literal_eval branch; a non-JSON string falls back to the defaults, and the final non-list guard still applies. KEPT: the accepted forward fix — storing/handling separators as proper JSON, the json.loads happy path, passing through actual list values, the DEFAULT_LOCAL_SEARCH_TEXT_SEPARATORS[_JSON] constants, and the JS UI validation/load handling. Tests: removed the two manager tests asserting the text->json self-upgrade; converted the ast-recovery tests (rag_routes, gaps_coverage, factory) to assert the new graceful default-fallback for corrupt/Python-repr values. --------- Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> Co-authored-by: Daniel Petti <djpetti@gmail.com> |
||
|
|
e0bd258c1e |
test(ui-tests): real assertions for 10 settings-interaction tests (dead-field audit PR 1) (#4589)
* test(ui-tests): real assertions for 10 settings-interaction tests Rewrites the dead-field settings tests in test_settings_interactions_ci.js flagged by the dead-field audit. Each previously skipped permanently or weakly/always-passed because it probed stale/wrong selectors and never waited for the client-side render. The dashboard renders #settings-content asynchronously from GET /settings/api, so every test now waits for '#settings-content .ldr-settings-item' before asserting the real selectors (components/settings_form.html) and handlers (components/settings.js): - search filtering: filtered subset shrinks on no-match and restores on clear - tab navigation: clicking .ldr-settings-tab[data-tab=llm] gains .active (+ re-render) - specific tabs: assert the real data-tab set (llm/search/report/app/notifications) - text/number/checkbox/select: assert real .ldr-settings-* control selectors - toggle works: click a real .ldr-settings-checkbox and assert checked flips - auto-save: assert .ldr-save-success after changing a setting - descriptions: assert non-empty .ldr-input-help Source-verified against the templates + settings.js. Not yet executed in CI: the settings-core shard is label-gated (needs a test:ui-full-shards run). * test(ui-tests): harden settings tests per review (clear-input, tab re-render, autosave readability) Addresses the non-blocking review recommendations on #4589: - settingsSearchFiltering: clear the search via $eval value='' + dispatch 'input' (portable across headless targets) instead of triple-click+Backspace; comment the intentional waitForFunction .catch. - tabNavigationWorks: assert the 'all' tab is the active default up front, and require the rendered item set to SHRINK on switch to 'llm' (llm is a strict subset of all) — a deterministic 'content actually re-rendered' check that avoids the first-200-chars false-fail the reviewer flagged. - autoSaveFunctionality: flip to sawSuccess=false default for readability. Left the 'known-safe toggle' suggestion: boolean-setting saves don't hit validation, and a real label CI run is the right place to surface any flake. * fix(ui-tests): dropdownSelectSettings accepts custom dropdowns (settings-core shard fix) The settings-core shard failed on this test: it asserted select.ldr-settings-select, but the default config's dropdown settings (llm.provider / llm.model / search.tool) render as custom .ldr-custom-dropdown widgets (components/custom_dropdown.html) — so #settings-content has zero plain <select>. Assert the real dropdown controls (custom-dropdown OR plain <select>, with options required only for a plain select). Verified locally against a running server: 17/17 settings-interaction tests now pass (was 16/17); #settings-content has 2 custom-dropdowns and 0 plain selects. |
||
|
|
c65d59a65f |
fix(security): resolve open code-scanning alerts (pypdf DoS, exception exposure, migration SQL, Actions injection, devskim FP) (#4601)
* fix(deps): bump pypdf to ~=6.12 to patch DoS advisories pypdf 6.10.2 is vulnerable to two DoS issues (GHSA-248m-82v9-q6g6, GHSA-cj93-chg6-vgv8), both fixed in 6.12.0. pypdf is a direct, actively-used dependency that parses untrusted remote PDFs (arXiv and research-library downloaders), so the DoS path is reachable. Widen the pin to ~=6.12 and re-lock (reuse strategy) -> pypdf 6.13.2. Only pypdf moves in the lockfile. * fix(rag): stop leaking raw exception detail to clients CodeQL #7864 flagged information exposure via exception text in the test-embedding handler. That path is already sanitized+logged, but two sibling paths leaked raw str(e) to the browser: - SSE doc-error stream (yielded straight to the client) - background-index task error stored in DB and returned by the index-status endpoint Wrap both in sanitize_error_message(). Also drop the raw 'Detail:' from the internal-LDR error branch, which can carry paths/SQL the pattern-based sanitizer won't catch (full trace stays in server logs). * fix(migrations): add bearer suppression for static-table SQL in 0013 Bearer #7868-#7871 flag f-string SQL in 0013, but the only interpolated value is a table name bound to a hardcoded literal tuple ("benchmark_runs"/"benchmark_configs") - identifiers can't be bind params. Sibling migration 0008 carries the same SQL with a '# bearer:disable python_lang_sql_injection' directive; 0013 only had the noqa half. Add the matching directive. * fix(ci): use env-var indirection in ai-code-reviewer zizmor #7865 (template-injection): three GitHub-context expressions are inlined into run: blocks. Move github.event.action into the step env as EVENT_ACTION and reference the already-defined PR_NUMBER/REPO_FULL_NAME env vars instead of re-inlining ${{ ... }}, matching the BASE_REF pattern this workflow already uses. * test(chat): reword docstring to clear devskim FP #7874 DevSkim's certificate-validation rule matched the word 'encrypted' in a test docstring; the file has no TLS code at all. Reword to 'plaintext (no-SQLCipher)' to clear the false positive without polluting the docstring with an inline suppression. --------- Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com> |
||
|
|
a761417b6d |
refactor: remove redundant per-site <think>-stripping; rely on central wrapper (#4343)
Now that #4342 makes ProcessingLLMWrapper strip <think> tags and normalize shape for every get_llm LLM, the scattered per-site stripping is redundant and (per the project's 'fix bugs, don't add fallbacks' principle) hides bugs / hurts debugging. Removed ~25 redundant remove_think_tags(<fresh-invoke>.content) calls -> direct .content across: 8 search strategies (adaptive/recursive/smart decomposition, browsecomp_optimized, evidence_based v1+v2, iterative_reasoning, constrained_search), constraint_analyzer, evidence/evaluator, followup_context_manager, research_functions; and reverted #4336's get_llm_response_text routing in synthesize_findings (restored its original shape-guard) and standard_knowledge. Kept (load-bearing, NOT redundant): the citation BaseCitationHandler._invoke_text helper (handles injected/test LLMs), the empty-answer guards (real regression fix — now log a warning so the fallback is visible), the get_llm_response_text normalizer at injected/ varied-input sites (filters, search engines, news), and the agent/bind_tools paths that bypass the wrapper. Added a 'single authorized strip point' note to ProcessingLLMWrapper._normalize_response and remove_think_tags so devs don't re-scatter stripping. Verified the wrap-invariant (all get_llm paths wrap; only mcp/langgraph bind_tools bypass) before removing. mypy 552 clean; ruff clean; 3818 passed across 145 LLM-layer test files. |
||
|
|
f19aacadc2 |
fix(library): batch convert_all_research to bound memory on large history (#4560) (#4585)
* fix(library): batch convert_all_research to avoid loading all report bodies at once (#4560) * test(library): assert convert_all_research loads report rows in bounded batches (#4560) |
||
|
|
bc6a11800f |
fix(models): use_alter on ResearchResource.document_id to break circular FK (#3776)
Documents and ResearchResource have bidirectional FKs:
- Document.resource_id → research_resources.id
- ResearchResource.document_id → documents.id
SQLAlchemy can't topologically sort the two tables and emits a
SAWarning on every Base.metadata.sorted_tables iteration:
SAWarning: Cannot correctly sort tables; there are unresolvable
cycles between tables "documents, research_resources"
This has been firing on every cold start of the migration runner
since v1.4.0 (2026-03-25). Cosmetic but noisy.
Fix: declare ResearchResource.document_id with use_alter=True and a
deterministic constraint name. SQLAlchemy then emits this one FK as a
post-CREATE ALTER TABLE rather than inline, which breaks the
dependency cycle for ordering purposes while still creating the
constraint at the database level.
Migration 0005 already chose not to enforce this FK at the DB level on
existing DBs (SQLite batch-alter limitation in 2026-04), so this only
changes how create_all() emits the FK for fresh installs going forward.
No migration is needed for existing DBs.
Adds a regression test that asserts create_all() no longer emits the
circular-FK SAWarning.
Bumps to 1.6.10 (stacks behind #3772 which uses 1.6.9).
Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com>
|
||
|
|
3e6fa7ba25 |
security: add rel=noopener noreferrer to external target=_blank links (#4142)
Closes #4129. Prevents tabnabbing (window.opener access) and Referer leakage to Discord, Reddit, GitHub, DOI, arXiv, PubMed, PMC, and document source URLs. Internal same-origin /library/document/{id}/pdf|txt links left as-is. |
||
|
|
d850343306 |
fix(metrics): drive rate-limiting analytics from RateLimitEstimate (dead panel) (#4576)
* fix(metrics): drive rate-limiting analytics from RateLimitEstimate
The /metrics rate-limiting panel read everything from the RateLimitAttempt
table, but production code stopped writing that table in commit
|
||
|
|
e7eb3bd574 |
feat(dev): add run_test_instance.sh launcher for a disposable LDR dev server (#4382)
* feat(dev): add run_test_instance.sh launcher for a disposable LDR dev server
Sets LDR_DATA_DIR to a temp directory before exec'ing restart_server.sh,
so the dev server's encrypted user DBs, auth DB, library, logs, and
research_outputs all land outside ~/.local/share/local-deep-research and
can be wiped in one shot.
Usage:
scripts/dev/run_test_instance.sh # start (existing test data kept)
scripts/dev/run_test_instance.sh --reset # wipe the test data dir first
LDR_DATA_DIR=$HOME/ldr-test scripts/dev/run_test_instance.sh # override path
Defaults LDR_DATA_DIR=/tmp/ldr-test (often tmpfs → wiped on reboot for
free). --reset refuses to rm empty/'/'/$HOME so a misconfigured env var
can't turn the flag into deleting a real location. Verified locally:
safety guard rejects $HOME (exit 1), unknown arg prints usage (exit 2),
--reset on a sandbox path wipes only that path.
Solves the problem of accumulated test-instance DBs (~19 GB observed)
without any code change — config/paths.py already honors LDR_DATA_DIR.
CI is unaffected: tests already isolate via per-test tmp_path fixtures
that monkeypatch.setenv LDR_DATA_DIR.
* harden run_test_instance.sh: canonicalized reset guard + --debug flag
- Reset guard now canonicalizes LDR_DATA_DIR with realpath -m before the
unsafe-path check, so a symlinked or relative path can't slip a real
location past it. Also refuses /tmp and the real data dir
(~/.local/share/local-deep-research), and tolerates an unset HOME under
set -u via ${HOME:-}.
- Add --debug flag that execs restart_server_debug.sh instead of
restart_server.sh (debug logging is a natural fit for a disposable
instance with throwaway data); combinable with --reset.
- Replace sed line-range usage extraction with a self-contained heredoc
so editing the header comment can't silently break --help.
|
||
|
|
035c1314f9 |
fix(settings): remove duplicate LLAMACPP entry from model-provider dropdown (#4594)
api_get_available_models built its provider list from get_discovered_provider_options() and THEN appended a hardcoded LLAMACPP entry 'as a complex local provider not yet migrated'. But LlamaCppProvider is auto-discovered (provider_name='llama.cpp'), so it was already in the list — the append produced a duplicate LLAMACPP entry in the dropdown. Remove the hardcoded append and rely on auto-discovery. Add a regression test asserting LLAMACPP appears exactly once (and no provider value is duplicated) in the route's provider_options. |
||
|
|
c397063322 |
refactor(auth): centralize research DB-password guard in password_utils (#4577)
* refactor(auth): centralize research DB-password guard in password_utils The direct (/start_research), follow-up, and chat research entry points each duplicated the same encryption-aware guard: fetch get_user_password, and if an encrypted DB has no password, log + return a session-expired 401. Three near-identical copies that could drift. Add resolve_user_password(username) -> (password, session_expired) to web/auth/password_utils.py (next to get_user_password). It makes the encryption-aware decision and logs it once, centrally. Each route keeps its own error-response body (their frontends expect different shapes: status/message vs success/error) but now shares the decision + logging. Removes the now-unused db_manager imports from research_routes and chat.routes. Behavior is unchanged (same 401s, same bodies). Tests that mocked the relocated symbols are updated to patch the helper or its source; adds unit tests for resolve_user_password's three outcomes. * docs(auth): widen password_utils module docstring to cover resolve_user_password The module now hosts both get_user_password and the resolve_user_password guard; the docstring said only 'retrieving user passwords'. (Review nit from the multi-agent review of #4577.) |
||
|
|
e5fad1d5e7 |
docs: mark get_available_providers cache note as removed in #4590 (#4595)
The resource-cleanup audit record referenced `@cache` on get_available_providers at config/llm_config.py:158, but that function (a dead duplicate of the provider auto-discovery path) was removed in #4590. Annotate the audit bullet so the stale file:line reference no longer misleads, while preserving the historical 'not a leak' conclusion. |
||
|
|
cc56890127 |
refactor(llm): remove dead provider-availability surface from llm_config (#4590)
The eight is_<provider>_available() shims and get_available_providers() in config/llm_config.py duplicated the class-based provider system: each shim was a pure pass-through to ProviderClass.is_available(), and get_available_providers() hardcoded the provider list a second time with display strings. Both were production-dead — the live provider dropdown uses the auto-discovery path (get_discovered_provider_options), and the only non-test reference to get_available_providers was a .cache_clear() in settings_service that never read the result. This removes that layer so the class system + registry are the single source of truth: - delete the 8 is_*_available shims + get_available_providers (+ its @cache, a latent unhashable-dict footgun) from llm_config.py - drop the no-op LLM provider cache-clear block from settings_service.invalidate_settings_caches() - repoint behavioral availability tests to ProviderClass.is_available() - delete the shim/get_available_providers tests; trim the inert test doubles in conftest/mock_modules (kept get_llm + VALID_PROVIDERS) - add guard tests: import-time auto-discovery registration (fresh interpreter) + the live get_discovered_provider_options path get_llm(), the egress policy gate, wrap_llm_without_think_tags(), and the noqa discover_providers import (the sole import-time discovery trigger) are untouched. Implements DRY-review finding L27. |
||
|
|
3a71c3dcbc |
fix(db): don't load report_content/chunk_text into memory in list queries (#4560) (#4574)
* fix(db): don't load report_content/chunk_text into memory in list queries (#4560) * docs(changelog): add fragment for #4574 * test(history): update get_history mocks to projected flat Rows + add #4560 projection guard * fix(db): project columns in /api/history get_history too (#4560) * test(db): add #4560 projection guards for /api/history and list_reports * test(history): make /api/history projection guard robust to query reordering |
||
|
|
e2d97d1e81 |
test(settings): regression-test bulk warning recalc for egress keys (#4463) (#4582)
#4463 added policy.egress_scope, llm.require_local_endpoint, and embeddings.require_local to WARNING_AFFECTING_KEYS but shipped no test that the BULK save_all_settings path actually recalculates warnings for them — the exact copy-paste-drift gap that caused the original bug (the keys were only in api_update_setting's list). Add a parametrized test over the three keys, modeled on the existing test_warning_affecting_keys. Mutation-checked: removing a key from WARNING_AFFECTING_KEYS fails the matching parametrization. |
||
|
|
2698702167 |
refactor(benchmark): extract YAML export helpers into a unit-tested module (#4584)
yamlEscape / formatSettingValue / formatSettingsSnapshot lived inline in benchmark_results.html, so the escaping that guards every benchmark YAML download was untestable. Move them to static/js/utils/yaml_export.js (loaded via <script src>, browser globals + module.exports shim — same dual-use pattern as url-validator.js et al.) and add tests/js/utils/yaml_export.test.js (16 cases). No behavior change; the only diff to the functions is eslint cleanup (unnecessary regex-class escapes, unused catch binding) that the inline <script> never got scanned for. Follow-up to the #4583 review. |
||
|
|
27c0e14387 |
fix(benchmark): quote + escape free-form string fields in YAML export (#4583)
model, model_provider, search_engine, dataset and the evaluator's model/provider/endpoint_url were interpolated raw into the downloaded YAML. A value with a YAML-special char (colon-space, #, quote, newline) could produce malformed YAML or inject an unintended key. Route them through the existing yamlEscape() helper (double-quoted), matching how the settings snapshot already serializes strings. Numeric fields and the controlled date_tested/ldr_version are left unquoted (quoting a date would change its YAML type). (AI review follow-up to #4579) |
||
|
|
bcc0108192 |
fix(news): make subscription status the single source of truth for active/due (#4492)
* fix(news): make subscription status the single source of truth for active/due Consolidates the three duplicated subscription 'run' paths (H4) and the triplicated active/overdue query (H5) from the DRY review. - NewsSubscription.active_filter / due_filter: one authoritative predicate keyed on the status column. Fixes a live bug where a subscription created as paused kept is_active=True (the column default create_subscription never overwrites) and was still run by the scheduler. The check-overdue route's copy also omitted the next_refresh-is-not-None guard. - run_subscription_now now reads the subscription's saved model/provider/ strategy/search_engine from the ORM row instead of the trimmed get_subscriptions() dict (which dropped them), and advances the refresh schedule on success so an overdue subscription is not immediately re-run. - Shared request-payload + refresh-schedule arithmetic extracted to news/subscription_runner.py, used by both HTTP run paths and the scheduler. Tests: new tests/news/test_subscription_runner.py pins the predicate bug and the helper behavior; existing scheduler/flask tests updated to the status-based query and run-now ORM flow. * fix(news): repair broken subscription-completion refresh update + include config in get_subscriptions Follow-ups from review (#4450) found while consolidating the run paths: - research_service's post-completion subscription refresh update called SQLSubscriptionStorage.get(), which references columns the model does not have (user_id/refresh_count/results_count) and so always raised AttributeError -- silently swallowed by the surrounding try/except. The block (duplicated in the quick and detailed paths) never updated anything. Replaced with a direct ORM load + advance_refresh_schedule(); dropped the also-broken increment_stats() call (the model has no such columns; total_runs is computed by counting research history). - get_subscriptions() dropped model_provider/model/search_strategy/ search_engine/custom_endpoint/status from its dict, so the subscriptions UI and any caller silently saw None for a subscription's saved config. Added them. Tests updated to the ORM-based completion path and asserting the new get_subscriptions fields. * docs(changelog): note completion-refresh and get_subscriptions fixes * refactor(news): address review recommendations on #4492 - Extract advance_refresh_schedule_by_id() and use it in both research_service post-completion blocks, which had become near-identical duplicates -- the exact drift class this PR exists to prevent. - Strengthen the active-subscriptions query test: compare the captured SQLAlchemy predicate against NewsSubscription.active_filter() via .compare() instead of a bare assert_called(). - Move the timezone import in subscription_runner.py to module top. - Reword the is_active inline comment so it no longer contradicts the legacy-mirror block comment above it. - Tests for the new by-id helper (found/missing/int-id). * chore(news): remove dead, broken subscription_manager storage subsystem (#4500) The news.subscription_manager storage subsystem (SQLSubscriptionStorage, SearchSubscription, TopicSubscription, BaseSubscription + factories), the abstract SubscriptionStorage interface, and StorageManager's get_user_subscriptions / get_user_stats were entirely unreached by live code: all real subscription functionality runs through news.api + the scheduler. The code was also broken against the current NewsSubscription model -- it referenced phantom columns (user_id/refresh_count/results_count) and filtered status against enum objects, so it raised AttributeError whenever invoked. Its only live caller (research_service's post-completion refresh update) was fixed in the preceding commit to use the ORM directly. Removes ~16k lines (incl. 18 mock-based test files that gave false coverage of the broken code), drops the unused SearchSubscription/TopicSubscription package exports, and surgically trims the storage-interface/StorageManager tests that also exercised the removed pieces. Verified dead from 6 angles (no instantiation, no callers, no dynamic/string usage, only /health uses StorageManager and only via get_user_feed). Full news suite green; repo-wide collection clean. |
||
|
|
841c9c33e9 |
fix(benchmark): gate evaluator endpoint_url behind the settings opt-in (#4579)
The default (summary) YAML download is meant to be safe to share — #3844 moved the full settings snapshot behind an Export opt-in for exactly that reason — but it still emitted the evaluator's endpoint_url unconditionally, which can be a private/internal host. Gate it behind the same includeSettings opt-in so the default summary stays shareable and reproducibility-minded users still get it on demand. |
||
|
|
41c02cd19d |
chore(ui-tests): drop dead-field history CRUD tests (unseedable in CI) (#4580)
historyItemDelete and clearAllHistoryButton in test_crud_operations_ci.js were permanent SKIPs: they scan /history for a delete / clear-all affordance and skip on an empty DB. Unlike the collection/subscription/document tests (#4174 / #4151 / #4187), they cannot be made real — a history item only exists by running research (start_research creates it IN_PROGRESS, which the delete endpoint refuses to delete), and the UI-test shards run with no LLM/mocks, so no research completes there (test_research_workflow_ci.js skips its 'completed research' assertions for the same reason). The history page is already covered in CI by test_history_page_ci.js, which also skips these item-dependent bits. Remove both tests, their run() registrations, the now-empty HistoryCrudTests object, and its export — mirroring the earlier collectionEditButton removal of a permanent SKIP for a feature untestable in this context. The real delete / clear-all features are unchanged in the app. |
||
|
|
5d9b12ed87 |
fix(settings): redact remaining GET endpoints + extend empty-password guard to /save_settings (#3988)
* feat(benchmark): record LDR version + redacted settings snapshot at start
The YAML download stamps `ldr_version` and `date_tested` from the
*current* app at download time (`new Date()` and `meta[name=app-version]`
in `benchmark_results.html`). A v1.6.5 run downloaded on v1.6.7 shows
v1.6.7. We hit this debugging the SimpleQA 95.7%/1m54s reference run —
couldn't tell which LDR version produced it. The YAML also exposes only
~3 settings (temperature, context_window, max_tokens), losing the ~30-50
settings that actually affect benchmark behaviour, so cross-run
comparisons are impossible.
Record the active LDR version and a full settings snapshot on the
`BenchmarkRun` row at start time. Migration 0010 adds two nullable
columns; existing rows survive with NULL and the YAML path falls back
to "unknown (pre-0010 run)". `start_benchmark` already captures the
settings snapshot for the background thread (it just discards it after);
we redact secrets via a new `DataSanitizer.redact_settings_snapshot`
helper and persist alongside `__version__`. Snapshot capture is wrapped
defensively because `get_all_settings()` raises uncaught `LookupError`
on stale enum-mismatch rows (we hit that on the user's old DB earlier).
The plain `DataSanitizer.redact()` does not handle the nested-with-
metadata snapshot shape `{key: {value, ui_element, ...}}` — the outer
compound key isn't in the sensitive-name set and the inner `value` key
isn't either, so secrets leak. The new helper checks
`ui_element == "password"` plus key-suffix defense-in-depth and
preserves metadata so YAML diffs still show "this key existed."
`/api/history` gains the small fields (`ldr_version`, `start_time`);
`/export` gains a `metadata` block with the full settings snapshot, so
the per-page list response stays small (~80 bytes/run added) while the
download click pays the ~184 KB snapshot cost. The YAML builder now
always calls `/export`, uses `metadata.started_at` for `date_tested`
(preserving the field name that the public HuggingFace leaderboard
parser at ldr-benchmarks/scripts/build_leaderboards.py reads), and
appends a `settings:` block via a new type-aware `formatSettingValue`.
* docs: add news fragment for #3844
User-visible change: YAML download now reflects the version and
settings active when the benchmark ran, not at download time.
* fix(search): handle Wikipedia 429s and empty summary-fetch results
Two production bugs surfaced during the benchmark verification of
PR #3793 + this snapshot work, plus the stale doc that should ship
alongside.
Wikipedia 429 cascade
=====================
The `wikipedia` PyPI library calls `Response.json()` unconditionally,
so when MediaWiki rate-limits us (HTTP 429 with an HTML body), every
following `wikipedia.summary()` in the same batch raises
`JSONDecodeError`. The previous handler logged a full traceback per
title — we observed 14+ stack traces for a single rate-limited query
during yesterday's SimpleQA benchmark on Roman emperors / Japanese
manga.
Catch `(json.JSONDecodeError, requests.exceptions.JSONDecodeError)`
specifically, log a single warning, and break out of the per-title
loop (every remaining title in the batch will hit the same throttle).
Same treatment for the outer `wikipedia.search()` call.
Summary-fetch empty-content / empty-summary guards
==================================================
With `summary_focus_query` now the default fetch mode (PR #3793),
`_make_summary_fetch_tool` runs against every fetched URL — including
paywalls, JS-only SPAs, and 200-but-blank pages. Two failure paths:
1. Empty page content: the LLM was being invoked to summarise an
empty string (waste of a round-trip), and the URL was being
registered in the citation collector with an empty snippet —
`_register_in_collector` caches by URL, so the agent could never
re-fetch the URL under a different focus.
2. Empty LLM summary: same cache poisoning, this time after the
model decided nothing on the page matched the focus.
Both paths now short-circuit to `NOT RELEVANT` BEFORE registration,
leaving the URL re-fetchable. Empty content also skips the LLM call
entirely.
Stale docs/CONFIGURATION.md
===========================
The auto-generated config table still listed `search.fetch.mode`
default as `full` (the pre-#3793 value). Regenerated via
`python scripts/generate_config_docs.py`.
* docs: add news fragment for the bundled #3844 bugfixes
* fix(settings): catch LookupError in get_all_settings + redact /settings/api response
Two related settings hardenings discovered during PR #3793 / #3844 verification.
LookupError fallback in SettingsManager.get_all_settings
========================================================
The DB-query handler at manager.py:780 caught only SQLAlchemyError. When
a row's `type` column holds an enum value no longer in `SettingType`
(e.g. legacy 'CHAT'-typed rows from a removed feature), SQLAlchemy
hydrates the row and raises `LookupError: 'CHAT' is not among the
defined enum values` — which propagated and crashed every caller of
`get_all_settings`: the `/settings/api` endpoint, benchmark start,
research start, MCP entry points. We hit this on a real user's old DB.
Add `LookupError` to the except clause so the snapshot falls back to
defaults-only on a stale row instead of crashing. PR #3844 already
defensively wrapped the benchmark-start path; this fixes the underlying
problem so the workaround there becomes belt-and-suspenders.
Redact /settings/api response
=============================
`api_get_all_settings` returned the full nested-with-metadata snapshot
to the frontend, including env-overridden API keys (`llm.openai.api_key`,
`search.engine.web.serper.api_key`, etc.) — anyone who could
authenticate could grab plaintext keys via a single GET. Now wraps the
response in `DataSanitizer.redact_settings_snapshot` (introduced in
PR #3844) so password-typed values come back as `[REDACTED]`.
Verified safe to ship: no JS caller round-trips the bulk endpoint's
values into a form (the settings form is server-rendered from the
template directly), so redacting the JSON does not break the UI. The
template still pre-fills password inputs with raw values — that's a
separate template-level UX concern (filed as out-of-scope follow-up).
Depends on PR #3844 for the `redact_settings_snapshot` helper.
* docs: news fragments for PR #3947
* fix(settings): close password-input HTML leak + protect against [REDACTED] write-back
Three coordinated changes that close the remaining password-leak surface
left after PR #3947 (which redacted the JSON API but didn't touch the
render layer or the write paths).
Template (settings_form.html)
=============================
The Jinja2 password branch rendered `value="{{ setting.value }}"`, so
anyone with View Source on the settings page could read plaintext API
keys. Now renders the input with `value=""`, a placeholder that
telegraphs configuration state ("(saved — type to change)" vs
"(not configured)"), and `autocomplete="new-password"` to prevent
browser password-manager autofill of stale entries.
JS render (settings.js)
=======================
Two coordinated changes:
1. The `renderSetting` default branch wrote `value="${setting.value}"`
into password inputs. After PR #3947 redacts the /settings/api
response, that value is the literal string "[REDACTED]" — a
subsequent save without typing would persist that string as the
API key. Now carves out a `password` branch matching the Jinja2
template (empty value, placeholder, autocomplete).
2. `originalSettings` was seeded from the same redacted response.
Now seeded with empty string for password fields so the auto-save
`hasChanged` guard compares against "" rather than "[REDACTED]" —
the first real keystroke flips correctly, no spurious dirty state.
Backend safety net (settings_routes.py)
=======================================
`save_all_settings` and `api_update_setting` now treat an empty value
on a password-typed setting as a no-op (idempotent 200 with message,
not a write). Defense-in-depth so a buggy frontend, a stale browser
tab, or a direct cURL submit cannot wipe a stored API key with empty
string. Same predicate (`ui_element == "password"`) used consistently
across all three layers.
Out of scope: an explicit "Clear" button. To unset a password setting,
clear the source env var or use settings import. If the UX demand
surfaces, the right design is a separate `DELETE /settings/api/<key>`
endpoint, not a sentinel-string write.
Stacked on PR #3947.
* docs: news fragments for PR #3954
* fix(settings): redact remaining GET endpoints + extend empty-password guard to /save_settings
Three small extensions to the password-defense work in PR #3947 / PR #3954
that close the remaining authenticated leak paths and write-corruption
paths in the settings module.
api_get_db_setting (singular GET /settings/api/<key>): redact value when
ui_element=="password", in both DB-row and defaults branches. Same
predicate as DataSanitizer.redact_settings_snapshot. Metadata stays
intact so the front-end can render the right input control.
get_bulk_settings (GET /settings/api/bulk): caller-supplied keys[] is
unrestricted, so a caller can request keys[]=llm.openai.api_key. Now
redacts via the suffix-only check against DEFAULT_SENSITIVE_KEYS (the
{value, exists} response shape has no ui_element field at this layer).
save_settings (POST /save_settings): JS-disabled form-encoded fallback.
PR #3954 added the empty-password no-op guard to save_all_settings and
api_update_setting; this extends the same guard so the no-JS path can't
wipe a stored secret with an empty form-POST submit.
Stacked on PR #3954.
* docs: news fragment for remaining settings password leaks
* chore(migrations): renumber 0010 → 0013 (inherits from #3844 stack)
Same renumber as #3844/#3954. Rename:
0010_benchmark_run_version_and_snapshot.py → 0013_…
Update revision/down_revision and test refs.
* chore(changelog): rename fragment to avoid 'password' in filename
`changelog.d/+settings-remaining-password-leaks.security.md` tripped the
file-whitelist-check (its regex flags any filename containing 'password'
/ 'secret' / 'token' / .key / .pem etc.). The towncrier fragment is just
a release-note text file — no secrets — but the heuristic doesn't know
that. Rename to drop 'password' from the filename; 'leaks' + the
`.security.md` suffix still convey the topic, and the file body is
unchanged.
* fix(db): renumber benchmark migration 0013→0014 (resolve collision with 0013_remove_meta_search_engines)
Same renumber as the benchmark / password-leak PRs: this branch carries
the benchmark migration at 0013, colliding with main's
0013_remove_meta_search_engines. Renumber to 0014 (down_revision 0013),
rename + re-anchor the stale 0010-named migration test to 0014, complete
the alembic expected-files list.
* docs(db): finish 0013→0014 renumber — update migration log + pre-migration comments
Leftover '0010' references in the renamed benchmark migration (log line,
docstring) and benchmark_routes.py pre-migration-row comments now read
'0014'. Comment/log text only — no behavior change.
* fix(settings): block [REDACTED] write-back + redact passwords in save response
Same two gaps as the password-template-leak PR, plus this branch's extra
save_settings (no-JS form POST) guard site:
1. All three empty-password no-op guards (save_all_settings, save_settings,
api_update_setting) checked value == "" only, but GET /settings/api
redacts passwords to the literal "[REDACTED]" sentinel — the comments
say both must be idempotent. A stale-tab / automation round-trip
overwrote the real key with the literal sentinel. Guards now treat the
sentinel as a no-op too.
2. The POST /save_all_settings response echoed raw setting.value, shipping
plaintext API keys back to the browser. Now redacted via the same helper.
Adds DataSanitizer.REDACTION_TEXT as the single source of truth for the
sentinel. Tests: [REDACTED] no-op guard + save-response redaction.
* fix(settings): late-bind redaction sentinel, redact created password echo, test no-JS guard
Review of the diff-to-main surfaced three more items:
1. _redact_password_value's redaction_text default was the hardcoded
literal "[REDACTED]", which would drift from DataSanitizer.REDACTION_TEXT
(the single source of truth this PR adds) if it ever changed — the exact
failure mode the constant exists to prevent. Now late-binds to the
constant when not overridden.
2. api_update_setting's 201 create response echoed db_setting.value in
plaintext for a newly-created password setting. Now redacted via
_redact_password_value.
3. The save_settings (no-JS) guard had no sentinel test; added empty +
[REDACTED] no-op tests.
Also fixes remaining stale 'migration 0010' refs to 0014.
* fix(settings): unify GET redaction + write guards on one sensitivity predicate
Adversarial review found the read/write asymmetry (redactor masks on
ui_element==password OR sensitive suffix; write guards only checked
ui_element) plus a bulk-GET suffix-only gap. Route everything through the
shared DataSanitizer.is_sensitive_setting(key, ui_element):
- the 3 write-back guards (save_all_settings, save_settings,
api_update_setting) now skip /sentinel writes for ANY secret, not
just ui_element==password — so a redacted GET round-trip can't write
the sentinel over a sensitive-suffix secret;
- _redact_password_value takes the key and uses the predicate (single-key
GET + create-echo now redact sensitive-suffix values too);
- get_bulk_settings' local suffix helper delegates to the shared predicate;
- redact_settings_snapshot uses it internally.
No current setting triggers the asymmetry (all secrets are
ui_element=password AND sensitive-suffix); this is forward-proofing in
the single-source style the PR already uses for REDACTION_TEXT. Tests:
predicate units, redactor/predicate agreement, save_all_settings no-op
incl. the sensitive-suffix-non-password case.
* refactor(security): move single-value redaction into DataSanitizer.redact_value
The GET endpoints' redaction logic lived as a private _redact_password_value
helper (and a thin _is_sensitive_key wrapper) in the web routes module —
security primitives in the presentation layer. Promote it to
DataSanitizer.redact_value, the single-value counterpart of
redact_settings_snapshot, and route the singular GET, the bulk GET, AND
redact_settings_snapshot itself through it so all read paths mask identically
(one predicate, one empty-value rule, one sentinel). Adds direct unit tests
for the primitive.
---------
Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com>
|
||
|
|
b37bacd606 |
refactor(security): centralize error scrubbing in BaseSearchEngine._scrub_error helper (#4572)
* refactor(security): centralize error scrubbing in _scrub_error helper
Every search-engine catch site previously hand-copied the dual-scrub
idiom redact_secrets(sanitize_error_message(str(e)), <secret>) — ~60
copies across 16 files. That duplication caused real drift: the
Elasticsearch password omission and the base-class log-vs-DB
inconsistency both came from copies falling out of sync.
Add BaseSearchEngine._scrub_error(error) plus a declarative
_secret_attrs class attribute (default ('api_key',); overridden by
Elasticsearch ('_api_key','_password'), Paperless ('api_token'),
Brave ('_brave_api_key')). The helper pulls every declared secret, so
a site can no longer drop one. Convert all call sites in base.py and
15 engines to self._scrub_error(...).
Side effect: 6 older engines (serper, serpapi, mojeek, exa, tavily,
scaleserp) that logged via redact_secrets(str(e), ...) alone now get
the regex sanitize pass too.
Adds TestScrubError locking the contract (incl. the multi-secret case
that would have caught the ES password drift).
* refactor(security): type-hint _scrub_error + test empty _secret_attrs
Address AI review nits: annotate error param as Union[BaseException, str];
add a contract test that an empty _secret_attrs still runs the regex pass
without error (missing attrs -> None, which redact_secrets skips).
* fix(security): address 24-agent review findings on _scrub_error
- Harden _scrub_error (the new single chokepoint, runs in except blocks):
guard str(error) so an exception whose __str__ raises can't crash the
handler; coerce truthy non-string _secret_attrs values to str so a
misconfigured secret can't trip redact_secrets' len() check.
- Tighten _secret_attrs annotation to tuple[str, ...].
- SerpAPI: store self.api_key so the literal-redaction pass has a safety
net (it was the only engine with no literal net); drop stale comment.
- Add credential-leak regression tests: exa + serper (header keys, opaque
sentinel that only the literal pass can scrub) and pubmed (params/URL).
- Extend TestScrubError: non-string secret + raising-__str__ don't crash.
- Soften changelog wording.
|
||
|
|
d98513da60 |
ci(forensics): request arrival/duration logging for #4431 navigation-stall hunts (#4536)
* ci(forensics): log request arrival + duration in CI/TESTING (#4431) The UI shards' 60s navigation timeouts leave a silent window in the server logs, but the app only logs explicit events — so a silent window can't distinguish "the request never reached the server" (listen backlog / docker-proxy / browser socket pool starved by engine.io polls) from "the request reached Flask and hung" (lock, DB pool, GIL). Add an outermost WSGI middleware that logs every request's arrival and WSGI-call duration (slow completions as warnings), enabled only when CI or TESTING is set. The next failing shard run's server-log artifact will pin down which side of that fork #4431 lives on. Refs #4431 * fix(chat): connect Socket.IO lazily on /chat/ to stop dev-server freeze (#4431) (#4544) * fix(chat): connect Socket.IO lazily on /chat/ to stop dev-server freeze (#4431) Root cause (proven by the request-timing forensics in this stack): the chat page eagerly opens a Socket.IO connection on every /chat/ load. The UI tests — and real users — navigate to/from /chat/ constantly, producing a churn of engine.io connect/disconnect cycles. On the werkzeug dev server (Flask-SocketIO threading mode, no eventlet/gevent) that churn under CPU pressure freezes the entire WSGI request pipeline for ~60s: during the window the instrumented server logs ZERO request arrivals, which is the flaky "Navigation timeout 60000ms" the UI shards hit. (Confirmed locally: 2-core-pinned server + /chat/ churn reproduces 31–63s arrival gaps and the same engine.io write()-before-start_response errors seen in CI.) Note: transports:['websocket'] — the originally-suspected mitigation — does NOT help; measured 84 vs 86 engine.io write-errors under identical churn, because the errors are websocket-driven, not polling-driven. The fix has to remove the churn itself, not switch transport. Chat doesn't need the socket until a research actually streams: chat.js calls subscribeToResearch (which lazily initializes the socket via socket.js's existing `if (!socket) initializeSocket()` path) on send and when resuming an in-progress research, and sets up an HTTP polling backup (pollForCompletion) regardless. So defer the connect on /chat/ instead of opening it on page load. Other realtime pages (/research, /progress, /benchmark) keep eager connect — they aren't navigated in the same churny way and aren't the source of the flake. Scope: targets the chat-core/chat-lifecycle shard freezes, which are the ones with instrumented proof. The class is ultimately resolved by the FastAPI/uvicorn migration (#3299). Validated by code analysis + CI (the env's sqlcipher3 segfaults under concurrent local churn made a clean local runtime A/B impossible; the freeze itself only reproduces faithfully on CI's 2-core + docker-proxy). Refs #4431 * test(socket): cover lazy /chat/ connect gating (#4431) Vitest unit tests for the auto-connect gating: io() is NOT called on page load for /chat/ (lazy), IS called for /progress//research (eager, unchanged), and subscribeToResearch lazily initializes the socket on a chat page. Deterministic (jsdom + fake timers), no server needed — the runtime freeze only reproduces on CI's 2-core topology. --------- Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com> * ci(forensics): dump thread stacks during the freeze (#4431) The arrival log proves the pipeline freezes (zero arrivals for ~60s) but not WHAT it's stuck on. socket.io was ruled out (lazy-connect removed it, the freeze persisted), and the local repro needs artificial PARALLEL=10 concurrency that doesn't match CI's single-test execution — so only a dump from CI itself can identify the real cause. Arm faulthandler.dump_traceback_later as a dead-man's switch, re-armed on every request arrival, so during a freeze it dumps ALL thread stacks to stderr. It runs on a dedicated C timer thread, so it fires even under GIL starvation. The next failing shard's server-log artifact will show whether the werkzeug accept loop, a lock, a SQLCipher/DB call, or the background scheduler is holding the pipeline. CI/TESTING-gated. * test(forensics): FakeLogger.debug for freeze-dump arm-failure path (#4431) * ci(forensics): don't arm freeze thread-dump under pytest (#4431) create_app() runs thousands of times under pytest with CI=true, so arming the repeating faulthandler dump in each spewed stack traces across the whole pytest run. Gate _arm_freeze_dump() on "pytest" not in sys.modules so only the real long-running UI-shard server arms it. (The pytest job's flakiness is a pre-existing SQLCipher-xdist worker crash, unrelated, but this removes the noise + any doubt.) * fix(logging): non-blocking stderr sink to stop request-pipeline freeze (#4431) ROOT CAUSE (forensics-backed). The werkzeug threading dev server logs synchronously: loguru's emit() holds the handler's _protected_lock while writing to the sink. The stderr sink had no enqueue, so every log call blocks on stderr I/O under that lock. When stderr back-pressures (a slow / full `docker logs` pipe in CI) the lock-holder stalls mid-write and ALL logging threads — i.e. every request thread, since every request logs — pile up behind the lock, freezing the whole request pipeline for ~60s. That is the flaky UI-shard "Navigation timeout 60000ms": the instrumented server records ZERO request arrivals across the window, and a faulthandler thread-dump captured 3 of 5 server threads parked in loguru's _protected_lock under load. Socket.IO was a red herring (the freeze reproduces with zero socket.io activity). FIX. Add enqueue=True to the stderr sink: loguru hands records to an in-memory queue and a single background thread does the write, so a log call never blocks on stderr while holding the lock. The database/progress sinks are left synchronous — they capture per-request context (username, password, research_id) in the emitting thread and can't move to loguru's worker thread. LOCAL VALIDATION (2-core-pinned server + heavy /chat/ churn): before: 31-63s request-arrival gaps, server segfaults, 3/5 threads stuck in the loguru lock, watchdog nav 31s after: 12s worst gap (one instance), no segfault, 0 threads in the loguru lock, watchdog nav 3.0s, churn completes 185 logging-related unit tests still pass. Refs #4431. * fix(forensics): sanitize CR/LF in logged request paths; test cleanup (#4431) Address AI-review: a crafted PATH_INFO/QUERY_STRING with newlines could inject fake [req] log lines (the forensics output is grep'd). Strip CR/LF before logging. Adds a test for it; drops an unused binding in the chat lazy-connect vitest. * fix(benchmarks): defer matplotlib import to stop request-pipeline freeze (#4431) ROOT CAUSE (captured by the freeze thread-dump): the 60s UI-shard navigation freeze is a slow matplotlib import on the server's import path. benchmarks/__init__.py eagerly pulls optimization + comparison submodules, which imported matplotlib at module level (optuna_optimizer: `from optuna.visualization import ...`; comparison evaluator: `import matplotlib.pyplot`). matplotlib's import is heavy and, under the 2-core CI runner's GIL/CPU starvation, stretches to ~60s while holding the import lock — freezing the whole werkzeug request pipeline (zero request arrivals across the window = the "Navigation timeout 60000ms"). The faulthandler dead-man's switch caught the main thread mid-import: optuna/visualization/matplotlib/_contour.py -> matplotlib/__init__.py. FIX: import matplotlib + optuna.visualization lazily, only inside the benchmark visualization methods that plot (never on a request path). `import local_deep_research.benchmarks` is now matplotlib-free (verified). Module-level None placeholders keep the names patchable; a guarded loader fills them on first real visualization without clobbering test mocks. Verified: benchmarks/benchmark_bp import no longer loads matplotlib; full benchmarks suite (1413 tests) passes. Refs #4431. * chore(#4431): address AI review — precise /chat/ match + changelog - autoInitSocket: match '/chat' and '/chat/<id>' precisely instead of a loose .includes('/chat/') that would also catch paths like /chat-archive/. - Add changelog.d/4536.bugfix.md documenting the #4431 fix and the enqueue=True async-stderr behavior change (possible log loss on crash, ordering differences). --------- Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com> |
||
|
|
26cea7dac9 |
chore: deprecate (but keep) dormant UploadBatch model (#4178)
* chore: deprecate UploadBatch model (step 1 of 2)
A multi-round dead-code audit confirmed UploadBatch is dormant: no
code path creates rows, no path sets Document.upload_batch_id, and
no caller reads either. The docstring has noted this since 2026-05
without follow-through.
Removing the model is split into two PRs to keep deployed encrypted
user DBs consistent:
step 1 (this PR): mark the class deprecated. Constructing an
UploadBatch emits DeprecationWarning. Schema is
unchanged so any DB created from this version is
bit-identical to the previous version.
step 2 (follow-up): delete the class, drop Document.upload_batch_id,
add an Alembic migration to drop the
upload_batches table.
Production code never constructs UploadBatch, so the warning only
shows up in the dedicated TestUploadBatchModel tests — those are
silenced with a class-level pytest.mark.filterwarnings.
* docs: add 'do not delete' note to UploadBatch with full rationale
Expand the deprecation docstring to record why the full removal
(class + table) is blocked, so future contributors don't re-attempt
the same investigation:
- 0001_initial_schema.py uses Base.metadata.create_all(), making the
ORM the source of truth for "schema at every revision"
- two enforcement tests require the migrated DB to match
Base.metadata exactly
- two round-trip tests require downgrade to restore the pre-upgrade
schema
- those constraints together mean the ORM class can't be removed
without first refactoring 0001 to an explicit table list
Captures the audit's cost/benefit conclusion (~3 KB per DB vs real
migration risk) so the next person doesn't redo the investigation
from scratch.
* test: narrow UploadBatch filterwarnings to avoid masking unrelated DeprecationWarnings
|
||
|
|
3bf056c0ee |
fix(ui-tests): real assertions for document upload + delete + bulk-select (#4187)
* fix(ui-tests): real assertions for document upload + delete + bulk-select Extends the per-test seeding pattern (#4174, #4180) to the three document CRUD tests, all of which were previously permanent SKIPs: - \`documentUploadFormExists\` — was checking \`/library\` for upload UI. That route is the research-library listing (filter + results), not an upload form, and has never had upload UI. Direct upload lives at \`/library/collections/<id>/upload\` (rag_routes.py). Now seeds a throwaway collection, navigates to the real upload page, and asserts the form has a multiple-file input plus a submit button. Cleans up. - \`documentDeleteConfirmation\` — list-page rows call \`DeleteManager.deleteDocument\`, which drives the Bootstrap \`#deleteConfirmModal\` (components/delete_confirmation_modal.html). The old query (\`.modal, .confirm-dialog, [role="alertdialog"]\`) also looked for a modal, but only on an empty DB where no rows exist. Now seeds a collection, uploads a tiny .txt fixture via the real multipart endpoint (\`POST /library/api/collections/<id>/upload\`), navigates to \`/library/?collection=<id>\` so only the fixture row is in scope, clicks .ldr-btn-delete-doc, and waits for Bootstrap to apply the \`.show\` class to #deleteConfirmModal. Asserts the modal becomes visible and reports its title. - \`bulkDeleteSelection\` — Same setup (seed collection + upload one fixture document). The library listing in this app has no per-row checkboxes and no bulk-action affordance, so the test still SKIPs, but with an accurate diagnostic ("feature not implemented for this view") instead of the old misleading "No bulk selection checkboxes found", which masked the difference between "no docs" and "no bulk UI." Inline fixture helpers (\`seedCollection\`, \`deleteCollection\`, \`uploadFixtureDocument\`) are added to this file. Kept local rather than promoted to \`test_lib/\` until a second consumer outside the document tests appears — the seeding pattern is still proving out across #4174 / #4180 / this PR; extracting too early would lock in defaults that may shift. Test plan (CI=true, server with LDR_DISABLE_RATE_LIMITING=true): - \`node test_crud_operations_ci.js\` — Documents group now reports: "Upload form ok (file input multiple=true, ... submit="🚀 Upload Files")" "#deleteConfirmModal opens on document delete click (title="Delete \\"ldr-ui-test.txt\\"?")" "Library listing has no per-row checkboxes and no bulk-action button (feature not implemented for this view)" - Pass count goes 8 -> 10; two previously-permanent SKIPs are now real PASSes against the actual UI contract. - Cleanup verified for both seeded collections via finally-block DELETE. Stacked on #4180. Auto-retargets to main when the ancestor chain (#4145 -> #4151 -> #4174 -> #4180) merges. * refactor(ui-tests): share collection seed/cleanup helpers + harden cleanup guard After #4174 merged, collection seeding lived in two places: inline in collectionDeleteConfirmation and in the seedCollection helper this PR adds for the document tests. Point collectionDeleteConfirmation at the shared seedCollection/deleteCollection helpers (seedCollection now returns {id, name} so the test keeps its detail-page load-wait), removing the duplication. Also harden deleteCollection: wrap the page.evaluate in a Node-side try/catch so a torn-down page during teardown can't throw and mask a test result — carrying forward the cleanup guard from #4174, which the document helper was missing. Subscription tests are left inline (their seed/cleanup is a distinct resource, not duplicated by an existing helper); unifying those can be a follow-up. * test(ui-tests): make document tests self-contained for CSRF seeding The three document tests called seedCollection(page) as their first line, reading the CSRF meta from whatever page the previous test left behind. In the normal run order that's an app page, but it's a fragile cross-test dependency (breaks under test isolation, reordering, or a prior failure that leaves the page elsewhere). Navigate to /library/collections first, matching what collectionDeleteConfirmation already does, so seedCollection and uploadFixtureDocument always have the CSRF token available. |
||
|
|
7c358b4676 |
fix(chat): guard chat research against missing DB password (#4457) (#4571)
* fix(chat): guard chat research against missing DB password (#4457) Chat-triggered research (send_message -> trigger_research) fetched the user's DB password and passed it to the background worker without checking it, unlike the /start_research and follow-up routes which return 401 when an encrypted DB has no password. When the password is gone but the session cookie is still valid (session-password TTL expiry, or a server/container restart), a chat research would run to completion while every background metric write (token + search) was silently dropped — leaving the metrics dashboard empty even though the research succeeded. This is one of the systemic 'all metrics empty' paths behind #4457 (distinct from the no-usage-data cause fixed in #4473). Add the same encryption-aware guard the other entry points use, before any rows are committed or a worker is spawned. Unencrypted installs are unaffected (guard is gated on db_manager.has_encryption). * docs(changelog): add fragment for chat research password guard (#4571) |
||
|
|
ca0976fec1 |
fix(ui-tests): drop dead-field subscription tests, retarget at #subscription-strategy (#4151)
After PR #4145 pointed the subscription tests at the real /news/subscriptions/new page, two CRUD tests still SKIPped because their target fields don't exist on that form: - \`subscriptionFrequencyOptions\` looked for a "frequency" dropdown. The form has no scheduling input at all (only iteration / question counts). Removed — there is nothing analogous to point it at. - \`subscriptionTypeOptions\` looked for a generic "type" / "category" dropdown. The closest real equivalent is the research-strategy <select id="subscription-strategy">, which has 5+ named options. Renamed to \`subscriptionStrategyOptions\` and re-targeted there. Net effect: - Subscription Strategy Options: previously SKIPped; now reports "Strategy options (5): Source-Based..., Focused Iteration - Quick..., MCP ReAct..., LangGraph Agent..." (a real PASS against a real contract). - Subscription Frequency Options: gone (was always SKIPping forever). Stacked on #4145 because it edits the same file. |
||
|
|
4c6667102a |
chore: add Session type hints to remaining db_session parameters (#4575)
Follow-up to #4487, which typed db_session in a first batch of files but left ~25 parameters across 8 files untyped. This completes the sweep so every real db_session parameter carries a sqlalchemy.orm.Session hint. - Required parameters: db_session: Session - Decorator/optional-injection parameters: db_session: Optional[Session] = None (e.g. the @with_user_session()-decorated Flask route handlers in settings_routes.py, where db_session is injected as a kwarg) Docstring usage examples (session_context.py, route_decorators.py) are intentionally left untouched. No behavior change; type annotations only. ruff, mypy, and pre-commit all pass. |
||
|
|
77c9c10e96 |
fix(rag): dedup collection indexing across SSE route and background worker (#4502)
* fix(rag): dedup collection indexing across SSE route and background worker The SSE route (index_collection) and the background worker (_background_index_worker) both index a collection and had drifted: - the background worker never stored embedding_dimension, so collections indexed via the background path lost that metadata; - the SSE route skipped the force-reindex cleanup (delete old chunks + FAISS indices) the worker performs, so a streamed force-reindex could leave stale, mixed-model vectors. Extract the embedding-metadata storage, the force-reindex cleanup, and the document-to-index query into three shared helpers used by both paths, so both now persist embedding_dimension and both clean up on force-reindex. The two callers keep their own loops since their progress mechanisms differ (SSE stream with heartbeats vs TaskMetadata polling). Adds unit tests for the helpers (incl. the embedding_dimension regression). * refactor(rag): address review recommendations on #4502 - Drop the tautological does-not-commit test for the metadata helper (it takes no session, so transaction ownership is enforced by the signature); note this at the class instead. - Document why the CascadeHelper import stays function-local (hoisting would import the whole deletion package at route-module import time for a force-reindex-only path). |
||
|
|
2b96e1d536 |
refactor: remove dead SEARCH_SNIPPETS_ONLY checks (DRY PR 2/23) (#4459)
* refactor: remove dead SEARCH_SNIPPETS_ONLY checks from 9 search engines (#4450) search_config.SEARCH_SNIPPETS_ONLY was never defined anywhere in the codebase, so all hasattr() checks always returned False. The base class run() already gates _get_full_content with self.search_snippets_only. Remove 13 dead conditional blocks across 9 engines: - Pattern A: Early-return guards in _get_full_content() (8 engines) - Pattern B: Inverted if-wrappers around live code (arxiv, github, wayback x3) - Pattern C: Variable overrides (pubmed, wikipedia) Also remove 7 now-unused `from ...config import search_config` imports and delete/update 42 tests that were testing dead code paths. * test(pubmed): drive snippets-only test via get_full_text, not removed SEARCH_SNIPPETS_ONLY This PR removed the dead module-level SEARCH_SNIPPETS_ONLY switch from the pubmed engine, but test_snippets_only_mode_uses_abstract still patched `{pubmed_module}.search_config` — a name no longer imported there, so the patch raised AttributeError. Rewrite the test to exercise the real snippets-only path (get_full_text=False → no PMC lookup, content is the abstract). * test(library): drop stale SEARCH_SNIPPETS_ONLY scaffolding The library engine's dead SEARCH_SNIPPETS_ONLY check was removed in this PR, but test_search_engine_library_coverage.py still patched the now-gone module constant. The test passed only by coincidence (the fixture item had no metadata.document_id, so _get_full_content was a no-op), and its name/docstring implied a snippet-only path the engine no longer has. Replace it with a test of the real remaining behavior (items without a document_id are skipped and returned unchanged) and remove the unused _patch_search_config helper. |
||
|
|
ef3fd3bfde |
fix(settings): close password-input HTML leak + protect against [REDACTED] write-back (#3954)
* feat(benchmark): record LDR version + redacted settings snapshot at start
The YAML download stamps `ldr_version` and `date_tested` from the
*current* app at download time (`new Date()` and `meta[name=app-version]`
in `benchmark_results.html`). A v1.6.5 run downloaded on v1.6.7 shows
v1.6.7. We hit this debugging the SimpleQA 95.7%/1m54s reference run —
couldn't tell which LDR version produced it. The YAML also exposes only
~3 settings (temperature, context_window, max_tokens), losing the ~30-50
settings that actually affect benchmark behaviour, so cross-run
comparisons are impossible.
Record the active LDR version and a full settings snapshot on the
`BenchmarkRun` row at start time. Migration 0010 adds two nullable
columns; existing rows survive with NULL and the YAML path falls back
to "unknown (pre-0010 run)". `start_benchmark` already captures the
settings snapshot for the background thread (it just discards it after);
we redact secrets via a new `DataSanitizer.redact_settings_snapshot`
helper and persist alongside `__version__`. Snapshot capture is wrapped
defensively because `get_all_settings()` raises uncaught `LookupError`
on stale enum-mismatch rows (we hit that on the user's old DB earlier).
The plain `DataSanitizer.redact()` does not handle the nested-with-
metadata snapshot shape `{key: {value, ui_element, ...}}` — the outer
compound key isn't in the sensitive-name set and the inner `value` key
isn't either, so secrets leak. The new helper checks
`ui_element == "password"` plus key-suffix defense-in-depth and
preserves metadata so YAML diffs still show "this key existed."
`/api/history` gains the small fields (`ldr_version`, `start_time`);
`/export` gains a `metadata` block with the full settings snapshot, so
the per-page list response stays small (~80 bytes/run added) while the
download click pays the ~184 KB snapshot cost. The YAML builder now
always calls `/export`, uses `metadata.started_at` for `date_tested`
(preserving the field name that the public HuggingFace leaderboard
parser at ldr-benchmarks/scripts/build_leaderboards.py reads), and
appends a `settings:` block via a new type-aware `formatSettingValue`.
* docs: add news fragment for #3844
User-visible change: YAML download now reflects the version and
settings active when the benchmark ran, not at download time.
* fix(search): handle Wikipedia 429s and empty summary-fetch results
Two production bugs surfaced during the benchmark verification of
PR #3793 + this snapshot work, plus the stale doc that should ship
alongside.
Wikipedia 429 cascade
=====================
The `wikipedia` PyPI library calls `Response.json()` unconditionally,
so when MediaWiki rate-limits us (HTTP 429 with an HTML body), every
following `wikipedia.summary()` in the same batch raises
`JSONDecodeError`. The previous handler logged a full traceback per
title — we observed 14+ stack traces for a single rate-limited query
during yesterday's SimpleQA benchmark on Roman emperors / Japanese
manga.
Catch `(json.JSONDecodeError, requests.exceptions.JSONDecodeError)`
specifically, log a single warning, and break out of the per-title
loop (every remaining title in the batch will hit the same throttle).
Same treatment for the outer `wikipedia.search()` call.
Summary-fetch empty-content / empty-summary guards
==================================================
With `summary_focus_query` now the default fetch mode (PR #3793),
`_make_summary_fetch_tool` runs against every fetched URL — including
paywalls, JS-only SPAs, and 200-but-blank pages. Two failure paths:
1. Empty page content: the LLM was being invoked to summarise an
empty string (waste of a round-trip), and the URL was being
registered in the citation collector with an empty snippet —
`_register_in_collector` caches by URL, so the agent could never
re-fetch the URL under a different focus.
2. Empty LLM summary: same cache poisoning, this time after the
model decided nothing on the page matched the focus.
Both paths now short-circuit to `NOT RELEVANT` BEFORE registration,
leaving the URL re-fetchable. Empty content also skips the LLM call
entirely.
Stale docs/CONFIGURATION.md
===========================
The auto-generated config table still listed `search.fetch.mode`
default as `full` (the pre-#3793 value). Regenerated via
`python scripts/generate_config_docs.py`.
* docs: add news fragment for the bundled #3844 bugfixes
* fix(settings): catch LookupError in get_all_settings + redact /settings/api response
Two related settings hardenings discovered during PR #3793 / #3844 verification.
LookupError fallback in SettingsManager.get_all_settings
========================================================
The DB-query handler at manager.py:780 caught only SQLAlchemyError. When
a row's `type` column holds an enum value no longer in `SettingType`
(e.g. legacy 'CHAT'-typed rows from a removed feature), SQLAlchemy
hydrates the row and raises `LookupError: 'CHAT' is not among the
defined enum values` — which propagated and crashed every caller of
`get_all_settings`: the `/settings/api` endpoint, benchmark start,
research start, MCP entry points. We hit this on a real user's old DB.
Add `LookupError` to the except clause so the snapshot falls back to
defaults-only on a stale row instead of crashing. PR #3844 already
defensively wrapped the benchmark-start path; this fixes the underlying
problem so the workaround there becomes belt-and-suspenders.
Redact /settings/api response
=============================
`api_get_all_settings` returned the full nested-with-metadata snapshot
to the frontend, including env-overridden API keys (`llm.openai.api_key`,
`search.engine.web.serper.api_key`, etc.) — anyone who could
authenticate could grab plaintext keys via a single GET. Now wraps the
response in `DataSanitizer.redact_settings_snapshot` (introduced in
PR #3844) so password-typed values come back as `[REDACTED]`.
Verified safe to ship: no JS caller round-trips the bulk endpoint's
values into a form (the settings form is server-rendered from the
template directly), so redacting the JSON does not break the UI. The
template still pre-fills password inputs with raw values — that's a
separate template-level UX concern (filed as out-of-scope follow-up).
Depends on PR #3844 for the `redact_settings_snapshot` helper.
* docs: news fragments for PR #3947
* fix(settings): close password-input HTML leak + protect against [REDACTED] write-back
Three coordinated changes that close the remaining password-leak surface
left after PR #3947 (which redacted the JSON API but didn't touch the
render layer or the write paths).
Template (settings_form.html)
=============================
The Jinja2 password branch rendered `value="{{ setting.value }}"`, so
anyone with View Source on the settings page could read plaintext API
keys. Now renders the input with `value=""`, a placeholder that
telegraphs configuration state ("(saved — type to change)" vs
"(not configured)"), and `autocomplete="new-password"` to prevent
browser password-manager autofill of stale entries.
JS render (settings.js)
=======================
Two coordinated changes:
1. The `renderSetting` default branch wrote `value="${setting.value}"`
into password inputs. After PR #3947 redacts the /settings/api
response, that value is the literal string "[REDACTED]" — a
subsequent save without typing would persist that string as the
API key. Now carves out a `password` branch matching the Jinja2
template (empty value, placeholder, autocomplete).
2. `originalSettings` was seeded from the same redacted response.
Now seeded with empty string for password fields so the auto-save
`hasChanged` guard compares against "" rather than "[REDACTED]" —
the first real keystroke flips correctly, no spurious dirty state.
Backend safety net (settings_routes.py)
=======================================
`save_all_settings` and `api_update_setting` now treat an empty value
on a password-typed setting as a no-op (idempotent 200 with message,
not a write). Defense-in-depth so a buggy frontend, a stale browser
tab, or a direct cURL submit cannot wipe a stored API key with empty
string. Same predicate (`ui_element == "password"`) used consistently
across all three layers.
Out of scope: an explicit "Clear" button. To unset a password setting,
clear the source env var or use settings import. If the UX demand
surfaces, the right design is a separate `DELETE /settings/api/<key>`
endpoint, not a sentinel-string write.
Stacked on PR #3947.
* docs: news fragments for PR #3954
* chore(migrations): renumber 0010 → 0013 (inherits from #3844 stack)
Same renumber as #3844 (this PR was stacked on it). Rename:
0010_benchmark_run_version_and_snapshot.py → 0013_…
Update revision/down_revision and test refs accordingly.
* test: mark benchmark export metadata test as intentional no-SUT-import
Same fix as #3844 (this PR inherits the file from the stack).
* fix(db): renumber benchmark migration 0013→0014 (resolve collision with 0013_remove_meta_search_engines)
Same renumber as the benchmark PR: this branch carries the benchmark
migration at 0013, colliding with main's 0013_remove_meta_search_engines.
Renumber to 0014 (down_revision 0013), rename + re-anchor the stale
0010-named migration test to 0014, complete the alembic expected-files
list.
* docs(db): finish 0013→0014 renumber — update migration log + pre-migration comments
Leftover '0010' references in the renamed benchmark migration (log line,
docstring) and benchmark_routes.py pre-migration-row comments now read
'0014'. Comment/log text only — no behavior change.
* fix(settings): block [REDACTED] write-back + redact passwords in save response
Two gaps found reviewing the password-leak fix:
1. The empty-password no-op guard (save_all_settings + api_update_setting)
only checked value == "", but GET /settings/api redacts passwords to
the literal "[REDACTED]" sentinel — its own comment says both must be
idempotent. A stale-tab / automation / cURL round-trip therefore
overwrote the real API key with the literal "[REDACTED]". Guard now
treats the sentinel as a no-op too.
2. The POST /save_all_settings response echoed the full settings dict
built from raw setting.value, shipping plaintext API keys back to the
browser — the exact exposure the GET redaction prevents. The echoed
dict is now redacted via the same helper.
Adds DataSanitizer.REDACTION_TEXT as the single source of truth for the
sentinel so the guards can't drift from what the redactor emits. Tests:
[REDACTED] no-op guard + save-response redaction (no plaintext leaks).
* style: ruff format the save-response redaction test
* fix(settings): guard save_settings no-JS path + redact created password echo
Review of the diff-to-main surfaced two more gaps on top of the earlier
[REDACTED]/save-response fixes:
1. save_settings (the no-JS form-POST fallback) had NO password write-back
guard — submitting an empty (or sentinel) password via a plain form
POST would clear the stored API key. Added the same guard the JSON
paths use, with two tests (empty + sentinel no-op).
2. api_update_setting's 201 create response echoed db_setting.value in
plaintext for a newly-created password setting. Redacted it to the
sentinel, matching every other settings response.
Also fixes the remaining stale 'migration 0010' refs (model comment +
benchmark test docstrings/name) to 0014.
* fix(settings): share one sensitivity predicate between GET redactor and write guards
Adversarial review found a read/write asymmetry: redact_settings_snapshot
masks a value when ui_element=='password' OR the key has a sensitive
suffix, but the write-back no-op guards only checked ui_element. A secret
registered with a non-password ui_element but a '.api_key'-style suffix
would be redacted on GET, then a save round-trip would write the
'[REDACTED]' sentinel straight over the real secret. No current setting
triggers it (all secrets use ui_element=password AND a sensitive suffix),
so this is forward-proofing — but it's the same single-source principle
the PR already uses for REDACTION_TEXT, applied to the predicate.
Adds DataSanitizer.is_sensitive_setting(key, ui_element); routes the
redactor and all three write guards (+ the create-echo) through it, so
read and write protection can never diverge. Tests: predicate unit
tests, redactor/predicate agreement, and save_all_settings no-op for
empty/sentinel incl. the sensitive-suffix-non-password case.
* chore(gitleaks): allowlist is_secret variable FP in data_sanitizer (#3954)
The generic-secret rule flags `is_secret = DataSanitizer.is_sensitive_setting(...)`
in redact_settings_snapshot, reading 'secret = <token>' as a credential. It is a
boolean variable assignment, not a secret. Add both the commit-less fingerprint
(survives squash-merge to keep gitleaks-main green) and the exact reported
fingerprint (suppresses the PR's historical-commit scan).
* test(settings): mark password-render test no-sut-import (renders template, not a module)
* fix(settings): align api_update_setting guard with the other two (isinstance check)
api_update_setting's empty-secret no-op guard omitted the isinstance(value, str)
check that save_all_settings and save_settings both have. No behavior change
(value in ("", REDACTION_TEXT) already returns False for non-strings) — this is
consistency across the three parallel guards so the difference doesn't read as
intentional. (AI review)
---------
Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com>
|
||
|
|
346ed9c0ea |
chore: add Session type hints to db_session parameters across remaining files (#4487)
Annotate db_session parameters in: - web/queue/processor_v2.py: _commit_with_safe_rollback, _reclaim_stranded_queue_rows - web/services/research_sources_service.py: _find_existing_paper - web/auth/decorators.py: get_current_db_session (return type) - news/web.py: load_user_settings |
||
|
|
817816603a |
fix(search): strip URLs before the _is_valid_search_result gate (searxng + mojeek) (#4573)
* fix(searxng): strip the title-href fallback URL too The SSRF housekeeping PR (#3931) added .strip() to the primary href branch in _get_search_results but missed the title-href fallback. Unlike the other (cosmetic) strip sites, this one is load-bearing: the extracted url is gated by _is_valid_search_result()'s http(s):// prefix check *before* it reaches the SSRF validator's internal strip, so a fallback href with leading whitespace would silently drop an otherwise-valid result. * fix(mojeek): strip URL before the _is_valid_search_result gate Mojeek has the same load-bearing pattern as the searxng fallback: it reads result.get("url") unstripped and passes it straight into _is_valid_search_result(), whose http(s):// prefix check runs before the SSRF validator's internal strip. A url with leading whitespace would be silently dropped. Mojeek is a JSON API (lower probability than HTML scrapes) but the bug class is identical. Use the (x or "").strip() form so an explicit None doesn't crash .strip(); stripping here also cleans the url that propagates into the preview id/link. * refactor(search): extract URL-strip into BaseSearchEngine._clean_result_url The two engines fixed in this PR (searxng title-href fallback, mojeek) and the cosmetic strip sites from #3931 (searxng, tavily previews) all repeated the same '(x or "").strip()' / 'str(href).strip()' idiom with a multi-line comment explaining why the strip matters. Replace the six inline sites with a single shared static helper, _clean_result_url(value), on BaseSearchEngine — next to the existing URL helper _extract_display_link. The load-bearing rationale now lives once in the helper's docstring instead of being copy-pasted at each call site. Behavior is identical: None/falsy -> "", any other value -> str(value).strip(). Add TestCleanResultUrl unit coverage for the helper. * test(search): add end-to-end regression tests for the strip-before-gate fix The helper unit tests cover _clean_result_url in isolation, but nothing asserted the actual bug fix end-to-end: that a whitespace-padded URL now survives _is_valid_search_result() instead of being silently dropped. Add two regression tests that feed a padded URL through the real extraction path (would return 0 results before the fix): - mojeek: padded url in the JSON API response - searxng: padded title-href fallback (BeautifulSoup preserves the whitespace, so the prefix gate is genuinely exercised) Addresses the AI reviewer's non-blocking note on PR #4573. |
||
|
|
956afb7caa |
fix(ui): render friendly engine name in agent-thinking tool_call panel (#4570)
The agent-thinking panel's tool_call renderer in progress.js built its
display text only from data.tool (the stable id, e.g. "web_search"),
producing "Using web_search" for the LangGraph strategy and defeating
the purpose of this PR in that surface.
Prefer the human-readable data.message the strategy already builds
(LangGraph: '🔍 Searching DuckDuckGo: "..."'; MCP: 'ACTION: Using
DuckDuckGo - "..."'), which embeds both the friendly engine name and
the query. When data.message is present, render it verbatim and skip
the args/query append to avoid duplicating the query (MCP also supplies
data.arguments). Fall back to 'Using ${tool}' (+ args) only when no
message was supplied, so the panel never renders blank.
Expose updateAgentThinking on window.progressComponent (already used for
testing) and add vitest coverage for both strategies and the fallback.
|
||
|
|
571141e472 |
fix(llm): address audit findings from PR #3670 (#3767)
* fix(llm): address audit findings on PR #3670 PR #3670 was merged before these audit findings could land in it. This follow-up addresses one true blocker, three PR-introduced regressions, one pre-existing inconsistency, and one test-quality issue surfaced by mutation analysis. Blocker - Add `llamacpp` to the `llm.provider` UI dropdown options. Without it users could not select the very provider this PR migrated them to via the UI — only via manual config edit. Updated the OpenAI-Compatible Endpoint label to reference vLLM/TGI now that llama.cpp has its own first-class option. PR-introduced regressions - Sanitize externally-derived `topic` strings (from news content) before interpolating them into log warnings in news/recommender/topic_based.py using the project's existing `sanitize_for_log` helper. Prevents log injection via crafted news titles. - Wrap `httpx.ConnectError` in `ProcessingLLMWrapper.invoke()` to a friendly `ValueError` for local HTTP providers (llamacpp, lmstudio, ollama, openai_endpoint), naming the provider and pointing at the URL setting. Without this, a user whose llama-server / LM Studio / Ollama is not running gets a raw stack trace. - Add `docs/release_notes/1.6.3.md` documenting the three breaking changes (gemma default removal, llamacpp HTTP migration, removed llamacpp_* settings keys). 1.6.3 was released without notes. Pre-existing inconsistency - Add empty-`llm.model` preflight to followup_research/routes.py before the ResearchHistory row is created and the worker thread is spawned — mirroring the existing check in research_routes.start_research at ~line 365. Returns HTTP 400 with the same `Model is required` message. Test quality (mutation analysis) - Strengthen test_returns_none_when_llm_not_configured to distinguish the targeted `except ValueError` path from the outer `except Exception` swallow. Patches the loguru logger and asserts `warning` was called (specific path) and `exception` was NOT (catch-all path). Without this, removing the targeted block would still let the test pass. Misc - README "Pre-1.7 installs auto-filled gemma3:12b" → "Pre-1.6.3" since this PR shipped in 1.6.3. * fix(tests): reformat settings JSON with ensure_ascii=False Same fix as PR #3959 — the prior merge commit serialized both default_settings.json and golden_master_settings.json with the default ensure_ascii=True (em-dashes written as — escapes). The test_golden_master test serializes the SettingsManager output with ensure_ascii=False (raw — characters), so the file-to-output comparison fails on every em-dash. Re-serialize both files with the test's exact options: indent=2, sort_keys=True, default=str, ensure_ascii=False. No key changes — pure format fix. * fix(review): address review findings on audit PR Follow-up to the self-review of this PR. Net changes: - news/topic_based.py: sanitize the `topic` interpolated in the PolicyDeniedError warning branch — the only externally-derived log site the original sweep missed (all 6 sites now use sanitize_for_log). - config/llm_config.py: remove _maybe_raise_actionable_local_server_error and its invoke/ainvoke call sites. It was redundant with the existing friendly-error layer (#4027, "Site B" in run_research_process) that already rewrites connection failures for these providers in the exact Start/Follow-up Research flows, and its str()-substring gate only fired for ollama (the openai-SDK providers raise APIConnectionError "Connection error.", matching none of the patterns). llm_config.py is now identical to main. - followup_research/routes.py: drop the stale "line ~365" reference and note the success/error key convention. - tests: add an empty-llm.model 400 regression test for /api/followup/start (asserts no ResearchHistory row / worker thread); replace the fragile `"AI" in warning_msg` assertion with the interpolated `'AI'` topic. - changelog.d/3767.{security,bugfix}.md: towncrier fragments for this PR's user-facing changes. - docs/release_notes/1.6.3.md: describe the friendly local-server error via the actual #4027 mechanism instead of the removed method. Also merges current main into the branch (was 21 commits behind); verified the merge preserves the #4466 credential-redaction in _log_llm_error. * fix(review): address carried-over AI-review recommendations - test_topic_based: decouple the topic-interpolation assertion from the hard-coded "AI" literal. The topic now carries a CRLF log-injection payload and the assertion derives the expected substring from sanitize_for_log(topic), additionally proving the raw CR/LF never reach the log line (the actual log-injection guarantee). - release_notes/1.6.3.md: drop the contradictory "(pending)" / fold-into- next-release framing. The filename is 1.6.3.md and 1.6.3 already shipped, so this is the retroactive 1.6.3 release notes — retitled to match the 1.6.8.md style. --------- Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com> |
||
|
|
fbc1a442da |
chore(security): SSRF housekeeping cleanups (#3931)
Two unrelated cleanup items batched as housekeeping: 1. Delete dead fetch_huggingface_pricing in pricing_fetcher.py. The method made an aiohttp GET to https://huggingface.co/pricing, logged that "HTML parsing would be needed", and returned None regardless. Never called in production code (cost_calculator.py and metrics_routes.py instantiate PricingFetcher but don't call this method). aiohttp/session infrastructure stays — tests still exercise the async context manager. 2. Add defensive .strip() to search-engine URL extraction in search_engine_searxng.py and search_engine_tavily.py. Search engines could theoretically return URLs with surrounding whitespace; the SSRF validator strips internally so this is purely defensive, but it also keeps URLs clean for logs and downstream consumers. |
||
|
|
958c1adde3 |
refactor(rag): categorize /test-embedding errors instead of guessing (#4208 follow-up) (#4220)
* refactor(rag): categorize /test-embedding errors instead of guessing (#4208 follow-up) The /api/rag/test-embedding endpoint caught every exception and ran a keyword-match heuristic over the error text to guess whether the user had picked an LLM by mistake. This routinely mis-categorized real failures — including the NoSettingsContextError fixed in #4208, which surfaced as "try a dedicated embedding model" because the underlying message happened to contain "json". Replace the heuristic with three categories based on the exception type: - Internal LDR errors (NoSettingsContextError, KeyError, AttributeError, TypeError, RuntimeError) → message says "internal LDR error", includes the type name, and asks the user to report it on GitHub. The user can't fix these by changing their model. - Upstream provider errors (modules under openai/httpx/requests/ urllib3/anthropic) → "The provider returned an error: <message>". - Anything else → verbatim "Embedding test failed for model 'X': <message>". No more guessing what the error means. The "you picked an LLM by mistake" message is gone, but the UI already labels models as (Embedding) vs (LLM — may not support embeddings) in the dropdown and shows a hint below the selection, so the post-test warning was largely redundant. Tests for the three previous heuristic branches (does-not-support, list-index-out-of-range, generic) are updated to assert the new verbatim/categorized behavior. New tests cover the internal-bug and upstream-provider categories explicitly. * refactor(rag): scope internal-error attribution to LDR modules + redact secrets Polish on the #4208 follow-up categorizer: - Categorize 'internal LDR error' by the module the exception CLASS is defined in (local_deep_research.*) instead of a frozenset of builtin names. type(exc).__module__ is the definition site, not the raise site, so a bare KeyError/TypeError/RuntimeError says nothing about whether LDR or a provider/langchain frame produced it. On this path a malformed-but-200 response from an OpenAI-compatible server can surface a builtin out of langchain's parser; the old name set told the user to 'report it on GitHub' for that upstream problem. Builtins now fall through to verbatim. - Scrub the detail text with sanitize_error_message before returning it, so an upstream error that echoes an API key (URL/auth header) can't be reflected verbatim to the browser. Defense-in-depth (endpoint is @login_required and only returns the user's own key to their own browser). - Tests: lock in builtins-fall-through (RuntimeError, KeyError), keep the LDR-module internal case (NoSettingsContextError), add a redaction test, and fix now-stale test names/docstrings. * test(rag): guard upstream builtin-subclass routing; hoist sanitizer import Addresses AI-review follow-ups on the categorizer polish: - Add test_upstream_subclass_of_builtin_is_provider_error: a KeyError subclass whose __module__ is an upstream package (openai._response) must route to the provider branch, not the internal-LDR branch. Locks in the _module_matches behavior the module-based design relies on. - Move sanitize_error_message to a top-level import next to the other ...security import. log_sanitizer has no LDR deps, so there is no circular import to avoid and the lazy import was unnecessary. |
||
|
|
0c8f917e2f |
fix(ui-tests): real assertions for collection delete + subscription toggle/delete via API seeding (#4174)
* fix(ui-tests): point collection + mobile tests at the real flow Background: \`/collections\` returns 404 — the real list page is at \`/library/collections\`. Seven test sites navigated to the 404, so the assertions never ran and every collection-related test reported SKIP. Worse, the assertions were written for a hypothetical modal flow that the app never had: \`#create-collection-btn\` is an \`<a href="/library/collections/create">\` link and the unused \`#create-collection-modal\` in the template is dead code (its \`showCreateCollectionModal()\` is defined but never called). This patch realigns two test files with what the app actually does. CRUD collection tests (\`test_crud_operations_ci.js\`): - \`createCollectionFormOpens\` — now verifies (a) the list page at \`/library/collections\` advertises an anchor to \`.../create\` and (b) the create page renders a form with a required name input and a submit button. No more dependency on click-then-modal behavior that doesn't exist. - \`createCollectionFormValidation\` — navigates directly to \`/library/collections/create\`, attempts empty submit, asserts the browser's HTML5 \`:invalid\` blocks submission and we stay on the create page. Verifies the actual contract (the form uses \`required\`), not heuristic error-class scraping. - \`collectionEditButton\` / \`collectionDeleteConfirmation\` — URL only (\`/collections\` → \`/library/collections\`). These still skip when zero collections exist; that's the right behavior pending fixture seeding (separate concern). Mobile modal tests (\`test_mobile_interactions_ci.js\`): - \`modalOpensOnMobile\` / \`modalClosesOnBackdropTap\` / \`modalCloseButtonAccessible\` — deleted. They tested modal-on-click behavior at a 404 route; the app never opened a modal there, so they were silent no-ops. The intent (mobile-fit create UX) is now covered by: - \`createFormFitsMobile\` — sets a 375x667 mobile viewport, navigates to \`/library/collections/create\`, and asserts the form fits the screen width and the required name input is at least 30px tall (touch-target). Same testing intent, real flow. - \`modalScrollableContent\` — unchanged; already targeted \`/settings/\` and works. Test plan (CI=true, server with LDR_DISABLE_RATE_LIMITING=true): - \`node test_crud_operations_ci.js\` — 5 passed / 10 skipped / 0 failed (previously 3 passed / 12 skipped — gained two real assertions on the collection create flow). - \`node test_mobile_interactions_ci.js\` — 3 passed / 5 skipped / 0 failed (Create Form Fits Mobile reports a 269px form on a 375px screen with a 52px name input). - \`mobile\` and \`api-crud\` shards via \`run_all_tests.js --shard=\` — both 100% green. * fix(ui-tests): point subscription tests at the real flow Same treatment as the collection rewrite in #4127, applied to the subscription create flow. The Create Subscription button on /news/subscriptions is a plain <button> whose JS handler runs \`window.location.href = '/news/subscriptions/new'\` (see static/js/pages/subscriptions.js) — it is page navigation, not a modal. The previous tests relied on findActionButton's click accidentally landing on a page with some form and passed by coincidence, not by asserting the actual contract. test_crud_operations_ci.js: - \`createSubscriptionFormOpens\` — verifies the list page exposes the create button, then navigates directly to /news/subscriptions/new and asserts the form has the required query textarea (#subscription-query) plus a submit button. - \`subscriptionFormValidation\` — navigates directly to /new, makes the query empty, clicks submit, asserts the browser keeps us on /new with HTML5 :invalid on the required textarea. Same browser-contract pattern as the collection rewrite. test_news_subscriptions_ci.js: - \`subscriptionFormModal\` -> \`subscriptionFormPage\` — renamed and rewritten. The "modal" label was wrong: the app never opens a modal here. Now asserts the create page is reachable and the query textarea is present + required. - \`subscriptionFormFields\` — was permanently SKIPped because it looked for fields named "query"/"frequency" etc. The actual form uses \`subscription-*\` ids (#subscription-query, #subscription-model, #subscription-search-engine, #subscription-strategy, #subscription-active). Selectors updated; this is now a real PASS instead of a silent SKIP. Note: \`subscriptionFrequencyOptions\` and \`subscriptionTypeOptions\` in CRUD still SKIP — those fields ("frequency" dropdown, "type" selector) do not exist on the real form (the closest equivalents are \`subscription-iterations\` and \`subscription-strategy\`). Either the test names or the app schema would need to change; out of scope here. Test plan (CI=true, server with LDR_DISABLE_RATE_LIMITING=true): - \`node test_crud_operations_ci.js\` — 5 / 10 / 0 (same totals, but Subscription Create Form Opens + Validation now report the real contract: "TEXTAREA #subscription-query required=true" / ":invalid blocks submit"). - \`node test_news_subscriptions_ci.js\` — 8 / 5 / 0 (was 7 / 6 / 0 — Subscription Form Fields is now PASS instead of SKIP). - \`api-crud\` and \`history-news\` shards via \`run_all_tests.js --shard=\` — both 100% green. Stacked on #4127 (collection-flow). Will rebase / fast-forward when that merges. * fix(ui-tests): real assertion for collection delete; drop dead edit test After #4127 / #4145 / #4151 worked through the other dead-field patterns in test_crud_operations_ci.js, the two collection list-page tests were still SKIPping for the same reasons: - \`collectionEditButton\` looked for an edit button on each card on /library/collections. The list page renders each card as a plain anchor link to the detail page — there is *no* per-card edit affordance. And the detail page has no edit UI either: \`grep -rn "edit\\|rename" src/.../templates/pages/collection_*.html\` returns nothing. Collections simply cannot be edited in the app. **Removed** the test — it was a permanent SKIP for a feature that does not exist. - \`collectionDeleteConfirmation\` looked for an inline delete button on the list page. There isn't one. Delete lives on the detail page (\`#delete-collection-btn\`), and the confirmation is a *native* \`window.confirm()\` prompt (collection_details.js::deleteCollection), not a DOM modal — so even fixing the URL would not help: the old \`document.querySelector('.modal, ...)\` would never find a native dialog. **Rewritten** with end-to-end seeding: 1. POST /library/api/collections to create a throwaway fixture. 2. Navigate to /library/collections/<id> and click #delete-collection-btn. 3. Capture the native confirm() via \`page.on('dialog')\` and \`dialog.dismiss()\` so the seeded collection survives for cleanup. 4. Assert dialog.type === 'confirm' and the message includes "Are you sure" / "delete". 5. DELETE the seeded collection via the API. This is the first per-test API-seeding fixture in the UI suite. It sets the pattern for the equivalent subscription / document / history-item tests, which still SKIP for the same kind of reason (empty DB + mismatch with native confirm()). I deliberately did not expand scope into those here — single concern per PR. Test plan (CI=true, server with LDR_DISABLE_RATE_LIMITING=true): - \`node test_crud_operations_ci.js\` — Collection Delete Confirmation now reports: "confirm() prompt fired: \\"Are you sure you want to delete \\"ldr-ui-test-collection-<ts>\\"..."" - Net: 14 tests (was 15: collectionEditButton removed). PASS count goes 5 -> 6; the new PASS replaces a permanent SKIP. Stacked on #4145. * fix(ui-tests): real subscription toggle + delete tests via API seeding (#4180) Extends the per-test fixture pattern from #4174 to two more permanently-SKIPping tests in test_crud_operations_ci.js: - \`subscriptionToggleStatus\` — was looking for buttons matching \`class*="toggle"\` / \`class*="pause"\` / \`class*="resume"\` / \`.toggle-status\` / \`input[type=checkbox]\`. The real card markup (pages/subscriptions.js::renderSubscriptionCard) uses \`<button class="btn btn-sm ldr-btn-icon" title="Pause|Resume">\`, none of which match the old selectors. Combined with an empty test DB, the test was a permanent SKIP for a real product feature (pause/resume) with zero current coverage. Now: seed an active subscription via POST /news/api/subscribe, click the Pause button, wait for the title to flip to "Resume", and assert the status badge text flipped to "paused". This exercises the actual pause API round-trip end-to-end. - \`subscriptionDeleteConfirmation\` — same shape as the collection delete rewrite (#4174). The card delete button calls deleteSubscriptionDirect, which guards on a native window.confirm() — not a DOM modal. The old query \`.modal, .confirm-dialog, [role="alertdialog"]\` never matched anything even on a populated list. Now: seed an inactive subscription, click the .btn-danger inside the card, capture the confirm() via page.on('dialog'), and assert the message includes "Are you sure" / "delete". Dismiss to preserve the fixture for explicit cleanup. One infrastructure pothole worth noting: navigateTo() in test_lib/test_utils.js skips when already on the same path. Both tests do "navigate to /news/subscriptions → seed → navigate to /news/subscriptions again" — the second navigate is a no-op, so the seeded card never renders. Worked around with \`page.reload({ waitUntil: 'domcontentloaded' })\`. This is the "Bug 5" from the earlier code-reviewer audit; addressing navigateTo() at the helper level is a separate discussion. Test plan (CI=true, server with LDR_DISABLE_RATE_LIMITING=true): - \`node test_crud_operations_ci.js\` reports: "Pause → Resume toggle works (active → paused, button retitled)" "confirm() prompt fired: \\"Are you sure you want to delete the subscription \\"ldr-ui-test-delete-<ts>\\"...\\"" - Pass count goes 6 -> 8; both new PASSes were permanent SKIPs. - Cleanup verified in both tests via finally-block DELETE. Stacked on #4174 (collection delete flow). * fix(ui-tests): guard cleanup page.evaluate; correct toggle endpoint comment The three rewritten CRUD tests ran their best-effort fixture cleanup as an unguarded `await page.evaluate(...)` inside `finally`. The inner try/catch only swallows the in-page fetch error; if the page is detached during cleanup the evaluate call itself throws from the finally block, discarding the real test result and reporting a misleading error. Wrap each cleanup evaluate in a Node-side try/catch so cleanup can never mask the test outcome. Also correct the subscriptionToggleStatus comment: the toggle hits PUT /news/api/subscriptions/<id> (toggleSubscription) and re-renders the list in place — there is no /pause endpoint and no full list reload. * fix(ui-tests): wait for collection detail load before clicking delete navigateTo only waits for domcontentloaded, but the collection detail page populates collectionData via an async fetch fired on DOMContentLoaded. Because deleteCollection() dereferences collectionData.name before calling confirm(), clicking #delete-collection-btn before that fetch lands throws a TypeError and no dialog fires — the test then reports a false "No confirm() dialog" failure. This is a real flake on a contended CI runner. Wait for #collection-name to swap in from its "Loading..." placeholder before clicking, mirroring the waitForSelector-on-seeded-card guard the subscription toggle/delete tests already use. |
||
|
|
4916dd5817 |
feat(security): validate LLM provider base_url against SSRF rules (#3929)
* feat(security): validate LLM provider base_url against SSRF rules LangChain provider SDKs (ChatOpenAI, ChatOllama) construct an internal httpx client from the configured base_url and route every inference call through it — bypassing safe_requests entirely. An authenticated user with permission to edit llm.<provider>.url could otherwise point inference traffic at internal services or cloud-credential endpoints. Adds ssrf_validator.assert_base_url_safe(base_url, *, setting_key) that wraps validate_url(allow_localhost=True, allow_private_ips=True) and raises ValueError on rejection. ALWAYS_BLOCKED_METADATA_IPS still fires under those permissive flags, so cloud metadata stays blocked. Call-site changes: - openai_base.py:create_llm + _create_llm_instance — gated on cls.url_setting (subclasses with hardcoded default_base_url like Anthropic / OpenAI / OpenRouter / xAI / IONOS skip validation; no operator URL to attack) - ollama.py:list_models_for_api — graceful: swallow ValueError, return [] - ollama.py:create_llm — strict: let ValueError propagate so the research-query caller surfaces a clear config error - ollama.py:is_available — graceful: swallow, return False so the model-list UI degrades cleanly setting_key (not provider_name) used in error messages so operators see the actual settings dot-path. Two pre-existing tests updated to use http://localhost:5000/v1 instead of http://custom:5000/v1 since the placeholder hostname doesn't resolve under the new SSRF guard. 19 new tests in tests/llm/test_provider_base_url_ssrf.py covering metadata-IP rejection, localhost/RFC1918 acceptance, edge cases, per-provider integration, validate-before-instantiate ordering, and public API re-exports. Supersedes #3778 (which addressed only openai_base.list_models_for_api inline; this PR covers all 5 sites with a shared helper). * test(security): decouple LLM config-passthrough tests from live DNS The new openai_base SSRF guard (assert_base_url_safe) does a live DNS lookup and fail-closes when a host does not resolve. The config-passthrough tests in test_api_key_configuration.py configured public/placeholder endpoint URLs and mocked ChatOpenAI but not the SSRF guard, so: - test_custom_endpoint_url_configuration failed in CI because its placeholder host (custom-llm-provider.com) does not resolve. - The three openrouter.ai sibling tests passed only because that domain resolves in CI — a latent live-DNS dependency. Patch assert_base_url_safe to a passthrough in these four config tests, matching the convention in tests/llm/test_provider_base_url_ssrf.py (test_pure_provider_skips_validation patches the same symbol). The URL/api_key passthrough assertions stay meaningful; SSRF enforcement is covered separately by test_provider_base_url_ssrf.py. No production code or the SSRF validator is changed. * fix(security): guard remaining LLM base_url SSRF gaps Two base_url construction sites still bypassed the assert_base_url_safe guard this PR introduces (same vuln class: a user-configurable URL handed to an SDK client / ChatOpenAI that uses its own httpx transport, bypassing safe_requests): - OpenAICompatibleProvider.list_models_for_api() (base class used by CustomOpenAIEndpoint / LMStudio / LlamaCpp) instantiated OpenAI(base_url=...) without guarding. Add a guard symmetric with the create_llm guard: only when base_url and cls.url_setting; on ValueError log a warning and return [] so model-listing degrades gracefully (no 500). - OpenAIProvider.create_llm() (url_setting = None, overrides create_llm without calling super, so the base guard never runs) set openai_params[openai_api_base] from llm.openai.api_base unguarded. Guard it and let ValueError propagate (fail-fast at inference construction, matching ollama.create_llm). Add focused tests mirroring the existing SSRF tests: list_models_for_api returns [] and logs on a blocked metadata URL (and never builds the SDK client); OpenAIProvider.create_llm raises on a blocked api_base; localhost endpoints still pass both paths. Mutation-verified both new negative tests. * test(security): decouple openai api_base passthrough test from live SSRF guard test_create_llm_passes_api_base_when_set sets llm.openai.api_base to a non-resolving placeholder and asserts it is passed through to ChatOpenAI. The new assert_base_url_safe guard in OpenAIProvider.create_llm does a live DNS lookup and fail-closes on the unresolvable host, breaking this config-passthrough test. Patch assert_base_url_safe to a passthrough at its import site in the openai implementation module (side_effect=lambda url, **_kwargs: url), matching the convention already used in tests/test_api_key_configuration.py and tests/llm/test_provider_base_url_ssrf.py. The test still asserts the api_base passthrough behavior, so it stays meaningful. |
||
|
|
374d075f1b |
feat(benchmark): record LDR version + redacted settings snapshot at benchmark start (#3844)
* feat(benchmark): record LDR version + redacted settings snapshot at start
The YAML download stamps `ldr_version` and `date_tested` from the
*current* app at download time (`new Date()` and `meta[name=app-version]`
in `benchmark_results.html`). A v1.6.5 run downloaded on v1.6.7 shows
v1.6.7. We hit this debugging the SimpleQA 95.7%/1m54s reference run —
couldn't tell which LDR version produced it. The YAML also exposes only
~3 settings (temperature, context_window, max_tokens), losing the ~30-50
settings that actually affect benchmark behaviour, so cross-run
comparisons are impossible.
Record the active LDR version and a full settings snapshot on the
`BenchmarkRun` row at start time. Migration 0010 adds two nullable
columns; existing rows survive with NULL and the YAML path falls back
to "unknown (pre-0010 run)". `start_benchmark` already captures the
settings snapshot for the background thread (it just discards it after);
we redact secrets via a new `DataSanitizer.redact_settings_snapshot`
helper and persist alongside `__version__`. Snapshot capture is wrapped
defensively because `get_all_settings()` raises uncaught `LookupError`
on stale enum-mismatch rows (we hit that on the user's old DB earlier).
The plain `DataSanitizer.redact()` does not handle the nested-with-
metadata snapshot shape `{key: {value, ui_element, ...}}` — the outer
compound key isn't in the sensitive-name set and the inner `value` key
isn't either, so secrets leak. The new helper checks
`ui_element == "password"` plus key-suffix defense-in-depth and
preserves metadata so YAML diffs still show "this key existed."
`/api/history` gains the small fields (`ldr_version`, `start_time`);
`/export` gains a `metadata` block with the full settings snapshot, so
the per-page list response stays small (~80 bytes/run added) while the
download click pays the ~184 KB snapshot cost. The YAML builder now
always calls `/export`, uses `metadata.started_at` for `date_tested`
(preserving the field name that the public HuggingFace leaderboard
parser at ldr-benchmarks/scripts/build_leaderboards.py reads), and
appends a `settings:` block via a new type-aware `formatSettingValue`.
* docs: add news fragment for #3844
User-visible change: YAML download now reflects the version and
settings active when the benchmark ran, not at download time.
* fix(search): handle Wikipedia 429s and empty summary-fetch results
Two production bugs surfaced during the benchmark verification of
PR #3793 + this snapshot work, plus the stale doc that should ship
alongside.
Wikipedia 429 cascade
=====================
The `wikipedia` PyPI library calls `Response.json()` unconditionally,
so when MediaWiki rate-limits us (HTTP 429 with an HTML body), every
following `wikipedia.summary()` in the same batch raises
`JSONDecodeError`. The previous handler logged a full traceback per
title — we observed 14+ stack traces for a single rate-limited query
during yesterday's SimpleQA benchmark on Roman emperors / Japanese
manga.
Catch `(json.JSONDecodeError, requests.exceptions.JSONDecodeError)`
specifically, log a single warning, and break out of the per-title
loop (every remaining title in the batch will hit the same throttle).
Same treatment for the outer `wikipedia.search()` call.
Summary-fetch empty-content / empty-summary guards
==================================================
With `summary_focus_query` now the default fetch mode (PR #3793),
`_make_summary_fetch_tool` runs against every fetched URL — including
paywalls, JS-only SPAs, and 200-but-blank pages. Two failure paths:
1. Empty page content: the LLM was being invoked to summarise an
empty string (waste of a round-trip), and the URL was being
registered in the citation collector with an empty snippet —
`_register_in_collector` caches by URL, so the agent could never
re-fetch the URL under a different focus.
2. Empty LLM summary: same cache poisoning, this time after the
model decided nothing on the page matched the focus.
Both paths now short-circuit to `NOT RELEVANT` BEFORE registration,
leaving the URL re-fetchable. Empty content also skips the LLM call
entirely.
Stale docs/CONFIGURATION.md
===========================
The auto-generated config table still listed `search.fetch.mode`
default as `full` (the pre-#3793 value). Regenerated via
`python scripts/generate_config_docs.py`.
* docs: add news fragment for the bundled #3844 bugfixes
* fix(settings): catch LookupError in get_all_settings + redact /settings/api response (#3947)
* fix(settings): catch LookupError in get_all_settings + redact /settings/api response
Two related settings hardenings discovered during PR #3793 / #3844 verification.
LookupError fallback in SettingsManager.get_all_settings
========================================================
The DB-query handler at manager.py:780 caught only SQLAlchemyError. When
a row's `type` column holds an enum value no longer in `SettingType`
(e.g. legacy 'CHAT'-typed rows from a removed feature), SQLAlchemy
hydrates the row and raises `LookupError: 'CHAT' is not among the
defined enum values` — which propagated and crashed every caller of
`get_all_settings`: the `/settings/api` endpoint, benchmark start,
research start, MCP entry points. We hit this on a real user's old DB.
Add `LookupError` to the except clause so the snapshot falls back to
defaults-only on a stale row instead of crashing. PR #3844 already
defensively wrapped the benchmark-start path; this fixes the underlying
problem so the workaround there becomes belt-and-suspenders.
Redact /settings/api response
=============================
`api_get_all_settings` returned the full nested-with-metadata snapshot
to the frontend, including env-overridden API keys (`llm.openai.api_key`,
`search.engine.web.serper.api_key`, etc.) — anyone who could
authenticate could grab plaintext keys via a single GET. Now wraps the
response in `DataSanitizer.redact_settings_snapshot` (introduced in
PR #3844) so password-typed values come back as `[REDACTED]`.
Verified safe to ship: no JS caller round-trips the bulk endpoint's
values into a form (the settings form is server-rendered from the
template directly), so redacting the JSON does not break the UI. The
template still pre-fills password inputs with raw values — that's a
separate template-level UX concern (filed as out-of-scope follow-up).
Depends on PR #3844 for the `redact_settings_snapshot` helper.
* docs: news fragments for PR #3947
* chore(migrations): renumber 0010 → 0013 to follow current head
Main has since landed 0010_add_chat_tables. Rename PR's migration:
0010_benchmark_run_version_and_snapshot.py → 0013_benchmark_run_version_and_snapshot.py
Update revision id (0010→0013) and down_revision (0009→0012, assuming
#3772 lands as 0011 and #3716 as 0012; user can re-renumber if order differs).
Update test references to the new revision number.
Prep commit; next commit merges main.
* test: mark benchmark export metadata test as intentional no-SUT-import
The test exercises the export endpoint indirectly through helpers
imported from a sibling coverage suite (_make_app,
_make_export_query_router, _patch_auth_and_db), which in turn import
from local_deep_research. The check-shadow-tests hook (added in #4356)
doesn't follow re-export chains, so it sees no direct
`local_deep_research` import and flags this as a shadow test.
Add the `# allow: no-sut-import — …` marker per the hook's convention.
* fix(db): renumber benchmark migration 0013→0014 (resolve collision with 0013_remove_meta_search_engines)
Main's 0013 is 0013_remove_meta_search_engines (#4534); this branch's
benchmark migration was still numbered 0013 (down_revision 0012), a
revision-id collision and a second head at 0012. Renumber to 0014 with
down_revision 0013 so the chain is linear:
0012 → 0013_remove_meta → 0014_benchmark.
Also update the stale test (named/asserting 0010 from before an earlier
0010→0013 renumber): rename to test_migration_0014_*, anchor the
hand-built legacy DB at 0013 and upgrade the single 0014 delta, and
complete the expected-files list in test_alembic_migrations.py
(0011/0012/0013_remove were missing).
* docs(db): finish 0013→0014 renumber — update migration log + pre-migration comments
Leftover '0010' references in the renamed benchmark migration (log line,
docstring) and in benchmark_routes.py comments describing pre-migration
rows now read '0014', matching the renumbered revision. Comment/log text
only — no behavior change.
* docs(benchmark): fix remaining stale '0010' migration refs to 0014
Model comment + two benchmark test docstrings/names still referenced
'migration 0010' after the benchmark migration was renumbered to 0014.
Comment/test-name only — renamed test_pre_0010_row_returns_null... to
...0014... for accuracy.
* fix(benchmark): YAML version fallback must not stamp current app version on old runs
AI review caught that the download JS fell back to the current app
version for pre-migration rows:
meta.ldr_version || run.ldr_version || appVersionMeta || 'unknown ...'
Both ldr_version fields are null for pre-0014 rows, so the chain fell
through to appVersionMeta (today's version) — reproducing the exact
'wrong version on an old run' bug this feature exists to fix. Drop
appVersionMeta from the chain; pre-0014 rows now read
'unknown (pre-0014 run)'.
Also fixes the stale '0010' references I missed in non-.py files
(benchmark_results.html ×3, changelog.d/3844.feature.md, a benchmark
test comment) — the renumber sweep had only covered .py. The real
0010_add_chat_tables references are untouched.
And clarifies the best-effort settings-snapshot catch in start_benchmark:
the enum case is handled inside get_all_settings; the broad outer catch
is the resilience backstop and logs a full traceback (exception class
included) — kept deliberately so a snapshot-capture failure never blocks
starting a benchmark.
* feat(benchmark): make settings snapshot opt-in in YAML download
The benchmark YAML always embedded the full run-time settings snapshot.
Even with secrets redacted, that exposes internal endpoint URLs and local
paths — a problem for the feature's own public-leaderboard use case. Make
it opt-in:
- Default download = summary only (model/provider/strategy/config/version/
date), all still run-time-sourced. No settings: block.
- New Export dropdown (replaces the two YAML buttons, mirrors results.html)
with an 'Include settings snapshot' checkbox (default off). When checked,
both the summary and +examples downloads append the redacted snapshot.
- downloadBenchmarkYAML gains an includeSettings flag gating the
formatSettingsSnapshot append; the /export fetch passes ?include_settings=1
only when opted in.
- Backend export endpoint skips loading/returning the ~184KB settings_snapshot
column unless ?include_settings is set, so the default download doesn't
transfer a blob it would discard.
- Changelog reflects opt-in; tests: export-endpoint opt-in/omitted-by-default,
Puppeteer Export-dropdown presence + default-off checkbox.
Pre-0014 rows still render 'settings: null' when opted in (unchanged).
* refactor(benchmark): normalize include_settings query parsing (AI review)
Case-insensitive .lower() in ("1","true") instead of listing "True"
explicitly. Behaviour unchanged for the frontend (sends ?include_settings=1).
* style(benchmark): ruff format include_settings parse
* refactor(benchmark): narrow settings-snapshot catch to recoverable errors (AI review)
Catch only (SQLAlchemyError, LookupError, ValueError) around
get_all_settings instead of bare Exception. The expected stale-enum
failure still degrades to an empty snapshot (ldr_version preserved),
but a genuine programming error now propagates to the outer handler
that logs a full traceback and marks the run FAILED, rather than being
silently swallowed into a degraded benchmark.
---------
Co-authored-by: LearningCircuit <185462206+LearningCircuit@users.noreply.github.com>
|
||
|
|
99041bf3d2 |
docs(resource-cleanup): record asyncio.run loop does not reproduce #3816 (#4143)
Closed PR #3930 explored an opt-in manual smoke suite that ran await ainvoke() in a loop against a real Ollama to reproduce the eventpoll-FD accumulation from #3816. It did not reproduce — a single-thread asyncio.run(ainvoke) loop deterministically closes its loop's selector each call, so eventpoll FDs do not accumulate. Reliable reproduction needs sustained concurrent load (multi-worker harness over a shared loop). Recording the empirical finding under "Intentionally not done" so the next leak hunter does not re-implement the same suite and rediscover the same negative result. Points to the existing in-CI tests in tests/utilities/test_close_base_llm.py as the regression coverage for the close-chain introspection branches. |
||
|
|
784864fd31 |
fix(security): extract sanitize_error_message to standalone function + strengthen patterns (#4466)
* refactor(security): extract sanitize_error_message to standalone function Extract the regex-based credential sanitizer from BaseSearchEngine._sanitize_error_message to a standalone sanitize_error_message() function in security/log_sanitizer.py so non-engine code (LLM config, error handling) can reuse it. Two fixes to the existing patterns: - Add `api-key` to the URL param alternation (Guardian uses this) - Change sk-/pk- char class to include hyphens for modern key formats (sk-proj-..., sk-ant-api03-...) * fix(security): run URL-credentials redaction before URL-param redaction If a query-param value is itself a URL with embedded credentials (?api-key=https://user:pass@host), the param pattern consumed the 'https' scheme first, so the URL-credentials pattern no longer matched and user:pass leaked into the sanitized message. Running the credentials pattern first fixes this for every param spelling (the key=/token= variants had the same pre-existing interaction). The reverse order is safe: anything the credentials pattern consumes is replaced by [REDACTED] literals, so it cannot expose text the param pattern would have caught. Found by adversarial fuzzing (200k inputs) of old vs. new patterns. * chore: fix stale docstring example, drop duplicate test The _sanitize_error_message docstring example used an 11-char key that the 20+-char pattern never matched, and showed the wrong replacement token. test_none_like_empty_passthrough was byte-identical to test_empty_string_passthrough. * fix(security): harden credential leak vectors across 5 code paths (#4468) * fix(security): harden credential leak vectors across 5 code paths Fix remaining credential leak vectors found during PR #4452 audit: - PubMed: Replace 5 bare logger.exception() calls with dual-scrub (sanitize_error_message + redact_secrets + logger.warning) - Base class run(): Sanitize error_message at 3 locations before it flows to SearchTracker.record_search (database persistence) - LLM config: Replace _log_llm_error's logger.exception (which explicitly re-logged URLs without redaction) with logger.warning + dual-scrub - Google PSE: Replace bare raise in _validate_connection with type(e)(safe_msg) from None to sanitize the re-raised exception - OpenAI compat errors: Replace {exc!s} with sanitize_error_message(str(exc)) in the Details suffix * fix(security): harden credential leak vectors across 7 more engines (#4474) Replace logger.exception with dual-scrub pattern (sanitize_error_message + redact_secrets + logger.warning) in GitHub, Elasticsearch, Paperless, Semantic Scholar, Brave, NASA ADS, and Guardian search engines. Store credentials on self where needed for redaction access. * fix(security): redact ES password in query_string/DSL search catch blocks search_by_query_string and search_by_dsl passed only _api_key to redact_secrets, while the other three Elasticsearch catch blocks also pass _password. A basic-auth password surfaced in a client error (not in https://user:pass@host URL form, which sanitize_error_message already catches) would leak via these two paths. Pass _password too, making all five ES catch blocks consistent. Adds a regression test that fails without this change. * test(security): set settings_snapshot on DB-persist test engine test_error_message_sanitized_before_db_persist builds its engine via __new__, bypassing __init__ (which always sets settings_snapshot or {}). run()'s egress-verification path reads self.settings_snapshot; when a sibling test arms the global egress audit net during a parallel CI run, this raised AttributeError, turning the green-in-isolation test red in the full suite. Set settings_snapshot = {} on the fixture to match __init__ and remove the ordering dependency. * fix(security): dual-scrub the log path in BaseSearchEngine.run() The three run() exception handlers dual-scrubbed the DB-bound error_message but the log-bound safe_msg used redact_secrets alone (literal values only, no regex). A foreign credential in the exception (e.g. https://user:pass@host where user:pass is not the engine's own api_key) was therefore redacted in the DB but leaked into the log. Apply sanitize_error_message to safe_msg in all three blocks so the log path matches the DB path. Adds a regression test that fails without the fix (foreign URL credential, caught only by the regex layer, not by the literal pass). Note: the same single-scrub-log pattern exists in several engine and LLM-provider catch blocks (all pre-existing on main, not added here); those are left for the _scrub_error base-helper follow-up that will make every call site dual-scrub by construction. |
||
|
|
33f410ed3d |
fix(security): force diagnose=False on the file sink too (#4182) (#4568)
Final piece of the #4182 logging chokepoint. The optional file sink (LDR_ENABLE_FILE_LOGGING) still honored LDR_LOGURU_DIAGNOSE, so an operator with debug + diagnose + file logging all enabled could persist frame-local credentials — including the unrecoverable SQLCipher master password (TRUST.md §5) — into an unencrypted log file. It now runs with diagnose=False, matching the DB and frontend sinks; frame-local exception dumps render only to the ephemeral stderr sink. This makes the code match the policy the comment above the sink block already states ('ONLY on the ephemeral, local stderr sink'). Renames test_diagnose_propagates_to_file_sink -> test_file_sink_never_gets_diagnose and updates it to assert the file sink is diagnose=False even when the opt-in is on. Mutation-verified: reverting the file sink to diagnose=diagnose fails the test. |
||
|
|
6762ccfaf5 |
fix(library): don't quarantine healthy FAISS index when embedding provider is down (#4512)
* fix(library): don't quarantine healthy FAISS index when embedding provider is down
load_or_create_faiss_index ran the embedding dimension probe
(embed_query('dimension_check')) inside the try-block whose except
assumes 'corrupt index file'. When the embedding provider was merely
unreachable (e.g. Ollama not running), the probe raised, the except
quarantined a perfectly healthy index (renamed to .corrupt-*) and
silently replaced it with an empty in-memory one. Once the provider
came back, the collection searched an empty index - zero results with
no hint why - until the user manually reindexed (and quarantined files
may be pruned later).
Hoist the probe out of the try-block so a provider outage propagates
as a clear error and the index files stay untouched. Quarantine still
applies to genuine FAISS.load_local failures (torn .pkl, malformed
pickle), which do not depend on the provider.
Found while diagnosing #4503 (research error 'Error searching
collection' with Ollama down).
* docs(changelog): add fragment for #4512
|
||
|
|
b0f5c1f5fa |
test(db): lock in LDR_TEST_MODE boolean parsing for the KDF floor (#4565)
* test(db): lock in LDR_TEST_MODE boolean parsing for the KDF floor Follow-up coverage from the #4564 review. _get_min_kdf_iterations() now parses LDR_TEST_MODE via to_bool(); add direct unit tests that pin both ends of that behaviour: - falsey values (0/false/False/no/off) must NOT relax the floor — this is the exact regression #4564 fixed, and these fail on the old bare-truthiness code - the full truthy set (true/TRUE/yes/on/enabled) must relax it, guarding against a future narrowing to a 3-value parser Also move the integration test's requested KDF (50000) off the registry min_value boundary (1000) so it exercises the KDF-floor clamp rather than the registry's range validation, and stays robust if that min is raised. Test-only; no behaviour change. * test(db): clarify load-bearing PYTEST_CURRENT_TEST delenv, add invalid-value case Addresses review feedback on the test-coverage PR: - comment why each test clears PYTEST_CURRENT_TEST (the function relaxes the KDF floor on its presence too, which is always set under pytest, so clearing it is required to isolate the LDR_TEST_MODE behaviour) - add an unrecognised string ("banana") to the falsey set so an arbitrary truthy-looking value is proven NOT to relax the floor |
||
|
|
aced529e70 |
fix(settings): collapse sections by default on mobile (#4032)
* fix(settings): collapse sections by default on mobile The Settings page renders ~400 controls across ~57 sections. On mobile this produced a page taller than 16,384 px — Puppeteer's screenshot ceiling and a genuine UX disaster for the user, who had to scroll forever to find anything. `initAccordions` now starts every section collapsed at the `(max-width: 767px)` breakpoint; desktop keeps the previous "all expanded" default so power-user workflows are unchanged. Active search overrides the collapse — when the user types in the filter, surviving sections are force-expanded so matches aren't hidden behind a tap. The AJAX save handlers attach per-input listeners on input/select/ textarea elements directly, not via FormData on the form, so `display: none` on the collapsed section body does not break saves. Page height on mobile (Pixel 5 viewport) drops from > 16,384 px to ~7,300 px in local measurement. CSS: tighten section-header layout on mobile back to a row (was stacking title above chevron in `settings.css:1419`, which buried the chevron affordance), enforce 44 px touch target, and give the toggle icon a 32×32 hit area. Tests: - New `settings-mobile-collapse.spec.js` pins initial state per viewport, page-height ceiling, search-forces-expand, and tap-toggle. - `settings-filter-behavior.spec.js` updated to wait with `state: 'attached'` since the rows are now DOM-attached but hidden on mobile before the first user tap. * docs(changelog): add 4032 bugfix entry * refactor(settings): drive chevron rotation via CSS, extract mobile breakpoint Address reviewer feedback on PR #4032: - Remove inline `icon.style.transform` mutations from initAccordions (init, expand, and click-handler branches). The `collapsed` class is already toggled on the header in all three paths, and settings.css:234 rotates the `.ldr-settings-toggle-icon` container based on that class — so the inline transforms were redundant and were compounding with the container rotation to point the chevron at -180deg instead of -90deg. - Extract the 767px mobile threshold into `MOBILE_BREAKPOINT_PX` and expose it via `window.__LDR_MOBILE_BREAKPOINT_PX` so the Playwright spec reads from the same source. CSS still hardcodes 767 in its own `@media` rules (CSS can't read JS values); a comment in settings-mobile-fix.css warns future editors to keep the literals in sync. * ci(playwright): run settings-mobile-collapse spec in WebKit suite The new settings-mobile-collapse.spec.js lived in tests/ui_tests/playwright/ but no CI job ran it: the PR-level UI checks use the Puppeteer run_all_tests.js runner, and the only workflow that invokes `npx playwright test` against that dir (this nightly WebKit suite) filters to an explicit allowlist that did not include the new spec. So the mobile-collapse fix had zero CI coverage. Add settings-mobile-collapse to both the Desktop Safari and Mobile Safari filters. The spec branches on isMobileViewport(), so it is meaningful on both: Desktop Safari asserts sections stay expanded, Mobile Safari asserts they collapse by default, the page clears the 16384px ceiling, and tap toggles. |
||
|
|
1b81e80293 |
fix(security): close password frame-locals leak class at the sink + targeted sweep (#4182) (#4555)
* fix(security): close password frame-locals leak class at sink + targeted sweep (#4182) Found during review of #4553: the diagnose=True frame-locals leak class spans ~16 more files beyond the #4530/#4540 core sweep. Most are unrelated error handlers in big multi-purpose functions that merely happen to have a password param in scope, so blanket-converting them would strip useful tracebacks. Address it at two levels instead: 1. Sink-level chokepoint (utilities/log_utils.py): force diagnose=False on the database sink (persists into the user's encrypted DB) and the frontend sink (ships to the browser). diagnose stays available only on the local stderr/file sinks the operator explicitly opted into. This stops loguru from ever rendering frame-local credentials into a durable or remote sink, covering every credential-bearing handler app-wide regardless of per-site logging discipline. 2. Targeted per-site sweep of the credential-centric session helpers where the master password is the operative secret: metrics/search_tracker, metrics/token_counter, database/library_init, database/backup/backup_executor, web_search_engines/rate_limiting/tracker, research_library/.../research_history_indexer. logger.exception -> logger.warning (15 sites; all static messages, nothing to redact). Added these modules to test_password_redaction_invariant's SWEPT_MODULES. Big route/service handlers (research_service, research_routes, rag_routes, auth, news, ...) are intentionally NOT blanket-converted: they keep full tracebacks for unrelated errors and are protected by the sink-level guard. * test(log_utils): add sink-diagnose policy test + update existing diagnose tests - New TestDiagnoseSinkPolicy in test_log_utils_coverage: asserts the DB and frontend sinks are diagnose=False even when LDR_LOGURU_DIAGNOSE is opted in, while stderr honors the opt-in. Mutation-verified. - Update three test_log_utils_extended tests that asserted diagnose propagated to *all* sinks; they now assert the new policy (stderr/file honor the opt-in; DB + frontend forced off). * test(context-overflow): match overflow warning by anchor, not call position The targeted #4182 sweep converted token_counter._save_to_db's except from logger.exception to logger.warning. The metrics-save path runs after overflow detection in these tests, so the overflow warning is no longer the *last* logger.warning call. Match it by its stable 'Context overflow detected' anchor across all warning calls via a small helper, instead of asserting on mock_logger.warning.call_args[0] (the last call). * test(log_utils): key file-sink diagnose assertion by sink identity, not call index Addresses #4555 review nit: test_diagnose_propagates_to_file_sink asserted on calls[3] assuming the file sink is always the 4th logger.add() call. Switch to a sink-keyed lookup (stderr / database_sink / frontend sink by identity; file sink as the remaining Path sink) so a future reorder in config_logger can't silently break the assertion. |
||
|
|
703efd3a05 |
fix(security): parse LDR_TEST_MODE as a boolean for the KDF floor (#4564)
_get_min_kdf_iterations() relaxes the SQLCipher KDF minimum to a test
value when LDR_TEST_MODE is set, but it used a bare truthiness check on
the raw env string. Any non-empty value — including LDR_TEST_MODE=0 or
=false — therefore enabled the relaxed floor, silently weakening DB
encryption for someone who set the flag explicitly to turn test mode
OFF. Parse it via the existing to_bool() helper so only genuinely-truthy
values (1/true/yes/on/enabled) relax the floor.
PYTEST_CURRENT_TEST stays presence-based: pytest sets it to a
descriptive non-boolean string, so its mere presence signals a test run.
Adds a regression test asserting LDR_TEST_MODE in {0,false,no,off} keeps
the production minimum (low requested values clamp to the default).
|
||
|
|
66754e8378 |
fix(metrics): record LLM calls when provider reports no token usage (#4473)
* fix(metrics): allow token metrics for non-encrypted databases (#4457) The background thread path in _save_to_db() unconditionally required user_password from research_context. For non-encrypted databases, get_user_password() returns None (no password exists), so all token metrics were silently dropped with a warning log. Research ran fine but the metrics pages showed "No token usage data yet." Make the password check conditional on db_manager.has_encryption. Also add ModelUsage upsert to the thread-safe writer (previously only the MainThread path updated ModelUsage aggregates). * fix(metrics): record LLM calls even when provider reports no token usage Root cause of #4457: when the LLM provider returns no usage data (OpenAI-compatible servers omitting 'usage' on streamed responses, proxies stripping it, Ollama omitting prompt_eval_count for fully-cached prompts), on_llm_end skipped _save_to_db entirely. Zero TokenUsage rows meant every metrics page (token use, model use, context-overflow, Recent LLM Calls, Developer Insights) rendered as if LDR had never been used, while researches completed normally. Verified end-to-end against an isolated encrypted-DB instance with a fake Ollama server: with usage fields, metrics record on both main and v1.7.0; without them, researches complete but zero rows are written. With this fix the call is recorded with zero token counts (model, provider, phase, response time and context-overflow estimates are still real data) plus a warning log naming the provider. * feat(metrics): opt-in stream_usage setting for openai_endpoint provider Adds llm.openai_endpoint.stream_usage (default off). When enabled, ChatOpenAI sends stream_options.include_usage so streamed responses report real token counts instead of being recorded as zero-count calls. Off by default and opt-in only: some OpenAI-compatible gateways (xAI, Databricks AI Gateway, Azure 'on your data') reject requests containing stream_options with a 400, which would break the call entirely. This mirrors upstream langchain-openai, which auto-enables stream_usage only for the official api.openai.com base URL for the same reason. The zero-usage warning log now points openai_endpoint users at the setting when their server supports it (LM Studio 0.3.18+, llama.cpp, vLLM, OpenRouter). * fix(metrics): encryption-aware search/rate-limit metric writes; honor period in research count Two follow-ups to the #4457 family of silent metrics drops: 1. search_tracker.py and rate_limiting/tracker.py still hard-required user_password before any metrics DB access (6 sites), silently dropping search metrics and rate-limit estimates for non-encrypted databases — the same bug class fixed for token metrics earlier in this PR. Added metrics_password_available() in thread_metrics and gated all sites on it. 2. _get_metrics_from_encrypted_db filtered the ResearchHistory count with period values 'today'/'week'/'month', but the API only ever sends '7d'/'30d'/'3m'/'1y' — no branch matched, so 'total researches' on /metrics ignored the selected period and always showed all-time. Extracted get_period_cutoff() in query_utils (shared with get_time_filter_condition) and compare isoformat strings against the TEXT created_at column. * fix(metrics): widen set_user_password to Optional[str] for unencrypted DBs mypy: the encryption-aware gates now legitimately pass password=None through to set_user_password for non-encrypted databases; get_session() is the layer that enforces a real password when encryption is on. * fix(metrics): honor 3m/1y period in all analytics; dedupe no-usage warning Review follow-ups for #4473: - F4: ratings, link and strategy analytics (metrics_routes.py) computed their day window via get_period_days, whose map only knew '90d'/'365d'. The main dashboard sends '3m'/'1y', so picking '3 months' or '1 year' silently fell back to 30 days. Made PERIOD_DAYS_MAP a superset of both UI vocabularies (the standalone link-analytics page still sends 90d/365d) and routed get_period_cutoff through get_period_days so there is a single period vocabulary across every metrics endpoint. - F2: the 'provider returned no token usage' warning fired on every LLM call; a 50-call research logged 50 identical lines. Now logged once per research session via a _warned_no_usage flag — calls are still recorded every time, only the warning is deduplicated. Note: the earlier-flagged ModelUsage None-model concern (F1) was verified a non-issue — on_llm_start always resolves current_model/provider to 'unknown', so None never reaches the nullable=False columns. * revert(metrics): drop encryption-aware password relaxation (false premise) The has_encryption-aware password checks were built on the premise that non-encrypted databases have no password. That premise is false: unencrypted-mode users still register/login with a real password, which is stored in session_password_store and returned by get_user_password() normally — research_context['user_password'] is None only on session loss, the same failure mode as encrypted mode. The relaxation was also half-broken: open_user_database/create_user_database validate the key unconditionally and raise on None even for unencrypted DBs, so a None password only survived on an engine cache hit and failed on cache miss. The rest of the app already handles unencrypted-no-password with a placeholder string (database_middleware 'dummy', session_context UNENCRYPTED_DB_PLACEHOLDER), never None. Reverts the password-check changes in thread_metrics.get_session, token_counter._save_to_db, search_tracker and rate_limiting/tracker back to requiring a password (matching origin/main), and removes the metrics_password_available helper and its tests. Keeps the real, verified fixes: zero-count recording when the provider reports no usage (the actual #4457 fix), the ModelUsage upsert parity in the background-thread path, the opt-in stream_usage setting, and the unified period vocabulary. * test(metrics): patch get_setting_from_snapshot at source for stream_usage tests Main refactored openai_base.py to import get_setting_from_snapshot inside its methods (function-local) to avoid a circular import, so the module-level attribute my stream_usage tests patched no longer exists and CI failed with AttributeError. Patch the source module (config.thread_settings.get_setting_from_snapshot) instead, matching the updated pre-existing test_create_llm_uses_custom_url_from_settings. |