mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
ba0912056c5b78bf2c8a1fb15eef202d72fec401
6474 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ba0912056c |
test(llm_utils): pin daemon-thread contract for in-loop async close (#4078)
* test(llm_utils): pin daemon-thread contract for in-loop async close The existing ``tests/utilities/test_close_base_llm.py`` already covers the sync + async + in-loop + timeout + idempotence + FD-growth cases for ``_close_base_llm``. Two narrow contracts remained unpinned: - **Daemon flag** — the cleanup thread at llm_utils.py:154-159 must be ``daemon=True`` or a stuck ``aclose()`` would hold up Python interpreter shutdown. The comment at llm_utils.py:140-143 documents this requirement but no test asserted it. - **In-loop close marks ``_ldr_closed`` even when inner aclose raises** — the cleanup thread runs ``asyncio.run(aclose())`` inside a ``try/except Exception`` (lines 146-152). When ``aclose`` raises, the thread exits cleanly and the main thread sees ``t.is_alive() == False``, then sets ``_ldr_closed = True`` (line 178). The pre-existing ``test_swallows_async_close_exception`` covered this invariant for the no-loop branch only. New class ``TestInLoopCleanupThreadContract`` adds two tests: - ``test_cleanup_thread_is_daemon_so_shutdown_is_not_blocked`` — patches ``threading.Thread`` with a subclass that captures the constructor kwargs; verifies ``daemon=True`` and a stable name prefix (``"ldr"``). - ``test_in_loop_close_marks_closed_even_when_inner_aclose_raises`` — invokes ``_close_base_llm`` inside ``asyncio.run`` with an ``aclose`` that raises; asserts ``_ldr_closed`` is set anyway. Mutation-checked: - Flipping ``daemon=True`` to ``daemon=False`` → the daemon test fails. - Removing the ``async_httpx._ldr_closed = True`` line from the in-loop completion path (llm_utils.py:178) → 3 tests fail: both new cases AND the existing ``test_closes_async_inside_running_loop_via_thread`` / ``test_in_loop_close_is_idempotent``. The fact that the existing in-loop idempotence test already covered the happy-path mark is reassuring; my new test covers the exception-path mark. 0 production changes. 24 close-base-llm tests pass (was 22). * test(llm_utils): replace line-number refs with symbol-based ones AI reviewer flagged that the docstrings on the new tests in PR #4078 cite specific line numbers in ``llm_utils.py`` (e.g. ``llm_utils.py:154-159``, ``:140-143``, ``:173-178``) which will become stale on any refactor of the target module. Replace with stable symbol / branch-name references: - ``llm_utils.py:154-159`` (Thread construction site) → "the ``else: # A loop is running in this thread`` block that spawns a ``ldr-async-llm-close`` thread" - ``llm_utils.py:140-143`` (docstring warning) → "the docstring of ``_close_base_llm`` ... when motivating the brief daemon thread" - ``llm_utils.py:146-152`` (try/except around asyncio.run) → "the cleanup thread's ``_close_in_thread`` runs ``asyncio.run(aclose())`` inside a ``try/except Exception``" - ``llm_utils.py:178`` (the sentinel-set line) → "the ``else`` branch that sets ``_ldr_closed = True``" No behavior change; both tests still pass and still pin the same contracts. Follow-up to a recommendation in the AI Code Reviewer comment on PR #4078.v1.6.11 |
||
|
|
8b98dfc237 |
test(ui): chunk mobile-nav overlap DOM walk; drop WebKit skip (#4060) (#4076)
* test(ui): chunk mobile-nav overlap DOM walk; drop WebKit skip (#4060) The mobile-nav overlap assertion in all-pages-mobile.spec.js previously ran a single ~60-line page.evaluate that walked every interactive element on the page. On Mobile Safari this occasionally raced WebKit's context-close ("Target page, context or browser has been closed"), so the test was wrapped in a WebKit-only test.skip fallback (#4060). Split the work so no single evaluate runs long: 1. Tiny evaluate fetches the nav rect. 2. Tiny evaluate fetches the interactive-element count. 3. Loop evaluates batches of 50 elements, short-circuiting once we have enough overlap hits to report. Each evaluate is now well under the threshold that triggered the WebKit race, so the WebKit-only skip and the dual error-message catch are removed. If a real overlap regresses, WebKit fails loudly alongside Chromium/Firefox — which was the goal of the issue. * test(ui): extract findElementsBehindMobileNav helper + per-batch cap Review followup for #4076: - Hoist the chunked overlap walk into findElementsBehindMobileNav so the test body reads as intent ("find overlaps, assert none") instead of evaluate plumbing. - Pass the remaining maxReported budget into each batch and break the inner loop once it's hit, so a batch with many overlap candidates doesn't serialize hits we'd discard anyway. Skipped from the same review: snapshotting the NodeList via evaluateHandle and re-deriving the nav rect per batch. Both target theoretical issues (drift, staleness) on pages that are static at the assertion point, and the absolute perf cost of the current shape is microseconds — not worth the API complexity until a real symptom appears. |
||
|
|
1c33f1dc07 |
fix(ui-tests): match create/new/add buttons with word boundaries (#4069)
* fix(ui-tests): match create/new/add buttons with word boundaries
Selector helpers in several UI tests called `text.includes('new')`,
which matches the substring "new" inside "News". On
/news/subscriptions, the first hit was the `<a class="btn">Back to News
Feed</a>` link instead of the `#create-subscription-btn`, so
`SubscriptionCrudTests.createSubscriptionFormOpens` clicked the wrong
control and failed because no form opened.
Switch the matchers in the affected helpers to
`\b(?:create|new|add)\b` (plus `subscribe` where it was already in the
list). Word boundaries keep real targets like "Create Subscription",
"New Folder", and "Add Subscription" while skipping "News Feed".
* refactor(ui-tests): extract findActionButton helper
Code-review follow-up. The buttons.find(...) + word-boundary regex
block was duplicated in 9 call sites across 4 files, which is the same
copy-paste that let the original "new" → "News" bug hide in multiple
places.
Extract a single helper into test_lib/test_utils.js:
findActionButton(page, { selectors, keywords, click })
Defaults to `selectors='button, a.btn, .btn'` and
`keywords=['create','new','add']`, returns `{ found, text }`.
Drops the inconsistent extra `subscribe` keyword from the subscription
CRUD test — verified on the current /news/subscriptions page that no
button is labeled "Subscribe"; the primary control is "Create
Subscription", which is matched by the default keyword list. This
collapses the subscription tests to the same keyword set as the rest.
Net change: 60 insertions / 127 deletions. Reran the 4 affected
shards (mobile, library, history-news, api-crud) end-to-end at 100%,
and confirmed the message now reports the correct button text
(e.g. "Create Collection") rather than the previous false-positive
match.
|
||
|
|
6e37c248e4 |
test(error_handling): pin load-bearing branches in openai_compat_errors (#4074)
* test(error_handling): pin load-bearing branches in openai_compat_errors The existing test file covers all seven dispatch tokens and the four main helpers (319 LOC), but two load-bearing implementation choices were only documented in comments — not asserted. Adds seven tests that catch the most likely regressions. Pinned behaviors: - ``TestDispatchOrderingTimeoutBeforeConnection`` — ``APITimeoutError`` is checked BEFORE ``APIConnectionError`` at openai_compat_errors.py:87. This matters because ``openai.APITimeoutError`` subclasses ``APIConnectionError`` in openai>=1.x, so reordering the two branches would mislabel every timeout as ``openai_connection_refused``. The comment at lines 85-86 documents this; the new test pins it. A ``issubclass`` sanity check on the openai class hierarchy means the test fails first (with a clear message) if openai ever reorganises these classes, instead of just silently producing the wrong token. - ``TestWalkCauseChainPreference::test_cause_preferred_over_context_when_both_set`` — at openai_compat_errors.py:60, ``_walk_cause`` does ``cur.__cause__ or cur.__context__`` so explicit ``raise X from Y`` chains take priority over implicit ``__context__`` chains. The test constructs a wrapper with both set and asserts the deepest reached is the ``__cause__`` root. Edge cases: - ``TestStripCredentialsEdgeCases::test_ipv6_host_brackets_preserved`` — ``urlparse`` returns ``hostname`` without brackets; the implementation reassembles ``netloc`` from hostname + port. The test verifies an IPv6 URL still has its host marker (brackets or bare ``::1``) after redaction. - ``test_userinfo_stripped_with_ipv6_host`` — combined userinfo + IPv6 host; the userinfo must be removed regardless of host format. - ``test_url_with_no_netloc_passed_through`` — bare paths hit the ``if not parsed.netloc:`` short-circuit and are returned as-is. - ``TestFriendlyErrorNoneArgs`` — ``friendly_openai_compatible_error`` uses ``provider or "<unknown provider>"`` and ``model or "<unspecified>"`` to keep the surfaced message legible when the caller doesn't know the values. Two tests pin both placeholders. Mutation-checked during development: - Swapping the timeout / connection-refused branches → both timeout tests fail. - Changing ``cur.__cause__ or cur.__context__`` to ``cur.__context__ or cur.__cause__`` → the cause-preference test fails. No production code changes. 34 tests pass (was 27). * fix(error_handling): preserve IPv6 brackets in _strip_credentials The AI Code Reviewer on this PR (#4074) flagged that the ``test_ipv6_host_brackets_preserved`` assertion was too loose: assert "[::1]" in result or "::1" in result When the implementation strips brackets, the result is ``http://::1:8080/v1`` — which still contains the substring ``::1``, so the test passes despite producing an invalid URL. Tightening to ``assert result == "http://[::1]:8080/v1"`` surfaced the underlying bug: ``_strip_credentials`` was indeed losing the brackets. Root cause: ``urllib.parse.urlparse`` exposes ``hostname`` without the surrounding brackets that mark IPv6 hosts. The previous ``netloc`` reassembly used the bracketless hostname directly, so the rebuilt URL became ``http://::1:8080/v1`` — ambiguous about where the host ends and the port begins, and rejected by downstream HTTP libraries. Fix: when reassembling ``netloc``, re-add brackets around any host that contains ``:`` (i.e. IPv6). IPv4 hosts never contain ``:`` so this heuristic is safe. Also tightens both new IPv6 tests to assert the full expected URL rather than a loose substring match. Mutation-checked: reverting the bracket re-add flips both ``TestStripCredentialsEdgeCases::test_ipv6_host_brackets_preserved`` and ``test_userinfo_stripped_with_ipv6_host`` to failure. |
||
|
|
d346a8fe2d |
test(scheduler): credential lifecycle coverage and weak-test cleanup (#4065)
* test(quality): strengthen weak scheduler tests
Four existing scheduler tests asserted on mock call_counts or swallowed
all exceptions without asserting anything, so they passed even when the
underlying production code was broken. Two frozen-dataclass tests used
``try/except AttributeError: pass`` blocks that silently pass if NO
exception is raised — the opposite of the intent.
Rewrites (no production code changes):
- ``test_scheduler_extended.py::test_logs_processing_start`` — previously
mocked the logger inside a ``try/except Exception: pass`` and ended
with the comment ``# Should have logged something`` and no
``assert``. The new version exercises the "no session info" early
return path and asserts on the exact entry-banner log line
(background.py:688) via ``mock_logger.info.assert_any_call(...)``.
- ``test_scheduler_extended.py::test_queries_overdue_subscriptions``
→ renamed to ``test_returns_early_when_credentials_missing``.
The previous version called the method with a user who had no
credentials, wrapped the call in a bare ``try/except Exception: pass``,
and asserted nothing. The new version patches
``get_user_db_session`` and asserts it is NOT called — proving the
credential-missing guard short-circuits before any DB work.
- ``test_scheduler_extended.py::test_handles_scheduler_exception`` +
``test_handles_job_lookup_error_on_remove`` — both replaced by
``test_unregister_swallows_job_lookup_error``. The originals tested
the mocks themselves (``mock.remove_job.side_effect = JobLookupError;
try: mock.remove_job(...); except JobLookupError: pass``) rather
than the scheduler. The replacement exercises ``unregister_user``
with two stale scheduled jobs, asserts both ``remove_job`` calls
were attempted, and asserts the user is fully cleaned up
(sessions + credentials) — pinning the JobLookupError swallow at
background.py:463-464.
- ``test_scheduler_extended.py::test_X_check_subscription`` paths —
the two tests that wrapped ``_check_subscription`` in
``try/except Exception: pass`` and asserted nothing now patch
``get_user_db_session`` and assert it is NOT called when the
user is missing from ``user_sessions``.
- ``test_scheduler_document_behavior.py::test_cannot_modify_enabled``
and ``test_cannot_modify_interval`` — replaced
``try/except AttributeError: pass`` blocks with
``pytest.raises((AttributeError, FrozenInstanceError))``. The tuple
is forward-compatible: ``FrozenInstanceError`` subclasses
``AttributeError`` and Python's behavior here has shifted between
versions. Using ``pytest.raises`` ensures the test fails if NO
exception is raised.
- ``test_scheduler_extended.py::test_is_frozen`` — same fix
(``try/except AttributeError: pass`` → ``pytest.raises``).
All 604 scheduler tests pass after these rewrites.
* test(scheduler): add credential lifecycle coverage
The scheduler at src/local_deep_research/scheduler/background.py
(1808 LOC) has ~600 tests in tests/news/test_scheduler_*.py, but
credential-lifecycle scenarios that are most fragile (per the
project memory file project_user_db_encryption_blocks_background_jobs.md)
were not covered. Adds eight test methods pinning these branches.
Each test documents the production line(s) it pins and the mutation
that would flip it. Mutation-checked during development:
- Removing ``self._credential_store.clear(username)`` from
``unregister_user`` (background.py:468) → fails
``test_unregister_user_clears_credential``.
- Removing the ``set_search_context({...})`` call at
background.py:837-844 → fails
``test_search_context_set_before_processing_each_research``.
- Removing the ``set_setting("document_scheduler.last_run", ...)``
call at background.py:1082-1084 → fails
``test_last_run_not_advanced_when_db_open_fails`` via its
happy-path contrast assertion.
Coverage added:
- ``TestCredentialExpiryAndIsolation``
- ``test_credential_expiry_between_two_retrieves_in_same_job`` —
a long-running job that retrieves credentials twice spanning the
TTL boundary sees ``pw → None``. Pins
credential_store_base.py:73-75 (lazy-delete on expired retrieve)
via SchedulerCredentialStore at background.py:50-53. The base
class TTL tests at tests/database/test_credential_store_ttl.py
cover single-retrieve boundary; this covers multi-call.
- ``test_unregister_user_clears_credential`` — pins
background.py:454-468 plus credential_store_base.py:98-107.
A snapshot caller already holds the password as a Python local
so it survives the clear; the next retrieve sees nothing.
- ``test_cross_user_credential_isolation`` — parametrized across
alice/bob/charlie. Pins the username-keyed dispatch.
- ``test_clear_is_idempotent_and_safe_on_unknown_user`` — pins the
``if key in self._store`` guard in
credential_store_base.py:106. Removing the guard would make the
second clear and the ghost clear raise ``KeyError``.
- ``TestTtlWrapperBehavior``
- ``test_ttl_boundary_store_expire_store_cycle`` — full
store → expire → store → expire cycle through the
SchedulerCredentialStore wrapper. Pins the
``ttl_hours * 3600`` conversion at background.py:42 and the
``expires_at`` recomputation on each store at
credential_store_base.py:47.
- ``test_ttl_hours_zero_expires_at_next_clock_tick`` — pins the
absence of validation in the constructor and the strict ``>``
in credential_store_base.py:73. Contract test: anyone adding
``if ttl_hours <= 0: raise ValueError`` must update this test.
- ``TestDocSchedulerCredentialLifecycle``
- ``test_last_run_not_advanced_when_db_open_fails`` — verifies the
intentional design from PR #3288 / commit
|
||
|
|
02e197da86 |
fix(security): redact Google API key from list_models error log (#4070)
* fix(security): redact Google API key from list_models error log The Google Gemini provider's ``list_models_for_api`` (at src/local_deep_research/llm/providers/implementations/google.py:56) constructs the request URL with the API key as a ``?key=...`` query parameter, per Google's documented API (https://ai.google.dev/api/rest). When ``safe_get(url, ...)`` raised — for any reason: connection error, timeout, 401, etc. — the underlying ``requests``/``urllib3`` exception message included the full URL, *with* the key. The except handler then called ``logger.exception(...)``, which writes the traceback (including the exception's ``__str__``) to every loguru sink: stderr, the database log sink, and the frontend progress sink. Reproduced under the project's production loguru config (``diagnose=False, backtrace=False``): the line ``requests.exceptions.ConnectionError: ...key=sk-LEAKED-VALUE-99999`` appeared in the captured log output. Fix: catch the exception explicitly, replace the key value in the message with ``***REDACTED***``, and log via ``logger.warning`` so the exception chain is not attached. Bundled regression tests in tests/security/test_api_key_leakage.py: - ``test_no_leak_when_safe_get_raises_with_url_in_message`` — the primary repro path. Patches ``safe_get`` to raise a ``ConnectionError`` whose message embeds the key, then asserts the sentinel is absent from ``loguru_caplog.text``. - ``test_no_leak_when_safe_get_raises_generic_runtime_error`` — same redaction also runs on non-requests exceptions whose ``str()`` contains the key. - ``test_non_200_response_does_not_leak_key`` — pins the existing status-code-only warning at lines 88-90 (which already doesn't include the URL). - ``test_repr_does_not_expose_stored_passwords`` / ``test_clear_entry_on_missing_does_not_leak_state`` — defense in depth on the credential store. - ``test_friendly_error_strips_credentials_from_base_url`` — pins the existing ``_strip_credentials`` userinfo redaction in ``error_handling/openai_compat_errors.py`` so a future change that removed it would be caught. Mutation-checked: restoring the old ``except Exception: logger.exception(...)`` flips the two Google leak tests to failure. * security: extract redact_secrets() utility from inline replace The previous commit fixed the Google API-key leak with an inline ``msg.replace(api_key, "***REDACTED***")`` in google.py. That is a one-off — every other provider, route handler, or error path that needs to scrub a known secret value would have to repeat the same pattern. Extract a single utility into ``security/log_sanitizer.py`` next to the existing ``sanitize_for_log`` / ``strip_control_chars`` helpers: def redact_secrets( message: str, *secrets: Optional[str], min_length: int = 8, token: str = "***REDACTED***", ) -> str Variadic; skips falsy and sub-min-length values to avoid corrupting normal message content; exposes ``min_length`` and ``token`` for callers who need to override. google.py now uses it instead of the inline replace. Unit tests in ``tests/security/test_log_sanitizer.py``: - Happy path: single secret, multiple secrets, all-occurrences. - Guards: ``None`` ignored, empty string ignored, sub-min-length ignored, custom min_length override. - Boundaries: no secrets, empty message, message without any secret. - Custom token override. - Realistic provider key shapes (OpenAI, Google, Anthropic). - Literal-substring-match contract (URL-encoded forms are NOT redacted unless the caller passes them). google.py refactor captures the redacted message in a local before the ``logger.warning`` call so the ``check-sensitive-logging`` pre-commit hook (which AST-checks for exception-variable references in non-exception log calls) does not flag the line. The hook's recommended ``logger.exception`` would defeat the entire point of the fix. The existing six leakage tests in ``tests/security/test_api_key_leakage.py`` remain unchanged — they assert the leakage contract, not the implementation, so the refactor flows underneath them. * review: lift redact_secrets to module-level + tighten silence test Two small follow-ups to the AI reviewer's points: 1. google.py: move ``from ....security.log_sanitizer import redact_secrets`` out of the except handler to module-level. The nested import has no circular-import or lazy-load justification here (ollama.py already imports ``from ....security import safe_get`` at module level), and lifting it eliminates the theoretical case where an ImportError raised while handling the provider exception would carry the leaked-URL ConnectionError up via ``__context__``. Also rewrites the inline comment so the two rationales (redact + drop exc_info; capture in a local for the check-sensitive-logging pre-commit hook) are no longer broken up by the import statement. 2. test_clear_entry_on_missing_does_not_leak_state was passing trivially because ``CredentialStoreBase.clear_entry`` is silent on every code path — the old assertion ``_LEAKED_KEY not in loguru_caplog.text`` would have held even if the test never exercised the method. Renamed to test_clear_entry_does_not_log_ store_state and replaced with ``assert not loguru_caplog.records`` so the contract being pinned is silence itself: a future ``logger.debug(f"store contents: {self._store}")`` regression would be caught immediately. Now exercises both the missing-key and present-key paths and seeds a second credential so a _store-dict dump would also leak it. Mutation-checked: monkey-patching clear_entry to add a debug log containing self._store flips the new test to failed; the live implementation still passes. All 6 tests in tests/security/test_api_key_leakage.py pass against the real code. |
||
|
|
0fe3c8c5de |
chore(security): suppress CVE-2026-8328 (ftplib.ftpcp SSRF) until 3.14.6 (#4072)
Grype alerts on CVE-2026-8328 against python:3.14.5-slim. The vulnerability is an SSRF in the undocumented ftplib.ftpcp() helper — the same PASV-trust class as CVE-2021-4189, whose original 2021 fix only patched ftplib.FTP and left ftpcp() unprotected. Upstream merged the fix to the CPython 3.14 branch on 2026-05-13 (python/cpython#149793), three days after Python 3.14.5 was tagged. No 3.14.6 release exists yet, so a base-image bump isn't an option. Not exploitable here: `grep -rn "ftplib\|ftpcp" src/` returns zero hits, and no transitive dependency imports ftplib either, so ftpcp() is unreachable from this image. Added to .grype.yaml in the existing python3.14 block alongside the other CPython CVEs awaiting the next 3.14.x point release. The suppression auto-cleans when the next Python bump picks up 3.14.6+. |
||
|
|
da0d18ed25 |
fix(release): set towncrier name to skip package import (#4071)
The release job uses a sparse checkout that omits src/ and runs a standalone `pip install towncrier`. Towncrier 24.8 still calls `get_project_name()` even when --version is passed on the CLI, and the existing [tool.towncrier] config pointed at the `local_deep_research` package, so the build crashed with ModuleNotFoundError before rendering any fragments. Set `name = "local-deep-research"` so towncrier short-circuits the import path (build.py:195-197). Drop the now-misleading `package`/`package_dir` fields — `--version` is always passed, `directory = "changelog.d"` is explicit, and nothing else inside towncrier still needs them. Fix the workflow comment that misattributed the bypass to --version. Verified by rendering changelog.d/*.md fragments against this pyproject.toml in a fresh directory with no src/ present. |
||
|
|
b0008045df |
fix(security): extend IMDS absolute-block to Apprise plugin schemes (#4063)
NotificationURLValidator only ran the cloud-metadata IP guard in the http/https branch, so URLs like signal://169.254.169.254/+1/+1 (and the same for gotify, ntfy, mattermost, rocketchat, matrix, json, xml, form, mailto) reached Apprise — which then POSTs against that host under HTTP. Behind the operator-only LDR_NOTIFICATIONS_ALLOW_OUTBOUND gate, but a residual gap inconsistent with the absolute-block invariant SECURITY.md documents. Refactored host extraction out of the http/https branch and added an IMDS-only check for plugin schemes (allow_private_ips=True semantics in _is_private_ip leaves only ALWAYS_BLOCKED_METADATA_IPS and NAT64-wrapped metadata active). LAN/loopback reach for self-hosted plugin endpoints (the #4006 use case) is unchanged. Test coverage: - 100 parametrized cases: 10 plugin schemes x 5 metadata IPs x 2 allow_private_ips values - mailto://user@IMDS/recipient regression - positive: signal/gotify LAN + signal localhost still allowed - positive: token-host schemes (discord/slack/telegram/pushover/teams) unaffected - DNS-resolved hostname pointing at IMDS rejected (single-resolve attacker; full rebinding TOCTOU remains documented residual risk) |
||
|
|
6f18a711d2 |
docs(resource-cleanup): expand Wave 7 with full audit ledger (#4054)
* docs(resource-cleanup): expand Wave 7 with full audit ledger Replaces the brief "follow-up gaps" bullet list with the full ledger of what the broader audit during #4047 actually examined, split into four scannable subsections: - Checked and confirmed clean: non-Ollama LLM providers, HTTP session lifecycle, subprocess/pidfd, asyncio loops, file handles, SocketIO connect/disconnect. - Flagged then verified NOT a real FD leak: OllamaEmbeddings (uses the deprecated langchain_community class with no httpx client), auth_db + journal_quality engines escaping shutdown_databases (bounded pools, not growing), LibraryRAGService in three RAG SSE endpoints (RAM churn, no FDs — FAISS uses pickle.load, embeddings hold no FDs per the item above, SentenceTransformer mmaps are process-wide singletons). - Minor findings: daemon threads without explicit shutdown, abandoned-research cleanup on socket disconnect — both reaped at process exit, not steady-state leaks. - Future-proofing note: ``langchain_community.embeddings.OllamaEmbeddings`` is deprecated; the replacement ``langchain_ollama.OllamaEmbeddings`` DOES carry ``_client`` and ``_async_client`` (verified by direct introspection), so when LDR migrates the in-running-loop eventpoll leak class will reappear for embeddings unless ``_close_base_llm`` is generalized. Direct introspection done at audit time confirms each verdict: ``[a for a in dir(e) if 'client' in a.lower()]`` returned ``[]`` for the deprecated class and a non-empty list for the new class. This ledger saves the next contributor from re-running the same agent sweep when investigating a future FD spike. No code changes. * docs(resource-cleanup): add Round-8 pidfd finding (fixed by #3971) The Wave 7 ledger covered the eventpoll-FD investigation but didn't mention the residual pidfd accumulation we discovered post-merge. A follow-up Round-8 investigation (8 parallel agents, 2 rounds + direct /proc inspection on a live prerelease container) traced ~3.6 pidfds/hour, steady-state ~29, to: _check_subscription → quick_summary → FullSearchResults.batch_fetch_and_extract → AutoHTMLDownloader fallback → PlaywrightHTMLDownloader._fetch_with_playwright → sync_playwright().start() → asyncio.create_subprocess_exec(node-driver) # opens pidfd → driver fails (Chromium not installed in production ldr stage) → pidfd not closed on the failed-child exit CPython 3.14 ruled out as a confounder: subprocess.py uses waitpid(WNOHANG) polling, never opens pidfds. Only asyncio.create_subprocess_* and multiprocessing.Process can open them on Linux + Python 3.9+ via PidfdChildWatcher. PR #3971 (already merged) addresses this from a different angle: it makes web.enable_javascript_rendering default false, so AutoHTMLDownloader short-circuits before invoking Playwright. No subprocess spawned → no pidfd opened. Original motivation for #3971 was the confusing tracebacks reported in #3826; the FD-leak finding is the second motivation, captured here so a future reader sees both. The new bullet sits in Section B (flagged-then-verified-then-fixed) because the leak was real but is now resolved upstream. * docs(resource-cleanup): add FD-leak debugging playbook + CI considerations Add a new "Debugging FD leaks — playbook for the next one" section between the History (Waves 1-7) and "Intentionally not done" parts of the doc, capturing the diagnostic flow we developed across Waves 6 and 7 so future contributors don't re-derive it from scratch. Includes: - Symptoms that justify treating an issue as an FD leak (OSError 24, static-asset MIME errors, High FD count warnings, healthcheck hangs). - Host-side and inside-container snapshot scripts that work even when the container is too FD-starved for docker exec (host-side via sudo + /proc/$P/fd) and through the entrypoint's UID drop (--user 0 to docker exec). - Lookup table mapping each anon_inode / socket / pipe / REG flavor to its likely Python-level source and the path to deep-dive (e.g. /proc/PID/fdinfo/N's Pid: line for pidfds). - A pinpointing recipe per FD type — eventpoll (asyncio/httpx), pidfd (asyncio.create_subprocess / multiprocessing.Process), WAL/SHM (SQLCipher engine.dispose). - Pointer to the existing in-codebase instrumentation: _count_open_fds, the periodic Resource monitor log, fd_monitor.py, and the RUN_MANUAL_SMOKE-gated tests/manual_smoke/test_fd_smoke.py harness. - Honest discussion of why an automated per-PR FD-growth assertion is hard (transient FDs, CI-environment subprocess noise, namespace differences, slow-drip leaks needing hours of uptime) and what a nightly long-run job would look like if the team chooses to invest in one. - A "which Wave fixed which leak class" reference table so the next reporter can recognize a class and skip to the relevant precedent. No code changes. Pure documentation. * docs(resource-cleanup): add development-time detection + bpftrace recipes Extend the FD-leak debugging playbook with two industry-standard techniques that would have caught Waves 6 and 7 earlier, drawn from upstream Python docs and the wider production-tracing literature: 1. **bpftrace syscall-level pinpointing** (in the per-FD-type section). Trace pidfd_open / epoll_create1 / etc. on the host targeting the container's host PID; produces a histogram of every user stack that triggered the syscall, ranked by frequency. The hot stacks are the culprits. Would have caught the Playwright pidfd leak in seconds. 2. **Development-time detection (new subsection 4a)** — catches leaks at test time before they ship: - PYTHONASYNCIODEBUG=1 + -W default::ResourceWarning. Per the asyncio dev docs, unclosed transports emit ResourceWarning at GC time; the filter actually displays them. Would have surfaced the Wave 7 in-running-loop skip in any test that exercised ainvoke + safe_close on ChatOllama. - python -X dev for a one-flag local dev mode bundling ResourceWarning + asyncio debug + warnings as default. - pyproject.toml [tool.pytest.ini_options] examples for both "display" and "error" filter modes (with a caveat that error mode needs a targeted subset, not the whole suite, because third-party libs also emit ResourceWarning). - psutil's num_fds / open_files / connections as the cross-platform alternative to /proc/self/fd for unit tests on macOS dev environments. - tracemalloc + objgraph as the next-level tool when a leak is reproducible — diff allocations before/after, then render the reference chain holding the leaked wrapper alive. No code changes. The new tooling is recommendations only; no mandatory pytest config change in this commit. Future work could enable PYTHONASYNCIODEBUG=1 in the CI test environment if the overhead is acceptable. Citations to docs.python.org are inline for the load-bearing ResourceWarning claim. * test(fd-canary): pin asyncio.create_subprocess pidfd lifecycle in CI Add ``TestAsyncioSubprocessFDBaseline`` to ``tests/utilities/test_close_base_llm.py`` with two regression tests that run on every PR: 1. ``test_no_fd_growth_across_asyncio_subprocess_cycles`` — spawns ``/bin/true`` via ``asyncio.create_subprocess_exec`` 10 times and asserts total FD count delta ≤ +2. Pins the pidfd FD class against the child-watcher leak shape. 2. ``test_no_fd_growth_when_subprocess_fails_to_exec`` — same shape but with a deliberately-missing binary, mirroring the *exact* Wave-7 production failure mode (Playwright's Node.js driver being spawned, kernel returning ENOENT because Chromium wasn't installed, child watcher still expected to clean up the pidfd it opened *before* the failed exec). Why this is the right level --------------------------- LDR's own code does NOT call ``asyncio.create_subprocess_*`` (verified in R8C1). The production leak came from a transitive dependency (Playwright). So we cannot test LDR's call sites directly — there are none. Instead these tests pin the *platform baseline*: on this Python version, repeated asyncio subprocess cycles must not leak FDs. If a future Python upgrade, a child-watcher change, or a new direct asyncio.create_subprocess call in LDR breaks the close semantics, the next PR's CI fails on these tests — which is the canary signal we want. Linux-only via ``sys.platform != "linux"`` skip. pidfd_open is a Linux syscall; macOS uses a different watcher and Windows uses ProactorEventLoop. Both 'pass by virtue of nothing to leak', so restricting to Linux keeps the signal sharp (a failure on Linux is actionable; a pass on macOS is uninformative). Same +2 FD slack we use for the eventpoll canary above. A real 1-FD-per-iter leak across 10 iterations would land at delta=10, well past the threshold. Doc reference ------------- Updated ``docs/developing/resource-cleanup.md`` "Existing instrumentation" section to enumerate all four in-CI FD-growth canaries (two eventpoll, two pidfd) so future contributors see at a glance what's already guarded and where to extend coverage when a new leak class is found. |
||
|
|
15a3df4aff |
fix(content-fetcher): disable JS rendering by default (#3826) (#3971)
* fix(content-fetcher): disable JS rendering by default (#3826)
The default Docker production image intentionally ships without
Chromium (Dockerfile lines 286-287), so the AutoHTMLDownloader's
Crawl4AI/Playwright fallback can never succeed for the majority of
users -- it just spawns a fresh Chromium per fetch, fails, and logs
a confusing traceback. In the issue reporter's run, 11 such failed
fallbacks fired per research on api.github.com JSON URLs.
Add a user-facing setting `web.enable_javascript_rendering` (default
false). When disabled, AutoHTMLDownloader skips the JS fallback and
returns the static result. Power users running outside Docker who
have set up Chromium can flip the toggle in the UI.
The setting is plumbed through:
- AutoHTMLDownloader.__init__ -- new enable_js_rendering=True ctor
arg (preserves direct-caller behaviour); download() and
download_with_result() short-circuit the JS fallback when False.
- ContentFetcher -- new enable_js_rendering=False kwarg passed
through to the HTML/DOI downloaders.
- build_fetch_tool / _make_full_fetch_tool / _make_summary_fetch_tool
-- accept settings_snapshot, read the bool via
get_bool_setting_from_snapshot (so the toggle works on ToolNode
worker threads where threading.local does not propagate), pass
enable_js_rendering into ContentFetcher.
- LangGraphAgentStrategy -- forwards self.settings_snapshot to
build_fetch_tool at both top-level and sub-agent callsites.
- pipeline.fetch_and_extract / batch_fetch_and_extract -- new
enable_js_rendering=False kwarg passed through.
- FullSearchResults -- new settings_snapshot kwarg, reads the bool
and passes it to batch_fetch_and_extract from both call paths
(run() and _get_full_content()).
- BaseSearchEngine -- forwards self.settings_snapshot when
constructing FullSearchResults.
Existing direct callers (tests, internal lazy-init in
_get_playwright_downloader) keep the implicit-on contract via the
True ctor default; the disable-by-default decision happens at the
factory layer.
* fix(content-fetcher): tighten JS-rendering disable per review
Address two findings from the code review of
|
||
|
|
5d60f3d00e |
chore(labels): add 'code-ready' as a human-only signal label (#4068)
Introduces a new repository label, ``code-ready``, that communicates a human reviewer's judgement that a PR's code changes look technically ready — i.e. the implementation, tests, docs and review nits are all addressed — while CI and an approving codeowner review may still be outstanding. The label is meant to bridge the gap between "needs review" and "auto-merge": a maintainer can apply it after walking the diff to signal that the code side is good, even though merge is still blocked on CI runs finishing or an approver clicking the button. Critically, this label must be **applied manually only**, never by automation. The motivation is judgement, not heuristics — a workflow that flips it based on "all CI green" or "no unresolved comments" would dilute the signal and undermine the human-in-the-loop intent. The labels.yml entry is grouped under a new "Human-only signal labels" section with an explicit comment saying so, and the label description itself includes "Apply manually — never auto-applied" so the rule is visible everywhere the label surface. Verified before adding: * No existing workflow (``pr-triage.yml``, ``label-fixed-in-dev.yml``, ``advanced-search-reminder.yml``, ``sync-main-to-dev.yml``, ``danger-zone-alert.yml``, ``compose-published-smoke.yml``) applies ``code-ready``. Each workflow's ``addLabels(...)`` calls use a closed set of specific label names — no heuristic ever resolves to ``code-ready``. * No naming collision with existing labels (``code-ready`` is new; ``auto-merge``, ``needs-codeowner-review``, ``awaiting-codeowner`` are distinct concepts). * Label created live on GitHub via ``gh label create`` before this commit; this PR brings ``labels.yml`` into source-of-truth sync. Color: ``006b75`` (teal) — distinct from the existing yellow/green review-state palette so it reads as a separate axis from the codeowner-review lifecycle. |
||
|
|
2723331f67 |
chore(ci): cut workflow-status.md regen diff noise (#4066)
The auto-regenerated workflow-status.md on every version-bump PR produced ~15 rows of churn that wasn't signal: - Status emoji column flipped between ✅ / · / ⏳ depending on which event last ran (e.g. backwards-compatibility flipped ✅→· because the most recent run was a skipped workflow_call, not because it regressed). The live badge column to its right is the source of truth for current status anyway, and run history lives in GitHub Actions itself. Drop the column. - Last activity buckets oscillated across this week / last week / 2 weeks ago for healthy daily/weekly workflows. Coarsen to last 30 days / 1-3 months ago / 3-6 months ago / long ago / never so a healthy workflow sits in one bucket indefinitely. Net effect: regenerations in steady state produce zero diff. Real signal (new stale/disabled workflows, aging past the 30d bucket) still surfaces. |
||
|
|
8597e429cc |
Improve UI tests + CI: artifact uploads, WebKit skip narrowing, settle-wait migrations (#4061)
* ci(responsive): restore artifact uploads and fix dead post-results gate The Responsive UI workflow lost its per-viewport artifact uploads (the explanatory comment around lines 206-209), so PR/release failures were un-debuggable - no screenshots, no test output. The downstream `post-results` job was also gated on `github.event_name == 'pull_request'`, which can never be true because the workflow has no `pull_request` trigger; the combined-report aggregator therefore never ran. Restore the upload step using `if: always()` + `if-no-files-found: ignore` (so server-startup failures still upload logs and quiet runs don't fail the step) and rewrite the `post-results` gate to `if: always()`. Artifact name matches the existing `ui-test-results-*` pattern expected by the combined-report glob. * test(playwright): narrow WebKit closed-context skip to webkit only (#4060) The catch at all-pages-mobile.spec.js:372 was previously calling `test.skip(true, ...)`, which skipped the test for every browser - so any non-WebKit error path also silently bailed out of the mobile-nav overlap assertion. Only Mobile Safari / WebKit is known to hit the `Target page, context or browser has been closed` race, so gate the skip on `browserName === 'webkit'`. Other browsers now re-throw and surface the regression. Also broaden the matched error message to include `Execution context was destroyed`, the alternate wording the same upstream race uses in newer Playwright versions. Skip annotation references issue #4060 so the skip is grep-able and can be removed when the underlying race is fixed or the DOM walk is restructured. * test(ui): add waitForStable helper to auth_helper.js Replaces ad-hoc `await delay(N)` sleeps used to "let the UI settle" after an action. The helper waits for a selector to be visible, then waits for its bounding box to stop changing across requestAnimationFrame ticks (bounded to 3s in-page). The final `idleMs` pause is configurable. JSDoc explicitly notes when NOT to use it: don't replace `delay()` calls that exercise wall-clock behavior (e.g. a 10s timer the app is supposed to respect). Those tests need real elapsed time, not a settle wait. Exported as a sibling of `safeClick` to keep Puppeteer test imports tidy. * test(ui): replace settle-delays with state-based waits in two puppeteer tests `test_research_cancellation.js` had 7 hardcoded `await delay(...)` calls and `test_form_validation_aria_ci.js` had 19. The vast majority were "give the UI a moment to settle" pauses with no real signal attached, so they slowed CI and quietly hid races whenever the runner was a beat slower than the chosen delay. For each call: - post-`navigateTo` 500ms sleeps -> `waitForSelector('#query', { visible: true })` - post-validation-trigger sleeps -> `waitForFunction` polling the `ldr-field-invalid` class to appear (or clear, when the test expects validation to pass) - post-focus 100ms -> `waitForFunction(() => document.activeElement?.id === 'query')` - post-cancel-click sleeps -> `waitForFunction` polling for `cancel|stop|suspend` to appear in the status text - post-typing 200ms -> `waitForFunction` polling for the typed value to land The one delay we kept: the explicit 10-second wait in the mid-stage cancellation test (`test_research_cancellation.js`), which deliberately exercises elapsed-time behavior of the research progress flow. That is not a settle wait and must stay wall-clock. Polling waits all use `.catch(() => {})` to preserve existing behavior when a selector or state never appears (the assertions further down handle the failure case more informatively than a hung wait would). * docs(pr-template): document label-gated CI workflows Several heavy E2E workflows are label-gated and silently no-op on PRs without the right label - new contributors had no way to know. Add a "CI test coverage" section to the PR template enumerating each gated workflow and the label that triggers it. No CI behavior change; documentation only. * test(form-validation): make waitForQueryReady detect validator attachment Local smoke-test (9 tests, ran against `scripts/dev/restart_server.sh`) exposed two latent races that the prior `await delay(500)` had been quietly hiding: 1. `waitForQueryReady` returned as soon as `#query` was visible, but the FormValidator class is registered against the field a tick later (research.js setupEventListeners). Waiting for the `.ldr-field-error` sibling that addValidation() inserts is the actual signal that the validator is wired and the submit handler will take the early-return path on an empty query. 2. `noLoadingUiOnEmptySubmit` ran after `errorClearsOnValidSubmit`, which typed a real query and triggered a real submit (the fetch fails but creates `.ldr-loading-overlay` first). `navigateTo` skipped the re-navigation because we were already on `/`, so the stale overlay carried over. Force a real `page.goto` for this test so it asserts about a fresh page, not the leftover state of the previous test. After the fix the suite passes 9/9 in ~1s (vs ~4.5s with the old delays). * chore(labels): rewrite test-trigger label descriptions for AI reviewer auto-apply The Friendly AI code reviewer (.github/workflows/ai-code-reviewer.yml) auto-applies labels based on the labels' descriptions in the repo. The existing test:puppeteer / test:e2e / ldr_research / ldr_research_static descriptions were passive ("Triggers Puppeteer E2E tests on this PR"), which doesn't guide the reviewer on *when* to apply them. Rewrite them in the same imperative, bias-toward-action style used by benchmark-needed ("Apply if a change risks degrading performance — when in doubt, add it. Run compare_configurations()"): - test:puppeteer + test:e2e — apply for any PR touching the web stack - ldr_research / ldr_research_static — apply for substantive code/arch changes, with the static variant biased even more toward "run it" since it uses the cheaper model Also add the test:* labels to labels.yml so they become version-controlled (previously they existed only on GitHub, created out-of-band). label-sync is additive and will overwrite the GitHub descriptions on next main push. |
||
|
|
ec91c5c716 |
fix(pdf): render CJK characters in exported PDFs (#4055) (#4058)
* fix(pdf): render CJK characters in exported PDFs (#4055) The PDF stylesheet hard-coded a Latin-only font stack, so WeasyPrint silently dropped Chinese/Japanese/Korean glyphs from downloads even when they rendered fine in the HTML view. Add Noto Sans CJK / Microsoft YaHei / SimSun fallbacks for both body and monospace families, and install fonts-noto-cjk in the Docker runtime stage so the slim base image actually has glyph coverage. Non-Docker installs still need a CJK font package on the host. * fix(pdf): broaden CJK font fallbacks + document host requirement Extend the PDF CSS font stack to cover macOS (PingFang, Hiragino, Apple SD Gothic Neo) and additional Windows families (Microsoft JhengHei, Yu Gothic, Malgun Gothic), so pip installs on those platforms render CJK without any user action. Document the per-distro CJK font install command in install-pip.md and add a new FAQ entry. Linux pip/server hosts still need fonts-noto-cjk installed manually — there is no in-code way to fix that without bundling ~20 MB of fonts into the wheel. * test(pdf): assert CJK glyph embedding end-to-end (#4055) Round-trip CJK text through markdown → PDF → pypdf extract_text so CI fails if fonts-noto-cjk is ever removed from the Docker runtime image. The pytest-tests job runs inside that image, so the test sees the installed fonts; bare hosts without CJK fonts skip the assertion via an fc-list gate. Does not catch CSS-fallback-stack regressions on its own: fontconfig auto-substitutes a CJK family on Linux even for a Latin-only stack. The CSS fallbacks still matter on Windows/macOS, which CI does not exercise — documented in the test docstring. |
||
|
|
41ee83c54c |
test(security): SSRF edge-case coverage and weak-test cleanup (#4062)
* test(quality): strengthen weak SSRF tests Several existing SSRF tests asserted on mock call_count / call_args or range-membership tautologies rather than the validator's real behavior. A regression in the underlying production code could pass these tests. Rewrites (no production code changes): - test_ssrf_redirect_bypass.py: replace ``test_each_hop_validated`` in both ``safe_get`` and ``safe_post`` variants. The previous version asserted only on ``mock_validate_url.call_count == 3``; the new version exercises the real validator with a third hop pointing at ``http://10.0.0.5/internal`` (a private IP literal) and asserts that ``ValueError`` is raised before the third request is fetched. - test_ssrf_redirect_bypass.py: replace ``test_send_respects_*`` for ``allow_localhost`` and ``allow_private_ips``. Previously these patched ``validate_url`` to return True and verified the kwargs; the rewrites use the real validator with IP literals so a regression in is_ip_blocked's flag handling would surface. Adds ``test_send_blocks_loopback_without_allow_localhost`` to prove the flag is actually gating behavior, not just being passed through. - test_ssrf_debug_hardening.py: rewrite three of four ``TestFullSearchSSRFValidation`` tests to drop the ``validate_url`` mock. Real validator blocks the metadata-IP literal (``169.254.169.254``) directly; the public hostname uses a DNS mock. - test_ssrf_validator_high_value.py: rewrite ``TestGetSafeUrl`` pass-through and unsafe-default tests to use the real validator (DNS mock for public host; literal RFC1918 IP for unsafe case). - test_ssrf_validator_behavior.py: replace ``TestBlockedIPRanges`` range-containment tautologies with ``TestPrivateIpRangesBehavior``, a single parametrized test that asserts ``is_ip_blocked`` returns True for an interior address of every entry in ``PRIVATE_IP_RANGES`` (18 cases, covering all 15 ranges plus their wraps). Removing any entry from ``ip_ranges.py`` is now detected by a specific failure. - test_ssrf_validator_extended.py: remove ``test_is_frozenset`` — a type-only check on ``ALWAYS_BLOCKED_METADATA_IPS``. The canonical exact-membership test already lives in ``test_ssrf_validator_high_value.py::TestConstants``. Each rewrite was mutation-checked: e.g. removing per-hop validation from ``safe_requests.py`` causes the redirect tests to fail with ``StopIteration`` (third hop attempted), and removing a range entry from ``ip_ranges.py`` flips the corresponding ``TestPrivateIpRangesBehavior`` case to failure with the range label in the assertion message. Net: 5 files modified, +130 lines / -68 lines, 565 SSRF tests pass. * test(security): add SSRF validator edge-case coverage Adds eight new test classes pinning previously-uncovered branches of ``src/local_deep_research/security/ssrf_validator.py`` and ``src/local_deep_research/security/ip_ranges.py``. Each class documents the production line(s) it exercises and the mutation it would catch. - TestUnspecifiedIPv4Blocked — ``validate_url`` end-to-end coverage for ``0.0.0.0/8`` (ip_ranges.py:24). Existing tests covered only ``is_ip_blocked``; this pins the full parser → IP-literal → block path. Parametrized across three interior addresses. - TestDnsResolutionNonGaierror — the generic ``except Exception`` handler at ssrf_validator.py:310-312 fires when ``getaddrinfo`` raises anything that is not a ``gaierror`` (PermissionError from a restricted environment, OSError, RuntimeError). Asserts the ``"Error during hostname resolution"`` log line and a False return. - TestRfcForbiddenControlChars — RFC_FORBIDDEN_URL_CHARS_RE (ssrf_validator.py:63) contains ``\\x00-\\x1f\\x7f``. Backslash and ``\\x00`` were already heavily tested; this parametrizes the run ends ``\\x01``, ``\\x1f``, and ``\\x7f`` (DEL). - TestAlternateIpHexForm — single-DWORD hex (``0x7f000001``) is not parseable by ``ipaddress.ip_address``, so the validator falls through to DNS via the ``except ValueError: pass`` at ssrf_validator.py:269-271. Mocked DNS returns the canonical ``127.0.0.1``, which the post-DNS check rejects. - TestPortEdgeCases — ``:65536`` exercises the urllib3 ``LocationParseError`` branch; ``:0`` parses but the host ``127.0.0.1`` is still IP-blocked. - TestMultipleAtSignsContract — locks in that urllib3's ``parse_url`` resolves ``http://user:pass@127.0.0.1@1.1.1.1/`` to host ``1.1.1.1`` per RFC 3986 last-``@`` rule, and that the validator agrees. If urllib3 ever changes this, the parser-differential defense at ssrf_validator.py:224-228 needs re-validation; this test surfaces the drift. - TestUserinfoContainsIpShape — documents that ``http://127.0.0.1@evil.com/`` is NOT a bypass: urllib3 reports host ``evil.com``, requests connects there, the ``127.0.0.1`` is userinfo only. Pins the urllib3 contract. - TestIpv6ZoneIdBlocked — ``[fe80::1%eth0]`` and the percent-encoded ``[fe80::1%25eth0]`` form. Python 3.9+ accepts zone IDs in ``ipaddress.ip_address`` so the validator catches this directly via ``fe80::/10`` in PRIVATE_IP_RANGES (ip_ranges.py:22) rather than via DNS gaierror. Also removes the previous ``TestDocumentation`` class which contained a single ``@pytest.mark.skip`` placeholder with no assertions; the security-model documentation lives in the module docstring. Mutation checks performed during development: - Remove ``0.0.0.0/8`` from PRIVATE_IP_RANGES → 4 tests fail (the 3 TestUnspecifiedIPv4Blocked cases plus the 0.0.0.5 case in the parametrized TestPrivateIpRangesBehavior added in the preceding commit). Restored. - Narrow RFC_FORBIDDEN_URL_CHARS_RE to drop ``\\x7f`` → 1 test fails (the ``\\x7f`` parametrize case). Restored. - Remove per-hop validation from ``safe_requests.py`` → the ``test_each_hop_validated`` rewrites from the preceding commit fail. Restored. Net: +16 new parametrized test cases across 8 classes; 565 SSRF tests pass; no production code changes. |
||
|
|
d88ba602ca |
test(e2e): regression for hidden context_window blocking Start Research (#3909) (#4059)
PR #4051 fixed the bug where the context_window number input had a step="512" HTML5 constraint while living in a display:none container. Any stored value not on the 512-grid (e.g. the reporter's 25000) failed validation; because the field cannot be focused while hidden, the browser silently aborted submit and the Start Research button appeared to do nothing. Add a Puppeteer test that pins the behavior so the constraint can't silently come back. The test: 1. Loads the research page (cloud-provider default keeps the context_window container hidden). 2. Sets #context_window.value = "25000" — the exact stored value from the reporter's bug. 3. Asserts the container is hidden (precondition for the regression). 4. Asserts research-form.checkValidity() returns true. No actual research is submitted — checkValidity() exercises the same HTML5 validation path that drives the silent-abort bug, without consuming LLM credits or interacting with the rest of the e2e flow. If step="512" (or any other constraint that 25000 violates) is ever re-added to the input, the test fails with a clear message pointing back to PR #4051. |
||
|
|
8de5d971d6 |
refactor(settings): use specialized exception classes for env settings (#3838)
Improved error observability and alignment with TRY003 standards by replacing generic ValueError with specialized exception classes. Updated type hints to use Path | str and Sequence as suggested in review. Co-authored-by: Daniel Petti <djpetti@gmail.com> |
||
|
|
47d370c45d |
fix(notifications): allow signal:// Apprise scheme (#4006) (#4056)
Signal notification URLs (signal://host:port/from/to) are rejected by NotificationURLValidator because `signal` is missing from ALLOWED_SCHEMES. The user-facing error is "Test failed: Invalid notification service URL", which is the Unsupported-protocol path at notification_validator.py:246. Apprise (the library LDR delegates to) ships a Signal notification plugin that targets a signal-api-rest container. For non-http schemes the validator intentionally skips the private-IP host check (notification_validator.py:270) and lets Apprise do its own URL parsing, so adding signal does not weaken the SSRF posture — the LAN-host pattern in the bug report (signal://192.168.50.20:8739/…) round-trips to Apprise unchanged. Adds two regression tests: - test_apprise_signal_url_accepted: end-to-end validate_service_url against a LAN-IP Signal URL. - TestClassConstants gets one extra assert that "signal" is in ALLOWED_SCHEMES, keeping the contract list aligned with the other Apprise schemes the file exercises. Closes #4006 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
e77b48c813 |
test(e2e): tolerate brief LLM output in research export test (#4053)
The 'should export and display research output' test contains an explicit narrative (test_deep_functionality.js:518-540) describing how the CI release pipeline's small free-tier LLM (Gemini 2.5 Flash Lite via OpenRouter) occasionally returns very brief, non-markdown output even when the research workflow completes end-to-end — and that this should be treated as a transient upstream content-quality flake, not a code regression. That branch logs a warning instead of failing. But the trailing assertion at the end of the same test still hard-checks 'expect(resultContent.length).to.be.greaterThan(100)', which directly contradicts the documented tolerance — an 89-char LLM response (real example from CI run #2385) makes the assertion fail despite the workflow mechanics having been validated. Drop the length assertion and keep only 'expect(resultContent).to.not.be .null', which still catches the real regression (results page didn't render) without flaking on upstream LLM brevity. |
||
|
|
35290b2d13 |
fix(research-form): relax context_window step so Start Research submits (#4051)
The context_window input has min=512 max=131072 step=512 and lives in a display:none container that is only revealed for local providers. Any stored value not aligned to the 512-step grid (e.g. 25000) fails HTML5 validation; because the field is not focusable while hidden, the browser silently aborts submission with no log line — the Start Research button appears to do nothing. Lower the step to 1 so any in-range integer is accepted. min/max still bound the value and the saved setting is unchanged. Fixes #3909 |
||
|
|
1ab65609db |
ci(release): drop credential persistence on cleanup-changelog checkout (#4050)
The `Checkout the release commit` step in the `cleanup-changelog` job defaulted to `persist-credentials: true`, leaving the job's GITHUB_TOKEN in `.git/config` for the duration of the run. If any later step in this job reads `.git/config` (artifact upload, third-party action that prints/dumps the repo state, etc.), the token leaks. Closes the only open `zizmor/artipacked` finding (code-scanning alert #4655). No functional impact: the only step that needs to push is `peter-evans/create-pull-request`, which already takes an explicit `token:` input and does not rely on the persisted git credential helper. Also dismissed code-scanning alert #7763 (CVE-2026-3298) via the GitHub API — that CVE is Windows-only per PSF advisory; this image is Linux, which Grype's package-version matcher does not account for. Alert #7764 (CVE-2026-7210) is left open as a tracking signal until Python 3.14.6 ships upstream (current latest is 3.14.5; no patched image exists yet). |
||
|
|
a2f7f6ead6 |
fix(ci): drop environment: ci from reusable workflow (#4049)
The `environment: ci` declaration on the research job has no functional
value for LDR — the `ci` Environment has zero protection rules and zero
environment-scoped secrets (verified via gh api). All required secrets
(OPENROUTER_API_KEY, SERPER_API_KEY) are repo-level.
The decorative env attachment becomes a problem for any external repo
that calls this reusable workflow: GitHub silently auto-creates an empty
`ci` Environment in the caller's repo, polluting their environments
namespace.
Dynamic environment via expression (e.g. `environment: ${{ inputs.env || '' }}`)
isn't a viable alternative — `actions/runner` Issue #2610 documents that
expression-in-environment doesn't reliably evaluate input context, and an
empty-string value still auto-creates an empty-named environment.
Simplest correct fix is to delete the line. LDR's own callers
(issue-research.yml, e2e-research-test.yml) keep working unchanged
because they never depended on env-attached functionality. External
callers no longer get the env-pollution side effect.
This unblocks a follow-up `ldr-automations` toolkit repo that will
expose meta-reusable workflows wrapping this one for other projects.
|
||
|
|
d6d9ceffac | chore: auto-bump version to 1.6.11 (#3961) | ||
|
|
3d0b7bb5f9 |
review: hoist asyncio+threading imports to module level + Wave 7 doc (#4048)
Addresses the AI Code Review nit on #4047: ``import threading`` (and the sibling ``import asyncio``) lived inside the ``_close_base_llm`` function body. There's no circular-import or optional-dependency reason to defer them; moving them to the top of the module improves readability and static analysis. Also extends ``docs/developing/resource-cleanup.md`` with a Wave 7 entry documenting: - The in-running-loop ``aclose`` skip bug (this PR's fix). - The healthcheck ``pidfd`` leak (Dockerfile change in the same PR). - The three gaps the broader audit during this PR surfaced as follow-up rather than in-scope work: ``OllamaEmbeddings`` httpx (same FD class as ChatOllama, no close path in langchain wrappers), ``auth_db`` / ``journal_quality`` engines escaping ``shutdown_databases``, and three RAG SSE endpoints constructing ``LibraryRAGService`` before the generator without a ``finally`` close. Also captures the negative results from the audit (non-Ollama providers safe via shared lru_cache, no subprocess pidfd risk, no raw event-loop creation, all ``open()`` calls inside ``with``) so a future contributor reading the history sees what was checked and ruled out. |
||
|
|
04de8597ec |
fix(llm,docker): close ChatOllama async httpx client when called from a running loop + healthcheck timeout (#4047)
* fix(llm): close ChatOllama async httpx client even when called from a running loop Regression of #3816 with #3855's coverage gap. ``_close_base_llm`` used to skip the async-client close when ``asyncio.get_running_loop()`` succeeded and document that the loop owner would close instead — but no loop-owner cleanup code exists in the project, so the inner ``httpx.AsyncClient`` (and its ``epoll_create`` FD) was silently abandoned. Long-running deployments accumulated ``anon_inode:[eventpoll]`` FDs until the process hit its ``ulimit -n``. The skip path fires under the default ``langgraph-agent`` strategy too: LangGraph dispatches some tool steps via asyncio internally, so close calls reached from a sync ``finally`` can still land inside a live loop. Cleanup now runs in a brief daemon thread that owns its own loop, so ``asyncio.run(aclose())`` works regardless of the caller's loop state. A bounded 5-second ``join`` keeps it from blocking shutdown when the Ollama server is unresponsive; if the join times out, ``_ldr_closed`` is left unset so a later call retries the close, and a WARNING surfaces in logs so the leak is visible instead of silent. Adds: - A regression unit test (``test_closes_async_inside_running_loop_via_thread``) that calls ``_close_base_llm`` from inside an ``asyncio.run`` driver and asserts ``aclose`` actually ran. - An FD-growth guard (``test_no_fd_growth_when_closed_inside_running_loop``) modeled on the existing ``test_no_fd_growth_across_repeated_close_cycles`` but exercising the in-loop close path. - An idempotency test and a timeout test for the new thread path. * fix(docker): add timeout to healthcheck urlopen so failed checks don't leak children ``urllib.request.urlopen('http://localhost:5000/api/v1/health')`` had no ``timeout=`` argument, so when the app slowed down (FD exhaustion, slow DB checkpoint, anything else) the call hung forever. Docker's ``--timeout=10s`` only SIGKILLs the ``sh -c`` wrapper; the python child got reparented to PID 1 and kept hanging on the urlopen, each one contributing a ``pidfd`` and a TCP socket against the app's listen socket. On a stuck container we observed 21 live + 113 zombie healthcheck pythons and 64 ``pidfd`` FDs on PID 1. ``timeout=8`` lets urlopen return/raise inside Docker's 10s budget so the child exits cleanly and gets reaped. Pairs with the eventpoll-FD fix in ``_close_base_llm``: that one removed the dominant 91% of the leak, this one removes the 6% remainder and the zombie pile-up. Adds a towncrier fragment covering both fixes. |
||
|
|
1651587d9c |
chore(alembic-runner): drop stale isolation_level="IMMEDIATE" references (#4039)
Two docstring/comment references in `alembic_runner.py` cite SQLCipher's `isolation_level="IMMEDIATE"` as the reason the head short-circuit matters. Production engines actually use `isolation_level=""` (deferred): - `src/local_deep_research/database/encrypted_db.py:378` (user-DB engine) - `src/local_deep_research/database/encrypted_db.py:450` (encrypted engine) The `IMMEDIATE` default in `_make_sqlcipher_connection` (line 280) is the helper-function default, but the production callers override it to "" to avoid login-path contention. The short-circuit is still load-bearing — `engine.begin()` opens a write transaction regardless of isolation level, and SQLite takes a RESERVED lock as soon as the first DML lands inside. Just the cited mechanism was wrong. Rewords both comments to reflect the actual lock-acquisition rule (RESERVED on first DML), independent of the driver isolation_level. Pure documentation change — no behavior delta. Existing short-circuit tests still pass. |
||
|
|
a6287a4362 |
fix(security): pin towncrier to exact version and bump Python to 3.14.5 (#4046)
* fix(security): resolve Scorecard pin alerts and bump Python to 3.14.5 - Pin `pip install towncrier` to a single version with `--hash` (both occurrences in release.yml), resolving Scorecard Pinned-Dependencies alerts #7761 and #7762. - Bump the Dockerfile base image from python:3.14.4-slim to 3.14.5-slim (with new pinned manifest digest). 3.14.5 bundles libexpat 2.8.0 (gh-149017), which is required to mitigate CVE-2026-7210 — Grype alert #7760. * chore(release): drop hash-pins on towncrier, keep exact version pin Per review feedback: hash-pinning a build-time CLI like towncrier adds maintenance burden without meaningful supply-chain benefit. The rest of this repo already uses exact-version pins (`pdm==2.26.2`, `pyyaml==6.0.3`, etc.) which Scorecard's PinnedDependenciesID rule accepts — the original alerts fired only because `~=24.8` is a fuzzy version range. |
||
|
|
f664221ce4 |
chore(observability): surface WAL-dispose failures + document LDR_APP_DEBUG sensitivity (#4042)
Two small follow-ups from the #3976 investigation. connection_cleanup.py: bump dispose-failure log from debug to warning. The 30-min periodic pool dispose at web/auth/connection_cleanup.py:154-171 is the workaround for ADR-0004's SQLCipher + WAL handle leak. Pre-fix, _checkpoint_wal/engine.dispose() failures were swallowed at logger.debug, hiding silent drift. Now surfaces at WARNING with the exception TYPE NAME only (matches the _report_silent_exception pattern in utilities/log_utils.py:146-194, which deliberately drops the exception value to avoid leaking sensitive locals through the sensitive-logging hook). New test test_dispose_failures_surface_as_warnings locks in: - the warning fires and names the user + exception type - the exception's message text does NOT leak docs/CONFIGURATION.md: document that LDR_APP_DEBUG=true also enables Loguru diagnose=True on every sink, which materialises local-variable values into exception traces. Those traces can include credentials, decrypted user content, and other sensitive locals. Documentation-only. Refs: #3976 |
||
|
|
f928f4cc5c |
🤖 Update dependencies (#4043)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |
||
|
|
2808f0fa9d |
feat(benchmarks): add statistical functions module (#4029)
* feat(benchmarks): add statistical functions module for benchmark evaluation * test(benchmarks): add unit tests for statistics module * fix(benchmarks): add input validation to statistical functions * feat(benchmarks): wire Wilson CI into metrics, reports, and live progress |
||
|
|
074285a26d |
fix(release): enrich AI release notes + render changelog in release flow (#4035)
* fix(release): enrich AI release notes + render changelog in release flow
Fixes the v1.6.10 release notes degradation where:
1. docs/release_notes/1.6.10.md was never created (no automation rendered
changelog.d/ fragments before/at release time)
2. AI summary call returned 2xx but empty content with finish_reason=length
create-release job now:
- Sparse-checks-out changelog.d/ + pyproject.toml, installs towncrier
(no PDM needed — towncrier reads pyproject directly), renders
docs/release_notes/<version>.md before composing the release body.
Guards against an empty fragment directory.
- Fetches every merged PR's title + body in a single GraphQL round-trip
and feeds them to the model.
- Fetches the full diff between the previous /releases/latest tag and
the new tag via the compare API, filters lockfiles/generated docs/
SBOM/static assets/binary patches, caps at 700k chars, strips NUL
bytes before jq --rawfile.
- Bumps AI_MAX_TOKENS default 4000 -> 64000 (matches the AI code
reviewer's working budget). Adds AI_REASONING_MAX_TOKENS=16000 so
Kimi K2 Thinking cannot burn the entire output budget on reasoning
tokens — the root cause of v1.6.10's empty .content.
- Adds .reasoning to the response-parsing fallback chain after
.content and .reasoning_content. OpenRouter normalizes Moonshot's
thinking trace to .reasoning (not .reasoning_content), which is why
v1.6.10's diagnostic showed message keys "content, reasoning,
reasoning_details" with no usable extraction path.
- Enforces a 750k char overall prompt cap so PR descriptions + diff
can't blow Kimi's 262k token context window.
- Truncates the final release body to 124,400 chars to stay under
GitHub's documented 125k release-body limit (HTTP 422 otherwise;
gh CLI does not pre-validate).
- Rewrites the SUMMARY_PROMPT to ask for a helpful narrative (not a
TL;DR), with length sized to the material.
New cleanup-changelog job opens a PR on main with the consumed fragments
+ rendered release-notes file, since the create-release runner is
throwaway. Branch protection on main allows the PR (0 required reviews,
0 required checks).
* chore(release): persist 1.6.10 changelog render + clear consumed fragments
The v1.6.10 release shipped without docs/release_notes/1.6.10.md because
no automation rendered changelog.d/ fragments at release time (see
release.yml change in this PR for the fix going forward). Persists the
render now so 1.6.11's release does not re-consume the same fragments.
Renders the v1.6.10 release_notes file from the 30 fragments that were
in changelog.d/ at v1.6.10 cut time, and removes those fragments from
changelog.d/. The rendered content also backs the v1.6.10 GitHub
release body update.
* fix(release): address AI review findings (UTF-8, race, GraphQL cap)
- UTF-8 character-aware truncation. Replace `head -c` (byte-oriented,
splits multi-byte UTF-8 mid-sequence) with Python-based character
truncation for the diff (700k), prompt (750k), and release body
(124,400) caps. Matters because towncrier renders emoji section
headers (💥/🔒/✨/🐛) that appear in diffs of docs/release_notes/;
mid-emoji splits produce invalid UTF-8 that jq --rawfile then
refuses to encode and the GitHub Release API rejects with HTTP 422.
- cleanup-changelog race fix. Pin checkout to ${{ github.sha }}
instead of `ref: main`. If a PR with new fragments merged into main
between create-release and cleanup-changelog, `ref: main` would
consume those new fragments into THIS release's docs/release_notes
file and delete them prematurely — stealing them from the next
release. github.sha is the commit the workflow ran against, so the
set of fragments matches what create-release rendered.
- GraphQL query node-count cap. Limit PR-description batch to 100 PRs
per query and log a warning if a release exceeds that (LDR's typical
release is ~20-30 PRs, well under). Unbounded fan-out could trip
GitHub's GraphQL complexity ceiling on a huge release.
- Compare API 300-file warning. Log when .files[] hits the 300-file
boundary so a future release's missing-file diff can be diagnosed
quickly without rerunning. The cap is a documented GitHub limit.
* fix(release): address review2 — PR cap, trap leak, base pin, prompt clarity
- Raise PR-fetch cap 100 → 200. v1.6.10 had 144 unique PRs (LDR's
dependency-bump traffic is heavy); the previous 100 cap would have
silently dropped ~30% of PR descriptions from the AI prompt. The
750k-char overall prompt cap still protects context window.
- Hoist COMPARE_JSON mktemp above the trap registration so the temp
file is cleaned up even if jq throws under set -e between mktemp
and the manual rm. ${DIFF_FILE}.clean (the NUL-strip staging path)
also added to the trap; rm -f tolerates the missing-file case.
- Pin base: main on peter-evans/create-pull-request. On tag-triggered
runs github.sha may not sit on main HEAD, and the action's
default-branch resolution could pick a non-main base. We always
want the cleanup PR to target main.
- Clarify SUMMARY_PROMPT section markers. The prior text said inputs
are "separated by `----- SECTION -----` markers" using SECTION as a
placeholder; a literal-minded model could look for that exact
string and find none. Now lists the actual marker forms explicitly.
- Add PREV_TAG == RELEASE_TAG guard. On a workflow re-run after the
release exists, /releases/latest returns the just-created tag,
making the diff empty. Falls back to the second-most-recent stable
release.
* fix(release): jq --arg for re-run guard + surface jq errors + doc updates
Workflow fixes from a final pass:
- Re-run guard now passes RELEASE_TAG to jq via `--arg rel` instead of
shell-interpolating it into the program text. RELEASE_TAG is already
validated as bare semver upstream so this is defense-in-depth, but
--arg keeps shell quoting and jq quoting fully separated regardless
of what RELEASE_TAG ever ends up containing.
- Compare-API jq pipeline no longer swallows stderr or masks the exit
code. Previously `jq ... 2>/dev/null || true` would silently produce
an empty diff and a "Diff size: 0 bytes" log line on any jq failure,
giving a maintainer no actionable signal. Now an explicit if-not
check logs a WARNING with jq's stderr intact and ensures the diff
file is empty.
Doc updates for the new release flow:
- changelog.d/README.md: drop the obsolete "maintainer runs `pdm run
towncrier build`" instructions; describe the automated render +
follow-up cleanup PR. Keep the local --draft / --keep preview tips
for fragment iteration.
- docs/RELEASE_GUIDE.md: rewrite the maintainer flow (steps 1-3 of the
old "Render + bump + commit both" sequence are obsolete — the
workflow handles rendering now). Add the cleanup PR merge as a final
checklist item. Update the body composition description from "AI
TL;DR" to AI narrative with diff + PR-body inputs.
* style(release): fix comment indent typo from prior edit
|
||
|
|
e6432db8bd |
fix(embeddings): correct OpenAIEmbeddingsProvider.requires_api_key to False (#4036)
Follow-up to #4026. After that PR the provider supports keyless OpenAI-compatible local servers (LM Studio, vLLM, llama.cpp) — an API key is needed only for the OpenAI cloud path. The class-level ``requires_api_key = True`` was therefore stale; any future UI consumer that gates an "API key required" badge on it would mislead users on local servers. Drop the explicit override so the attribute inherits ``False`` from BaseEmbeddingProvider. The cloud-needs-key rule is still enforced at runtime in ``is_available`` and ``create_embeddings`` when no base_url is configured, so nothing about the active behavior changes. No behavior change for current callers — there is no embedding-side consumer of this attribute today; the fix is to make a latent semantic inaccuracy not bite the first future consumer. |
||
|
|
df8657adb5 |
Feat/deepseek provider (#3432)
* feat: deepseek provider * fix: address review comments on deepseek provider - Fix typo in import (loggere -> removed unused import) - Fix typo in model name (deepseek-reasonser -> deepseek-reasoner) - Fix base URL (api.deepseek.com/api/v1 -> api.deepseek.com/v1) - Remove standalone functions; auto-discovery handles registration - Add requires_auth_for_models to match other cloud providers - Add deepseek_settings.json for the llm.deepseek.api_key default setting --------- Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> Co-authored-by: Daniel Petti <djpetti@gmail.com> |
||
|
|
9ad3910452 |
fix(search): keep cross-engine filter fallback within evaluated context (#3866)
* fix(search): keep cross-engine filter fallback within evaluated context * style(search): apply ruff format for context fallback fix |
||
|
|
2ca4f02e6a |
docs(developing): add prerelease Docker image testing section (#4034)
Document the two Docker Hub tags published by prerelease-docker.yml (the immutable prerelease-vX.Y.Z-<sha> tag and the floating :prerelease tag added in #4005) and provide a copy-pasteable docker-compose service that runs the RC alongside production on port 5001 with isolated volumes, so a broken migration in the candidate cannot damage a production SQLCipher database. |
||
|
|
243d2b2a7f |
fix(embeddings): allow OpenAI-compatible local endpoints (#3883) (#4026)
* fix(embeddings): allow OpenAI-compatible local endpoints (#3883) Adds the OPENAI member to the EmbeddingProvider enum, registers the embeddings.openai.* settings so the UI can surface the configuration form, and widens the provider's availability + create_embeddings path to accept a base_url-only configuration (LM Studio, vLLM, llama.cpp). The model-list lookup now routes through the configured base_url so discovery hits the local server instead of api.openai.com. No DB migration is required: the embedding_model_type column is declared with values_callable, so SQLite renders it as plain VARCHAR with no CHECK constraint — adding the OPENAI enum value is a pure Python-side change. Fixes #3883 * test(settings): regenerate golden master for new embeddings.openai.* keys Picks up the four embeddings.openai.* keys (api_key, base_url, model, dimensions) registered by settings_openai_embeddings.json in this PR. Generated via scripts/dev/regenerate_golden_master.py — no manual edits. * fix(embeddings): annotate openai params dict for mypy invariance The params dict at openai.py:121 holds heterogeneous values: str for model/api_key/base_url, int for dimensions. mypy infers Dict[str, str] from the initial literal and rejects the int assignment plus the **params unpack into OpenAIEmbeddings (6+ errors at line 133, "dict is invariant"). Explicit Dict[str, Any] annotation resolves it — same shape this file already uses for client_kwargs at line 197. --------- Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> Co-authored-by: Daniel Petti <djpetti@gmail.com> |
||
|
|
8c59082c30 |
feat(errors): friendly runtime messages for OpenAI-compatible endpoints (#4027)
* feat(errors): friendly runtime messages for OpenAI-compatible endpoints (#3878) Wrap Site B in research_service.run_research_process so that when a request to an OpenAI-compatible LLM endpoint (LM Studio / vLLM / llama.cpp server / OpenRouter / custom endpoint) fails at runtime, the surfaced error names the provider, configured base URL, and model. The helper lives in error_handling/openai_compat_errors.py and: * walks __cause__/__context__ to find the underlying openai.* / httpx.* class through any LangChain wrapper (cycle-guarded); * dispatches to seven new tokens that slot into the existing "Error type: <code>" convention: openai_connection_refused, openai_timeout, openai_auth, openai_permission_denied, openai_model_not_found, openai_bad_request, openai_unknown; * always appends the original exc!s as a "Details:" suffix so no information is lost; * strips userinfo from base URLs before display (no API-key leaks when a user embeds the key in the URL). Sites B and C and ErrorReporter all learn the new tokens; existing Ollama and ad-hoc connection branches are untouched, so non-OpenAI-compatible providers see no behaviour change. Tests construct openai / httpx exceptions directly (no network) and cover all five acceptance criteria from the issue plus the seven token round-trips through ErrorReporter. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address djpetti feedback on PR #4027 - mention --network=host in _DOCKER_HINT - hoist openai/httpx imports to module top (drop risk-averse try/except) - hoist openai_compat_errors import to research_service.py top * deps: promote openai and httpx to direct dependencies error_handling/openai_compat_errors.py imports openai and httpx at module top-level, but both were only present transitively via langchain-openai. Pin them as direct deps so a future langchain-openai refactor cannot break the error_handling module at import time. --------- Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Daniel Petti <djpetti@gmail.com> Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> |
||
|
|
b20786c62c |
test(migrations): pin invariants from PR #4000 multi-round review (#4033)
Adds three regression tests that each fail on `main` (pre-fix) and pass with the runner-level changes in this PR. Surfaced by a 30+ subagent multi-round review of the existing test coverage; deferred dozens of proposed tests that overlapped with existing coverage or tested SQLite/Alembic internals rather than our code. 1. `test_run_migrations_skips_upgrade_when_at_head` — extended. Mocks now cover not just `command.upgrade` but also the new `_drop_orphan_alembic_temp_tables` and `_disable_fk_for_migration` helpers. Pins that the short-circuit happens BEFORE engine.connect() and the FK toggle. If a future refactor moves the short-circuit below the orphan-cleanup or FK toggle, this test fails — the existing command.upgrade mock alone would not catch that. 2. `test_run_migrations_drops_multiple_orphan_temp_tables` — new. Seeds three orphan `_alembic_tmp_*` tables and asserts all are cleaned in one pass. Targets the loop body in `_drop_orphan_alembic_temp_tables`; the existing single-orphan test would still pass if the loop ever short-circuited after the first iteration. 3. `test_drop_orphan_temp_tables_no_op_when_none_present` — new. Direct unit test on `_drop_orphan_alembic_temp_tables` against a clean DB. Pins the `if not temp_tables: return` early-return guard — a future refactor that unconditionally logs/scans would be caught. Out of scope (verified by Round 5 cross-verification): - foreign_key_check after upgrade: already covered (lines 4632, 4831). - Data preservation 0001→head: already covered (lines 1518, 1871). - Run twice no-op: covered by `test_idempotent_migrations` (line 194). |
||
|
|
3ade4b4103 |
🤖 Update dependencies (#4031)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> |
||
|
|
c2a47a83b3 |
fix(db): unblock multi-migration upgrades blocked by FK mismatch + orphan _alembic_tmp_* tables (#4000)
* fix(db): unblock multi-migration upgrades — toggle FK + scrub orphan temp tables outside the alembic transaction
Closes #3990 and unblocks #3817. Real users at revisions 0001–0005
upgrading to 0009 hit two failure modes that left their account
unable to log in:
1. **`foreign key mismatch — "download_attempts" referencing "download_tracker"`** (#3990)
Migration 0007's defensive `PRAGMA foreign_keys = OFF` is silently a
no-op once the sqlite3/sqlcipher3 driver has auto-begun the migration
transaction (per sqlite.org/pragma.html#pragma_foreign_keys). With the
chained 0002–0006 upgrade, earlier migrations issue DML before 0007
runs, freezing FK in the connect-time ON state for the rest of the
upgrade. The orphan-scrub `DELETE FROM download_attempts ...` then
fails with "foreign key mismatch" because the pre-fix
`download_tracker.url_hash` lacks the UNIQUE backing the FK requires
for the cascade machinery to compile.
The fix issues `PRAGMA foreign_keys = OFF` in
`alembic_runner.run_migrations` BEFORE opening the migration
transaction (via `exec_driver_sql`, which doesn't trigger driver
auto-begin), then re-enables FK on the same connection after the
upgrade commits and before the connection returns to the pool — so
subsequent checkouts see the production-default ON state.
2. **`table _alembic_tmp_journals already exists`** (#3817)
`op.batch_alter_table` rebuilds a table by creating
`_alembic_tmp_<table>`, copying data, dropping the original, and
renaming. On a clean run alembic drops the temp table automatically.
If a previous attempt failed in a way that bypassed transaction
rollback (e.g., an older migration runner that auto-committed each
migration), the temp table persists and the next attempt fails with
"table _alembic_tmp_* already exists".
The fix drops orphan `_alembic_tmp_*` tables in
`alembic_runner.run_migrations` before opening the migration
transaction. This runs at the SQLite level under autocommit; if a
concurrent run_migrations is mid-batch_alter_table, our DROP blocks
on the SQLite write lock until the rename consumes the temp table,
making our DROP IF EXISTS a no-op — the race is benign.
Tests: two new fixture-driven regression tests
(`TestUpgradeFromBuggyV16xUserDbProductionEngine`,
`TestOrphanAlembicTempTableCleanup`) reproduce the production failure
modes verbatim — `isolation_level=""` matching the sqlcipher3 engine
in `encrypted_db.py`, FK ON at connect via the same event handler
`apply_performance_pragmas` installs, and a chained 0005→head
upgrade so DML auto-begins before 0007. Both tests fail without the
runner fix with the exact production error messages and pass with it.
Migration 0007's misleading comment ("no DML has opened the implicit
transaction yet") is also corrected — that statement was true when
the migration was written against a single-revision test fixture but
never held for real multi-migration upgrades.
* test(no-raw-sql): allow alembic_runner.py — same exception class as initialize.py
`alembic_runner.py` is migration infrastructure (drops orphan
`_alembic_tmp_*` tables in #3817, toggles `PRAGMA foreign_keys` in
#3990). The single `DROP TABLE IF EXISTS` f-string trips the
`["\']DROP\s+TABLE\s+'` regex in the raw-SQL guard. Add the file to
the same exclusion list `database/initialize.py` lives in — both are
catalog-derived DDL on migration infrastructure, not application
code touching user-controllable SQL.
Precedent: commit
v1.6.10
|
||
|
|
048e58905a |
chore(deps): bump urllib3 to 2.7 for CVE-2026-44431 and CVE-2026-44432 (#4028)
Fixes two high-severity vulnerabilities: - CVE-2026-44431: sensitive headers forwarded across origins in proxied low-level redirects - CVE-2026-44432: decompression-bomb safeguards bypassed in streaming API |
||
|
|
f7f427bff7 |
feat(citation): source-tagged citations with global counter (#4012)
* feat(citation): source-tagged citations with global counter
Add ``CitationMode.SOURCE_TAGGED_HYPERLINKS`` and set it as the
default ``report.citation_format``.
## What changes for users
Reports now render citations as ``[arxiv-1]``, ``[openai.com-2]``,
``[arxiv-3]`` — the source tag identifies *what kind* of source each
citation is, while the number is the original bibliography-order
global counter. Compared to the previous ``DOMAIN_ID_*`` modes, the
suffix is **not** a per-domain counter, so labels never collide and
clicking from inline text to the source list is unambiguous.
Source-tag resolution order:
1. ``URLClassifier``-recognised academic sources use the short enum
value: ``arxiv``, ``pubmed``, ``pmc``, ``semantic_scholar``,
``biorxiv``, ``medrxiv``, ``doi``.
2. Generic web URLs fall back to the cleaned domain
(``nytimes.com``, ``openai.com``) via the existing
``_extract_domain``.
3. Empty or non-http(s) URLs (``file://``, local-RAG hits) tag as
``local`` and render without a hyperlink so the markdown stays
clean. A future PR can plumb collection names through the RAG
metadata pipeline to replace the uniform ``local`` fallback —
noted in the helper docstring.
## What does NOT change
* The agent still emits plain ``[N]`` citations — the LLM prompt and
``SearchResultsCollector`` are untouched. This is purely a
display-layer transform applied after generation.
* All other modes are preserved unchanged. Users on
``domain_id_hyperlinks`` etc. keep their current behaviour.
* The global counter mechanism in
``SearchResultsCollector.add_results`` (``index = len(_all_links) +
1``) was already correct — the new mode just stops the formatter
from throwing that number away.
## Files
* ``citation_formatter.py``: new enum value, new
``_format_source_tagged_hyperlinks`` method, ``_extract_source_label``
helper (URLClassifier → domain → ``local`` fallback chain), and
``_is_linkable_url`` helper so file:// / empty URLs render as
``[local-N]`` rather than ``[[local-N]](file:///...)``.
* ``research_service.py`` & ``scheduler/background.py``: add the new
value to the string→enum dispatch maps. Existing Python fallbacks
are deliberately left as-is.
* ``default_settings.json``: add the new option (placed first to
signal it as the default), flip ``value`` from
``"number_hyperlinks"`` to ``"source_tagged_hyperlinks"``, expand
the description.
* ``golden_master_settings.json``: regenerated via
``scripts/dev/regenerate_golden_master.py``.
## Tests
* ``test_source_tagged_hyperlinks_preserves_global_counter`` — the
core property: ``arxiv-1, openai.com-2, arxiv-3`` (not per-domain
re-numbering). Covers individual citations *and* comma-separated
groups ``[1, 2, 3]`` → three tagged links concatenated.
* ``test_source_tagged_hyperlinks_known_academic_sources`` — arxiv,
pubmed, semantic_scholar, biorxiv tags.
* ``test_source_tagged_hyperlinks_local_url_falls_back`` — both
``file://`` URLs and missing-URL citations render as plain
``[local-N]`` without a hyperlink.
* ``test_enum_member_count`` and ``test_*_value`` in
``test_citation_formatter_high_value.py`` updated for the new
member.
* feat(citation): use collection name for local-RAG citations + changelog
Builds on the source-tagged citation work in this PR. Two pieces:
## Collection-name plumbing for local documents
Previously, RAG / library hits all rendered as ``[local-N]`` because
the formatter only saw the URL/title round-trip and had no signal
about which collection a hit came from. Now the rendered sources
block carries an optional ``Collection:`` line per source, and the
formatter parses it back so library hits surface their (slugified)
collection name as the citation tag.
Concrete pipeline:
1. ``LibraryRAGSearchEngine`` already puts ``collection_name`` into
``result["metadata"]`` (existing — no change).
2. ``utilities/search_utilities.format_links_to_markdown`` now
tracks ``canon_to_collection`` alongside ``canon_to_title`` and
appends `` Collection: <name>`` after the ``URL:`` line when
the metadata carries one. First non-empty wins per canonical URL
(mirrors how title/quality work).
3. ``CitationFormatter._parse_collections`` extracts
``{citation_num: name}`` via a multiline regex anchored on the
``[N]`` header so a Collection: line attached to ``[1]`` cannot
leak into ``[2]``.
4. ``_extract_source_label`` gains an optional ``collection``
parameter that wins outright when supplied. Otherwise the existing
fallback chain (URLClassifier → domain → ``local``) is unchanged.
5. ``_slugify_collection`` normalises free-form collection names
into compact inline-safe tags: ``"My Papers"`` → ``my-papers``,
``"team/finance"`` → ``team-finance``, edge cases degrade to
``local`` rather than empty.
Result: a research mixing web hits and library hits now renders as
e.g. ``[arxiv-1]``, ``[my-papers-2]``, ``[openai.com-3]``,
``[team-finance-4]`` — readers can see at a glance what kind of
source each citation is.
## Changelog fragment
Adds ``changelog.d/4012.feature.md`` per the towncrier convention
documented in ``changelog.d/README.md``. Describes the new default
citation format and notes that all previous modes remain available
via ``report.citation_format``.
## Tests
* ``test_source_tagged_hyperlinks_uses_collection_name`` — mixed
web + library report renders with the right tags and no
cross-contamination.
* ``test_source_tagged_hyperlinks_collection_slugify_edge_cases`` —
pins slugifier behaviour on whitespace, slashes, casing, unicode,
and empty-after-slug edge cases.
* ``test_source_tagged_hyperlinks_missing_collection_falls_back`` —
library URL without a ``Collection:`` line keeps the previous
``local-N`` behaviour (compat with hand-rolled sources blocks).
* ``test_source_tagged_hyperlinks_collection_line_isolation`` —
regression guard for the regex anchoring: a ``Collection:`` line
on ``[1]`` must not affect ``[2]``.
* Four ``TestFormatLinksToMarkdownCollections`` tests cover the
renderer side: emit on metadata present, omit on metadata absent,
omit on metadata without ``collection_name``, first non-empty
wins on URL dedup.
1173 tests pass across ``tests/text_optimization/``,
``tests/utilities/`` (search utilities), and ``tests/settings/``.
``mypy`` clean on both touched source files.
* chore(citation): don't flip default to source_tagged yet
Per maintainer call: ship the new ``source_tagged_hyperlinks`` mode
as an opt-in only — keep ``number_hyperlinks`` as the default for
``report.citation_format`` for now. The mode stays available in the
settings dropdown for users who want to try it; we can flip the
default in a later release once it has soaked.
Changes:
* ``default_settings.json``: revert ``value`` to ``"number_hyperlinks"``;
move the new option from first to second-to-last in the dropdown so
the ordering doesn't read as "this is the default"; rewrite the
description to lead with the existing modes.
* ``golden_master_settings.json``: regenerate to track the JSON value.
* ``changelog.d/4012.feature.md``: reword from "new default" to
"new option, opt-in via the setting".
No code change to the formatter, the new mode, the collection
plumbing, or any of the 8 new tests added earlier in this PR.
|
||
|
|
d2a0889014 |
test: fix flaky rate-limit-triggered failures in rag upload coverage (#3943)
`tests/research_library/routes/test_rag_routes_upload_coverage.py`'s `TestUploadToCollection` tests pass in isolation but the last three (test_upload_pdf_storage_failure_continues, test_upload_auto_index_triggered, test_upload_auto_index_no_password) flake to `429 TOO MANY REQUESTS` when run as part of the wider research_library test suite locally (LDR_DISABLE_RATE_LIMITING/DISABLE_RATE_LIMITING is unset). The `@upload_rate_limit_user`/`@upload_rate_limit_ip` decorators applied to `upload_to_collection` at module import time close over the real Limiter instance, so the existing fixture's symbol patches cannot undo them — by the time those tests run, earlier tests in the same pytest process have already consumed the per-user 10/minute budget against the shared in-memory storage. Add `patch.object(real_limiter, "enabled", False)` to the fixture so the real limiter is short-circuited for the duration of each test (and restored automatically on exit). CI is unaffected (it sets `DISABLE_RATE_LIMITING=true` at the workflow env, so the limiter is already disabled there). |
||
|
|
b5ca512d5d |
feat(hooks): add pre-commit hook to validate settings key namespaces (#4025)
* feat(hooks): add pre-commit hook to validate settings key namespaces Parses ALLOWED_SETTING_PREFIXES and BLOCKED_SETTING_PREFIXES from settings_routes.py via AST (single source of truth) and checks hardcoded settings keys in Python (AST) and JavaScript (regex) files. Prevents the class of bug where a new settings key is added but its prefix is missing from the allow list. * feat(hooks): add pre-commit hook to validate settings key namespaces Parses ALLOWED_SETTING_PREFIXES and BLOCKED_SETTING_PREFIXES from settings_routes.py via AST (single source of truth) and checks hardcoded settings keys in Python (AST) and JavaScript (regex) files. Prevents the class of bug where a new settings key is added but its prefix is missing from the allow list. |
||
|
|
37bd58ba6b |
fix(settings): allow local_search_ namespace for embedding settings (#4024)
The security namespace gate added in
|
||
|
|
964c774292 |
chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests/puppeteer (#4021)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com> |
||
|
|
59e3bac836 |
chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests (#4020)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com> |
||
|
|
1351a0cde7 |
chore(deps-dev): bump puppeteer in /tests/api_tests_with_login (#4019)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com> |
||
|
|
67114f8066 |
chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests/ui_tests (#4018)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com> |