Commit Graph

6474 Commits

Author SHA1 Message Date
LearningCircuit
ba0912056c test(llm_utils): pin daemon-thread contract for in-loop async close (#4078)
* test(llm_utils): pin daemon-thread contract for in-loop async close

The existing ``tests/utilities/test_close_base_llm.py`` already covers
the sync + async + in-loop + timeout + idempotence + FD-growth cases
for ``_close_base_llm``. Two narrow contracts remained unpinned:

- **Daemon flag** — the cleanup thread at llm_utils.py:154-159 must
  be ``daemon=True`` or a stuck ``aclose()`` would hold up Python
  interpreter shutdown. The comment at llm_utils.py:140-143 documents
  this requirement but no test asserted it.

- **In-loop close marks ``_ldr_closed`` even when inner aclose
  raises** — the cleanup thread runs ``asyncio.run(aclose())`` inside
  a ``try/except Exception`` (lines 146-152). When ``aclose`` raises,
  the thread exits cleanly and the main thread sees
  ``t.is_alive() == False``, then sets ``_ldr_closed = True`` (line
  178). The pre-existing ``test_swallows_async_close_exception``
  covered this invariant for the no-loop branch only.

New class ``TestInLoopCleanupThreadContract`` adds two tests:

- ``test_cleanup_thread_is_daemon_so_shutdown_is_not_blocked`` —
  patches ``threading.Thread`` with a subclass that captures the
  constructor kwargs; verifies ``daemon=True`` and a stable name
  prefix (``"ldr"``).
- ``test_in_loop_close_marks_closed_even_when_inner_aclose_raises``
  — invokes ``_close_base_llm`` inside ``asyncio.run`` with an
  ``aclose`` that raises; asserts ``_ldr_closed`` is set anyway.

Mutation-checked:
- Flipping ``daemon=True`` to ``daemon=False`` → the daemon test fails.
- Removing the ``async_httpx._ldr_closed = True`` line from the
  in-loop completion path (llm_utils.py:178) → 3 tests fail: both new
  cases AND the existing ``test_closes_async_inside_running_loop_via_thread``
  / ``test_in_loop_close_is_idempotent``. The fact that the existing
  in-loop idempotence test already covered the happy-path mark is
  reassuring; my new test covers the exception-path mark.

0 production changes. 24 close-base-llm tests pass (was 22).

* test(llm_utils): replace line-number refs with symbol-based ones

AI reviewer flagged that the docstrings on the new tests in PR #4078
cite specific line numbers in ``llm_utils.py`` (e.g.
``llm_utils.py:154-159``, ``:140-143``, ``:173-178``) which will
become stale on any refactor of the target module.

Replace with stable symbol / branch-name references:

- ``llm_utils.py:154-159`` (Thread construction site) →
  "the ``else: # A loop is running in this thread`` block that
  spawns a ``ldr-async-llm-close`` thread"
- ``llm_utils.py:140-143`` (docstring warning) →
  "the docstring of ``_close_base_llm`` ... when motivating the
  brief daemon thread"
- ``llm_utils.py:146-152`` (try/except around asyncio.run) →
  "the cleanup thread's ``_close_in_thread`` runs
  ``asyncio.run(aclose())`` inside a ``try/except Exception``"
- ``llm_utils.py:178`` (the sentinel-set line) →
  "the ``else`` branch that sets ``_ldr_closed = True``"

No behavior change; both tests still pass and still pin the same
contracts. Follow-up to a recommendation in the AI Code Reviewer
comment on PR #4078.
v1.6.11
2026-05-17 11:50:06 +02:00
LearningCircuit
8b98dfc237 test(ui): chunk mobile-nav overlap DOM walk; drop WebKit skip (#4060) (#4076)
* test(ui): chunk mobile-nav overlap DOM walk; drop WebKit skip (#4060)

The mobile-nav overlap assertion in all-pages-mobile.spec.js previously
ran a single ~60-line page.evaluate that walked every interactive
element on the page. On Mobile Safari this occasionally raced WebKit's
context-close ("Target page, context or browser has been closed"), so
the test was wrapped in a WebKit-only test.skip fallback (#4060).

Split the work so no single evaluate runs long:
  1. Tiny evaluate fetches the nav rect.
  2. Tiny evaluate fetches the interactive-element count.
  3. Loop evaluates batches of 50 elements, short-circuiting once we
     have enough overlap hits to report.

Each evaluate is now well under the threshold that triggered the
WebKit race, so the WebKit-only skip and the dual error-message catch
are removed. If a real overlap regresses, WebKit fails loudly
alongside Chromium/Firefox — which was the goal of the issue.

* test(ui): extract findElementsBehindMobileNav helper + per-batch cap

Review followup for #4076:

- Hoist the chunked overlap walk into findElementsBehindMobileNav so
  the test body reads as intent ("find overlaps, assert none") instead
  of evaluate plumbing.
- Pass the remaining maxReported budget into each batch and break the
  inner loop once it's hit, so a batch with many overlap candidates
  doesn't serialize hits we'd discard anyway.

Skipped from the same review: snapshotting the NodeList via
evaluateHandle and re-deriving the nav rect per batch. Both target
theoretical issues (drift, staleness) on pages that are static at the
assertion point, and the absolute perf cost of the current shape is
microseconds — not worth the API complexity until a real symptom
appears.
2026-05-17 10:42:58 +02:00
LearningCircuit
1c33f1dc07 fix(ui-tests): match create/new/add buttons with word boundaries (#4069)
* fix(ui-tests): match create/new/add buttons with word boundaries

Selector helpers in several UI tests called `text.includes('new')`,
which matches the substring "new" inside "News". On
/news/subscriptions, the first hit was the `<a class="btn">Back to News
Feed</a>` link instead of the `#create-subscription-btn`, so
`SubscriptionCrudTests.createSubscriptionFormOpens` clicked the wrong
control and failed because no form opened.

Switch the matchers in the affected helpers to
`\b(?:create|new|add)\b` (plus `subscribe` where it was already in the
list). Word boundaries keep real targets like "Create Subscription",
"New Folder", and "Add Subscription" while skipping "News Feed".

* refactor(ui-tests): extract findActionButton helper

Code-review follow-up. The buttons.find(...) + word-boundary regex
block was duplicated in 9 call sites across 4 files, which is the same
copy-paste that let the original "new" → "News" bug hide in multiple
places.

Extract a single helper into test_lib/test_utils.js:

  findActionButton(page, { selectors, keywords, click })

Defaults to `selectors='button, a.btn, .btn'` and
`keywords=['create','new','add']`, returns `{ found, text }`.

Drops the inconsistent extra `subscribe` keyword from the subscription
CRUD test — verified on the current /news/subscriptions page that no
button is labeled "Subscribe"; the primary control is "Create
Subscription", which is matched by the default keyword list. This
collapses the subscription tests to the same keyword set as the rest.

Net change: 60 insertions / 127 deletions. Reran the 4 affected
shards (mobile, library, history-news, api-crud) end-to-end at 100%,
and confirmed the message now reports the correct button text
(e.g. "Create Collection") rather than the previous false-positive
match.
2026-05-17 10:20:36 +02:00
LearningCircuit
6e37c248e4 test(error_handling): pin load-bearing branches in openai_compat_errors (#4074)
* test(error_handling): pin load-bearing branches in openai_compat_errors

The existing test file covers all seven dispatch tokens and the four
main helpers (319 LOC), but two load-bearing implementation choices
were only documented in comments — not asserted. Adds seven tests
that catch the most likely regressions.

Pinned behaviors:

- ``TestDispatchOrderingTimeoutBeforeConnection`` — ``APITimeoutError``
  is checked BEFORE ``APIConnectionError`` at openai_compat_errors.py:87.
  This matters because ``openai.APITimeoutError`` subclasses
  ``APIConnectionError`` in openai>=1.x, so reordering the two branches
  would mislabel every timeout as ``openai_connection_refused``. The
  comment at lines 85-86 documents this; the new test pins it. A
  ``issubclass`` sanity check on the openai class hierarchy means the
  test fails first (with a clear message) if openai ever reorganises
  these classes, instead of just silently producing the wrong token.

- ``TestWalkCauseChainPreference::test_cause_preferred_over_context_when_both_set``
  — at openai_compat_errors.py:60, ``_walk_cause`` does
  ``cur.__cause__ or cur.__context__`` so explicit ``raise X from Y``
  chains take priority over implicit ``__context__`` chains. The test
  constructs a wrapper with both set and asserts the deepest reached
  is the ``__cause__`` root.

Edge cases:

- ``TestStripCredentialsEdgeCases::test_ipv6_host_brackets_preserved``
  — ``urlparse`` returns ``hostname`` without brackets; the
  implementation reassembles ``netloc`` from hostname + port. The
  test verifies an IPv6 URL still has its host marker (brackets or
  bare ``::1``) after redaction.

- ``test_userinfo_stripped_with_ipv6_host`` — combined userinfo +
  IPv6 host; the userinfo must be removed regardless of host format.

- ``test_url_with_no_netloc_passed_through`` — bare paths hit the
  ``if not parsed.netloc:`` short-circuit and are returned as-is.

- ``TestFriendlyErrorNoneArgs`` — ``friendly_openai_compatible_error``
  uses ``provider or "<unknown provider>"`` and
  ``model or "<unspecified>"`` to keep the surfaced message legible
  when the caller doesn't know the values. Two tests pin both
  placeholders.

Mutation-checked during development:

- Swapping the timeout / connection-refused branches → both timeout
  tests fail.

- Changing ``cur.__cause__ or cur.__context__`` to
  ``cur.__context__ or cur.__cause__`` → the cause-preference test
  fails.

No production code changes. 34 tests pass (was 27).

* fix(error_handling): preserve IPv6 brackets in _strip_credentials

The AI Code Reviewer on this PR (#4074) flagged that the
``test_ipv6_host_brackets_preserved`` assertion was too loose:

    assert "[::1]" in result or "::1" in result

When the implementation strips brackets, the result is
``http://::1:8080/v1`` — which still contains the substring ``::1``,
so the test passes despite producing an invalid URL. Tightening to
``assert result == "http://[::1]:8080/v1"`` surfaced the underlying
bug: ``_strip_credentials`` was indeed losing the brackets.

Root cause: ``urllib.parse.urlparse`` exposes ``hostname`` without
the surrounding brackets that mark IPv6 hosts. The previous
``netloc`` reassembly used the bracketless hostname directly, so
the rebuilt URL became ``http://::1:8080/v1`` — ambiguous about
where the host ends and the port begins, and rejected by downstream
HTTP libraries.

Fix: when reassembling ``netloc``, re-add brackets around any host
that contains ``:`` (i.e. IPv6). IPv4 hosts never contain ``:`` so
this heuristic is safe.

Also tightens both new IPv6 tests to assert the full expected URL
rather than a loose substring match.

Mutation-checked: reverting the bracket re-add flips both
``TestStripCredentialsEdgeCases::test_ipv6_host_brackets_preserved``
and ``test_userinfo_stripped_with_ipv6_host`` to failure.
2026-05-17 10:06:10 +02:00
LearningCircuit
d346a8fe2d test(scheduler): credential lifecycle coverage and weak-test cleanup (#4065)
* test(quality): strengthen weak scheduler tests

Four existing scheduler tests asserted on mock call_counts or swallowed
all exceptions without asserting anything, so they passed even when the
underlying production code was broken. Two frozen-dataclass tests used
``try/except AttributeError: pass`` blocks that silently pass if NO
exception is raised — the opposite of the intent.

Rewrites (no production code changes):

- ``test_scheduler_extended.py::test_logs_processing_start`` — previously
  mocked the logger inside a ``try/except Exception: pass`` and ended
  with the comment ``# Should have logged something`` and no
  ``assert``. The new version exercises the "no session info" early
  return path and asserts on the exact entry-banner log line
  (background.py:688) via ``mock_logger.info.assert_any_call(...)``.

- ``test_scheduler_extended.py::test_queries_overdue_subscriptions``
  → renamed to ``test_returns_early_when_credentials_missing``.
  The previous version called the method with a user who had no
  credentials, wrapped the call in a bare ``try/except Exception: pass``,
  and asserted nothing. The new version patches
  ``get_user_db_session`` and asserts it is NOT called — proving the
  credential-missing guard short-circuits before any DB work.

- ``test_scheduler_extended.py::test_handles_scheduler_exception`` +
  ``test_handles_job_lookup_error_on_remove`` — both replaced by
  ``test_unregister_swallows_job_lookup_error``. The originals tested
  the mocks themselves (``mock.remove_job.side_effect = JobLookupError;
  try: mock.remove_job(...); except JobLookupError: pass``) rather
  than the scheduler. The replacement exercises ``unregister_user``
  with two stale scheduled jobs, asserts both ``remove_job`` calls
  were attempted, and asserts the user is fully cleaned up
  (sessions + credentials) — pinning the JobLookupError swallow at
  background.py:463-464.

- ``test_scheduler_extended.py::test_X_check_subscription`` paths —
  the two tests that wrapped ``_check_subscription`` in
  ``try/except Exception: pass`` and asserted nothing now patch
  ``get_user_db_session`` and assert it is NOT called when the
  user is missing from ``user_sessions``.

- ``test_scheduler_document_behavior.py::test_cannot_modify_enabled``
  and ``test_cannot_modify_interval`` — replaced
  ``try/except AttributeError: pass`` blocks with
  ``pytest.raises((AttributeError, FrozenInstanceError))``. The tuple
  is forward-compatible: ``FrozenInstanceError`` subclasses
  ``AttributeError`` and Python's behavior here has shifted between
  versions. Using ``pytest.raises`` ensures the test fails if NO
  exception is raised.

- ``test_scheduler_extended.py::test_is_frozen`` — same fix
  (``try/except AttributeError: pass`` → ``pytest.raises``).

All 604 scheduler tests pass after these rewrites.

* test(scheduler): add credential lifecycle coverage

The scheduler at src/local_deep_research/scheduler/background.py
(1808 LOC) has ~600 tests in tests/news/test_scheduler_*.py, but
credential-lifecycle scenarios that are most fragile (per the
project memory file project_user_db_encryption_blocks_background_jobs.md)
were not covered. Adds eight test methods pinning these branches.

Each test documents the production line(s) it pins and the mutation
that would flip it. Mutation-checked during development:

- Removing ``self._credential_store.clear(username)`` from
  ``unregister_user`` (background.py:468) → fails
  ``test_unregister_user_clears_credential``.
- Removing the ``set_search_context({...})`` call at
  background.py:837-844 → fails
  ``test_search_context_set_before_processing_each_research``.
- Removing the ``set_setting("document_scheduler.last_run", ...)``
  call at background.py:1082-1084 → fails
  ``test_last_run_not_advanced_when_db_open_fails`` via its
  happy-path contrast assertion.

Coverage added:

- ``TestCredentialExpiryAndIsolation``
  - ``test_credential_expiry_between_two_retrieves_in_same_job`` —
    a long-running job that retrieves credentials twice spanning the
    TTL boundary sees ``pw → None``. Pins
    credential_store_base.py:73-75 (lazy-delete on expired retrieve)
    via SchedulerCredentialStore at background.py:50-53. The base
    class TTL tests at tests/database/test_credential_store_ttl.py
    cover single-retrieve boundary; this covers multi-call.
  - ``test_unregister_user_clears_credential`` — pins
    background.py:454-468 plus credential_store_base.py:98-107.
    A snapshot caller already holds the password as a Python local
    so it survives the clear; the next retrieve sees nothing.
  - ``test_cross_user_credential_isolation`` — parametrized across
    alice/bob/charlie. Pins the username-keyed dispatch.
  - ``test_clear_is_idempotent_and_safe_on_unknown_user`` — pins the
    ``if key in self._store`` guard in
    credential_store_base.py:106. Removing the guard would make the
    second clear and the ghost clear raise ``KeyError``.

- ``TestTtlWrapperBehavior``
  - ``test_ttl_boundary_store_expire_store_cycle`` — full
    store → expire → store → expire cycle through the
    SchedulerCredentialStore wrapper. Pins the
    ``ttl_hours * 3600`` conversion at background.py:42 and the
    ``expires_at`` recomputation on each store at
    credential_store_base.py:47.
  - ``test_ttl_hours_zero_expires_at_next_clock_tick`` — pins the
    absence of validation in the constructor and the strict ``>``
    in credential_store_base.py:73. Contract test: anyone adding
    ``if ttl_hours <= 0: raise ValueError`` must update this test.

- ``TestDocSchedulerCredentialLifecycle``
  - ``test_last_run_not_advanced_when_db_open_fails`` — verifies the
    intentional design from PR #3288 / commit 405226638. The
    ``set_setting("document_scheduler.last_run", ...)`` call at
    background.py:1082-1084 is OUTSIDE try/finally on purpose: if
    upstream setup fails (DB open, SettingsManager init), last_run
    must stay put so the next tick retries. The test has two
    contrasting blocks — unhappy path (DB open raises → last_run NOT
    advanced) and happy path (DB open succeeds → last_run IS
    advanced) — so neither assertion is trivially satisfied.
  - ``test_search_context_set_before_processing_each_research`` —
    pins the fix from PR #3289 / commit 1a0d46e69. Without
    ``set_search_context``, downloads bypass per-thread rate
    limiting because the context is missing. Asserts every required
    field is passed (research_id, username, user_password,
    research_phase=document_scheduler).

The new file relies on the global ``reset_all_singletons`` autouse
fixture at tests/conftest.py:76-94 (which resets the singleton and
calls ``.stop()``) and does not introduce a redundant local
fixture. All deferred imports (get_user_db_session,
set_search_context, SettingsManager) are patched at their source
module.

10 new test cases (8 specs, 3 from parametrize on cross-user). All
604 scheduler-suite tests pass.

* test(scheduler): tidy credential lifecycle test file

Two cleanup items from the AI Code Review on PR #4065:

- Remove the vestigial ``_ = FrozenInstanceError`` guard at the bottom
  of the module along with the matching top-level import. The import
  was carried over from an earlier iteration and is no longer used.

- Extract the mock ``get_user_db_session`` builder that was duplicated
  across ``test_last_run_not_advanced_when_db_open_fails`` and
  ``test_search_context_set_before_processing_each_research`` into a
  ``_make_db_session_with_research`` helper. Saves ~20 lines and gives
  the two happy-path setups a single place to evolve.

No behavior changes; all 10 tests in the file still pass.
2026-05-17 10:03:56 +02:00
LearningCircuit
02e197da86 fix(security): redact Google API key from list_models error log (#4070)
* fix(security): redact Google API key from list_models error log

The Google Gemini provider's ``list_models_for_api`` (at
src/local_deep_research/llm/providers/implementations/google.py:56)
constructs the request URL with the API key as a ``?key=...`` query
parameter, per Google's documented API (https://ai.google.dev/api/rest).

When ``safe_get(url, ...)`` raised — for any reason: connection error,
timeout, 401, etc. — the underlying ``requests``/``urllib3`` exception
message included the full URL, *with* the key. The except handler then
called ``logger.exception(...)``, which writes the traceback (including
the exception's ``__str__``) to every loguru sink: stderr, the database
log sink, and the frontend progress sink.

Reproduced under the project's production loguru config
(``diagnose=False, backtrace=False``): the line
``requests.exceptions.ConnectionError: ...key=sk-LEAKED-VALUE-99999``
appeared in the captured log output.

Fix: catch the exception explicitly, replace the key value in the
message with ``***REDACTED***``, and log via ``logger.warning`` so the
exception chain is not attached.

Bundled regression tests in tests/security/test_api_key_leakage.py:

- ``test_no_leak_when_safe_get_raises_with_url_in_message`` — the
  primary repro path. Patches ``safe_get`` to raise a
  ``ConnectionError`` whose message embeds the key, then asserts the
  sentinel is absent from ``loguru_caplog.text``.
- ``test_no_leak_when_safe_get_raises_generic_runtime_error`` — same
  redaction also runs on non-requests exceptions whose ``str()``
  contains the key.
- ``test_non_200_response_does_not_leak_key`` — pins the existing
  status-code-only warning at lines 88-90 (which already doesn't
  include the URL).
- ``test_repr_does_not_expose_stored_passwords`` /
  ``test_clear_entry_on_missing_does_not_leak_state`` — defense in
  depth on the credential store.
- ``test_friendly_error_strips_credentials_from_base_url`` — pins the
  existing ``_strip_credentials`` userinfo redaction in
  ``error_handling/openai_compat_errors.py`` so a future change that
  removed it would be caught.

Mutation-checked: restoring the old ``except Exception: logger.exception(...)``
flips the two Google leak tests to failure.

* security: extract redact_secrets() utility from inline replace

The previous commit fixed the Google API-key leak with an inline
``msg.replace(api_key, "***REDACTED***")`` in google.py. That is a
one-off — every other provider, route handler, or error path that
needs to scrub a known secret value would have to repeat the same
pattern.

Extract a single utility into ``security/log_sanitizer.py`` next to
the existing ``sanitize_for_log`` / ``strip_control_chars`` helpers:

    def redact_secrets(
        message: str,
        *secrets: Optional[str],
        min_length: int = 8,
        token: str = "***REDACTED***",
    ) -> str

Variadic; skips falsy and sub-min-length values to avoid corrupting
normal message content; exposes ``min_length`` and ``token`` for
callers who need to override. google.py now uses it instead of the
inline replace.

Unit tests in ``tests/security/test_log_sanitizer.py``:

- Happy path: single secret, multiple secrets, all-occurrences.
- Guards: ``None`` ignored, empty string ignored, sub-min-length
  ignored, custom min_length override.
- Boundaries: no secrets, empty message, message without any secret.
- Custom token override.
- Realistic provider key shapes (OpenAI, Google, Anthropic).
- Literal-substring-match contract (URL-encoded forms are NOT
  redacted unless the caller passes them).

google.py refactor captures the redacted message in a local before
the ``logger.warning`` call so the ``check-sensitive-logging``
pre-commit hook (which AST-checks for exception-variable references
in non-exception log calls) does not flag the line. The hook's
recommended ``logger.exception`` would defeat the entire point of
the fix.

The existing six leakage tests in
``tests/security/test_api_key_leakage.py`` remain unchanged — they
assert the leakage contract, not the implementation, so the refactor
flows underneath them.

* review: lift redact_secrets to module-level + tighten silence test

Two small follow-ups to the AI reviewer's points:

1. google.py: move ``from ....security.log_sanitizer import
   redact_secrets`` out of the except handler to module-level. The
   nested import has no circular-import or lazy-load justification
   here (ollama.py already imports ``from ....security import
   safe_get`` at module level), and lifting it eliminates the
   theoretical case where an ImportError raised while handling the
   provider exception would carry the leaked-URL ConnectionError up
   via ``__context__``. Also rewrites the inline comment so the two
   rationales (redact + drop exc_info; capture in a local for the
   check-sensitive-logging pre-commit hook) are no longer broken up
   by the import statement.

2. test_clear_entry_on_missing_does_not_leak_state was passing
   trivially because ``CredentialStoreBase.clear_entry`` is silent
   on every code path — the old assertion ``_LEAKED_KEY not in
   loguru_caplog.text`` would have held even if the test never
   exercised the method. Renamed to test_clear_entry_does_not_log_
   store_state and replaced with ``assert not loguru_caplog.records``
   so the contract being pinned is silence itself: a future
   ``logger.debug(f"store contents: {self._store}")`` regression
   would be caught immediately. Now exercises both the missing-key
   and present-key paths and seeds a second credential so a
   _store-dict dump would also leak it.

Mutation-checked: monkey-patching clear_entry to add a debug log
containing self._store flips the new test to failed; the live
implementation still passes. All 6 tests in
tests/security/test_api_key_leakage.py pass against the real code.
2026-05-17 02:40:10 +02:00
LearningCircuit
0fe3c8c5de chore(security): suppress CVE-2026-8328 (ftplib.ftpcp SSRF) until 3.14.6 (#4072)
Grype alerts on CVE-2026-8328 against python:3.14.5-slim. The
vulnerability is an SSRF in the undocumented ftplib.ftpcp() helper —
the same PASV-trust class as CVE-2021-4189, whose original 2021 fix
only patched ftplib.FTP and left ftpcp() unprotected.

Upstream merged the fix to the CPython 3.14 branch on 2026-05-13
(python/cpython#149793), three days after Python 3.14.5 was tagged.
No 3.14.6 release exists yet, so a base-image bump isn't an option.

Not exploitable here: `grep -rn "ftplib\|ftpcp" src/` returns zero
hits, and no transitive dependency imports ftplib either, so
ftpcp() is unreachable from this image.

Added to .grype.yaml in the existing python3.14 block alongside the
other CPython CVEs awaiting the next 3.14.x point release. The
suppression auto-cleans when the next Python bump picks up 3.14.6+.
2026-05-17 02:32:58 +02:00
LearningCircuit
da0d18ed25 fix(release): set towncrier name to skip package import (#4071)
The release job uses a sparse checkout that omits src/ and runs a
standalone `pip install towncrier`. Towncrier 24.8 still calls
`get_project_name()` even when --version is passed on the CLI,
and the existing [tool.towncrier] config pointed at the
`local_deep_research` package, so the build crashed with
ModuleNotFoundError before rendering any fragments.

Set `name = "local-deep-research"` so towncrier short-circuits the
import path (build.py:195-197). Drop the now-misleading
`package`/`package_dir` fields — `--version` is always passed,
`directory = "changelog.d"` is explicit, and nothing else inside
towncrier still needs them. Fix the workflow comment that
misattributed the bypass to --version.

Verified by rendering changelog.d/*.md fragments against this
pyproject.toml in a fresh directory with no src/ present.
2026-05-17 02:30:51 +02:00
LearningCircuit
b0008045df fix(security): extend IMDS absolute-block to Apprise plugin schemes (#4063)
NotificationURLValidator only ran the cloud-metadata IP guard in the
http/https branch, so URLs like signal://169.254.169.254/+1/+1 (and
the same for gotify, ntfy, mattermost, rocketchat, matrix, json, xml,
form, mailto) reached Apprise — which then POSTs against that host
under HTTP. Behind the operator-only LDR_NOTIFICATIONS_ALLOW_OUTBOUND
gate, but a residual gap inconsistent with the absolute-block invariant
SECURITY.md documents.

Refactored host extraction out of the http/https branch and added an
IMDS-only check for plugin schemes (allow_private_ips=True semantics in
_is_private_ip leaves only ALWAYS_BLOCKED_METADATA_IPS and NAT64-wrapped
metadata active). LAN/loopback reach for self-hosted plugin endpoints
(the #4006 use case) is unchanged.

Test coverage:
- 100 parametrized cases: 10 plugin schemes x 5 metadata IPs x 2
  allow_private_ips values
- mailto://user@IMDS/recipient regression
- positive: signal/gotify LAN + signal localhost still allowed
- positive: token-host schemes (discord/slack/telegram/pushover/teams)
  unaffected
- DNS-resolved hostname pointing at IMDS rejected (single-resolve
  attacker; full rebinding TOCTOU remains documented residual risk)
2026-05-17 02:17:30 +02:00
LearningCircuit
6f18a711d2 docs(resource-cleanup): expand Wave 7 with full audit ledger (#4054)
* docs(resource-cleanup): expand Wave 7 with full audit ledger

Replaces the brief "follow-up gaps" bullet list with the full ledger
of what the broader audit during #4047 actually examined, split into
four scannable subsections:

- Checked and confirmed clean: non-Ollama LLM providers, HTTP session
  lifecycle, subprocess/pidfd, asyncio loops, file handles, SocketIO
  connect/disconnect.
- Flagged then verified NOT a real FD leak: OllamaEmbeddings (uses
  the deprecated langchain_community class with no httpx client),
  auth_db + journal_quality engines escaping shutdown_databases
  (bounded pools, not growing), LibraryRAGService in three RAG SSE
  endpoints (RAM churn, no FDs — FAISS uses pickle.load, embeddings
  hold no FDs per the item above, SentenceTransformer mmaps are
  process-wide singletons).
- Minor findings: daemon threads without explicit shutdown,
  abandoned-research cleanup on socket disconnect — both reaped at
  process exit, not steady-state leaks.
- Future-proofing note: ``langchain_community.embeddings.OllamaEmbeddings``
  is deprecated; the replacement ``langchain_ollama.OllamaEmbeddings``
  DOES carry ``_client`` and ``_async_client`` (verified by direct
  introspection), so when LDR migrates the in-running-loop eventpoll
  leak class will reappear for embeddings unless ``_close_base_llm``
  is generalized.

Direct introspection done at audit time confirms each verdict:
``[a for a in dir(e) if 'client' in a.lower()]`` returned ``[]`` for
the deprecated class and a non-empty list for the new class. This
ledger saves the next contributor from re-running the same agent
sweep when investigating a future FD spike.

No code changes.

* docs(resource-cleanup): add Round-8 pidfd finding (fixed by #3971)

The Wave 7 ledger covered the eventpoll-FD investigation but didn't
mention the residual pidfd accumulation we discovered post-merge. A
follow-up Round-8 investigation (8 parallel agents, 2 rounds + direct
/proc inspection on a live prerelease container) traced ~3.6
pidfds/hour, steady-state ~29, to:

  _check_subscription → quick_summary
    → FullSearchResults.batch_fetch_and_extract
    → AutoHTMLDownloader fallback
    → PlaywrightHTMLDownloader._fetch_with_playwright
    → sync_playwright().start()
    → asyncio.create_subprocess_exec(node-driver)  # opens pidfd
    → driver fails (Chromium not installed in production ldr stage)
    → pidfd not closed on the failed-child exit

CPython 3.14 ruled out as a confounder: subprocess.py uses
waitpid(WNOHANG) polling, never opens pidfds. Only
asyncio.create_subprocess_* and multiprocessing.Process can open them
on Linux + Python 3.9+ via PidfdChildWatcher.

PR #3971 (already merged) addresses this from a different angle: it
makes web.enable_javascript_rendering default false, so
AutoHTMLDownloader short-circuits before invoking Playwright. No
subprocess spawned → no pidfd opened. Original motivation for #3971
was the confusing tracebacks reported in #3826; the FD-leak finding
is the second motivation, captured here so a future reader sees both.

The new bullet sits in Section B (flagged-then-verified-then-fixed)
because the leak was real but is now resolved upstream.

* docs(resource-cleanup): add FD-leak debugging playbook + CI considerations

Add a new "Debugging FD leaks — playbook for the next one" section
between the History (Waves 1-7) and "Intentionally not done" parts of
the doc, capturing the diagnostic flow we developed across Waves 6
and 7 so future contributors don't re-derive it from scratch.

Includes:

- Symptoms that justify treating an issue as an FD leak (OSError 24,
  static-asset MIME errors, High FD count warnings, healthcheck
  hangs).
- Host-side and inside-container snapshot scripts that work even when
  the container is too FD-starved for docker exec (host-side via
  sudo + /proc/$P/fd) and through the entrypoint's UID drop
  (--user 0 to docker exec).
- Lookup table mapping each anon_inode / socket / pipe / REG flavor
  to its likely Python-level source and the path to deep-dive (e.g.
  /proc/PID/fdinfo/N's Pid: line for pidfds).
- A pinpointing recipe per FD type — eventpoll (asyncio/httpx),
  pidfd (asyncio.create_subprocess / multiprocessing.Process),
  WAL/SHM (SQLCipher engine.dispose).
- Pointer to the existing in-codebase instrumentation: _count_open_fds,
  the periodic Resource monitor log, fd_monitor.py, and the
  RUN_MANUAL_SMOKE-gated tests/manual_smoke/test_fd_smoke.py harness.
- Honest discussion of why an automated per-PR FD-growth assertion is
  hard (transient FDs, CI-environment subprocess noise, namespace
  differences, slow-drip leaks needing hours of uptime) and what a
  nightly long-run job would look like if the team chooses to invest
  in one.
- A "which Wave fixed which leak class" reference table so the next
  reporter can recognize a class and skip to the relevant precedent.

No code changes. Pure documentation.

* docs(resource-cleanup): add development-time detection + bpftrace recipes

Extend the FD-leak debugging playbook with two industry-standard
techniques that would have caught Waves 6 and 7 earlier, drawn from
upstream Python docs and the wider production-tracing literature:

1. **bpftrace syscall-level pinpointing** (in the per-FD-type
   section). Trace pidfd_open / epoll_create1 / etc. on the host
   targeting the container's host PID; produces a histogram of every
   user stack that triggered the syscall, ranked by frequency. The
   hot stacks are the culprits. Would have caught the Playwright
   pidfd leak in seconds.

2. **Development-time detection (new subsection 4a)** — catches
   leaks at test time before they ship:
   - PYTHONASYNCIODEBUG=1 + -W default::ResourceWarning. Per the
     asyncio dev docs, unclosed transports emit ResourceWarning at GC
     time; the filter actually displays them. Would have surfaced
     the Wave 7 in-running-loop skip in any test that exercised
     ainvoke + safe_close on ChatOllama.
   - python -X dev for a one-flag local dev mode bundling
     ResourceWarning + asyncio debug + warnings as default.
   - pyproject.toml [tool.pytest.ini_options] examples for both
     "display" and "error" filter modes (with a caveat that error
     mode needs a targeted subset, not the whole suite, because
     third-party libs also emit ResourceWarning).
   - psutil's num_fds / open_files / connections as the
     cross-platform alternative to /proc/self/fd for unit tests on
     macOS dev environments.
   - tracemalloc + objgraph as the next-level tool when a leak is
     reproducible — diff allocations before/after, then render the
     reference chain holding the leaked wrapper alive.

No code changes. The new tooling is recommendations only; no
mandatory pytest config change in this commit. Future work could
enable PYTHONASYNCIODEBUG=1 in the CI test environment if the
overhead is acceptable.

Citations to docs.python.org are inline for the load-bearing
ResourceWarning claim.

* test(fd-canary): pin asyncio.create_subprocess pidfd lifecycle in CI

Add ``TestAsyncioSubprocessFDBaseline`` to
``tests/utilities/test_close_base_llm.py`` with two regression tests
that run on every PR:

1. ``test_no_fd_growth_across_asyncio_subprocess_cycles`` — spawns
   ``/bin/true`` via ``asyncio.create_subprocess_exec`` 10 times and
   asserts total FD count delta ≤ +2. Pins the pidfd FD class against
   the child-watcher leak shape.

2. ``test_no_fd_growth_when_subprocess_fails_to_exec`` — same shape
   but with a deliberately-missing binary, mirroring the *exact*
   Wave-7 production failure mode (Playwright's Node.js driver being
   spawned, kernel returning ENOENT because Chromium wasn't
   installed, child watcher still expected to clean up the pidfd it
   opened *before* the failed exec).

Why this is the right level
---------------------------
LDR's own code does NOT call ``asyncio.create_subprocess_*`` (verified
in R8C1). The production leak came from a transitive dependency
(Playwright). So we cannot test LDR's call sites directly — there are
none. Instead these tests pin the *platform baseline*: on this Python
version, repeated asyncio subprocess cycles must not leak FDs. If a
future Python upgrade, a child-watcher change, or a new direct
asyncio.create_subprocess call in LDR breaks the close semantics, the
next PR's CI fails on these tests — which is the canary signal we
want.

Linux-only via ``sys.platform != "linux"`` skip. pidfd_open is a
Linux syscall; macOS uses a different watcher and Windows uses
ProactorEventLoop. Both 'pass by virtue of nothing to leak', so
restricting to Linux keeps the signal sharp (a failure on Linux is
actionable; a pass on macOS is uninformative).

Same +2 FD slack we use for the eventpoll canary above. A real
1-FD-per-iter leak across 10 iterations would land at delta=10,
well past the threshold.

Doc reference
-------------
Updated ``docs/developing/resource-cleanup.md`` "Existing
instrumentation" section to enumerate all four in-CI FD-growth
canaries (two eventpoll, two pidfd) so future contributors see at a
glance what's already guarded and where to extend coverage when a
new leak class is found.
2026-05-16 20:01:04 +02:00
LearningCircuit
15a3df4aff fix(content-fetcher): disable JS rendering by default (#3826) (#3971)
* fix(content-fetcher): disable JS rendering by default (#3826)

The default Docker production image intentionally ships without
Chromium (Dockerfile lines 286-287), so the AutoHTMLDownloader's
Crawl4AI/Playwright fallback can never succeed for the majority of
users -- it just spawns a fresh Chromium per fetch, fails, and logs
a confusing traceback. In the issue reporter's run, 11 such failed
fallbacks fired per research on api.github.com JSON URLs.

Add a user-facing setting `web.enable_javascript_rendering` (default
false). When disabled, AutoHTMLDownloader skips the JS fallback and
returns the static result. Power users running outside Docker who
have set up Chromium can flip the toggle in the UI.

The setting is plumbed through:

- AutoHTMLDownloader.__init__ -- new enable_js_rendering=True ctor
  arg (preserves direct-caller behaviour); download() and
  download_with_result() short-circuit the JS fallback when False.
- ContentFetcher -- new enable_js_rendering=False kwarg passed
  through to the HTML/DOI downloaders.
- build_fetch_tool / _make_full_fetch_tool / _make_summary_fetch_tool
  -- accept settings_snapshot, read the bool via
  get_bool_setting_from_snapshot (so the toggle works on ToolNode
  worker threads where threading.local does not propagate), pass
  enable_js_rendering into ContentFetcher.
- LangGraphAgentStrategy -- forwards self.settings_snapshot to
  build_fetch_tool at both top-level and sub-agent callsites.
- pipeline.fetch_and_extract / batch_fetch_and_extract -- new
  enable_js_rendering=False kwarg passed through.
- FullSearchResults -- new settings_snapshot kwarg, reads the bool
  and passes it to batch_fetch_and_extract from both call paths
  (run() and _get_full_content()).
- BaseSearchEngine -- forwards self.settings_snapshot when
  constructing FullSearchResults.

Existing direct callers (tests, internal lazy-init in
_get_playwright_downloader) keep the implicit-on contract via the
True ctor default; the disable-by-default decision happens at the
factory layer.

* fix(content-fetcher): tighten JS-rendering disable per review

Address two findings from the code review of 1cd1d116c:

1. Remove unreachable ``try/except NoSettingsContextError`` wrappers
   around ``get_bool_setting_from_snapshot`` in both
   ``_read_js_rendering_setting`` helpers. ``get_setting_from_snapshot``
   only raises that exception when ``default is None``; we always pass
   ``default=False``, so the except blocks were structurally
   unreachable and also added a silent-fallback layer that conflicts
   with the project's no-fallbacks rule.

2. Flip ``AutoHTMLDownloader.__init__`` default from
   ``enable_js_rendering=True`` to ``False``. This makes the
   constructor consistent with every other layer (``ContentFetcher``,
   ``fetch_and_extract``, ``batch_fetch_and_extract``,
   ``FullSearchResults``, and the user-facing setting itself), so
   future direct callers cannot accidentally re-enable JS rendering by
   omitting the kwarg. The two existing direct callers that do exercise
   the JS-rendering fallback (the SPA-trigger unit test and the
   extraction performance benchmark) now opt in explicitly.

* test(content-fetcher): add cross-layer integration coverage for #3826

The change spans 7 source files and introduces a new kwarg on 5
constructors / functions; per-module unit tests catch regressions
within a layer but not at the boundaries. Add integration tests
that pin down the wiring between layers:

* ``tests/research_library/downloaders/test_extraction_pipeline.py``
  -- ``TestFetchAndExtractJSRenderingPlumbing`` (5 tests). Asserts
  ``fetch_and_extract`` and ``batch_fetch_and_extract`` forward
  ``enable_js_rendering`` into the ``AutoHTMLDownloader`` constructor
  for the default (False), explicit-True, and explicit-False cases.

* ``tests/web_search_engines/engines/test_full_search.py``
  -- ``TestJSRenderingForwardingFromSettingsSnapshot`` (9 tests).
  Asserts ``FullSearchResults`` reads
  ``web.enable_javascript_rendering`` from its ``settings_snapshot``
  and forwards the boolean to every ``batch_fetch_and_extract`` call,
  on both code paths (``run()`` and ``_get_full_content()``). Also
  pins the new ``settings_snapshot`` ctor kwarg's default and storage.

* ``tests/web_search_engines/test_search_engine_base.py``
  -- ``TestInitFullSearchForwardsSettingsSnapshot`` (2 tests).
  Asserts ``BaseSearchEngine._init_full_search`` forwards
  ``self.settings_snapshot`` when constructing ``FullSearchResults``,
  closing the last unverified hop in the layer chain.

All 227 tests in the touched modules pass.

* fix(content-fetcher): pin _read_js_rendering_setting return to bool

CI mypy-analysis flagged
``web_search_engines/engines/full_search.py:23`` with
``Returning Any from function declared to return "bool"``
([no-any-return]). ``get_bool_setting_from_snapshot`` is internally
typed as ``Any`` because it routes through the generic snapshot
accessor, so returning its result directly leaks ``Any`` past a
``-> bool`` signature when ``warn_return_any`` is on.

Wrap the call in ``bool(...)`` to coerce to a definite ``bool``.
Apply the same change to the sibling helper in
``advanced_search_system/tools/fetch/__init__.py`` (which mypy does
not currently check because the package is in the ``ignore until
cleaned up`` override list, but the pattern is identical and the
fix keeps the two helpers consistent).

* test(content-fetcher): assert no browser spawn when JS rendering disabled (#3975)

Adds an end-to-end-style regression test for issue #3826 / PR #3971
that mocks the actual library symbols Crawl4AI and Playwright are
imported from -- ``crawl4ai.AsyncWebCrawler`` (line 112 lazy import in
``_fetch_with_crawl4ai``) and ``playwright.sync_api.sync_playwright``
(line 216 lazy import in ``_fetch_with_playwright``) -- and asserts
both have zero call count when ``AutoHTMLDownloader`` /
``ContentFetcher`` runs with ``enable_js_rendering=False``.

This complements the existing unit tests that patch
``_get_playwright_downloader`` to assert it isn't called. The new
tests are stronger: if anyone ever adds a code path that bypasses
``self.enable_js_rendering`` and reaches into Crawl4AI/Playwright via
a different import path, these tests catch it.

Five regression tests across two classes:

* ``test_no_browser_when_static_returns_short_content`` -- the bug
  trigger (JSON / no-content response that previously fell through to
  JS rendering).
* ``test_no_browser_when_spa_signals_present`` -- the SPA-signal
  branch.
* ``test_no_browser_when_static_returns_none`` -- the static-fetch
  None-return branch.
* ``test_no_browser_in_download_with_result_path`` -- pins the
  ``download_with_result()`` gate (the path the agent actually uses
  via ``ContentFetcher.fetch``).
* ``test_no_browser_when_content_fetcher_disabled`` -- same
  guarantee at the public ``ContentFetcher`` boundary.

Inverse-check confirmed locally: with ``enable_js_rendering=True`` and
the same mocks, ``AsyncWebCrawler`` is invoked (call_count = 1), so
the tests fail closed if the gate ever regresses.

* refactor(content-fetcher): share JS-rendering toggle helper, gate MCP download (#3974)

Builds on PR #3971 (disable JS rendering by default) with two
clean-up items the post-merge review surfaced:

1. The ``_read_js_rendering_setting`` helper was duplicated as a
   private function in ``advanced_search_system/tools/fetch/__init__.py``
   and ``web_search_engines/engines/full_search.py``. Importing
   underscore-prefixed names from another package is a smell. Extract
   it into ``utilities/js_rendering.py`` as a public
   ``read_js_rendering_setting`` and update both callsites to import
   from there.

2. ``mcp_strategy.py:1146`` (the MCP ``download_content`` tool) was
   constructing ``ContentFetcher(timeout=...)`` without forwarding the
   ``enable_js_rendering`` kwarg. It happened to work because the
   ``ContentFetcher`` ctor defaults to ``False``, but the path was
   fragile if the default ever changed and silently ignored the user's
   setting choice. Read the bool from ``self.settings_snapshot`` (which
   ``MCPSearchStrategy.__init__`` already accepts and stores) via the
   shared helper, and pass it explicitly.

New tests:
* ``tests/utilities/test_js_rendering.py`` — unit tests for the helper
  (default, true/false from snapshot, string coercion, return-type
  pinning).
* ``tests/mcp/test_mcp_strategy.py:TestDownloadContentJSRendering`` —
  three regression tests asserting the MCP download tool forwards the
  setting from the snapshot to ``ContentFetcher`` (off, on, default
  off when no snapshot).

* test(settings): teach integrity checker to recognise get_bool_setting_from_snapshot

PR #3974 introduced ``utilities/js_rendering.py`` which consumes
``web.enable_javascript_rendering`` via ``get_bool_setting_from_snapshot``.
The static-analysis regex in ``test_no_orphaned_settings`` already
handles ``get_setting_from_snapshot`` and ``get_bool_setting`` but had
no pattern for the ``get_bool_setting_from_snapshot`` combination, so
the new setting was incorrectly flagged as orphaned.

Add the missing pattern so the test recognises the consumer. The
setting is a real, used config knob and must not be added to
``KNOWN_UNUSED``.

* docs(content-fetcher): how to enable JS rendering + honest benchmark note

User-facing settings copy + code-level parameter docstrings were
silent about (a) how to actually enable JS rendering after the
default flip and (b) what evidence backs disabling-by-default. Add
both, with explicit honesty about the empirical limits.

Concretely:

* ``changelog.d/3826.bugfix.md`` (new) — towncrier fragment so the
  next release notes mention the default change. The
  ``recommend-release-notes`` pre-commit hook had nudged us about
  this; we hadn't addressed it.
* ``web.enable_javascript_rendering`` setting description in
  ``defaults/default_settings.json`` — adds the explicit "to enable:
  install Chromium via ``playwright install --with-deps chromium``"
  step, then a transparent caveat about the evidence: our
  Chromium-on vs Chromium-off benchmark comparisons were mostly
  accidental (some dev instances had it installed, routine Docker
  runs did not), and JS rendering did not measurably improve
  research quality. Most regular benchmark runs are on Docker
  without Chromium anyway.
* ``ContentFetcher.__init__`` docstring (``content_fetcher/fetcher.py``)
  — same caveat at the code level so future readers see the same
  framing.
* ``fetch_and_extract`` / ``batch_fetch_and_extract`` docstrings
  (``research_library/downloaders/extraction/pipeline.py``) — same
  note, since these are the two functions a non-Docker user might
  call directly.
* ``utilities/js_rendering.py`` module docstring — expanded with a
  "Why disabled by default" section covering both the #3826
  mechanism and the benchmark observation.
* ``tests/settings/golden_master_settings.json`` — regenerated.

No code-behaviour change. Tests still pass (781 across settings,
js_rendering helper, fetcher, and extraction pipeline). Honest
framing of "mostly accidental, limited, no measurable improvement"
intentionally avoids overclaiming.

* chore(content-fetcher): address AI-review recommendations

Two non-blocking nits from the AI code review on PR #3971:

1. Dead parameter ``mocker_patch`` on the
   ``TestInitFullSearchForwardsSettingsSnapshot._make_engine`` helper
   in ``tests/web_search_engines/test_search_engine_base.py`` — never
   referenced inside the function body. Drop the param and update the
   two call sites; the docstring now points to where mocking actually
   happens (in the caller's ``with patch(...)`` block).

2. Add an inline comment on
   ``AutoHTMLDownloader.__init__``'s ``enable_js_rendering=False``
   default explaining the rationale (production Docker image ships
   without Chromium, see issue #3826). The ``ContentFetcher`` docstring
   already covers this for the layer above, but a direct caller of
   ``AutoHTMLDownloader`` (e.g. a test or downstream library) would
   otherwise have to chase the explanation through commit history.

No behaviour change. 62 tests pass (test_search_engine_base.py +
test_playwright_html.py).
2026-05-16 14:20:14 +02:00
LearningCircuit
5d60f3d00e chore(labels): add 'code-ready' as a human-only signal label (#4068)
Introduces a new repository label, ``code-ready``, that communicates a
human reviewer's judgement that a PR's code changes look technically
ready — i.e. the implementation, tests, docs and review nits are all
addressed — while CI and an approving codeowner review may still be
outstanding. The label is meant to bridge the gap between "needs
review" and "auto-merge": a maintainer can apply it after walking the
diff to signal that the code side is good, even though merge is still
blocked on CI runs finishing or an approver clicking the button.

Critically, this label must be **applied manually only**, never by
automation. The motivation is judgement, not heuristics — a workflow
that flips it based on "all CI green" or "no unresolved comments"
would dilute the signal and undermine the human-in-the-loop intent.
The labels.yml entry is grouped under a new "Human-only signal
labels" section with an explicit comment saying so, and the label
description itself includes "Apply manually — never auto-applied" so
the rule is visible everywhere the label surface.

Verified before adding:
* No existing workflow (``pr-triage.yml``, ``label-fixed-in-dev.yml``,
  ``advanced-search-reminder.yml``, ``sync-main-to-dev.yml``,
  ``danger-zone-alert.yml``, ``compose-published-smoke.yml``) applies
  ``code-ready``. Each workflow's ``addLabels(...)`` calls use a
  closed set of specific label names — no heuristic ever resolves to
  ``code-ready``.
* No naming collision with existing labels (``code-ready`` is new;
  ``auto-merge``, ``needs-codeowner-review``, ``awaiting-codeowner``
  are distinct concepts).
* Label created live on GitHub via ``gh label create`` before this
  commit; this PR brings ``labels.yml`` into source-of-truth sync.

Color: ``006b75`` (teal) — distinct from the existing yellow/green
review-state palette so it reads as a separate axis from the
codeowner-review lifecycle.
2026-05-16 14:18:09 +02:00
LearningCircuit
2723331f67 chore(ci): cut workflow-status.md regen diff noise (#4066)
The auto-regenerated workflow-status.md on every version-bump PR
produced ~15 rows of churn that wasn't signal:

- Status emoji column flipped between  / · /  depending on which
  event last ran (e.g. backwards-compatibility flipped →· because
  the most recent run was a skipped workflow_call, not because it
  regressed). The live badge column to its right is the source of
  truth for current status anyway, and run history lives in GitHub
  Actions itself. Drop the column.
- Last activity buckets oscillated across this week / last week / 2
  weeks ago for healthy daily/weekly workflows. Coarsen to last 30
  days / 1-3 months ago / 3-6 months ago / long ago / never so a
  healthy workflow sits in one bucket indefinitely.

Net effect: regenerations in steady state produce zero diff. Real
signal (new stale/disabled workflows, aging past the 30d bucket)
still surfaces.
2026-05-16 13:20:21 +02:00
LearningCircuit
8597e429cc Improve UI tests + CI: artifact uploads, WebKit skip narrowing, settle-wait migrations (#4061)
* ci(responsive): restore artifact uploads and fix dead post-results gate

The Responsive UI workflow lost its per-viewport artifact uploads (the
explanatory comment around lines 206-209), so PR/release failures were
un-debuggable - no screenshots, no test output. The downstream
`post-results` job was also gated on `github.event_name == 'pull_request'`,
which can never be true because the workflow has no `pull_request` trigger;
the combined-report aggregator therefore never ran.

Restore the upload step using `if: always()` + `if-no-files-found: ignore`
(so server-startup failures still upload logs and quiet runs don't fail
the step) and rewrite the `post-results` gate to `if: always()`. Artifact
name matches the existing `ui-test-results-*` pattern expected by the
combined-report glob.

* test(playwright): narrow WebKit closed-context skip to webkit only (#4060)

The catch at all-pages-mobile.spec.js:372 was previously calling
`test.skip(true, ...)`, which skipped the test for every browser - so any
non-WebKit error path also silently bailed out of the mobile-nav overlap
assertion. Only Mobile Safari / WebKit is known to hit the
`Target page, context or browser has been closed` race, so gate the skip
on `browserName === 'webkit'`. Other browsers now re-throw and surface the
regression.

Also broaden the matched error message to include
`Execution context was destroyed`, the alternate wording the same upstream
race uses in newer Playwright versions.

Skip annotation references issue #4060 so the skip is grep-able and can be
removed when the underlying race is fixed or the DOM walk is restructured.

* test(ui): add waitForStable helper to auth_helper.js

Replaces ad-hoc `await delay(N)` sleeps used to "let the UI settle" after
an action. The helper waits for a selector to be visible, then waits for
its bounding box to stop changing across requestAnimationFrame ticks
(bounded to 3s in-page). The final `idleMs` pause is configurable.

JSDoc explicitly notes when NOT to use it: don't replace `delay()` calls
that exercise wall-clock behavior (e.g. a 10s timer the app is supposed to
respect). Those tests need real elapsed time, not a settle wait.

Exported as a sibling of `safeClick` to keep Puppeteer test imports tidy.

* test(ui): replace settle-delays with state-based waits in two puppeteer tests

`test_research_cancellation.js` had 7 hardcoded `await delay(...)` calls
and `test_form_validation_aria_ci.js` had 19. The vast majority were
"give the UI a moment to settle" pauses with no real signal attached, so
they slowed CI and quietly hid races whenever the runner was a beat slower
than the chosen delay.

For each call:
- post-`navigateTo` 500ms sleeps -> `waitForSelector('#query', { visible: true })`
- post-validation-trigger sleeps -> `waitForFunction` polling the
  `ldr-field-invalid` class to appear (or clear, when the test expects
  validation to pass)
- post-focus 100ms -> `waitForFunction(() => document.activeElement?.id === 'query')`
- post-cancel-click sleeps -> `waitForFunction` polling for `cancel|stop|suspend`
  to appear in the status text
- post-typing 200ms -> `waitForFunction` polling for the typed value to land

The one delay we kept: the explicit 10-second wait in the mid-stage
cancellation test (`test_research_cancellation.js`), which deliberately
exercises elapsed-time behavior of the research progress flow. That is
not a settle wait and must stay wall-clock.

Polling waits all use `.catch(() => {})` to preserve existing
behavior when a selector or state never appears (the assertions further
down handle the failure case more informatively than a hung wait would).

* docs(pr-template): document label-gated CI workflows

Several heavy E2E workflows are label-gated and silently no-op on PRs
without the right label - new contributors had no way to know. Add a "CI
test coverage" section to the PR template enumerating each gated workflow
and the label that triggers it.

No CI behavior change; documentation only.

* test(form-validation): make waitForQueryReady detect validator attachment

Local smoke-test (9 tests, ran against `scripts/dev/restart_server.sh`)
exposed two latent races that the prior `await delay(500)` had been
quietly hiding:

1. `waitForQueryReady` returned as soon as `#query` was visible, but the
   FormValidator class is registered against the field a tick later
   (research.js setupEventListeners). Waiting for the `.ldr-field-error`
   sibling that addValidation() inserts is the actual signal that the
   validator is wired and the submit handler will take the early-return
   path on an empty query.

2. `noLoadingUiOnEmptySubmit` ran after `errorClearsOnValidSubmit`, which
   typed a real query and triggered a real submit (the fetch fails but
   creates `.ldr-loading-overlay` first). `navigateTo` skipped the
   re-navigation because we were already on `/`, so the stale overlay
   carried over. Force a real `page.goto` for this test so it asserts
   about a fresh page, not the leftover state of the previous test.

After the fix the suite passes 9/9 in ~1s (vs ~4.5s with the old delays).

* chore(labels): rewrite test-trigger label descriptions for AI reviewer auto-apply

The Friendly AI code reviewer (.github/workflows/ai-code-reviewer.yml)
auto-applies labels based on the labels' descriptions in the repo. The
existing test:puppeteer / test:e2e / ldr_research / ldr_research_static
descriptions were passive ("Triggers Puppeteer E2E tests on this PR"),
which doesn't guide the reviewer on *when* to apply them.

Rewrite them in the same imperative, bias-toward-action style used by
benchmark-needed ("Apply if a change risks degrading performance — when
in doubt, add it. Run compare_configurations()"):

- test:puppeteer + test:e2e — apply for any PR touching the web stack
- ldr_research / ldr_research_static — apply for substantive code/arch
  changes, with the static variant biased even more toward "run it"
  since it uses the cheaper model

Also add the test:* labels to labels.yml so they become version-controlled
(previously they existed only on GitHub, created out-of-band). label-sync
is additive and will overwrite the GitHub descriptions on next main push.
2026-05-16 13:17:28 +02:00
LearningCircuit
ec91c5c716 fix(pdf): render CJK characters in exported PDFs (#4055) (#4058)
* fix(pdf): render CJK characters in exported PDFs (#4055)

The PDF stylesheet hard-coded a Latin-only font stack, so WeasyPrint
silently dropped Chinese/Japanese/Korean glyphs from downloads even
when they rendered fine in the HTML view. Add Noto Sans CJK /
Microsoft YaHei / SimSun fallbacks for both body and monospace
families, and install fonts-noto-cjk in the Docker runtime stage so
the slim base image actually has glyph coverage.

Non-Docker installs still need a CJK font package on the host.

* fix(pdf): broaden CJK font fallbacks + document host requirement

Extend the PDF CSS font stack to cover macOS (PingFang, Hiragino,
Apple SD Gothic Neo) and additional Windows families (Microsoft
JhengHei, Yu Gothic, Malgun Gothic), so pip installs on those
platforms render CJK without any user action.

Document the per-distro CJK font install command in install-pip.md
and add a new FAQ entry. Linux pip/server hosts still need
fonts-noto-cjk installed manually — there is no in-code way to fix
that without bundling ~20 MB of fonts into the wheel.

* test(pdf): assert CJK glyph embedding end-to-end (#4055)

Round-trip CJK text through markdown → PDF → pypdf extract_text so CI
fails if fonts-noto-cjk is ever removed from the Docker runtime image.
The pytest-tests job runs inside that image, so the test sees the
installed fonts; bare hosts without CJK fonts skip the assertion via
an fc-list gate.

Does not catch CSS-fallback-stack regressions on its own: fontconfig
auto-substitutes a CJK family on Linux even for a Latin-only stack.
The CSS fallbacks still matter on Windows/macOS, which CI does not
exercise — documented in the test docstring.
2026-05-16 13:12:28 +02:00
LearningCircuit
41ee83c54c test(security): SSRF edge-case coverage and weak-test cleanup (#4062)
* test(quality): strengthen weak SSRF tests

Several existing SSRF tests asserted on mock call_count / call_args or
range-membership tautologies rather than the validator's real behavior.
A regression in the underlying production code could pass these tests.

Rewrites (no production code changes):

- test_ssrf_redirect_bypass.py: replace ``test_each_hop_validated``
  in both ``safe_get`` and ``safe_post`` variants. The previous version
  asserted only on ``mock_validate_url.call_count == 3``; the new
  version exercises the real validator with a third hop pointing at
  ``http://10.0.0.5/internal`` (a private IP literal) and asserts that
  ``ValueError`` is raised before the third request is fetched.

- test_ssrf_redirect_bypass.py: replace ``test_send_respects_*`` for
  ``allow_localhost`` and ``allow_private_ips``. Previously these
  patched ``validate_url`` to return True and verified the kwargs; the
  rewrites use the real validator with IP literals so a regression in
  is_ip_blocked's flag handling would surface. Adds
  ``test_send_blocks_loopback_without_allow_localhost`` to prove the
  flag is actually gating behavior, not just being passed through.

- test_ssrf_debug_hardening.py: rewrite three of four
  ``TestFullSearchSSRFValidation`` tests to drop the ``validate_url``
  mock. Real validator blocks the metadata-IP literal
  (``169.254.169.254``) directly; the public hostname uses a DNS mock.

- test_ssrf_validator_high_value.py: rewrite ``TestGetSafeUrl``
  pass-through and unsafe-default tests to use the real validator
  (DNS mock for public host; literal RFC1918 IP for unsafe case).

- test_ssrf_validator_behavior.py: replace ``TestBlockedIPRanges``
  range-containment tautologies with ``TestPrivateIpRangesBehavior``,
  a single parametrized test that asserts ``is_ip_blocked`` returns
  True for an interior address of every entry in ``PRIVATE_IP_RANGES``
  (18 cases, covering all 15 ranges plus their wraps). Removing any
  entry from ``ip_ranges.py`` is now detected by a specific failure.

- test_ssrf_validator_extended.py: remove ``test_is_frozenset`` — a
  type-only check on ``ALWAYS_BLOCKED_METADATA_IPS``. The canonical
  exact-membership test already lives in
  ``test_ssrf_validator_high_value.py::TestConstants``.

Each rewrite was mutation-checked: e.g. removing per-hop validation
from ``safe_requests.py`` causes the redirect tests to fail with
``StopIteration`` (third hop attempted), and removing a range entry
from ``ip_ranges.py`` flips the corresponding ``TestPrivateIpRangesBehavior``
case to failure with the range label in the assertion message.

Net: 5 files modified, +130 lines / -68 lines, 565 SSRF tests pass.

* test(security): add SSRF validator edge-case coverage

Adds eight new test classes pinning previously-uncovered branches of
``src/local_deep_research/security/ssrf_validator.py`` and
``src/local_deep_research/security/ip_ranges.py``. Each class documents
the production line(s) it exercises and the mutation it would catch.

- TestUnspecifiedIPv4Blocked — ``validate_url`` end-to-end coverage for
  ``0.0.0.0/8`` (ip_ranges.py:24). Existing tests covered only
  ``is_ip_blocked``; this pins the full parser → IP-literal → block
  path. Parametrized across three interior addresses.

- TestDnsResolutionNonGaierror — the generic ``except Exception``
  handler at ssrf_validator.py:310-312 fires when ``getaddrinfo``
  raises anything that is not a ``gaierror`` (PermissionError from a
  restricted environment, OSError, RuntimeError). Asserts the
  ``"Error during hostname resolution"`` log line and a False return.

- TestRfcForbiddenControlChars — RFC_FORBIDDEN_URL_CHARS_RE
  (ssrf_validator.py:63) contains ``\\x00-\\x1f\\x7f``. Backslash and
  ``\\x00`` were already heavily tested; this parametrizes the run
  ends ``\\x01``, ``\\x1f``, and ``\\x7f`` (DEL).

- TestAlternateIpHexForm — single-DWORD hex (``0x7f000001``) is not
  parseable by ``ipaddress.ip_address``, so the validator falls
  through to DNS via the ``except ValueError: pass`` at
  ssrf_validator.py:269-271. Mocked DNS returns the canonical
  ``127.0.0.1``, which the post-DNS check rejects.

- TestPortEdgeCases — ``:65536`` exercises the urllib3
  ``LocationParseError`` branch; ``:0`` parses but the host
  ``127.0.0.1`` is still IP-blocked.

- TestMultipleAtSignsContract — locks in that urllib3's
  ``parse_url`` resolves ``http://user:pass@127.0.0.1@1.1.1.1/`` to
  host ``1.1.1.1`` per RFC 3986 last-``@`` rule, and that the
  validator agrees. If urllib3 ever changes this, the
  parser-differential defense at ssrf_validator.py:224-228 needs
  re-validation; this test surfaces the drift.

- TestUserinfoContainsIpShape — documents that
  ``http://127.0.0.1@evil.com/`` is NOT a bypass: urllib3 reports
  host ``evil.com``, requests connects there, the ``127.0.0.1`` is
  userinfo only. Pins the urllib3 contract.

- TestIpv6ZoneIdBlocked — ``[fe80::1%eth0]`` and the
  percent-encoded ``[fe80::1%25eth0]`` form. Python 3.9+ accepts
  zone IDs in ``ipaddress.ip_address`` so the validator catches
  this directly via ``fe80::/10`` in PRIVATE_IP_RANGES (ip_ranges.py:22)
  rather than via DNS gaierror.

Also removes the previous ``TestDocumentation`` class which contained
a single ``@pytest.mark.skip`` placeholder with no assertions; the
security-model documentation lives in the module docstring.

Mutation checks performed during development:

- Remove ``0.0.0.0/8`` from PRIVATE_IP_RANGES → 4 tests fail (the 3
  TestUnspecifiedIPv4Blocked cases plus the 0.0.0.5 case in the
  parametrized TestPrivateIpRangesBehavior added in the preceding
  commit). Restored.
- Narrow RFC_FORBIDDEN_URL_CHARS_RE to drop ``\\x7f`` → 1 test fails
  (the ``\\x7f`` parametrize case). Restored.
- Remove per-hop validation from ``safe_requests.py`` → the
  ``test_each_hop_validated`` rewrites from the preceding commit
  fail. Restored.

Net: +16 new parametrized test cases across 8 classes; 565 SSRF
tests pass; no production code changes.
2026-05-16 10:02:42 +02:00
LearningCircuit
d88ba602ca test(e2e): regression for hidden context_window blocking Start Research (#3909) (#4059)
PR #4051 fixed the bug where the context_window number input had a
step="512" HTML5 constraint while living in a display:none container.
Any stored value not on the 512-grid (e.g. the reporter's 25000) failed
validation; because the field cannot be focused while hidden, the
browser silently aborted submit and the Start Research button appeared
to do nothing.

Add a Puppeteer test that pins the behavior so the constraint can't
silently come back. The test:

1. Loads the research page (cloud-provider default keeps the
   context_window container hidden).
2. Sets #context_window.value = "25000" — the exact stored value from
   the reporter's bug.
3. Asserts the container is hidden (precondition for the regression).
4. Asserts research-form.checkValidity() returns true.

No actual research is submitted — checkValidity() exercises the same
HTML5 validation path that drives the silent-abort bug, without
consuming LLM credits or interacting with the rest of the e2e flow.
If step="512" (or any other constraint that 25000 violates) is ever
re-added to the input, the test fails with a clear message pointing
back to PR #4051.
2026-05-16 10:00:44 +02:00
Rin
8de5d971d6 refactor(settings): use specialized exception classes for env settings (#3838)
Improved error observability and alignment with TRY003 standards by replacing generic ValueError with specialized exception classes. Updated type hints to use Path | str and Sequence as suggested in review.

Co-authored-by: Daniel Petti <djpetti@gmail.com>
2026-05-15 23:56:21 +00:00
Rahul
47d370c45d fix(notifications): allow signal:// Apprise scheme (#4006) (#4056)
Signal notification URLs (signal://host:port/from/to) are rejected by
NotificationURLValidator because `signal` is missing from
ALLOWED_SCHEMES. The user-facing error is "Test failed: Invalid
notification service URL", which is the Unsupported-protocol path at
notification_validator.py:246.

Apprise (the library LDR delegates to) ships a Signal notification
plugin that targets a signal-api-rest container. For non-http schemes
the validator intentionally skips the private-IP host check
(notification_validator.py:270) and lets Apprise do its own URL
parsing, so adding signal does not weaken the SSRF posture — the
LAN-host pattern in the bug report (signal://192.168.50.20:8739/…)
round-trips to Apprise unchanged.

Adds two regression tests:
- test_apprise_signal_url_accepted: end-to-end validate_service_url
  against a LAN-IP Signal URL.
- TestClassConstants gets one extra assert that "signal" is in
  ALLOWED_SCHEMES, keeping the contract list aligned with the
  other Apprise schemes the file exercises.

Closes #4006

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 19:07:13 -04:00
LearningCircuit
e77b48c813 test(e2e): tolerate brief LLM output in research export test (#4053)
The 'should export and display research output' test contains an
explicit narrative (test_deep_functionality.js:518-540) describing how
the CI release pipeline's small free-tier LLM (Gemini 2.5 Flash Lite
via OpenRouter) occasionally returns very brief, non-markdown output
even when the research workflow completes end-to-end — and that this
should be treated as a transient upstream content-quality flake, not a
code regression. That branch logs a warning instead of failing.

But the trailing assertion at the end of the same test still hard-checks
'expect(resultContent.length).to.be.greaterThan(100)', which directly
contradicts the documented tolerance — an 89-char LLM response (real
example from CI run #2385) makes the assertion fail despite the workflow
mechanics having been validated.

Drop the length assertion and keep only 'expect(resultContent).to.not.be
.null', which still catches the real regression (results page didn't
render) without flaking on upstream LLM brevity.
2026-05-15 01:46:14 +02:00
LearningCircuit
35290b2d13 fix(research-form): relax context_window step so Start Research submits (#4051)
The context_window input has min=512 max=131072 step=512 and lives in a
display:none container that is only revealed for local providers. Any
stored value not aligned to the 512-step grid (e.g. 25000) fails HTML5
validation; because the field is not focusable while hidden, the browser
silently aborts submission with no log line — the Start Research button
appears to do nothing.

Lower the step to 1 so any in-range integer is accepted. min/max still
bound the value and the saved setting is unchanged.

Fixes #3909
2026-05-15 01:30:55 +02:00
LearningCircuit
1ab65609db ci(release): drop credential persistence on cleanup-changelog checkout (#4050)
The `Checkout the release commit` step in the `cleanup-changelog` job
defaulted to `persist-credentials: true`, leaving the job's GITHUB_TOKEN
in `.git/config` for the duration of the run. If any later step in this
job reads `.git/config` (artifact upload, third-party action that
prints/dumps the repo state, etc.), the token leaks. Closes the only
open `zizmor/artipacked` finding (code-scanning alert #4655).

No functional impact: the only step that needs to push is
`peter-evans/create-pull-request`, which already takes an explicit
`token:` input and does not rely on the persisted git credential helper.

Also dismissed code-scanning alert #7763 (CVE-2026-3298) via the GitHub
API — that CVE is Windows-only per PSF advisory; this image is Linux,
which Grype's package-version matcher does not account for. Alert #7764
(CVE-2026-7210) is left open as a tracking signal until Python 3.14.6
ships upstream (current latest is 3.14.5; no patched image exists yet).
2026-05-15 01:20:17 +02:00
LearningCircuit
a2f7f6ead6 fix(ci): drop environment: ci from reusable workflow (#4049)
The `environment: ci` declaration on the research job has no functional
value for LDR — the `ci` Environment has zero protection rules and zero
environment-scoped secrets (verified via gh api). All required secrets
(OPENROUTER_API_KEY, SERPER_API_KEY) are repo-level.

The decorative env attachment becomes a problem for any external repo
that calls this reusable workflow: GitHub silently auto-creates an empty
`ci` Environment in the caller's repo, polluting their environments
namespace.

Dynamic environment via expression (e.g. `environment: ${{ inputs.env || '' }}`)
isn't a viable alternative — `actions/runner` Issue #2610 documents that
expression-in-environment doesn't reliably evaluate input context, and an
empty-string value still auto-creates an empty-named environment.

Simplest correct fix is to delete the line. LDR's own callers
(issue-research.yml, e2e-research-test.yml) keep working unchanged
because they never depended on env-attached functionality. External
callers no longer get the env-pollution side effect.

This unblocks a follow-up `ldr-automations` toolkit repo that will
expose meta-reusable workflows wrapping this one for other projects.
2026-05-15 01:11:15 +02:00
github-actions[bot]
d6d9ceffac chore: auto-bump version to 1.6.11 (#3961) 2026-05-14 23:07:03 +02:00
LearningCircuit
3d0b7bb5f9 review: hoist asyncio+threading imports to module level + Wave 7 doc (#4048)
Addresses the AI Code Review nit on #4047: ``import threading`` (and
the sibling ``import asyncio``) lived inside the ``_close_base_llm``
function body. There's no circular-import or optional-dependency
reason to defer them; moving them to the top of the module improves
readability and static analysis.

Also extends ``docs/developing/resource-cleanup.md`` with a Wave 7
entry documenting:

- The in-running-loop ``aclose`` skip bug (this PR's fix).
- The healthcheck ``pidfd`` leak (Dockerfile change in the same PR).
- The three gaps the broader audit during this PR surfaced as
  follow-up rather than in-scope work: ``OllamaEmbeddings`` httpx (same
  FD class as ChatOllama, no close path in langchain wrappers),
  ``auth_db`` / ``journal_quality`` engines escaping
  ``shutdown_databases``, and three RAG SSE endpoints constructing
  ``LibraryRAGService`` before the generator without a ``finally``
  close.

Also captures the negative results from the audit (non-Ollama
providers safe via shared lru_cache, no subprocess pidfd risk, no
raw event-loop creation, all ``open()`` calls inside ``with``) so a
future contributor reading the history sees what was checked and
ruled out.
2026-05-14 22:58:57 +02:00
LearningCircuit
04de8597ec fix(llm,docker): close ChatOllama async httpx client when called from a running loop + healthcheck timeout (#4047)
* fix(llm): close ChatOllama async httpx client even when called from a running loop

Regression of #3816 with #3855's coverage gap. ``_close_base_llm`` used to
skip the async-client close when ``asyncio.get_running_loop()`` succeeded
and document that the loop owner would close instead — but no loop-owner
cleanup code exists in the project, so the inner ``httpx.AsyncClient``
(and its ``epoll_create`` FD) was silently abandoned. Long-running
deployments accumulated ``anon_inode:[eventpoll]`` FDs until the process
hit its ``ulimit -n``.

The skip path fires under the default ``langgraph-agent`` strategy too:
LangGraph dispatches some tool steps via asyncio internally, so close
calls reached from a sync ``finally`` can still land inside a live loop.

Cleanup now runs in a brief daemon thread that owns its own loop, so
``asyncio.run(aclose())`` works regardless of the caller's loop state. A
bounded 5-second ``join`` keeps it from blocking shutdown when the
Ollama server is unresponsive; if the join times out, ``_ldr_closed`` is
left unset so a later call retries the close, and a WARNING surfaces in
logs so the leak is visible instead of silent.

Adds:
- A regression unit test (``test_closes_async_inside_running_loop_via_thread``)
  that calls ``_close_base_llm`` from inside an ``asyncio.run`` driver and
  asserts ``aclose`` actually ran.
- An FD-growth guard (``test_no_fd_growth_when_closed_inside_running_loop``)
  modeled on the existing ``test_no_fd_growth_across_repeated_close_cycles``
  but exercising the in-loop close path.
- An idempotency test and a timeout test for the new thread path.

* fix(docker): add timeout to healthcheck urlopen so failed checks don't leak children

``urllib.request.urlopen('http://localhost:5000/api/v1/health')`` had no
``timeout=`` argument, so when the app slowed down (FD exhaustion, slow
DB checkpoint, anything else) the call hung forever. Docker's
``--timeout=10s`` only SIGKILLs the ``sh -c`` wrapper; the python child
got reparented to PID 1 and kept hanging on the urlopen, each one
contributing a ``pidfd`` and a TCP socket against the app's listen
socket. On a stuck container we observed 21 live + 113 zombie
healthcheck pythons and 64 ``pidfd`` FDs on PID 1.

``timeout=8`` lets urlopen return/raise inside Docker's 10s budget so
the child exits cleanly and gets reaped.

Pairs with the eventpoll-FD fix in ``_close_base_llm``: that one
removed the dominant 91% of the leak, this one removes the 6%
remainder and the zombie pile-up.

Adds a towncrier fragment covering both fixes.
2026-05-14 22:50:40 +02:00
LearningCircuit
1651587d9c chore(alembic-runner): drop stale isolation_level="IMMEDIATE" references (#4039)
Two docstring/comment references in `alembic_runner.py` cite SQLCipher's
`isolation_level="IMMEDIATE"` as the reason the head short-circuit
matters. Production engines actually use `isolation_level=""` (deferred):
- `src/local_deep_research/database/encrypted_db.py:378` (user-DB engine)
- `src/local_deep_research/database/encrypted_db.py:450` (encrypted engine)

The `IMMEDIATE` default in `_make_sqlcipher_connection` (line 280) is
the helper-function default, but the production callers override it
to "" to avoid login-path contention.

The short-circuit is still load-bearing — `engine.begin()` opens a
write transaction regardless of isolation level, and SQLite takes a
RESERVED lock as soon as the first DML lands inside. Just the cited
mechanism was wrong. Rewords both comments to reflect the actual
lock-acquisition rule (RESERVED on first DML), independent of the
driver isolation_level.

Pure documentation change — no behavior delta. Existing short-circuit
tests still pass.
2026-05-14 17:29:35 +02:00
LearningCircuit
a6287a4362 fix(security): pin towncrier to exact version and bump Python to 3.14.5 (#4046)
* fix(security): resolve Scorecard pin alerts and bump Python to 3.14.5

- Pin `pip install towncrier` to a single version with `--hash` (both
  occurrences in release.yml), resolving Scorecard Pinned-Dependencies
  alerts #7761 and #7762.
- Bump the Dockerfile base image from python:3.14.4-slim to 3.14.5-slim
  (with new pinned manifest digest). 3.14.5 bundles libexpat 2.8.0
  (gh-149017), which is required to mitigate CVE-2026-7210 — Grype
  alert #7760.

* chore(release): drop hash-pins on towncrier, keep exact version pin

Per review feedback: hash-pinning a build-time CLI like towncrier adds
maintenance burden without meaningful supply-chain benefit. The rest of
this repo already uses exact-version pins (`pdm==2.26.2`, `pyyaml==6.0.3`,
etc.) which Scorecard's PinnedDependenciesID rule accepts — the original
alerts fired only because `~=24.8` is a fuzzy version range.
2026-05-14 17:24:19 +02:00
LearningCircuit
f664221ce4 chore(observability): surface WAL-dispose failures + document LDR_APP_DEBUG sensitivity (#4042)
Two small follow-ups from the #3976 investigation.

connection_cleanup.py: bump dispose-failure log from debug to warning.
The 30-min periodic pool dispose at web/auth/connection_cleanup.py:154-171
is the workaround for ADR-0004's SQLCipher + WAL handle leak. Pre-fix,
_checkpoint_wal/engine.dispose() failures were swallowed at logger.debug,
hiding silent drift. Now surfaces at WARNING with the exception TYPE NAME
only (matches the _report_silent_exception pattern in utilities/log_utils.py:146-194,
which deliberately drops the exception value to avoid leaking sensitive locals
through the sensitive-logging hook).

New test test_dispose_failures_surface_as_warnings locks in:
- the warning fires and names the user + exception type
- the exception's message text does NOT leak

docs/CONFIGURATION.md: document that LDR_APP_DEBUG=true also enables
Loguru diagnose=True on every sink, which materialises local-variable
values into exception traces. Those traces can include credentials,
decrypted user content, and other sensitive locals. Documentation-only.

Refs: #3976
2026-05-14 15:26:33 +02:00
github-actions[bot]
f928f4cc5c 🤖 Update dependencies (#4043)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-14 14:04:25 +02:00
Ishitta
2808f0fa9d feat(benchmarks): add statistical functions module (#4029)
* feat(benchmarks): add statistical functions module for benchmark evaluation

* test(benchmarks): add unit tests for statistics module

* fix(benchmarks): add input validation to statistical functions

* feat(benchmarks): wire Wilson CI into metrics, reports, and live progress
2026-05-14 09:00:04 +00:00
LearningCircuit
074285a26d fix(release): enrich AI release notes + render changelog in release flow (#4035)
* fix(release): enrich AI release notes + render changelog in release flow

Fixes the v1.6.10 release notes degradation where:
  1. docs/release_notes/1.6.10.md was never created (no automation rendered
     changelog.d/ fragments before/at release time)
  2. AI summary call returned 2xx but empty content with finish_reason=length

create-release job now:
  - Sparse-checks-out changelog.d/ + pyproject.toml, installs towncrier
    (no PDM needed — towncrier reads pyproject directly), renders
    docs/release_notes/<version>.md before composing the release body.
    Guards against an empty fragment directory.
  - Fetches every merged PR's title + body in a single GraphQL round-trip
    and feeds them to the model.
  - Fetches the full diff between the previous /releases/latest tag and
    the new tag via the compare API, filters lockfiles/generated docs/
    SBOM/static assets/binary patches, caps at 700k chars, strips NUL
    bytes before jq --rawfile.
  - Bumps AI_MAX_TOKENS default 4000 -> 64000 (matches the AI code
    reviewer's working budget). Adds AI_REASONING_MAX_TOKENS=16000 so
    Kimi K2 Thinking cannot burn the entire output budget on reasoning
    tokens — the root cause of v1.6.10's empty .content.
  - Adds .reasoning to the response-parsing fallback chain after
    .content and .reasoning_content. OpenRouter normalizes Moonshot's
    thinking trace to .reasoning (not .reasoning_content), which is why
    v1.6.10's diagnostic showed message keys "content, reasoning,
    reasoning_details" with no usable extraction path.
  - Enforces a 750k char overall prompt cap so PR descriptions + diff
    can't blow Kimi's 262k token context window.
  - Truncates the final release body to 124,400 chars to stay under
    GitHub's documented 125k release-body limit (HTTP 422 otherwise;
    gh CLI does not pre-validate).
  - Rewrites the SUMMARY_PROMPT to ask for a helpful narrative (not a
    TL;DR), with length sized to the material.

New cleanup-changelog job opens a PR on main with the consumed fragments
+ rendered release-notes file, since the create-release runner is
throwaway. Branch protection on main allows the PR (0 required reviews,
0 required checks).

* chore(release): persist 1.6.10 changelog render + clear consumed fragments

The v1.6.10 release shipped without docs/release_notes/1.6.10.md because
no automation rendered changelog.d/ fragments at release time (see
release.yml change in this PR for the fix going forward). Persists the
render now so 1.6.11's release does not re-consume the same fragments.

Renders the v1.6.10 release_notes file from the 30 fragments that were
in changelog.d/ at v1.6.10 cut time, and removes those fragments from
changelog.d/. The rendered content also backs the v1.6.10 GitHub
release body update.

* fix(release): address AI review findings (UTF-8, race, GraphQL cap)

- UTF-8 character-aware truncation. Replace `head -c` (byte-oriented,
  splits multi-byte UTF-8 mid-sequence) with Python-based character
  truncation for the diff (700k), prompt (750k), and release body
  (124,400) caps. Matters because towncrier renders emoji section
  headers (💥/🔒//🐛) that appear in diffs of docs/release_notes/;
  mid-emoji splits produce invalid UTF-8 that jq --rawfile then
  refuses to encode and the GitHub Release API rejects with HTTP 422.

- cleanup-changelog race fix. Pin checkout to ${{ github.sha }}
  instead of `ref: main`. If a PR with new fragments merged into main
  between create-release and cleanup-changelog, `ref: main` would
  consume those new fragments into THIS release's docs/release_notes
  file and delete them prematurely — stealing them from the next
  release. github.sha is the commit the workflow ran against, so the
  set of fragments matches what create-release rendered.

- GraphQL query node-count cap. Limit PR-description batch to 100 PRs
  per query and log a warning if a release exceeds that (LDR's typical
  release is ~20-30 PRs, well under). Unbounded fan-out could trip
  GitHub's GraphQL complexity ceiling on a huge release.

- Compare API 300-file warning. Log when .files[] hits the 300-file
  boundary so a future release's missing-file diff can be diagnosed
  quickly without rerunning. The cap is a documented GitHub limit.

* fix(release): address review2 — PR cap, trap leak, base pin, prompt clarity

- Raise PR-fetch cap 100 → 200. v1.6.10 had 144 unique PRs (LDR's
  dependency-bump traffic is heavy); the previous 100 cap would have
  silently dropped ~30% of PR descriptions from the AI prompt. The
  750k-char overall prompt cap still protects context window.

- Hoist COMPARE_JSON mktemp above the trap registration so the temp
  file is cleaned up even if jq throws under set -e between mktemp
  and the manual rm. ${DIFF_FILE}.clean (the NUL-strip staging path)
  also added to the trap; rm -f tolerates the missing-file case.

- Pin base: main on peter-evans/create-pull-request. On tag-triggered
  runs github.sha may not sit on main HEAD, and the action's
  default-branch resolution could pick a non-main base. We always
  want the cleanup PR to target main.

- Clarify SUMMARY_PROMPT section markers. The prior text said inputs
  are "separated by `----- SECTION -----` markers" using SECTION as a
  placeholder; a literal-minded model could look for that exact
  string and find none. Now lists the actual marker forms explicitly.

- Add PREV_TAG == RELEASE_TAG guard. On a workflow re-run after the
  release exists, /releases/latest returns the just-created tag,
  making the diff empty. Falls back to the second-most-recent stable
  release.

* fix(release): jq --arg for re-run guard + surface jq errors + doc updates

Workflow fixes from a final pass:

- Re-run guard now passes RELEASE_TAG to jq via `--arg rel` instead of
  shell-interpolating it into the program text. RELEASE_TAG is already
  validated as bare semver upstream so this is defense-in-depth, but
  --arg keeps shell quoting and jq quoting fully separated regardless
  of what RELEASE_TAG ever ends up containing.

- Compare-API jq pipeline no longer swallows stderr or masks the exit
  code. Previously `jq ... 2>/dev/null || true` would silently produce
  an empty diff and a "Diff size: 0 bytes" log line on any jq failure,
  giving a maintainer no actionable signal. Now an explicit if-not
  check logs a WARNING with jq's stderr intact and ensures the diff
  file is empty.

Doc updates for the new release flow:

- changelog.d/README.md: drop the obsolete "maintainer runs `pdm run
  towncrier build`" instructions; describe the automated render +
  follow-up cleanup PR. Keep the local --draft / --keep preview tips
  for fragment iteration.

- docs/RELEASE_GUIDE.md: rewrite the maintainer flow (steps 1-3 of the
  old "Render + bump + commit both" sequence are obsolete — the
  workflow handles rendering now). Add the cleanup PR merge as a final
  checklist item. Update the body composition description from "AI
  TL;DR" to AI narrative with diff + PR-body inputs.

* style(release): fix comment indent typo from prior edit
2026-05-14 10:17:31 +02:00
LearningCircuit
e6432db8bd fix(embeddings): correct OpenAIEmbeddingsProvider.requires_api_key to False (#4036)
Follow-up to #4026. After that PR the provider supports keyless
OpenAI-compatible local servers (LM Studio, vLLM, llama.cpp) — an
API key is needed only for the OpenAI cloud path. The class-level
``requires_api_key = True`` was therefore stale; any future UI consumer
that gates an "API key required" badge on it would mislead users on
local servers.

Drop the explicit override so the attribute inherits ``False`` from
BaseEmbeddingProvider. The cloud-needs-key rule is still enforced at
runtime in ``is_available`` and ``create_embeddings`` when no base_url
is configured, so nothing about the active behavior changes.

No behavior change for current callers — there is no embedding-side
consumer of this attribute today; the fix is to make a latent semantic
inaccuracy not bite the first future consumer.
2026-05-14 09:08:26 +02:00
kwhyte7
df8657adb5 Feat/deepseek provider (#3432)
* feat: deepseek provider

* fix: address review comments on deepseek provider

- Fix typo in import (loggere -> removed unused import)
- Fix typo in model name (deepseek-reasonser -> deepseek-reasoner)
- Fix base URL (api.deepseek.com/api/v1 -> api.deepseek.com/v1)
- Remove standalone functions; auto-discovery handles registration
- Add requires_auth_for_models to match other cloud providers
- Add deepseek_settings.json for the llm.deepseek.api_key default setting

---------

Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
Co-authored-by: Daniel Petti <djpetti@gmail.com>
2026-05-13 23:53:35 +00:00
qWait
9ad3910452 fix(search): keep cross-engine filter fallback within evaluated context (#3866)
* fix(search): keep cross-engine filter fallback within evaluated context

* style(search): apply ruff format for context fallback fix
2026-05-13 23:17:36 +00:00
LearningCircuit
2ca4f02e6a docs(developing): add prerelease Docker image testing section (#4034)
Document the two Docker Hub tags published by prerelease-docker.yml
(the immutable prerelease-vX.Y.Z-<sha> tag and the floating :prerelease
tag added in #4005) and provide a copy-pasteable docker-compose service
that runs the RC alongside production on port 5001 with isolated
volumes, so a broken migration in the candidate cannot damage a
production SQLCipher database.
2026-05-14 00:24:00 +02:00
Leoy
243d2b2a7f fix(embeddings): allow OpenAI-compatible local endpoints (#3883) (#4026)
* fix(embeddings): allow OpenAI-compatible local endpoints (#3883)

Adds the OPENAI member to the EmbeddingProvider enum, registers the
embeddings.openai.* settings so the UI can surface the configuration
form, and widens the provider's availability + create_embeddings path
to accept a base_url-only configuration (LM Studio, vLLM, llama.cpp).
The model-list lookup now routes through the configured base_url so
discovery hits the local server instead of api.openai.com.

No DB migration is required: the embedding_model_type column is
declared with values_callable, so SQLite renders it as plain VARCHAR
with no CHECK constraint — adding the OPENAI enum value is a pure
Python-side change.

Fixes #3883

* test(settings): regenerate golden master for new embeddings.openai.* keys

Picks up the four embeddings.openai.* keys (api_key, base_url, model,
dimensions) registered by settings_openai_embeddings.json in this PR.

Generated via scripts/dev/regenerate_golden_master.py — no manual edits.

* fix(embeddings): annotate openai params dict for mypy invariance

The params dict at openai.py:121 holds heterogeneous values:
str for model/api_key/base_url, int for dimensions. mypy infers
Dict[str, str] from the initial literal and rejects the int assignment
plus the **params unpack into OpenAIEmbeddings (6+ errors at line 133,
"dict is invariant").

Explicit Dict[str, Any] annotation resolves it — same shape this file
already uses for client_kwargs at line 197.

---------

Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
Co-authored-by: Daniel Petti <djpetti@gmail.com>
2026-05-13 21:11:43 +02:00
d 🔹
8c59082c30 feat(errors): friendly runtime messages for OpenAI-compatible endpoints (#4027)
* feat(errors): friendly runtime messages for OpenAI-compatible endpoints (#3878)

Wrap Site B in research_service.run_research_process so that when a request
to an OpenAI-compatible LLM endpoint (LM Studio / vLLM / llama.cpp server /
OpenRouter / custom endpoint) fails at runtime, the surfaced error names the
provider, configured base URL, and model.

The helper lives in error_handling/openai_compat_errors.py and:

  * walks __cause__/__context__ to find the underlying openai.* / httpx.*
    class through any LangChain wrapper (cycle-guarded);
  * dispatches to seven new tokens that slot into the existing
    "Error type: <code>" convention: openai_connection_refused,
    openai_timeout, openai_auth, openai_permission_denied,
    openai_model_not_found, openai_bad_request, openai_unknown;
  * always appends the original exc!s as a "Details:" suffix so no
    information is lost;
  * strips userinfo from base URLs before display (no API-key leaks when a
    user embeds the key in the URL).

Sites B and C and ErrorReporter all learn the new tokens; existing Ollama
and ad-hoc connection branches are untouched, so non-OpenAI-compatible
providers see no behaviour change.

Tests construct openai / httpx exceptions directly (no network) and cover
all five acceptance criteria from the issue plus the seven token round-trips
through ErrorReporter.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review: address djpetti feedback on PR #4027

- mention --network=host in _DOCKER_HINT
- hoist openai/httpx imports to module top (drop risk-averse try/except)
- hoist openai_compat_errors import to research_service.py top

* deps: promote openai and httpx to direct dependencies

error_handling/openai_compat_errors.py imports openai and httpx at module
top-level, but both were only present transitively via langchain-openai.
Pin them as direct deps so a future langchain-openai refactor cannot break
the error_handling module at import time.

---------

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Petti <djpetti@gmail.com>
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-05-13 20:50:35 +02:00
LearningCircuit
b20786c62c test(migrations): pin invariants from PR #4000 multi-round review (#4033)
Adds three regression tests that each fail on `main` (pre-fix) and pass
with the runner-level changes in this PR. Surfaced by a 30+ subagent
multi-round review of the existing test coverage; deferred dozens of
proposed tests that overlapped with existing coverage or tested
SQLite/Alembic internals rather than our code.

1. `test_run_migrations_skips_upgrade_when_at_head` — extended.
   Mocks now cover not just `command.upgrade` but also the new
   `_drop_orphan_alembic_temp_tables` and `_disable_fk_for_migration`
   helpers. Pins that the short-circuit happens BEFORE engine.connect()
   and the FK toggle. If a future refactor moves the short-circuit
   below the orphan-cleanup or FK toggle, this test fails — the existing
   command.upgrade mock alone would not catch that.

2. `test_run_migrations_drops_multiple_orphan_temp_tables` — new.
   Seeds three orphan `_alembic_tmp_*` tables and asserts all are
   cleaned in one pass. Targets the loop body in
   `_drop_orphan_alembic_temp_tables`; the existing single-orphan test
   would still pass if the loop ever short-circuited after the first
   iteration.

3. `test_drop_orphan_temp_tables_no_op_when_none_present` — new.
   Direct unit test on `_drop_orphan_alembic_temp_tables` against a
   clean DB. Pins the `if not temp_tables: return` early-return guard —
   a future refactor that unconditionally logs/scans would be caught.

Out of scope (verified by Round 5 cross-verification):
- foreign_key_check after upgrade: already covered (lines 4632, 4831).
- Data preservation 0001→head: already covered (lines 1518, 1871).
- Run twice no-op: covered by `test_idempotent_migrations` (line 194).
2026-05-13 20:11:08 +02:00
github-actions[bot]
3ade4b4103 🤖 Update dependencies (#4031)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-05-13 19:23:23 +02:00
LearningCircuit
c2a47a83b3 fix(db): unblock multi-migration upgrades blocked by FK mismatch + orphan _alembic_tmp_* tables (#4000)
* fix(db): unblock multi-migration upgrades — toggle FK + scrub orphan temp tables outside the alembic transaction

Closes #3990 and unblocks #3817. Real users at revisions 0001–0005
upgrading to 0009 hit two failure modes that left their account
unable to log in:

1. **`foreign key mismatch — "download_attempts" referencing "download_tracker"`** (#3990)

   Migration 0007's defensive `PRAGMA foreign_keys = OFF` is silently a
   no-op once the sqlite3/sqlcipher3 driver has auto-begun the migration
   transaction (per sqlite.org/pragma.html#pragma_foreign_keys). With the
   chained 0002–0006 upgrade, earlier migrations issue DML before 0007
   runs, freezing FK in the connect-time ON state for the rest of the
   upgrade. The orphan-scrub `DELETE FROM download_attempts ...` then
   fails with "foreign key mismatch" because the pre-fix
   `download_tracker.url_hash` lacks the UNIQUE backing the FK requires
   for the cascade machinery to compile.

   The fix issues `PRAGMA foreign_keys = OFF` in
   `alembic_runner.run_migrations` BEFORE opening the migration
   transaction (via `exec_driver_sql`, which doesn't trigger driver
   auto-begin), then re-enables FK on the same connection after the
   upgrade commits and before the connection returns to the pool — so
   subsequent checkouts see the production-default ON state.

2. **`table _alembic_tmp_journals already exists`** (#3817)

   `op.batch_alter_table` rebuilds a table by creating
   `_alembic_tmp_<table>`, copying data, dropping the original, and
   renaming. On a clean run alembic drops the temp table automatically.
   If a previous attempt failed in a way that bypassed transaction
   rollback (e.g., an older migration runner that auto-committed each
   migration), the temp table persists and the next attempt fails with
   "table _alembic_tmp_* already exists".

   The fix drops orphan `_alembic_tmp_*` tables in
   `alembic_runner.run_migrations` before opening the migration
   transaction. This runs at the SQLite level under autocommit; if a
   concurrent run_migrations is mid-batch_alter_table, our DROP blocks
   on the SQLite write lock until the rename consumes the temp table,
   making our DROP IF EXISTS a no-op — the race is benign.

Tests: two new fixture-driven regression tests
(`TestUpgradeFromBuggyV16xUserDbProductionEngine`,
`TestOrphanAlembicTempTableCleanup`) reproduce the production failure
modes verbatim — `isolation_level=""` matching the sqlcipher3 engine
in `encrypted_db.py`, FK ON at connect via the same event handler
`apply_performance_pragmas` installs, and a chained 0005→head
upgrade so DML auto-begins before 0007. Both tests fail without the
runner fix with the exact production error messages and pass with it.

Migration 0007's misleading comment ("no DML has opened the implicit
transaction yet") is also corrected — that statement was true when
the migration was written against a single-revision test fixture but
never held for real multi-migration upgrades.

* test(no-raw-sql): allow alembic_runner.py — same exception class as initialize.py

`alembic_runner.py` is migration infrastructure (drops orphan
`_alembic_tmp_*` tables in #3817, toggles `PRAGMA foreign_keys` in
#3990). The single `DROP TABLE IF EXISTS` f-string trips the
`["\']DROP\s+TABLE\s+'` regex in the raw-SQL guard. Add the file to
the same exclusion list `database/initialize.py` lives in — both are
catalog-derived DDL on migration infrastructure, not application
code touching user-controllable SQL.

Precedent: commit 0b82064fd added `database/initialize.py` with the
same justification.

The catalog-derived identifier in `_drop_orphan_alembic_temp_tables`
already carries `# noqa: S608` and `# bearer:disable` markers, so
static analysis (ruff/bearer) still flags any new violations in the
file — the test exclusion only suppresses the project-local raw-SQL
guard.
v1.6.10
2026-05-13 00:13:02 +02:00
LearningCircuit
048e58905a chore(deps): bump urllib3 to 2.7 for CVE-2026-44431 and CVE-2026-44432 (#4028)
Fixes two high-severity vulnerabilities:
- CVE-2026-44431: sensitive headers forwarded across origins in proxied low-level redirects
- CVE-2026-44432: decompression-bomb safeguards bypassed in streaming API
2026-05-13 00:08:50 +02:00
LearningCircuit
f7f427bff7 feat(citation): source-tagged citations with global counter (#4012)
* feat(citation): source-tagged citations with global counter

Add ``CitationMode.SOURCE_TAGGED_HYPERLINKS`` and set it as the
default ``report.citation_format``.

## What changes for users

Reports now render citations as ``[arxiv-1]``, ``[openai.com-2]``,
``[arxiv-3]`` — the source tag identifies *what kind* of source each
citation is, while the number is the original bibliography-order
global counter. Compared to the previous ``DOMAIN_ID_*`` modes, the
suffix is **not** a per-domain counter, so labels never collide and
clicking from inline text to the source list is unambiguous.

Source-tag resolution order:

1. ``URLClassifier``-recognised academic sources use the short enum
   value: ``arxiv``, ``pubmed``, ``pmc``, ``semantic_scholar``,
   ``biorxiv``, ``medrxiv``, ``doi``.
2. Generic web URLs fall back to the cleaned domain
   (``nytimes.com``, ``openai.com``) via the existing
   ``_extract_domain``.
3. Empty or non-http(s) URLs (``file://``, local-RAG hits) tag as
   ``local`` and render without a hyperlink so the markdown stays
   clean. A future PR can plumb collection names through the RAG
   metadata pipeline to replace the uniform ``local`` fallback —
   noted in the helper docstring.

## What does NOT change

* The agent still emits plain ``[N]`` citations — the LLM prompt and
  ``SearchResultsCollector`` are untouched. This is purely a
  display-layer transform applied after generation.
* All other modes are preserved unchanged. Users on
  ``domain_id_hyperlinks`` etc. keep their current behaviour.
* The global counter mechanism in
  ``SearchResultsCollector.add_results`` (``index = len(_all_links) +
  1``) was already correct — the new mode just stops the formatter
  from throwing that number away.

## Files

* ``citation_formatter.py``: new enum value, new
  ``_format_source_tagged_hyperlinks`` method, ``_extract_source_label``
  helper (URLClassifier → domain → ``local`` fallback chain), and
  ``_is_linkable_url`` helper so file:// / empty URLs render as
  ``[local-N]`` rather than ``[[local-N]](file:///...)``.
* ``research_service.py`` & ``scheduler/background.py``: add the new
  value to the string→enum dispatch maps. Existing Python fallbacks
  are deliberately left as-is.
* ``default_settings.json``: add the new option (placed first to
  signal it as the default), flip ``value`` from
  ``"number_hyperlinks"`` to ``"source_tagged_hyperlinks"``, expand
  the description.
* ``golden_master_settings.json``: regenerated via
  ``scripts/dev/regenerate_golden_master.py``.

## Tests

* ``test_source_tagged_hyperlinks_preserves_global_counter`` — the
  core property: ``arxiv-1, openai.com-2, arxiv-3`` (not per-domain
  re-numbering). Covers individual citations *and* comma-separated
  groups ``[1, 2, 3]`` → three tagged links concatenated.
* ``test_source_tagged_hyperlinks_known_academic_sources`` — arxiv,
  pubmed, semantic_scholar, biorxiv tags.
* ``test_source_tagged_hyperlinks_local_url_falls_back`` — both
  ``file://`` URLs and missing-URL citations render as plain
  ``[local-N]`` without a hyperlink.
* ``test_enum_member_count`` and ``test_*_value`` in
  ``test_citation_formatter_high_value.py`` updated for the new
  member.

* feat(citation): use collection name for local-RAG citations + changelog

Builds on the source-tagged citation work in this PR. Two pieces:

## Collection-name plumbing for local documents

Previously, RAG / library hits all rendered as ``[local-N]`` because
the formatter only saw the URL/title round-trip and had no signal
about which collection a hit came from. Now the rendered sources
block carries an optional ``Collection:`` line per source, and the
formatter parses it back so library hits surface their (slugified)
collection name as the citation tag.

Concrete pipeline:

1. ``LibraryRAGSearchEngine`` already puts ``collection_name`` into
   ``result["metadata"]`` (existing — no change).
2. ``utilities/search_utilities.format_links_to_markdown`` now
   tracks ``canon_to_collection`` alongside ``canon_to_title`` and
   appends ``   Collection: <name>`` after the ``URL:`` line when
   the metadata carries one. First non-empty wins per canonical URL
   (mirrors how title/quality work).
3. ``CitationFormatter._parse_collections`` extracts
   ``{citation_num: name}`` via a multiline regex anchored on the
   ``[N]`` header so a Collection: line attached to ``[1]`` cannot
   leak into ``[2]``.
4. ``_extract_source_label`` gains an optional ``collection``
   parameter that wins outright when supplied. Otherwise the existing
   fallback chain (URLClassifier → domain → ``local``) is unchanged.
5. ``_slugify_collection`` normalises free-form collection names
   into compact inline-safe tags: ``"My Papers"`` → ``my-papers``,
   ``"team/finance"`` → ``team-finance``, edge cases degrade to
   ``local`` rather than empty.

Result: a research mixing web hits and library hits now renders as
e.g. ``[arxiv-1]``, ``[my-papers-2]``, ``[openai.com-3]``,
``[team-finance-4]`` — readers can see at a glance what kind of
source each citation is.

## Changelog fragment

Adds ``changelog.d/4012.feature.md`` per the towncrier convention
documented in ``changelog.d/README.md``. Describes the new default
citation format and notes that all previous modes remain available
via ``report.citation_format``.

## Tests

* ``test_source_tagged_hyperlinks_uses_collection_name`` — mixed
  web + library report renders with the right tags and no
  cross-contamination.
* ``test_source_tagged_hyperlinks_collection_slugify_edge_cases`` —
  pins slugifier behaviour on whitespace, slashes, casing, unicode,
  and empty-after-slug edge cases.
* ``test_source_tagged_hyperlinks_missing_collection_falls_back`` —
  library URL without a ``Collection:`` line keeps the previous
  ``local-N`` behaviour (compat with hand-rolled sources blocks).
* ``test_source_tagged_hyperlinks_collection_line_isolation`` —
  regression guard for the regex anchoring: a ``Collection:`` line
  on ``[1]`` must not affect ``[2]``.
* Four ``TestFormatLinksToMarkdownCollections`` tests cover the
  renderer side: emit on metadata present, omit on metadata absent,
  omit on metadata without ``collection_name``, first non-empty
  wins on URL dedup.

1173 tests pass across ``tests/text_optimization/``,
``tests/utilities/`` (search utilities), and ``tests/settings/``.
``mypy`` clean on both touched source files.

* chore(citation): don't flip default to source_tagged yet

Per maintainer call: ship the new ``source_tagged_hyperlinks`` mode
as an opt-in only — keep ``number_hyperlinks`` as the default for
``report.citation_format`` for now. The mode stays available in the
settings dropdown for users who want to try it; we can flip the
default in a later release once it has soaked.

Changes:

* ``default_settings.json``: revert ``value`` to ``"number_hyperlinks"``;
  move the new option from first to second-to-last in the dropdown so
  the ordering doesn't read as "this is the default"; rewrite the
  description to lead with the existing modes.
* ``golden_master_settings.json``: regenerate to track the JSON value.
* ``changelog.d/4012.feature.md``: reword from "new default" to
  "new option, opt-in via the setting".

No code change to the formatter, the new mode, the collection
plumbing, or any of the 8 new tests added earlier in this PR.
2026-05-12 23:26:53 +02:00
LearningCircuit
d2a0889014 test: fix flaky rate-limit-triggered failures in rag upload coverage (#3943)
`tests/research_library/routes/test_rag_routes_upload_coverage.py`'s
`TestUploadToCollection` tests pass in isolation but the last three
(test_upload_pdf_storage_failure_continues, test_upload_auto_index_triggered,
test_upload_auto_index_no_password) flake to `429 TOO MANY REQUESTS` when
run as part of the wider research_library test suite locally
(LDR_DISABLE_RATE_LIMITING/DISABLE_RATE_LIMITING is unset). The
`@upload_rate_limit_user`/`@upload_rate_limit_ip` decorators applied to
`upload_to_collection` at module import time close over the real Limiter
instance, so the existing fixture's symbol patches cannot undo them — by
the time those tests run, earlier tests in the same pytest process have
already consumed the per-user 10/minute budget against the shared
in-memory storage.

Add `patch.object(real_limiter, "enabled", False)` to the fixture so the
real limiter is short-circuited for the duration of each test (and
restored automatically on exit). CI is unaffected (it sets
`DISABLE_RATE_LIMITING=true` at the workflow env, so the limiter is
already disabled there).
2026-05-12 22:49:14 +02:00
LearningCircuit
b5ca512d5d feat(hooks): add pre-commit hook to validate settings key namespaces (#4025)
* feat(hooks): add pre-commit hook to validate settings key namespaces

Parses ALLOWED_SETTING_PREFIXES and BLOCKED_SETTING_PREFIXES from
settings_routes.py via AST (single source of truth) and checks
hardcoded settings keys in Python (AST) and JavaScript (regex) files.
Prevents the class of bug where a new settings key is added but its
prefix is missing from the allow list.

* feat(hooks): add pre-commit hook to validate settings key namespaces

Parses ALLOWED_SETTING_PREFIXES and BLOCKED_SETTING_PREFIXES from
settings_routes.py via AST (single source of truth) and checks
hardcoded settings keys in Python (AST) and JavaScript (regex) files.
Prevents the class of bug where a new settings key is added but its
prefix is missing from the allow list.
2026-05-12 21:33:40 +02:00
LearningCircuit
37bd58ba6b fix(settings): allow local_search_ namespace for embedding settings (#4024)
The security namespace gate added in 6430dd4 blocked creation of
local_search_* setting keys (embedding model, chunk size, etc.) because
the prefix was missing from ALLOWED_SETTING_PREFIXES.
2026-05-12 20:56:51 +02:00
dependabot[bot]
964c774292 chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests/puppeteer (#4021)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md)
- [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-version: 24.43.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>
2026-05-12 19:29:41 +02:00
dependabot[bot]
59e3bac836 chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests (#4020)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md)
- [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-version: 24.43.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>
2026-05-12 19:29:11 +02:00
dependabot[bot]
1351a0cde7 chore(deps-dev): bump puppeteer in /tests/api_tests_with_login (#4019)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md)
- [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-version: 24.43.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>
2026-05-12 19:28:37 +02:00
dependabot[bot]
67114f8066 chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests/ui_tests (#4018)
Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1.
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md)
- [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1)

---
updated-dependencies:
- dependency-name: puppeteer
  dependency-version: 24.43.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>
2026-05-12 19:28:12 +02:00