local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-15 19:46:56 +03:00

Author	SHA1	Message	Date
LearningCircuit	ba0912056c	test(llm_utils): pin daemon-thread contract for in-loop async close (#4078 ) * test(llm_utils): pin daemon-thread contract for in-loop async close The existing ``tests/utilities/test_close_base_llm.py`` already covers the sync + async + in-loop + timeout + idempotence + FD-growth cases for ``_close_base_llm``. Two narrow contracts remained unpinned: - Daemon flag — the cleanup thread at llm_utils.py:154-159 must be ``daemon=True`` or a stuck ``aclose()`` would hold up Python interpreter shutdown. The comment at llm_utils.py:140-143 documents this requirement but no test asserted it. - In-loop close marks ``_ldr_closed`` even when inner aclose raises — the cleanup thread runs ``asyncio.run(aclose())`` inside a ``try/except Exception`` (lines 146-152). When ``aclose`` raises, the thread exits cleanly and the main thread sees ``t.is_alive() == False``, then sets ``_ldr_closed = True`` (line 178). The pre-existing ``test_swallows_async_close_exception`` covered this invariant for the no-loop branch only. New class ``TestInLoopCleanupThreadContract`` adds two tests: - ``test_cleanup_thread_is_daemon_so_shutdown_is_not_blocked`` — patches ``threading.Thread`` with a subclass that captures the constructor kwargs; verifies ``daemon=True`` and a stable name prefix (``"ldr"``). - ``test_in_loop_close_marks_closed_even_when_inner_aclose_raises`` — invokes ``_close_base_llm`` inside ``asyncio.run`` with an ``aclose`` that raises; asserts ``_ldr_closed`` is set anyway. Mutation-checked: - Flipping ``daemon=True`` to ``daemon=False`` → the daemon test fails. - Removing the ``async_httpx._ldr_closed = True`` line from the in-loop completion path (llm_utils.py:178) → 3 tests fail: both new cases AND the existing ``test_closes_async_inside_running_loop_via_thread`` / ``test_in_loop_close_is_idempotent``. The fact that the existing in-loop idempotence test already covered the happy-path mark is reassuring; my new test covers the exception-path mark. 0 production changes. 24 close-base-llm tests pass (was 22). * test(llm_utils): replace line-number refs with symbol-based ones AI reviewer flagged that the docstrings on the new tests in PR #4078 cite specific line numbers in ``llm_utils.py`` (e.g. ``llm_utils.py:154-159``, ``:140-143``, ``:173-178``) which will become stale on any refactor of the target module. Replace with stable symbol / branch-name references: - ``llm_utils.py:154-159`` (Thread construction site) → "the ``else: # A loop is running in this thread`` block that spawns a ``ldr-async-llm-close`` thread" - ``llm_utils.py:140-143`` (docstring warning) → "the docstring of ``_close_base_llm`` ... when motivating the brief daemon thread" - ``llm_utils.py:146-152`` (try/except around asyncio.run) → "the cleanup thread's ``_close_in_thread`` runs ``asyncio.run(aclose())`` inside a ``try/except Exception``" - ``llm_utils.py:178`` (the sentinel-set line) → "the ``else`` branch that sets ``_ldr_closed = True``" No behavior change; both tests still pass and still pin the same contracts. Follow-up to a recommendation in the AI Code Reviewer comment on PR #4078. v1.6.11	2026-05-17 11:50:06 +02:00
LearningCircuit	8b98dfc237	test(ui): chunk mobile-nav overlap DOM walk; drop WebKit skip (#4060 ) (#4076 ) * test(ui): chunk mobile-nav overlap DOM walk; drop WebKit skip (#4060) The mobile-nav overlap assertion in all-pages-mobile.spec.js previously ran a single ~60-line page.evaluate that walked every interactive element on the page. On Mobile Safari this occasionally raced WebKit's context-close ("Target page, context or browser has been closed"), so the test was wrapped in a WebKit-only test.skip fallback (#4060). Split the work so no single evaluate runs long: 1. Tiny evaluate fetches the nav rect. 2. Tiny evaluate fetches the interactive-element count. 3. Loop evaluates batches of 50 elements, short-circuiting once we have enough overlap hits to report. Each evaluate is now well under the threshold that triggered the WebKit race, so the WebKit-only skip and the dual error-message catch are removed. If a real overlap regresses, WebKit fails loudly alongside Chromium/Firefox — which was the goal of the issue. * test(ui): extract findElementsBehindMobileNav helper + per-batch cap Review followup for #4076: - Hoist the chunked overlap walk into findElementsBehindMobileNav so the test body reads as intent ("find overlaps, assert none") instead of evaluate plumbing. - Pass the remaining maxReported budget into each batch and break the inner loop once it's hit, so a batch with many overlap candidates doesn't serialize hits we'd discard anyway. Skipped from the same review: snapshotting the NodeList via evaluateHandle and re-deriving the nav rect per batch. Both target theoretical issues (drift, staleness) on pages that are static at the assertion point, and the absolute perf cost of the current shape is microseconds — not worth the API complexity until a real symptom appears.	2026-05-17 10:42:58 +02:00
LearningCircuit	1c33f1dc07	fix(ui-tests): match create/new/add buttons with word boundaries (#4069 ) * fix(ui-tests): match create/new/add buttons with word boundaries Selector helpers in several UI tests called `text.includes('new')`, which matches the substring "new" inside "News". On /news/subscriptions, the first hit was the `<a class="btn">Back to News Feed</a>` link instead of the `#create-subscription-btn`, so `SubscriptionCrudTests.createSubscriptionFormOpens` clicked the wrong control and failed because no form opened. Switch the matchers in the affected helpers to `\b(?:create\|new\|add)\b` (plus `subscribe` where it was already in the list). Word boundaries keep real targets like "Create Subscription", "New Folder", and "Add Subscription" while skipping "News Feed". * refactor(ui-tests): extract findActionButton helper Code-review follow-up. The buttons.find(...) + word-boundary regex block was duplicated in 9 call sites across 4 files, which is the same copy-paste that let the original "new" → "News" bug hide in multiple places. Extract a single helper into test_lib/test_utils.js: findActionButton(page, { selectors, keywords, click }) Defaults to `selectors='button, a.btn, .btn'` and `keywords=['create','new','add']`, returns `{ found, text }`. Drops the inconsistent extra `subscribe` keyword from the subscription CRUD test — verified on the current /news/subscriptions page that no button is labeled "Subscribe"; the primary control is "Create Subscription", which is matched by the default keyword list. This collapses the subscription tests to the same keyword set as the rest. Net change: 60 insertions / 127 deletions. Reran the 4 affected shards (mobile, library, history-news, api-crud) end-to-end at 100%, and confirmed the message now reports the correct button text (e.g. "Create Collection") rather than the previous false-positive match.	2026-05-17 10:20:36 +02:00
LearningCircuit	6e37c248e4	test(error_handling): pin load-bearing branches in openai_compat_errors (#4074 ) * test(error_handling): pin load-bearing branches in openai_compat_errors The existing test file covers all seven dispatch tokens and the four main helpers (319 LOC), but two load-bearing implementation choices were only documented in comments — not asserted. Adds seven tests that catch the most likely regressions. Pinned behaviors: - ``TestDispatchOrderingTimeoutBeforeConnection`` — ``APITimeoutError`` is checked BEFORE ``APIConnectionError`` at openai_compat_errors.py:87. This matters because ``openai.APITimeoutError`` subclasses ``APIConnectionError`` in openai>=1.x, so reordering the two branches would mislabel every timeout as ``openai_connection_refused``. The comment at lines 85-86 documents this; the new test pins it. A ``issubclass`` sanity check on the openai class hierarchy means the test fails first (with a clear message) if openai ever reorganises these classes, instead of just silently producing the wrong token. - ``TestWalkCauseChainPreference::test_cause_preferred_over_context_when_both_set`` — at openai_compat_errors.py:60, ``_walk_cause`` does ``cur.__cause__ or cur.__context__`` so explicit ``raise X from Y`` chains take priority over implicit ``__context__`` chains. The test constructs a wrapper with both set and asserts the deepest reached is the ``__cause__`` root. Edge cases: - ``TestStripCredentialsEdgeCases::test_ipv6_host_brackets_preserved`` — ``urlparse`` returns ``hostname`` without brackets; the implementation reassembles ``netloc`` from hostname + port. The test verifies an IPv6 URL still has its host marker (brackets or bare ``::1``) after redaction. - ``test_userinfo_stripped_with_ipv6_host`` — combined userinfo + IPv6 host; the userinfo must be removed regardless of host format. - ``test_url_with_no_netloc_passed_through`` — bare paths hit the ``if not parsed.netloc:`` short-circuit and are returned as-is. - ``TestFriendlyErrorNoneArgs`` — ``friendly_openai_compatible_error`` uses ``provider or "<unknown provider>"`` and ``model or "<unspecified>"`` to keep the surfaced message legible when the caller doesn't know the values. Two tests pin both placeholders. Mutation-checked during development: - Swapping the timeout / connection-refused branches → both timeout tests fail. - Changing ``cur.__cause__ or cur.__context__`` to ``cur.__context__ or cur.__cause__`` → the cause-preference test fails. No production code changes. 34 tests pass (was 27). * fix(error_handling): preserve IPv6 brackets in _strip_credentials The AI Code Reviewer on this PR (#4074) flagged that the ``test_ipv6_host_brackets_preserved`` assertion was too loose: assert "[::1]" in result or "::1" in result When the implementation strips brackets, the result is ``http://::1:8080/v1`` — which still contains the substring ``::1``, so the test passes despite producing an invalid URL. Tightening to ``assert result == "http://[::1]:8080/v1"`` surfaced the underlying bug: ``_strip_credentials`` was indeed losing the brackets. Root cause: ``urllib.parse.urlparse`` exposes ``hostname`` without the surrounding brackets that mark IPv6 hosts. The previous ``netloc`` reassembly used the bracketless hostname directly, so the rebuilt URL became ``http://::1:8080/v1`` — ambiguous about where the host ends and the port begins, and rejected by downstream HTTP libraries. Fix: when reassembling ``netloc``, re-add brackets around any host that contains ``:`` (i.e. IPv6). IPv4 hosts never contain ``:`` so this heuristic is safe. Also tightens both new IPv6 tests to assert the full expected URL rather than a loose substring match. Mutation-checked: reverting the bracket re-add flips both ``TestStripCredentialsEdgeCases::test_ipv6_host_brackets_preserved`` and ``test_userinfo_stripped_with_ipv6_host`` to failure.	2026-05-17 10:06:10 +02:00
LearningCircuit	d346a8fe2d	test(scheduler): credential lifecycle coverage and weak-test cleanup (#4065 ) * test(quality): strengthen weak scheduler tests Four existing scheduler tests asserted on mock call_counts or swallowed all exceptions without asserting anything, so they passed even when the underlying production code was broken. Two frozen-dataclass tests used ``try/except AttributeError: pass`` blocks that silently pass if NO exception is raised — the opposite of the intent. Rewrites (no production code changes): - ``test_scheduler_extended.py::test_logs_processing_start`` — previously mocked the logger inside a ``try/except Exception: pass`` and ended with the comment ``# Should have logged something`` and no ``assert``. The new version exercises the "no session info" early return path and asserts on the exact entry-banner log line (background.py:688) via ``mock_logger.info.assert_any_call(...)``. - ``test_scheduler_extended.py::test_queries_overdue_subscriptions`` → renamed to ``test_returns_early_when_credentials_missing``. The previous version called the method with a user who had no credentials, wrapped the call in a bare ``try/except Exception: pass``, and asserted nothing. The new version patches ``get_user_db_session`` and asserts it is NOT called — proving the credential-missing guard short-circuits before any DB work. - ``test_scheduler_extended.py::test_handles_scheduler_exception`` + ``test_handles_job_lookup_error_on_remove`` — both replaced by ``test_unregister_swallows_job_lookup_error``. The originals tested the mocks themselves (``mock.remove_job.side_effect = JobLookupError; try: mock.remove_job(...); except JobLookupError: pass``) rather than the scheduler. The replacement exercises ``unregister_user`` with two stale scheduled jobs, asserts both ``remove_job`` calls were attempted, and asserts the user is fully cleaned up (sessions + credentials) — pinning the JobLookupError swallow at background.py:463-464. - ``test_scheduler_extended.py::test_X_check_subscription`` paths — the two tests that wrapped ``_check_subscription`` in ``try/except Exception: pass`` and asserted nothing now patch ``get_user_db_session`` and assert it is NOT called when the user is missing from ``user_sessions``. - ``test_scheduler_document_behavior.py::test_cannot_modify_enabled`` and ``test_cannot_modify_interval`` — replaced ``try/except AttributeError: pass`` blocks with ``pytest.raises((AttributeError, FrozenInstanceError))``. The tuple is forward-compatible: ``FrozenInstanceError`` subclasses ``AttributeError`` and Python's behavior here has shifted between versions. Using ``pytest.raises`` ensures the test fails if NO exception is raised. - ``test_scheduler_extended.py::test_is_frozen`` — same fix (``try/except AttributeError: pass`` → ``pytest.raises``). All 604 scheduler tests pass after these rewrites. * test(scheduler): add credential lifecycle coverage The scheduler at src/local_deep_research/scheduler/background.py (1808 LOC) has ~600 tests in tests/news/test_scheduler_.py, but credential-lifecycle scenarios that are most fragile (per the project memory file project_user_db_encryption_blocks_background_jobs.md) were not covered. Adds eight test methods pinning these branches. Each test documents the production line(s) it pins and the mutation that would flip it. Mutation-checked during development: - Removing ``self._credential_store.clear(username)`` from ``unregister_user`` (background.py:468) → fails ``test_unregister_user_clears_credential``. - Removing the ``set_search_context({...})`` call at background.py:837-844 → fails ``test_search_context_set_before_processing_each_research``. - Removing the ``set_setting("document_scheduler.last_run", ...)`` call at background.py:1082-1084 → fails ``test_last_run_not_advanced_when_db_open_fails`` via its happy-path contrast assertion. Coverage added: - ``TestCredentialExpiryAndIsolation`` - ``test_credential_expiry_between_two_retrieves_in_same_job`` — a long-running job that retrieves credentials twice spanning the TTL boundary sees ``pw → None``. Pins credential_store_base.py:73-75 (lazy-delete on expired retrieve) via SchedulerCredentialStore at background.py:50-53. The base class TTL tests at tests/database/test_credential_store_ttl.py cover single-retrieve boundary; this covers multi-call. - ``test_unregister_user_clears_credential`` — pins background.py:454-468 plus credential_store_base.py:98-107. A snapshot caller already holds the password as a Python local so it survives the clear; the next retrieve sees nothing. - ``test_cross_user_credential_isolation`` — parametrized across alice/bob/charlie. Pins the username-keyed dispatch. - ``test_clear_is_idempotent_and_safe_on_unknown_user`` — pins the ``if key in self._store`` guard in credential_store_base.py:106. Removing the guard would make the second clear and the ghost clear raise ``KeyError``. - ``TestTtlWrapperBehavior`` - ``test_ttl_boundary_store_expire_store_cycle`` — full store → expire → store → expire cycle through the SchedulerCredentialStore wrapper. Pins the ``ttl_hours 3600`` conversion at background.py:42 and the ``expires_at`` recomputation on each store at credential_store_base.py:47. - ``test_ttl_hours_zero_expires_at_next_clock_tick`` — pins the absence of validation in the constructor and the strict ``>`` in credential_store_base.py:73. Contract test: anyone adding ``if ttl_hours <= 0: raise ValueError`` must update this test. - ``TestDocSchedulerCredentialLifecycle`` - ``test_last_run_not_advanced_when_db_open_fails`` — verifies the intentional design from PR #3288 / commit `405226638`. The ``set_setting("document_scheduler.last_run", ...)`` call at background.py:1082-1084 is OUTSIDE try/finally on purpose: if upstream setup fails (DB open, SettingsManager init), last_run must stay put so the next tick retries. The test has two contrasting blocks — unhappy path (DB open raises → last_run NOT advanced) and happy path (DB open succeeds → last_run IS advanced) — so neither assertion is trivially satisfied. - ``test_search_context_set_before_processing_each_research`` — pins the fix from PR #3289 / commit `1a0d46e69`. Without ``set_search_context``, downloads bypass per-thread rate limiting because the context is missing. Asserts every required field is passed (research_id, username, user_password, research_phase=document_scheduler). The new file relies on the global ``reset_all_singletons`` autouse fixture at tests/conftest.py:76-94 (which resets the singleton and calls ``.stop()``) and does not introduce a redundant local fixture. All deferred imports (get_user_db_session, set_search_context, SettingsManager) are patched at their source module. 10 new test cases (8 specs, 3 from parametrize on cross-user). All 604 scheduler-suite tests pass. * test(scheduler): tidy credential lifecycle test file Two cleanup items from the AI Code Review on PR #4065: - Remove the vestigial ``_ = FrozenInstanceError`` guard at the bottom of the module along with the matching top-level import. The import was carried over from an earlier iteration and is no longer used. - Extract the mock ``get_user_db_session`` builder that was duplicated across ``test_last_run_not_advanced_when_db_open_fails`` and ``test_search_context_set_before_processing_each_research`` into a ``_make_db_session_with_research`` helper. Saves ~20 lines and gives the two happy-path setups a single place to evolve. No behavior changes; all 10 tests in the file still pass.	2026-05-17 10:03:56 +02:00
LearningCircuit	02e197da86	fix(security): redact Google API key from list_models error log (#4070 ) * fix(security): redact Google API key from list_models error log The Google Gemini provider's ``list_models_for_api`` (at src/local_deep_research/llm/providers/implementations/google.py:56) constructs the request URL with the API key as a ``?key=...`` query parameter, per Google's documented API (https://ai.google.dev/api/rest). When ``safe_get(url, ...)`` raised — for any reason: connection error, timeout, 401, etc. — the underlying ``requests``/``urllib3`` exception message included the full URL, with the key. The except handler then called ``logger.exception(...)``, which writes the traceback (including the exception's ``__str__``) to every loguru sink: stderr, the database log sink, and the frontend progress sink. Reproduced under the project's production loguru config (``diagnose=False, backtrace=False``): the line ``requests.exceptions.ConnectionError: ...key=sk-LEAKED-VALUE-99999`` appeared in the captured log output. Fix: catch the exception explicitly, replace the key value in the message with ``*REDACTED``, and log via ``logger.warning`` so the exception chain is not attached. Bundled regression tests in tests/security/test_api_key_leakage.py: - ``test_no_leak_when_safe_get_raises_with_url_in_message`` — the primary repro path. Patches ``safe_get`` to raise a ``ConnectionError`` whose message embeds the key, then asserts the sentinel is absent from ``loguru_caplog.text``. - ``test_no_leak_when_safe_get_raises_generic_runtime_error`` — same redaction also runs on non-requests exceptions whose ``str()`` contains the key. - ``test_non_200_response_does_not_leak_key`` — pins the existing status-code-only warning at lines 88-90 (which already doesn't include the URL). - ``test_repr_does_not_expose_stored_passwords`` / ``test_clear_entry_on_missing_does_not_leak_state`` — defense in depth on the credential store. - ``test_friendly_error_strips_credentials_from_base_url`` — pins the existing ``_strip_credentials`` userinfo redaction in ``error_handling/openai_compat_errors.py`` so a future change that removed it would be caught. Mutation-checked: restoring the old ``except Exception: logger.exception(...)`` flips the two Google leak tests to failure. security: extract redact_secrets() utility from inline replace The previous commit fixed the Google API-key leak with an inline ``msg.replace(api_key, "*REDACTED")`` in google.py. That is a one-off — every other provider, route handler, or error path that needs to scrub a known secret value would have to repeat the same pattern. Extract a single utility into ``security/log_sanitizer.py`` next to the existing ``sanitize_for_log`` / ``strip_control_chars`` helpers: def redact_secrets( message: str, secrets: Optional[str], min_length: int = 8, token: str = "*REDACTED", ) -> str Variadic; skips falsy and sub-min-length values to avoid corrupting normal message content; exposes ``min_length`` and ``token`` for callers who need to override. google.py now uses it instead of the inline replace. Unit tests in ``tests/security/test_log_sanitizer.py``: - Happy path: single secret, multiple secrets, all-occurrences. - Guards: ``None`` ignored, empty string ignored, sub-min-length ignored, custom min_length override. - Boundaries: no secrets, empty message, message without any secret. - Custom token override. - Realistic provider key shapes (OpenAI, Google, Anthropic). - Literal-substring-match contract (URL-encoded forms are NOT redacted unless the caller passes them). google.py refactor captures the redacted message in a local before the ``logger.warning`` call so the ``check-sensitive-logging`` pre-commit hook (which AST-checks for exception-variable references in non-exception log calls) does not flag the line. The hook's recommended ``logger.exception`` would defeat the entire point of the fix. The existing six leakage tests in ``tests/security/test_api_key_leakage.py`` remain unchanged — they assert the leakage contract, not the implementation, so the refactor flows underneath them. review: lift redact_secrets to module-level + tighten silence test Two small follow-ups to the AI reviewer's points: 1. google.py: move ``from ....security.log_sanitizer import redact_secrets`` out of the except handler to module-level. The nested import has no circular-import or lazy-load justification here (ollama.py already imports ``from ....security import safe_get`` at module level), and lifting it eliminates the theoretical case where an ImportError raised while handling the provider exception would carry the leaked-URL ConnectionError up via ``__context__``. Also rewrites the inline comment so the two rationales (redact + drop exc_info; capture in a local for the check-sensitive-logging pre-commit hook) are no longer broken up by the import statement. 2. test_clear_entry_on_missing_does_not_leak_state was passing trivially because ``CredentialStoreBase.clear_entry`` is silent on every code path — the old assertion ``_LEAKED_KEY not in loguru_caplog.text`` would have held even if the test never exercised the method. Renamed to test_clear_entry_does_not_log_ store_state and replaced with ``assert not loguru_caplog.records`` so the contract being pinned is silence itself: a future ``logger.debug(f"store contents: {self._store}")`` regression would be caught immediately. Now exercises both the missing-key and present-key paths and seeds a second credential so a _store-dict dump would also leak it. Mutation-checked: monkey-patching clear_entry to add a debug log containing self._store flips the new test to failed; the live implementation still passes. All 6 tests in tests/security/test_api_key_leakage.py pass against the real code.	2026-05-17 02:40:10 +02:00
LearningCircuit	0fe3c8c5de	chore(security): suppress CVE-2026-8328 (ftplib.ftpcp SSRF) until 3.14.6 (#4072 ) Grype alerts on CVE-2026-8328 against python:3.14.5-slim. The vulnerability is an SSRF in the undocumented ftplib.ftpcp() helper — the same PASV-trust class as CVE-2021-4189, whose original 2021 fix only patched ftplib.FTP and left ftpcp() unprotected. Upstream merged the fix to the CPython 3.14 branch on 2026-05-13 (python/cpython#149793), three days after Python 3.14.5 was tagged. No 3.14.6 release exists yet, so a base-image bump isn't an option. Not exploitable here: `grep -rn "ftplib\\|ftpcp" src/` returns zero hits, and no transitive dependency imports ftplib either, so ftpcp() is unreachable from this image. Added to .grype.yaml in the existing python3.14 block alongside the other CPython CVEs awaiting the next 3.14.x point release. The suppression auto-cleans when the next Python bump picks up 3.14.6+.	2026-05-17 02:32:58 +02:00
LearningCircuit	da0d18ed25	fix(release): set towncrier name to skip package import (#4071 ) The release job uses a sparse checkout that omits src/ and runs a standalone `pip install towncrier`. Towncrier 24.8 still calls `get_project_name()` even when --version is passed on the CLI, and the existing [tool.towncrier] config pointed at the `local_deep_research` package, so the build crashed with ModuleNotFoundError before rendering any fragments. Set `name = "local-deep-research"` so towncrier short-circuits the import path (build.py:195-197). Drop the now-misleading `package`/`package_dir` fields — `--version` is always passed, `directory = "changelog.d"` is explicit, and nothing else inside towncrier still needs them. Fix the workflow comment that misattributed the bypass to --version. Verified by rendering changelog.d/*.md fragments against this pyproject.toml in a fresh directory with no src/ present.	2026-05-17 02:30:51 +02:00
LearningCircuit	b0008045df	fix(security): extend IMDS absolute-block to Apprise plugin schemes (#4063 ) NotificationURLValidator only ran the cloud-metadata IP guard in the http/https branch, so URLs like signal://169.254.169.254/+1/+1 (and the same for gotify, ntfy, mattermost, rocketchat, matrix, json, xml, form, mailto) reached Apprise — which then POSTs against that host under HTTP. Behind the operator-only LDR_NOTIFICATIONS_ALLOW_OUTBOUND gate, but a residual gap inconsistent with the absolute-block invariant SECURITY.md documents. Refactored host extraction out of the http/https branch and added an IMDS-only check for plugin schemes (allow_private_ips=True semantics in _is_private_ip leaves only ALWAYS_BLOCKED_METADATA_IPS and NAT64-wrapped metadata active). LAN/loopback reach for self-hosted plugin endpoints (the #4006 use case) is unchanged. Test coverage: - 100 parametrized cases: 10 plugin schemes x 5 metadata IPs x 2 allow_private_ips values - mailto://user@IMDS/recipient regression - positive: signal/gotify LAN + signal localhost still allowed - positive: token-host schemes (discord/slack/telegram/pushover/teams) unaffected - DNS-resolved hostname pointing at IMDS rejected (single-resolve attacker; full rebinding TOCTOU remains documented residual risk)	2026-05-17 02:17:30 +02:00
LearningCircuit	6f18a711d2	docs(resource-cleanup): expand Wave 7 with full audit ledger (#4054 ) * docs(resource-cleanup): expand Wave 7 with full audit ledger Replaces the brief "follow-up gaps" bullet list with the full ledger of what the broader audit during #4047 actually examined, split into four scannable subsections: - Checked and confirmed clean: non-Ollama LLM providers, HTTP session lifecycle, subprocess/pidfd, asyncio loops, file handles, SocketIO connect/disconnect. - Flagged then verified NOT a real FD leak: OllamaEmbeddings (uses the deprecated langchain_community class with no httpx client), auth_db + journal_quality engines escaping shutdown_databases (bounded pools, not growing), LibraryRAGService in three RAG SSE endpoints (RAM churn, no FDs — FAISS uses pickle.load, embeddings hold no FDs per the item above, SentenceTransformer mmaps are process-wide singletons). - Minor findings: daemon threads without explicit shutdown, abandoned-research cleanup on socket disconnect — both reaped at process exit, not steady-state leaks. - Future-proofing note: ``langchain_community.embeddings.OllamaEmbeddings`` is deprecated; the replacement ``langchain_ollama.OllamaEmbeddings`` DOES carry ``_client`` and ``_async_client`` (verified by direct introspection), so when LDR migrates the in-running-loop eventpoll leak class will reappear for embeddings unless ``_close_base_llm`` is generalized. Direct introspection done at audit time confirms each verdict: ``[a for a in dir(e) if 'client' in a.lower()]`` returned ``[]`` for the deprecated class and a non-empty list for the new class. This ledger saves the next contributor from re-running the same agent sweep when investigating a future FD spike. No code changes. * docs(resource-cleanup): add Round-8 pidfd finding (fixed by #3971) The Wave 7 ledger covered the eventpoll-FD investigation but didn't mention the residual pidfd accumulation we discovered post-merge. A follow-up Round-8 investigation (8 parallel agents, 2 rounds + direct /proc inspection on a live prerelease container) traced ~3.6 pidfds/hour, steady-state ~29, to: _check_subscription → quick_summary → FullSearchResults.batch_fetch_and_extract → AutoHTMLDownloader fallback → PlaywrightHTMLDownloader._fetch_with_playwright → sync_playwright().start() → asyncio.create_subprocess_exec(node-driver) # opens pidfd → driver fails (Chromium not installed in production ldr stage) → pidfd not closed on the failed-child exit CPython 3.14 ruled out as a confounder: subprocess.py uses waitpid(WNOHANG) polling, never opens pidfds. Only asyncio.create_subprocess_* and multiprocessing.Process can open them on Linux + Python 3.9+ via PidfdChildWatcher. PR #3971 (already merged) addresses this from a different angle: it makes web.enable_javascript_rendering default false, so AutoHTMLDownloader short-circuits before invoking Playwright. No subprocess spawned → no pidfd opened. Original motivation for #3971 was the confusing tracebacks reported in #3826; the FD-leak finding is the second motivation, captured here so a future reader sees both. The new bullet sits in Section B (flagged-then-verified-then-fixed) because the leak was real but is now resolved upstream. * docs(resource-cleanup): add FD-leak debugging playbook + CI considerations Add a new "Debugging FD leaks — playbook for the next one" section between the History (Waves 1-7) and "Intentionally not done" parts of the doc, capturing the diagnostic flow we developed across Waves 6 and 7 so future contributors don't re-derive it from scratch. Includes: - Symptoms that justify treating an issue as an FD leak (OSError 24, static-asset MIME errors, High FD count warnings, healthcheck hangs). - Host-side and inside-container snapshot scripts that work even when the container is too FD-starved for docker exec (host-side via sudo + /proc/$P/fd) and through the entrypoint's UID drop (--user 0 to docker exec). - Lookup table mapping each anon_inode / socket / pipe / REG flavor to its likely Python-level source and the path to deep-dive (e.g. /proc/PID/fdinfo/N's Pid: line for pidfds). - A pinpointing recipe per FD type — eventpoll (asyncio/httpx), pidfd (asyncio.create_subprocess / multiprocessing.Process), WAL/SHM (SQLCipher engine.dispose). - Pointer to the existing in-codebase instrumentation: _count_open_fds, the periodic Resource monitor log, fd_monitor.py, and the RUN_MANUAL_SMOKE-gated tests/manual_smoke/test_fd_smoke.py harness. - Honest discussion of why an automated per-PR FD-growth assertion is hard (transient FDs, CI-environment subprocess noise, namespace differences, slow-drip leaks needing hours of uptime) and what a nightly long-run job would look like if the team chooses to invest in one. - A "which Wave fixed which leak class" reference table so the next reporter can recognize a class and skip to the relevant precedent. No code changes. Pure documentation. * docs(resource-cleanup): add development-time detection + bpftrace recipes Extend the FD-leak debugging playbook with two industry-standard techniques that would have caught Waves 6 and 7 earlier, drawn from upstream Python docs and the wider production-tracing literature: 1. bpftrace syscall-level pinpointing (in the per-FD-type section). Trace pidfd_open / epoll_create1 / etc. on the host targeting the container's host PID; produces a histogram of every user stack that triggered the syscall, ranked by frequency. The hot stacks are the culprits. Would have caught the Playwright pidfd leak in seconds. 2. Development-time detection (new subsection 4a) — catches leaks at test time before they ship: - PYTHONASYNCIODEBUG=1 + -W default::ResourceWarning. Per the asyncio dev docs, unclosed transports emit ResourceWarning at GC time; the filter actually displays them. Would have surfaced the Wave 7 in-running-loop skip in any test that exercised ainvoke + safe_close on ChatOllama. - python -X dev for a one-flag local dev mode bundling ResourceWarning + asyncio debug + warnings as default. - pyproject.toml [tool.pytest.ini_options] examples for both "display" and "error" filter modes (with a caveat that error mode needs a targeted subset, not the whole suite, because third-party libs also emit ResourceWarning). - psutil's num_fds / open_files / connections as the cross-platform alternative to /proc/self/fd for unit tests on macOS dev environments. - tracemalloc + objgraph as the next-level tool when a leak is reproducible — diff allocations before/after, then render the reference chain holding the leaked wrapper alive. No code changes. The new tooling is recommendations only; no mandatory pytest config change in this commit. Future work could enable PYTHONASYNCIODEBUG=1 in the CI test environment if the overhead is acceptable. Citations to docs.python.org are inline for the load-bearing ResourceWarning claim. * test(fd-canary): pin asyncio.create_subprocess pidfd lifecycle in CI Add ``TestAsyncioSubprocessFDBaseline`` to ``tests/utilities/test_close_base_llm.py`` with two regression tests that run on every PR: 1. ``test_no_fd_growth_across_asyncio_subprocess_cycles`` — spawns ``/bin/true`` via ``asyncio.create_subprocess_exec`` 10 times and asserts total FD count delta ≤ +2. Pins the pidfd FD class against the child-watcher leak shape. 2. ``test_no_fd_growth_when_subprocess_fails_to_exec`` — same shape but with a deliberately-missing binary, mirroring the exact Wave-7 production failure mode (Playwright's Node.js driver being spawned, kernel returning ENOENT because Chromium wasn't installed, child watcher still expected to clean up the pidfd it opened before the failed exec). Why this is the right level --------------------------- LDR's own code does NOT call ``asyncio.create_subprocess_`` (verified in R8C1). The production leak came from a transitive dependency (Playwright). So we cannot test LDR's call sites directly — there are none. Instead these tests pin the platform baseline*: on this Python version, repeated asyncio subprocess cycles must not leak FDs. If a future Python upgrade, a child-watcher change, or a new direct asyncio.create_subprocess call in LDR breaks the close semantics, the next PR's CI fails on these tests — which is the canary signal we want. Linux-only via ``sys.platform != "linux"`` skip. pidfd_open is a Linux syscall; macOS uses a different watcher and Windows uses ProactorEventLoop. Both 'pass by virtue of nothing to leak', so restricting to Linux keeps the signal sharp (a failure on Linux is actionable; a pass on macOS is uninformative). Same +2 FD slack we use for the eventpoll canary above. A real 1-FD-per-iter leak across 10 iterations would land at delta=10, well past the threshold. Doc reference ------------- Updated ``docs/developing/resource-cleanup.md`` "Existing instrumentation" section to enumerate all four in-CI FD-growth canaries (two eventpoll, two pidfd) so future contributors see at a glance what's already guarded and where to extend coverage when a new leak class is found.	2026-05-16 20:01:04 +02:00
LearningCircuit	15a3df4aff	fix(content-fetcher): disable JS rendering by default (#3826 ) (#3971 ) * fix(content-fetcher): disable JS rendering by default (#3826) The default Docker production image intentionally ships without Chromium (Dockerfile lines 286-287), so the AutoHTMLDownloader's Crawl4AI/Playwright fallback can never succeed for the majority of users -- it just spawns a fresh Chromium per fetch, fails, and logs a confusing traceback. In the issue reporter's run, 11 such failed fallbacks fired per research on api.github.com JSON URLs. Add a user-facing setting `web.enable_javascript_rendering` (default false). When disabled, AutoHTMLDownloader skips the JS fallback and returns the static result. Power users running outside Docker who have set up Chromium can flip the toggle in the UI. The setting is plumbed through: - AutoHTMLDownloader.__init__ -- new enable_js_rendering=True ctor arg (preserves direct-caller behaviour); download() and download_with_result() short-circuit the JS fallback when False. - ContentFetcher -- new enable_js_rendering=False kwarg passed through to the HTML/DOI downloaders. - build_fetch_tool / _make_full_fetch_tool / _make_summary_fetch_tool -- accept settings_snapshot, read the bool via get_bool_setting_from_snapshot (so the toggle works on ToolNode worker threads where threading.local does not propagate), pass enable_js_rendering into ContentFetcher. - LangGraphAgentStrategy -- forwards self.settings_snapshot to build_fetch_tool at both top-level and sub-agent callsites. - pipeline.fetch_and_extract / batch_fetch_and_extract -- new enable_js_rendering=False kwarg passed through. - FullSearchResults -- new settings_snapshot kwarg, reads the bool and passes it to batch_fetch_and_extract from both call paths (run() and _get_full_content()). - BaseSearchEngine -- forwards self.settings_snapshot when constructing FullSearchResults. Existing direct callers (tests, internal lazy-init in _get_playwright_downloader) keep the implicit-on contract via the True ctor default; the disable-by-default decision happens at the factory layer. * fix(content-fetcher): tighten JS-rendering disable per review Address two findings from the code review of `1cd1d116c`: 1. Remove unreachable ``try/except NoSettingsContextError`` wrappers around ``get_bool_setting_from_snapshot`` in both ``_read_js_rendering_setting`` helpers. ``get_setting_from_snapshot`` only raises that exception when ``default is None``; we always pass ``default=False``, so the except blocks were structurally unreachable and also added a silent-fallback layer that conflicts with the project's no-fallbacks rule. 2. Flip ``AutoHTMLDownloader.__init__`` default from ``enable_js_rendering=True`` to ``False``. This makes the constructor consistent with every other layer (``ContentFetcher``, ``fetch_and_extract``, ``batch_fetch_and_extract``, ``FullSearchResults``, and the user-facing setting itself), so future direct callers cannot accidentally re-enable JS rendering by omitting the kwarg. The two existing direct callers that do exercise the JS-rendering fallback (the SPA-trigger unit test and the extraction performance benchmark) now opt in explicitly. * test(content-fetcher): add cross-layer integration coverage for #3826 The change spans 7 source files and introduces a new kwarg on 5 constructors / functions; per-module unit tests catch regressions within a layer but not at the boundaries. Add integration tests that pin down the wiring between layers: * ``tests/research_library/downloaders/test_extraction_pipeline.py`` -- ``TestFetchAndExtractJSRenderingPlumbing`` (5 tests). Asserts ``fetch_and_extract`` and ``batch_fetch_and_extract`` forward ``enable_js_rendering`` into the ``AutoHTMLDownloader`` constructor for the default (False), explicit-True, and explicit-False cases. * ``tests/web_search_engines/engines/test_full_search.py`` -- ``TestJSRenderingForwardingFromSettingsSnapshot`` (9 tests). Asserts ``FullSearchResults`` reads ``web.enable_javascript_rendering`` from its ``settings_snapshot`` and forwards the boolean to every ``batch_fetch_and_extract`` call, on both code paths (``run()`` and ``_get_full_content()``). Also pins the new ``settings_snapshot`` ctor kwarg's default and storage. * ``tests/web_search_engines/test_search_engine_base.py`` -- ``TestInitFullSearchForwardsSettingsSnapshot`` (2 tests). Asserts ``BaseSearchEngine._init_full_search`` forwards ``self.settings_snapshot`` when constructing ``FullSearchResults``, closing the last unverified hop in the layer chain. All 227 tests in the touched modules pass. * fix(content-fetcher): pin _read_js_rendering_setting return to bool CI mypy-analysis flagged ``web_search_engines/engines/full_search.py:23`` with ``Returning Any from function declared to return "bool"`` ([no-any-return]). ``get_bool_setting_from_snapshot`` is internally typed as ``Any`` because it routes through the generic snapshot accessor, so returning its result directly leaks ``Any`` past a ``-> bool`` signature when ``warn_return_any`` is on. Wrap the call in ``bool(...)`` to coerce to a definite ``bool``. Apply the same change to the sibling helper in ``advanced_search_system/tools/fetch/__init__.py`` (which mypy does not currently check because the package is in the ``ignore until cleaned up`` override list, but the pattern is identical and the fix keeps the two helpers consistent). * test(content-fetcher): assert no browser spawn when JS rendering disabled (#3975) Adds an end-to-end-style regression test for issue #3826 / PR #3971 that mocks the actual library symbols Crawl4AI and Playwright are imported from -- ``crawl4ai.AsyncWebCrawler`` (line 112 lazy import in ``_fetch_with_crawl4ai``) and ``playwright.sync_api.sync_playwright`` (line 216 lazy import in ``_fetch_with_playwright``) -- and asserts both have zero call count when ``AutoHTMLDownloader`` / ``ContentFetcher`` runs with ``enable_js_rendering=False``. This complements the existing unit tests that patch ``_get_playwright_downloader`` to assert it isn't called. The new tests are stronger: if anyone ever adds a code path that bypasses ``self.enable_js_rendering`` and reaches into Crawl4AI/Playwright via a different import path, these tests catch it. Five regression tests across two classes: * ``test_no_browser_when_static_returns_short_content`` -- the bug trigger (JSON / no-content response that previously fell through to JS rendering). * ``test_no_browser_when_spa_signals_present`` -- the SPA-signal branch. * ``test_no_browser_when_static_returns_none`` -- the static-fetch None-return branch. * ``test_no_browser_in_download_with_result_path`` -- pins the ``download_with_result()`` gate (the path the agent actually uses via ``ContentFetcher.fetch``). * ``test_no_browser_when_content_fetcher_disabled`` -- same guarantee at the public ``ContentFetcher`` boundary. Inverse-check confirmed locally: with ``enable_js_rendering=True`` and the same mocks, ``AsyncWebCrawler`` is invoked (call_count = 1), so the tests fail closed if the gate ever regresses. * refactor(content-fetcher): share JS-rendering toggle helper, gate MCP download (#3974) Builds on PR #3971 (disable JS rendering by default) with two clean-up items the post-merge review surfaced: 1. The ``_read_js_rendering_setting`` helper was duplicated as a private function in ``advanced_search_system/tools/fetch/__init__.py`` and ``web_search_engines/engines/full_search.py``. Importing underscore-prefixed names from another package is a smell. Extract it into ``utilities/js_rendering.py`` as a public ``read_js_rendering_setting`` and update both callsites to import from there. 2. ``mcp_strategy.py:1146`` (the MCP ``download_content`` tool) was constructing ``ContentFetcher(timeout=...)`` without forwarding the ``enable_js_rendering`` kwarg. It happened to work because the ``ContentFetcher`` ctor defaults to ``False``, but the path was fragile if the default ever changed and silently ignored the user's setting choice. Read the bool from ``self.settings_snapshot`` (which ``MCPSearchStrategy.__init__`` already accepts and stores) via the shared helper, and pass it explicitly. New tests: * ``tests/utilities/test_js_rendering.py`` — unit tests for the helper (default, true/false from snapshot, string coercion, return-type pinning). * ``tests/mcp/test_mcp_strategy.py:TestDownloadContentJSRendering`` — three regression tests asserting the MCP download tool forwards the setting from the snapshot to ``ContentFetcher`` (off, on, default off when no snapshot). * test(settings): teach integrity checker to recognise get_bool_setting_from_snapshot PR #3974 introduced ``utilities/js_rendering.py`` which consumes ``web.enable_javascript_rendering`` via ``get_bool_setting_from_snapshot``. The static-analysis regex in ``test_no_orphaned_settings`` already handles ``get_setting_from_snapshot`` and ``get_bool_setting`` but had no pattern for the ``get_bool_setting_from_snapshot`` combination, so the new setting was incorrectly flagged as orphaned. Add the missing pattern so the test recognises the consumer. The setting is a real, used config knob and must not be added to ``KNOWN_UNUSED``. * docs(content-fetcher): how to enable JS rendering + honest benchmark note User-facing settings copy + code-level parameter docstrings were silent about (a) how to actually enable JS rendering after the default flip and (b) what evidence backs disabling-by-default. Add both, with explicit honesty about the empirical limits. Concretely: * ``changelog.d/3826.bugfix.md`` (new) — towncrier fragment so the next release notes mention the default change. The ``recommend-release-notes`` pre-commit hook had nudged us about this; we hadn't addressed it. * ``web.enable_javascript_rendering`` setting description in ``defaults/default_settings.json`` — adds the explicit "to enable: install Chromium via ``playwright install --with-deps chromium``" step, then a transparent caveat about the evidence: our Chromium-on vs Chromium-off benchmark comparisons were mostly accidental (some dev instances had it installed, routine Docker runs did not), and JS rendering did not measurably improve research quality. Most regular benchmark runs are on Docker without Chromium anyway. * ``ContentFetcher.__init__`` docstring (``content_fetcher/fetcher.py``) — same caveat at the code level so future readers see the same framing. * ``fetch_and_extract`` / ``batch_fetch_and_extract`` docstrings (``research_library/downloaders/extraction/pipeline.py``) — same note, since these are the two functions a non-Docker user might call directly. * ``utilities/js_rendering.py`` module docstring — expanded with a "Why disabled by default" section covering both the #3826 mechanism and the benchmark observation. * ``tests/settings/golden_master_settings.json`` — regenerated. No code-behaviour change. Tests still pass (781 across settings, js_rendering helper, fetcher, and extraction pipeline). Honest framing of "mostly accidental, limited, no measurable improvement" intentionally avoids overclaiming. * chore(content-fetcher): address AI-review recommendations Two non-blocking nits from the AI code review on PR #3971: 1. Dead parameter ``mocker_patch`` on the ``TestInitFullSearchForwardsSettingsSnapshot._make_engine`` helper in ``tests/web_search_engines/test_search_engine_base.py`` — never referenced inside the function body. Drop the param and update the two call sites; the docstring now points to where mocking actually happens (in the caller's ``with patch(...)`` block). 2. Add an inline comment on ``AutoHTMLDownloader.__init__``'s ``enable_js_rendering=False`` default explaining the rationale (production Docker image ships without Chromium, see issue #3826). The ``ContentFetcher`` docstring already covers this for the layer above, but a direct caller of ``AutoHTMLDownloader`` (e.g. a test or downstream library) would otherwise have to chase the explanation through commit history. No behaviour change. 62 tests pass (test_search_engine_base.py + test_playwright_html.py).	2026-05-16 14:20:14 +02:00
LearningCircuit	5d60f3d00e	chore(labels): add 'code-ready' as a human-only signal label (#4068 ) Introduces a new repository label, ``code-ready``, that communicates a human reviewer's judgement that a PR's code changes look technically ready — i.e. the implementation, tests, docs and review nits are all addressed — while CI and an approving codeowner review may still be outstanding. The label is meant to bridge the gap between "needs review" and "auto-merge": a maintainer can apply it after walking the diff to signal that the code side is good, even though merge is still blocked on CI runs finishing or an approver clicking the button. Critically, this label must be applied manually only, never by automation. The motivation is judgement, not heuristics — a workflow that flips it based on "all CI green" or "no unresolved comments" would dilute the signal and undermine the human-in-the-loop intent. The labels.yml entry is grouped under a new "Human-only signal labels" section with an explicit comment saying so, and the label description itself includes "Apply manually — never auto-applied" so the rule is visible everywhere the label surface. Verified before adding: * No existing workflow (``pr-triage.yml``, ``label-fixed-in-dev.yml``, ``advanced-search-reminder.yml``, ``sync-main-to-dev.yml``, ``danger-zone-alert.yml``, ``compose-published-smoke.yml``) applies ``code-ready``. Each workflow's ``addLabels(...)`` calls use a closed set of specific label names — no heuristic ever resolves to ``code-ready``. * No naming collision with existing labels (``code-ready`` is new; ``auto-merge``, ``needs-codeowner-review``, ``awaiting-codeowner`` are distinct concepts). * Label created live on GitHub via ``gh label create`` before this commit; this PR brings ``labels.yml`` into source-of-truth sync. Color: ``006b75`` (teal) — distinct from the existing yellow/green review-state palette so it reads as a separate axis from the codeowner-review lifecycle.	2026-05-16 14:18:09 +02:00
LearningCircuit	2723331f67	chore(ci): cut workflow-status.md regen diff noise (#4066 ) The auto-regenerated workflow-status.md on every version-bump PR produced ~15 rows of churn that wasn't signal: - Status emoji column flipped between ✅ / · / ⏳ depending on which event last ran (e.g. backwards-compatibility flipped ✅→· because the most recent run was a skipped workflow_call, not because it regressed). The live badge column to its right is the source of truth for current status anyway, and run history lives in GitHub Actions itself. Drop the column. - Last activity buckets oscillated across this week / last week / 2 weeks ago for healthy daily/weekly workflows. Coarsen to last 30 days / 1-3 months ago / 3-6 months ago / long ago / never so a healthy workflow sits in one bucket indefinitely. Net effect: regenerations in steady state produce zero diff. Real signal (new stale/disabled workflows, aging past the 30d bucket) still surfaces.	2026-05-16 13:20:21 +02:00
LearningCircuit	8597e429cc	Improve UI tests + CI: artifact uploads, WebKit skip narrowing, settle-wait migrations (#4061 ) * ci(responsive): restore artifact uploads and fix dead post-results gate The Responsive UI workflow lost its per-viewport artifact uploads (the explanatory comment around lines 206-209), so PR/release failures were un-debuggable - no screenshots, no test output. The downstream `post-results` job was also gated on `github.event_name == 'pull_request'`, which can never be true because the workflow has no `pull_request` trigger; the combined-report aggregator therefore never ran. Restore the upload step using `if: always()` + `if-no-files-found: ignore` (so server-startup failures still upload logs and quiet runs don't fail the step) and rewrite the `post-results` gate to `if: always()`. Artifact name matches the existing `ui-test-results-` pattern expected by the combined-report glob. test(playwright): narrow WebKit closed-context skip to webkit only (#4060) The catch at all-pages-mobile.spec.js:372 was previously calling `test.skip(true, ...)`, which skipped the test for every browser - so any non-WebKit error path also silently bailed out of the mobile-nav overlap assertion. Only Mobile Safari / WebKit is known to hit the `Target page, context or browser has been closed` race, so gate the skip on `browserName === 'webkit'`. Other browsers now re-throw and surface the regression. Also broaden the matched error message to include `Execution context was destroyed`, the alternate wording the same upstream race uses in newer Playwright versions. Skip annotation references issue #4060 so the skip is grep-able and can be removed when the underlying race is fixed or the DOM walk is restructured. * test(ui): add waitForStable helper to auth_helper.js Replaces ad-hoc `await delay(N)` sleeps used to "let the UI settle" after an action. The helper waits for a selector to be visible, then waits for its bounding box to stop changing across requestAnimationFrame ticks (bounded to 3s in-page). The final `idleMs` pause is configurable. JSDoc explicitly notes when NOT to use it: don't replace `delay()` calls that exercise wall-clock behavior (e.g. a 10s timer the app is supposed to respect). Those tests need real elapsed time, not a settle wait. Exported as a sibling of `safeClick` to keep Puppeteer test imports tidy. * test(ui): replace settle-delays with state-based waits in two puppeteer tests `test_research_cancellation.js` had 7 hardcoded `await delay(...)` calls and `test_form_validation_aria_ci.js` had 19. The vast majority were "give the UI a moment to settle" pauses with no real signal attached, so they slowed CI and quietly hid races whenever the runner was a beat slower than the chosen delay. For each call: - post-`navigateTo` 500ms sleeps -> `waitForSelector('#query', { visible: true })` - post-validation-trigger sleeps -> `waitForFunction` polling the `ldr-field-invalid` class to appear (or clear, when the test expects validation to pass) - post-focus 100ms -> `waitForFunction(() => document.activeElement?.id === 'query')` - post-cancel-click sleeps -> `waitForFunction` polling for `cancel\|stop\|suspend` to appear in the status text - post-typing 200ms -> `waitForFunction` polling for the typed value to land The one delay we kept: the explicit 10-second wait in the mid-stage cancellation test (`test_research_cancellation.js`), which deliberately exercises elapsed-time behavior of the research progress flow. That is not a settle wait and must stay wall-clock. Polling waits all use `.catch(() => {})` to preserve existing behavior when a selector or state never appears (the assertions further down handle the failure case more informatively than a hung wait would). * docs(pr-template): document label-gated CI workflows Several heavy E2E workflows are label-gated and silently no-op on PRs without the right label - new contributors had no way to know. Add a "CI test coverage" section to the PR template enumerating each gated workflow and the label that triggers it. No CI behavior change; documentation only. * test(form-validation): make waitForQueryReady detect validator attachment Local smoke-test (9 tests, ran against `scripts/dev/restart_server.sh`) exposed two latent races that the prior `await delay(500)` had been quietly hiding: 1. `waitForQueryReady` returned as soon as `#query` was visible, but the FormValidator class is registered against the field a tick later (research.js setupEventListeners). Waiting for the `.ldr-field-error` sibling that addValidation() inserts is the actual signal that the validator is wired and the submit handler will take the early-return path on an empty query. 2. `noLoadingUiOnEmptySubmit` ran after `errorClearsOnValidSubmit`, which typed a real query and triggered a real submit (the fetch fails but creates `.ldr-loading-overlay` first). `navigateTo` skipped the re-navigation because we were already on `/`, so the stale overlay carried over. Force a real `page.goto` for this test so it asserts about a fresh page, not the leftover state of the previous test. After the fix the suite passes 9/9 in ~1s (vs ~4.5s with the old delays). * chore(labels): rewrite test-trigger label descriptions for AI reviewer auto-apply The Friendly AI code reviewer (.github/workflows/ai-code-reviewer.yml) auto-applies labels based on the labels' descriptions in the repo. The existing test:puppeteer / test:e2e / ldr_research / ldr_research_static descriptions were passive ("Triggers Puppeteer E2E tests on this PR"), which doesn't guide the reviewer on when to apply them. Rewrite them in the same imperative, bias-toward-action style used by benchmark-needed ("Apply if a change risks degrading performance — when in doubt, add it. Run compare_configurations()"): - test:puppeteer + test:e2e — apply for any PR touching the web stack - ldr_research / ldr_research_static — apply for substantive code/arch changes, with the static variant biased even more toward "run it" since it uses the cheaper model Also add the test:* labels to labels.yml so they become version-controlled (previously they existed only on GitHub, created out-of-band). label-sync is additive and will overwrite the GitHub descriptions on next main push.	2026-05-16 13:17:28 +02:00
LearningCircuit	ec91c5c716	fix(pdf): render CJK characters in exported PDFs (#4055 ) (#4058 ) * fix(pdf): render CJK characters in exported PDFs (#4055) The PDF stylesheet hard-coded a Latin-only font stack, so WeasyPrint silently dropped Chinese/Japanese/Korean glyphs from downloads even when they rendered fine in the HTML view. Add Noto Sans CJK / Microsoft YaHei / SimSun fallbacks for both body and monospace families, and install fonts-noto-cjk in the Docker runtime stage so the slim base image actually has glyph coverage. Non-Docker installs still need a CJK font package on the host. * fix(pdf): broaden CJK font fallbacks + document host requirement Extend the PDF CSS font stack to cover macOS (PingFang, Hiragino, Apple SD Gothic Neo) and additional Windows families (Microsoft JhengHei, Yu Gothic, Malgun Gothic), so pip installs on those platforms render CJK without any user action. Document the per-distro CJK font install command in install-pip.md and add a new FAQ entry. Linux pip/server hosts still need fonts-noto-cjk installed manually — there is no in-code way to fix that without bundling ~20 MB of fonts into the wheel. * test(pdf): assert CJK glyph embedding end-to-end (#4055) Round-trip CJK text through markdown → PDF → pypdf extract_text so CI fails if fonts-noto-cjk is ever removed from the Docker runtime image. The pytest-tests job runs inside that image, so the test sees the installed fonts; bare hosts without CJK fonts skip the assertion via an fc-list gate. Does not catch CSS-fallback-stack regressions on its own: fontconfig auto-substitutes a CJK family on Linux even for a Latin-only stack. The CSS fallbacks still matter on Windows/macOS, which CI does not exercise — documented in the test docstring.	2026-05-16 13:12:28 +02:00
LearningCircuit	41ee83c54c	test(security): SSRF edge-case coverage and weak-test cleanup (#4062 ) * test(quality): strengthen weak SSRF tests Several existing SSRF tests asserted on mock call_count / call_args or range-membership tautologies rather than the validator's real behavior. A regression in the underlying production code could pass these tests. Rewrites (no production code changes): - test_ssrf_redirect_bypass.py: replace ``test_each_hop_validated`` in both ``safe_get`` and ``safe_post`` variants. The previous version asserted only on ``mock_validate_url.call_count == 3``; the new version exercises the real validator with a third hop pointing at ``http://10.0.0.5/internal`` (a private IP literal) and asserts that ``ValueError`` is raised before the third request is fetched. - test_ssrf_redirect_bypass.py: replace ``test_send_respects_`` for ``allow_localhost`` and ``allow_private_ips``. Previously these patched ``validate_url`` to return True and verified the kwargs; the rewrites use the real validator with IP literals so a regression in is_ip_blocked's flag handling would surface. Adds ``test_send_blocks_loopback_without_allow_localhost`` to prove the flag is actually gating behavior, not just being passed through. - test_ssrf_debug_hardening.py: rewrite three of four ``TestFullSearchSSRFValidation`` tests to drop the ``validate_url`` mock. Real validator blocks the metadata-IP literal (``169.254.169.254``) directly; the public hostname uses a DNS mock. - test_ssrf_validator_high_value.py: rewrite ``TestGetSafeUrl`` pass-through and unsafe-default tests to use the real validator (DNS mock for public host; literal RFC1918 IP for unsafe case). - test_ssrf_validator_behavior.py: replace ``TestBlockedIPRanges`` range-containment tautologies with ``TestPrivateIpRangesBehavior``, a single parametrized test that asserts ``is_ip_blocked`` returns True for an interior address of every entry in ``PRIVATE_IP_RANGES`` (18 cases, covering all 15 ranges plus their wraps). Removing any entry from ``ip_ranges.py`` is now detected by a specific failure. - test_ssrf_validator_extended.py: remove ``test_is_frozenset`` — a type-only check on ``ALWAYS_BLOCKED_METADATA_IPS``. The canonical exact-membership test already lives in ``test_ssrf_validator_high_value.py::TestConstants``. Each rewrite was mutation-checked: e.g. removing per-hop validation from ``safe_requests.py`` causes the redirect tests to fail with ``StopIteration`` (third hop attempted), and removing a range entry from ``ip_ranges.py`` flips the corresponding ``TestPrivateIpRangesBehavior`` case to failure with the range label in the assertion message. Net: 5 files modified, +130 lines / -68 lines, 565 SSRF tests pass. test(security): add SSRF validator edge-case coverage Adds eight new test classes pinning previously-uncovered branches of ``src/local_deep_research/security/ssrf_validator.py`` and ``src/local_deep_research/security/ip_ranges.py``. Each class documents the production line(s) it exercises and the mutation it would catch. - TestUnspecifiedIPv4Blocked — ``validate_url`` end-to-end coverage for ``0.0.0.0/8`` (ip_ranges.py:24). Existing tests covered only ``is_ip_blocked``; this pins the full parser → IP-literal → block path. Parametrized across three interior addresses. - TestDnsResolutionNonGaierror — the generic ``except Exception`` handler at ssrf_validator.py:310-312 fires when ``getaddrinfo`` raises anything that is not a ``gaierror`` (PermissionError from a restricted environment, OSError, RuntimeError). Asserts the ``"Error during hostname resolution"`` log line and a False return. - TestRfcForbiddenControlChars — RFC_FORBIDDEN_URL_CHARS_RE (ssrf_validator.py:63) contains ``\\x00-\\x1f\\x7f``. Backslash and ``\\x00`` were already heavily tested; this parametrizes the run ends ``\\x01``, ``\\x1f``, and ``\\x7f`` (DEL). - TestAlternateIpHexForm — single-DWORD hex (``0x7f000001``) is not parseable by ``ipaddress.ip_address``, so the validator falls through to DNS via the ``except ValueError: pass`` at ssrf_validator.py:269-271. Mocked DNS returns the canonical ``127.0.0.1``, which the post-DNS check rejects. - TestPortEdgeCases — ``:65536`` exercises the urllib3 ``LocationParseError`` branch; ``:0`` parses but the host ``127.0.0.1`` is still IP-blocked. - TestMultipleAtSignsContract — locks in that urllib3's ``parse_url`` resolves ``http://user:pass@127.0.0.1@1.1.1.1/`` to host ``1.1.1.1`` per RFC 3986 last-``@`` rule, and that the validator agrees. If urllib3 ever changes this, the parser-differential defense at ssrf_validator.py:224-228 needs re-validation; this test surfaces the drift. - TestUserinfoContainsIpShape — documents that ``http://127.0.0.1@evil.com/`` is NOT a bypass: urllib3 reports host ``evil.com``, requests connects there, the ``127.0.0.1`` is userinfo only. Pins the urllib3 contract. - TestIpv6ZoneIdBlocked — ``[fe80::1%eth0]`` and the percent-encoded ``[fe80::1%25eth0]`` form. Python 3.9+ accepts zone IDs in ``ipaddress.ip_address`` so the validator catches this directly via ``fe80::/10`` in PRIVATE_IP_RANGES (ip_ranges.py:22) rather than via DNS gaierror. Also removes the previous ``TestDocumentation`` class which contained a single ``@pytest.mark.skip`` placeholder with no assertions; the security-model documentation lives in the module docstring. Mutation checks performed during development: - Remove ``0.0.0.0/8`` from PRIVATE_IP_RANGES → 4 tests fail (the 3 TestUnspecifiedIPv4Blocked cases plus the 0.0.0.5 case in the parametrized TestPrivateIpRangesBehavior added in the preceding commit). Restored. - Narrow RFC_FORBIDDEN_URL_CHARS_RE to drop ``\\x7f`` → 1 test fails (the ``\\x7f`` parametrize case). Restored. - Remove per-hop validation from ``safe_requests.py`` → the ``test_each_hop_validated`` rewrites from the preceding commit fail. Restored. Net: +16 new parametrized test cases across 8 classes; 565 SSRF tests pass; no production code changes.	2026-05-16 10:02:42 +02:00
LearningCircuit	d88ba602ca	test(e2e): regression for hidden context_window blocking Start Research (#3909 ) (#4059 ) PR #4051 fixed the bug where the context_window number input had a step="512" HTML5 constraint while living in a display:none container. Any stored value not on the 512-grid (e.g. the reporter's 25000) failed validation; because the field cannot be focused while hidden, the browser silently aborted submit and the Start Research button appeared to do nothing. Add a Puppeteer test that pins the behavior so the constraint can't silently come back. The test: 1. Loads the research page (cloud-provider default keeps the context_window container hidden). 2. Sets #context_window.value = "25000" — the exact stored value from the reporter's bug. 3. Asserts the container is hidden (precondition for the regression). 4. Asserts research-form.checkValidity() returns true. No actual research is submitted — checkValidity() exercises the same HTML5 validation path that drives the silent-abort bug, without consuming LLM credits or interacting with the rest of the e2e flow. If step="512" (or any other constraint that 25000 violates) is ever re-added to the input, the test fails with a clear message pointing back to PR #4051.	2026-05-16 10:00:44 +02:00
Rin	8de5d971d6	refactor(settings): use specialized exception classes for env settings (#3838 ) Improved error observability and alignment with TRY003 standards by replacing generic ValueError with specialized exception classes. Updated type hints to use Path \| str and Sequence as suggested in review. Co-authored-by: Daniel Petti <djpetti@gmail.com>	2026-05-15 23:56:21 +00:00
Rahul	47d370c45d	fix(notifications): allow signal:// Apprise scheme (#4006 ) (#4056 ) Signal notification URLs (signal://host:port/from/to) are rejected by NotificationURLValidator because `signal` is missing from ALLOWED_SCHEMES. The user-facing error is "Test failed: Invalid notification service URL", which is the Unsupported-protocol path at notification_validator.py:246. Apprise (the library LDR delegates to) ships a Signal notification plugin that targets a signal-api-rest container. For non-http schemes the validator intentionally skips the private-IP host check (notification_validator.py:270) and lets Apprise do its own URL parsing, so adding signal does not weaken the SSRF posture — the LAN-host pattern in the bug report (signal://192.168.50.20:8739/…) round-trips to Apprise unchanged. Adds two regression tests: - test_apprise_signal_url_accepted: end-to-end validate_service_url against a LAN-IP Signal URL. - TestClassConstants gets one extra assert that "signal" is in ALLOWED_SCHEMES, keeping the contract list aligned with the other Apprise schemes the file exercises. Closes #4006 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 19:07:13 -04:00
LearningCircuit	e77b48c813	test(e2e): tolerate brief LLM output in research export test (#4053 ) The 'should export and display research output' test contains an explicit narrative (test_deep_functionality.js:518-540) describing how the CI release pipeline's small free-tier LLM (Gemini 2.5 Flash Lite via OpenRouter) occasionally returns very brief, non-markdown output even when the research workflow completes end-to-end — and that this should be treated as a transient upstream content-quality flake, not a code regression. That branch logs a warning instead of failing. But the trailing assertion at the end of the same test still hard-checks 'expect(resultContent.length).to.be.greaterThan(100)', which directly contradicts the documented tolerance — an 89-char LLM response (real example from CI run #2385) makes the assertion fail despite the workflow mechanics having been validated. Drop the length assertion and keep only 'expect(resultContent).to.not.be .null', which still catches the real regression (results page didn't render) without flaking on upstream LLM brevity.	2026-05-15 01:46:14 +02:00
LearningCircuit	35290b2d13	fix(research-form): relax context_window step so Start Research submits (#4051 ) The context_window input has min=512 max=131072 step=512 and lives in a display:none container that is only revealed for local providers. Any stored value not aligned to the 512-step grid (e.g. 25000) fails HTML5 validation; because the field is not focusable while hidden, the browser silently aborts submission with no log line — the Start Research button appears to do nothing. Lower the step to 1 so any in-range integer is accepted. min/max still bound the value and the saved setting is unchanged. Fixes #3909	2026-05-15 01:30:55 +02:00
LearningCircuit	1ab65609db	ci(release): drop credential persistence on cleanup-changelog checkout (#4050 ) The `Checkout the release commit` step in the `cleanup-changelog` job defaulted to `persist-credentials: true`, leaving the job's GITHUB_TOKEN in `.git/config` for the duration of the run. If any later step in this job reads `.git/config` (artifact upload, third-party action that prints/dumps the repo state, etc.), the token leaks. Closes the only open `zizmor/artipacked` finding (code-scanning alert #4655). No functional impact: the only step that needs to push is `peter-evans/create-pull-request`, which already takes an explicit `token:` input and does not rely on the persisted git credential helper. Also dismissed code-scanning alert #7763 (CVE-2026-3298) via the GitHub API — that CVE is Windows-only per PSF advisory; this image is Linux, which Grype's package-version matcher does not account for. Alert #7764 (CVE-2026-7210) is left open as a tracking signal until Python 3.14.6 ships upstream (current latest is 3.14.5; no patched image exists yet).	2026-05-15 01:20:17 +02:00
LearningCircuit	a2f7f6ead6	fix(ci): drop environment: ci from reusable workflow (#4049 ) The `environment: ci` declaration on the research job has no functional value for LDR — the `ci` Environment has zero protection rules and zero environment-scoped secrets (verified via gh api). All required secrets (OPENROUTER_API_KEY, SERPER_API_KEY) are repo-level. The decorative env attachment becomes a problem for any external repo that calls this reusable workflow: GitHub silently auto-creates an empty `ci` Environment in the caller's repo, polluting their environments namespace. Dynamic environment via expression (e.g. `environment: ${{ inputs.env \|\| '' }}`) isn't a viable alternative — `actions/runner` Issue #2610 documents that expression-in-environment doesn't reliably evaluate input context, and an empty-string value still auto-creates an empty-named environment. Simplest correct fix is to delete the line. LDR's own callers (issue-research.yml, e2e-research-test.yml) keep working unchanged because they never depended on env-attached functionality. External callers no longer get the env-pollution side effect. This unblocks a follow-up `ldr-automations` toolkit repo that will expose meta-reusable workflows wrapping this one for other projects.	2026-05-15 01:11:15 +02:00
github-actions[bot]	d6d9ceffac	chore: auto-bump version to 1.6.11 (#3961 )	2026-05-14 23:07:03 +02:00
LearningCircuit	3d0b7bb5f9	review: hoist asyncio+threading imports to module level + Wave 7 doc (#4048 ) Addresses the AI Code Review nit on #4047: ``import threading`` (and the sibling ``import asyncio``) lived inside the ``_close_base_llm`` function body. There's no circular-import or optional-dependency reason to defer them; moving them to the top of the module improves readability and static analysis. Also extends ``docs/developing/resource-cleanup.md`` with a Wave 7 entry documenting: - The in-running-loop ``aclose`` skip bug (this PR's fix). - The healthcheck ``pidfd`` leak (Dockerfile change in the same PR). - The three gaps the broader audit during this PR surfaced as follow-up rather than in-scope work: ``OllamaEmbeddings`` httpx (same FD class as ChatOllama, no close path in langchain wrappers), ``auth_db`` / ``journal_quality`` engines escaping ``shutdown_databases``, and three RAG SSE endpoints constructing ``LibraryRAGService`` before the generator without a ``finally`` close. Also captures the negative results from the audit (non-Ollama providers safe via shared lru_cache, no subprocess pidfd risk, no raw event-loop creation, all ``open()`` calls inside ``with``) so a future contributor reading the history sees what was checked and ruled out.	2026-05-14 22:58:57 +02:00
LearningCircuit	04de8597ec	fix(llm,docker): close ChatOllama async httpx client when called from a running loop + healthcheck timeout (#4047 ) * fix(llm): close ChatOllama async httpx client even when called from a running loop Regression of #3816 with #3855's coverage gap. ``_close_base_llm`` used to skip the async-client close when ``asyncio.get_running_loop()`` succeeded and document that the loop owner would close instead — but no loop-owner cleanup code exists in the project, so the inner ``httpx.AsyncClient`` (and its ``epoll_create`` FD) was silently abandoned. Long-running deployments accumulated ``anon_inode:[eventpoll]`` FDs until the process hit its ``ulimit -n``. The skip path fires under the default ``langgraph-agent`` strategy too: LangGraph dispatches some tool steps via asyncio internally, so close calls reached from a sync ``finally`` can still land inside a live loop. Cleanup now runs in a brief daemon thread that owns its own loop, so ``asyncio.run(aclose())`` works regardless of the caller's loop state. A bounded 5-second ``join`` keeps it from blocking shutdown when the Ollama server is unresponsive; if the join times out, ``_ldr_closed`` is left unset so a later call retries the close, and a WARNING surfaces in logs so the leak is visible instead of silent. Adds: - A regression unit test (``test_closes_async_inside_running_loop_via_thread``) that calls ``_close_base_llm`` from inside an ``asyncio.run`` driver and asserts ``aclose`` actually ran. - An FD-growth guard (``test_no_fd_growth_when_closed_inside_running_loop``) modeled on the existing ``test_no_fd_growth_across_repeated_close_cycles`` but exercising the in-loop close path. - An idempotency test and a timeout test for the new thread path. * fix(docker): add timeout to healthcheck urlopen so failed checks don't leak children ``urllib.request.urlopen('http://localhost:5000/api/v1/health')`` had no ``timeout=`` argument, so when the app slowed down (FD exhaustion, slow DB checkpoint, anything else) the call hung forever. Docker's ``--timeout=10s`` only SIGKILLs the ``sh -c`` wrapper; the python child got reparented to PID 1 and kept hanging on the urlopen, each one contributing a ``pidfd`` and a TCP socket against the app's listen socket. On a stuck container we observed 21 live + 113 zombie healthcheck pythons and 64 ``pidfd`` FDs on PID 1. ``timeout=8`` lets urlopen return/raise inside Docker's 10s budget so the child exits cleanly and gets reaped. Pairs with the eventpoll-FD fix in ``_close_base_llm``: that one removed the dominant 91% of the leak, this one removes the 6% remainder and the zombie pile-up. Adds a towncrier fragment covering both fixes.	2026-05-14 22:50:40 +02:00
LearningCircuit	1651587d9c	chore(alembic-runner): drop stale isolation_level="IMMEDIATE" references (#4039 ) Two docstring/comment references in `alembic_runner.py` cite SQLCipher's `isolation_level="IMMEDIATE"` as the reason the head short-circuit matters. Production engines actually use `isolation_level=""` (deferred): - `src/local_deep_research/database/encrypted_db.py:378` (user-DB engine) - `src/local_deep_research/database/encrypted_db.py:450` (encrypted engine) The `IMMEDIATE` default in `_make_sqlcipher_connection` (line 280) is the helper-function default, but the production callers override it to "" to avoid login-path contention. The short-circuit is still load-bearing — `engine.begin()` opens a write transaction regardless of isolation level, and SQLite takes a RESERVED lock as soon as the first DML lands inside. Just the cited mechanism was wrong. Rewords both comments to reflect the actual lock-acquisition rule (RESERVED on first DML), independent of the driver isolation_level. Pure documentation change — no behavior delta. Existing short-circuit tests still pass.	2026-05-14 17:29:35 +02:00
LearningCircuit	a6287a4362	fix(security): pin towncrier to exact version and bump Python to 3.14.5 (#4046 ) * fix(security): resolve Scorecard pin alerts and bump Python to 3.14.5 - Pin `pip install towncrier` to a single version with `--hash` (both occurrences in release.yml), resolving Scorecard Pinned-Dependencies alerts #7761 and #7762. - Bump the Dockerfile base image from python:3.14.4-slim to 3.14.5-slim (with new pinned manifest digest). 3.14.5 bundles libexpat 2.8.0 (gh-149017), which is required to mitigate CVE-2026-7210 — Grype alert #7760. * chore(release): drop hash-pins on towncrier, keep exact version pin Per review feedback: hash-pinning a build-time CLI like towncrier adds maintenance burden without meaningful supply-chain benefit. The rest of this repo already uses exact-version pins (`pdm==2.26.2`, `pyyaml==6.0.3`, etc.) which Scorecard's PinnedDependenciesID rule accepts — the original alerts fired only because `~=24.8` is a fuzzy version range.	2026-05-14 17:24:19 +02:00
LearningCircuit	f664221ce4	chore(observability): surface WAL-dispose failures + document LDR_APP_DEBUG sensitivity (#4042 ) Two small follow-ups from the #3976 investigation. connection_cleanup.py: bump dispose-failure log from debug to warning. The 30-min periodic pool dispose at web/auth/connection_cleanup.py:154-171 is the workaround for ADR-0004's SQLCipher + WAL handle leak. Pre-fix, _checkpoint_wal/engine.dispose() failures were swallowed at logger.debug, hiding silent drift. Now surfaces at WARNING with the exception TYPE NAME only (matches the _report_silent_exception pattern in utilities/log_utils.py:146-194, which deliberately drops the exception value to avoid leaking sensitive locals through the sensitive-logging hook). New test test_dispose_failures_surface_as_warnings locks in: - the warning fires and names the user + exception type - the exception's message text does NOT leak docs/CONFIGURATION.md: document that LDR_APP_DEBUG=true also enables Loguru diagnose=True on every sink, which materialises local-variable values into exception traces. Those traces can include credentials, decrypted user content, and other sensitive locals. Documentation-only. Refs: #3976	2026-05-14 15:26:33 +02:00
github-actions[bot]	f928f4cc5c	🤖 Update dependencies (#4043 ) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2026-05-14 14:04:25 +02:00
Ishitta	2808f0fa9d	feat(benchmarks): add statistical functions module (#4029 ) * feat(benchmarks): add statistical functions module for benchmark evaluation * test(benchmarks): add unit tests for statistics module * fix(benchmarks): add input validation to statistical functions * feat(benchmarks): wire Wilson CI into metrics, reports, and live progress	2026-05-14 09:00:04 +00:00
LearningCircuit	074285a26d	fix(release): enrich AI release notes + render changelog in release flow (#4035 ) * fix(release): enrich AI release notes + render changelog in release flow Fixes the v1.6.10 release notes degradation where: 1. docs/release_notes/1.6.10.md was never created (no automation rendered changelog.d/ fragments before/at release time) 2. AI summary call returned 2xx but empty content with finish_reason=length create-release job now: - Sparse-checks-out changelog.d/ + pyproject.toml, installs towncrier (no PDM needed — towncrier reads pyproject directly), renders docs/release_notes/<version>.md before composing the release body. Guards against an empty fragment directory. - Fetches every merged PR's title + body in a single GraphQL round-trip and feeds them to the model. - Fetches the full diff between the previous /releases/latest tag and the new tag via the compare API, filters lockfiles/generated docs/ SBOM/static assets/binary patches, caps at 700k chars, strips NUL bytes before jq --rawfile. - Bumps AI_MAX_TOKENS default 4000 -> 64000 (matches the AI code reviewer's working budget). Adds AI_REASONING_MAX_TOKENS=16000 so Kimi K2 Thinking cannot burn the entire output budget on reasoning tokens — the root cause of v1.6.10's empty .content. - Adds .reasoning to the response-parsing fallback chain after .content and .reasoning_content. OpenRouter normalizes Moonshot's thinking trace to .reasoning (not .reasoning_content), which is why v1.6.10's diagnostic showed message keys "content, reasoning, reasoning_details" with no usable extraction path. - Enforces a 750k char overall prompt cap so PR descriptions + diff can't blow Kimi's 262k token context window. - Truncates the final release body to 124,400 chars to stay under GitHub's documented 125k release-body limit (HTTP 422 otherwise; gh CLI does not pre-validate). - Rewrites the SUMMARY_PROMPT to ask for a helpful narrative (not a TL;DR), with length sized to the material. New cleanup-changelog job opens a PR on main with the consumed fragments + rendered release-notes file, since the create-release runner is throwaway. Branch protection on main allows the PR (0 required reviews, 0 required checks). * chore(release): persist 1.6.10 changelog render + clear consumed fragments The v1.6.10 release shipped without docs/release_notes/1.6.10.md because no automation rendered changelog.d/ fragments at release time (see release.yml change in this PR for the fix going forward). Persists the render now so 1.6.11's release does not re-consume the same fragments. Renders the v1.6.10 release_notes file from the 30 fragments that were in changelog.d/ at v1.6.10 cut time, and removes those fragments from changelog.d/. The rendered content also backs the v1.6.10 GitHub release body update. * fix(release): address AI review findings (UTF-8, race, GraphQL cap) - UTF-8 character-aware truncation. Replace `head -c` (byte-oriented, splits multi-byte UTF-8 mid-sequence) with Python-based character truncation for the diff (700k), prompt (750k), and release body (124,400) caps. Matters because towncrier renders emoji section headers (💥/🔒/✨/🐛) that appear in diffs of docs/release_notes/; mid-emoji splits produce invalid UTF-8 that jq --rawfile then refuses to encode and the GitHub Release API rejects with HTTP 422. - cleanup-changelog race fix. Pin checkout to ${{ github.sha }} instead of `ref: main`. If a PR with new fragments merged into main between create-release and cleanup-changelog, `ref: main` would consume those new fragments into THIS release's docs/release_notes file and delete them prematurely — stealing them from the next release. github.sha is the commit the workflow ran against, so the set of fragments matches what create-release rendered. - GraphQL query node-count cap. Limit PR-description batch to 100 PRs per query and log a warning if a release exceeds that (LDR's typical release is ~20-30 PRs, well under). Unbounded fan-out could trip GitHub's GraphQL complexity ceiling on a huge release. - Compare API 300-file warning. Log when .files[] hits the 300-file boundary so a future release's missing-file diff can be diagnosed quickly without rerunning. The cap is a documented GitHub limit. * fix(release): address review2 — PR cap, trap leak, base pin, prompt clarity - Raise PR-fetch cap 100 → 200. v1.6.10 had 144 unique PRs (LDR's dependency-bump traffic is heavy); the previous 100 cap would have silently dropped ~30% of PR descriptions from the AI prompt. The 750k-char overall prompt cap still protects context window. - Hoist COMPARE_JSON mktemp above the trap registration so the temp file is cleaned up even if jq throws under set -e between mktemp and the manual rm. ${DIFF_FILE}.clean (the NUL-strip staging path) also added to the trap; rm -f tolerates the missing-file case. - Pin base: main on peter-evans/create-pull-request. On tag-triggered runs github.sha may not sit on main HEAD, and the action's default-branch resolution could pick a non-main base. We always want the cleanup PR to target main. - Clarify SUMMARY_PROMPT section markers. The prior text said inputs are "separated by `----- SECTION -----` markers" using SECTION as a placeholder; a literal-minded model could look for that exact string and find none. Now lists the actual marker forms explicitly. - Add PREV_TAG == RELEASE_TAG guard. On a workflow re-run after the release exists, /releases/latest returns the just-created tag, making the diff empty. Falls back to the second-most-recent stable release. * fix(release): jq --arg for re-run guard + surface jq errors + doc updates Workflow fixes from a final pass: - Re-run guard now passes RELEASE_TAG to jq via `--arg rel` instead of shell-interpolating it into the program text. RELEASE_TAG is already validated as bare semver upstream so this is defense-in-depth, but --arg keeps shell quoting and jq quoting fully separated regardless of what RELEASE_TAG ever ends up containing. - Compare-API jq pipeline no longer swallows stderr or masks the exit code. Previously `jq ... 2>/dev/null \|\| true` would silently produce an empty diff and a "Diff size: 0 bytes" log line on any jq failure, giving a maintainer no actionable signal. Now an explicit if-not check logs a WARNING with jq's stderr intact and ensures the diff file is empty. Doc updates for the new release flow: - changelog.d/README.md: drop the obsolete "maintainer runs `pdm run towncrier build`" instructions; describe the automated render + follow-up cleanup PR. Keep the local --draft / --keep preview tips for fragment iteration. - docs/RELEASE_GUIDE.md: rewrite the maintainer flow (steps 1-3 of the old "Render + bump + commit both" sequence are obsolete — the workflow handles rendering now). Add the cleanup PR merge as a final checklist item. Update the body composition description from "AI TL;DR" to AI narrative with diff + PR-body inputs. * style(release): fix comment indent typo from prior edit	2026-05-14 10:17:31 +02:00
LearningCircuit	e6432db8bd	fix(embeddings): correct OpenAIEmbeddingsProvider.requires_api_key to False (#4036 ) Follow-up to #4026. After that PR the provider supports keyless OpenAI-compatible local servers (LM Studio, vLLM, llama.cpp) — an API key is needed only for the OpenAI cloud path. The class-level ``requires_api_key = True`` was therefore stale; any future UI consumer that gates an "API key required" badge on it would mislead users on local servers. Drop the explicit override so the attribute inherits ``False`` from BaseEmbeddingProvider. The cloud-needs-key rule is still enforced at runtime in ``is_available`` and ``create_embeddings`` when no base_url is configured, so nothing about the active behavior changes. No behavior change for current callers — there is no embedding-side consumer of this attribute today; the fix is to make a latent semantic inaccuracy not bite the first future consumer.	2026-05-14 09:08:26 +02:00
kwhyte7	df8657adb5	Feat/deepseek provider (#3432 ) * feat: deepseek provider * fix: address review comments on deepseek provider - Fix typo in import (loggere -> removed unused import) - Fix typo in model name (deepseek-reasonser -> deepseek-reasoner) - Fix base URL (api.deepseek.com/api/v1 -> api.deepseek.com/v1) - Remove standalone functions; auto-discovery handles registration - Add requires_auth_for_models to match other cloud providers - Add deepseek_settings.json for the llm.deepseek.api_key default setting --------- Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> Co-authored-by: Daniel Petti <djpetti@gmail.com>	2026-05-13 23:53:35 +00:00
qWait	9ad3910452	fix(search): keep cross-engine filter fallback within evaluated context (#3866 ) * fix(search): keep cross-engine filter fallback within evaluated context * style(search): apply ruff format for context fallback fix	2026-05-13 23:17:36 +00:00
LearningCircuit	2ca4f02e6a	docs(developing): add prerelease Docker image testing section (#4034 ) Document the two Docker Hub tags published by prerelease-docker.yml (the immutable prerelease-vX.Y.Z-<sha> tag and the floating :prerelease tag added in #4005) and provide a copy-pasteable docker-compose service that runs the RC alongside production on port 5001 with isolated volumes, so a broken migration in the candidate cannot damage a production SQLCipher database.	2026-05-14 00:24:00 +02:00
Leoy	243d2b2a7f	fix(embeddings): allow OpenAI-compatible local endpoints (#3883 ) (#4026 ) * fix(embeddings): allow OpenAI-compatible local endpoints (#3883) Adds the OPENAI member to the EmbeddingProvider enum, registers the embeddings.openai.* settings so the UI can surface the configuration form, and widens the provider's availability + create_embeddings path to accept a base_url-only configuration (LM Studio, vLLM, llama.cpp). The model-list lookup now routes through the configured base_url so discovery hits the local server instead of api.openai.com. No DB migration is required: the embedding_model_type column is declared with values_callable, so SQLite renders it as plain VARCHAR with no CHECK constraint — adding the OPENAI enum value is a pure Python-side change. Fixes #3883 * test(settings): regenerate golden master for new embeddings.openai.* keys Picks up the four embeddings.openai.* keys (api_key, base_url, model, dimensions) registered by settings_openai_embeddings.json in this PR. Generated via scripts/dev/regenerate_golden_master.py — no manual edits. * fix(embeddings): annotate openai params dict for mypy invariance The params dict at openai.py:121 holds heterogeneous values: str for model/api_key/base_url, int for dimensions. mypy infers Dict[str, str] from the initial literal and rejects the int assignment plus the **params unpack into OpenAIEmbeddings (6+ errors at line 133, "dict is invariant"). Explicit Dict[str, Any] annotation resolves it — same shape this file already uses for client_kwargs at line 197. --------- Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> Co-authored-by: Daniel Petti <djpetti@gmail.com>	2026-05-13 21:11:43 +02:00
d 🔹	8c59082c30	feat(errors): friendly runtime messages for OpenAI-compatible endpoints (#4027 ) * feat(errors): friendly runtime messages for OpenAI-compatible endpoints (#3878) Wrap Site B in research_service.run_research_process so that when a request to an OpenAI-compatible LLM endpoint (LM Studio / vLLM / llama.cpp server / OpenRouter / custom endpoint) fails at runtime, the surfaced error names the provider, configured base URL, and model. The helper lives in error_handling/openai_compat_errors.py and: * walks __cause__/__context__ to find the underlying openai.* / httpx.* class through any LangChain wrapper (cycle-guarded); * dispatches to seven new tokens that slot into the existing "Error type: <code>" convention: openai_connection_refused, openai_timeout, openai_auth, openai_permission_denied, openai_model_not_found, openai_bad_request, openai_unknown; * always appends the original exc!s as a "Details:" suffix so no information is lost; * strips userinfo from base URLs before display (no API-key leaks when a user embeds the key in the URL). Sites B and C and ErrorReporter all learn the new tokens; existing Ollama and ad-hoc connection branches are untouched, so non-OpenAI-compatible providers see no behaviour change. Tests construct openai / httpx exceptions directly (no network) and cover all five acceptance criteria from the issue plus the seven token round-trips through ErrorReporter. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address djpetti feedback on PR #4027 - mention --network=host in _DOCKER_HINT - hoist openai/httpx imports to module top (drop risk-averse try/except) - hoist openai_compat_errors import to research_service.py top * deps: promote openai and httpx to direct dependencies error_handling/openai_compat_errors.py imports openai and httpx at module top-level, but both were only present transitively via langchain-openai. Pin them as direct deps so a future langchain-openai refactor cannot break the error_handling module at import time. --------- Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Daniel Petti <djpetti@gmail.com> Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>	2026-05-13 20:50:35 +02:00
LearningCircuit	b20786c62c	test(migrations): pin invariants from PR #4000 multi-round review (#4033 ) Adds three regression tests that each fail on `main` (pre-fix) and pass with the runner-level changes in this PR. Surfaced by a 30+ subagent multi-round review of the existing test coverage; deferred dozens of proposed tests that overlapped with existing coverage or tested SQLite/Alembic internals rather than our code. 1. `test_run_migrations_skips_upgrade_when_at_head` — extended. Mocks now cover not just `command.upgrade` but also the new `_drop_orphan_alembic_temp_tables` and `_disable_fk_for_migration` helpers. Pins that the short-circuit happens BEFORE engine.connect() and the FK toggle. If a future refactor moves the short-circuit below the orphan-cleanup or FK toggle, this test fails — the existing command.upgrade mock alone would not catch that. 2. `test_run_migrations_drops_multiple_orphan_temp_tables` — new. Seeds three orphan `_alembic_tmp_*` tables and asserts all are cleaned in one pass. Targets the loop body in `_drop_orphan_alembic_temp_tables`; the existing single-orphan test would still pass if the loop ever short-circuited after the first iteration. 3. `test_drop_orphan_temp_tables_no_op_when_none_present` — new. Direct unit test on `_drop_orphan_alembic_temp_tables` against a clean DB. Pins the `if not temp_tables: return` early-return guard — a future refactor that unconditionally logs/scans would be caught. Out of scope (verified by Round 5 cross-verification): - foreign_key_check after upgrade: already covered (lines 4632, 4831). - Data preservation 0001→head: already covered (lines 1518, 1871). - Run twice no-op: covered by `test_idempotent_migrations` (line 194).	2026-05-13 20:11:08 +02:00
github-actions[bot]	3ade4b4103	🤖 Update dependencies (#4031 ) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2026-05-13 19:23:23 +02:00
LearningCircuit	c2a47a83b3	fix(db): unblock multi-migration upgrades blocked by FK mismatch + orphan _alembic_tmp_* tables (#4000 ) * fix(db): unblock multi-migration upgrades — toggle FK + scrub orphan temp tables outside the alembic transaction Closes #3990 and unblocks #3817. Real users at revisions 0001–0005 upgrading to 0009 hit two failure modes that left their account unable to log in: 1. `foreign key mismatch — "download_attempts" referencing "download_tracker"` (#3990) Migration 0007's defensive `PRAGMA foreign_keys = OFF` is silently a no-op once the sqlite3/sqlcipher3 driver has auto-begun the migration transaction (per sqlite.org/pragma.html#pragma_foreign_keys). With the chained 0002–0006 upgrade, earlier migrations issue DML before 0007 runs, freezing FK in the connect-time ON state for the rest of the upgrade. The orphan-scrub `DELETE FROM download_attempts ...` then fails with "foreign key mismatch" because the pre-fix `download_tracker.url_hash` lacks the UNIQUE backing the FK requires for the cascade machinery to compile. The fix issues `PRAGMA foreign_keys = OFF` in `alembic_runner.run_migrations` BEFORE opening the migration transaction (via `exec_driver_sql`, which doesn't trigger driver auto-begin), then re-enables FK on the same connection after the upgrade commits and before the connection returns to the pool — so subsequent checkouts see the production-default ON state. 2. `table _alembic_tmp_journals already exists` (#3817) `op.batch_alter_table` rebuilds a table by creating `_alembic_tmp_<table>`, copying data, dropping the original, and renaming. On a clean run alembic drops the temp table automatically. If a previous attempt failed in a way that bypassed transaction rollback (e.g., an older migration runner that auto-committed each migration), the temp table persists and the next attempt fails with "table _alembic_tmp_* already exists". The fix drops orphan `_alembic_tmp_` tables in `alembic_runner.run_migrations` before opening the migration transaction. This runs at the SQLite level under autocommit; if a concurrent run_migrations is mid-batch_alter_table, our DROP blocks on the SQLite write lock until the rename consumes the temp table, making our DROP IF EXISTS a no-op — the race is benign. Tests: two new fixture-driven regression tests (`TestUpgradeFromBuggyV16xUserDbProductionEngine`, `TestOrphanAlembicTempTableCleanup`) reproduce the production failure modes verbatim — `isolation_level=""` matching the sqlcipher3 engine in `encrypted_db.py`, FK ON at connect via the same event handler `apply_performance_pragmas` installs, and a chained 0005→head upgrade so DML auto-begins before 0007. Both tests fail without the runner fix with the exact production error messages and pass with it. Migration 0007's misleading comment ("no DML has opened the implicit transaction yet") is also corrected — that statement was true when the migration was written against a single-revision test fixture but never held for real multi-migration upgrades. test(no-raw-sql): allow alembic_runner.py — same exception class as initialize.py `alembic_runner.py` is migration infrastructure (drops orphan `_alembic_tmp_*` tables in #3817, toggles `PRAGMA foreign_keys` in #3990). The single `DROP TABLE IF EXISTS` f-string trips the `["\']DROP\s+TABLE\s+'` regex in the raw-SQL guard. Add the file to the same exclusion list `database/initialize.py` lives in — both are catalog-derived DDL on migration infrastructure, not application code touching user-controllable SQL. Precedent: commit `0b82064fd` added `database/initialize.py` with the same justification. The catalog-derived identifier in `_drop_orphan_alembic_temp_tables` already carries `# noqa: S608` and `# bearer:disable` markers, so static analysis (ruff/bearer) still flags any new violations in the file — the test exclusion only suppresses the project-local raw-SQL guard. v1.6.10	2026-05-13 00:13:02 +02:00
LearningCircuit	048e58905a	chore(deps): bump urllib3 to 2.7 for CVE-2026-44431 and CVE-2026-44432 (#4028 ) Fixes two high-severity vulnerabilities: - CVE-2026-44431: sensitive headers forwarded across origins in proxied low-level redirects - CVE-2026-44432: decompression-bomb safeguards bypassed in streaming API	2026-05-13 00:08:50 +02:00
LearningCircuit	f7f427bff7	feat(citation): source-tagged citations with global counter (#4012 ) * feat(citation): source-tagged citations with global counter Add ``CitationMode.SOURCE_TAGGED_HYPERLINKS`` and set it as the default ``report.citation_format``. ## What changes for users Reports now render citations as ``[arxiv-1]``, ``[openai.com-2]``, ``[arxiv-3]`` — the source tag identifies what kind of source each citation is, while the number is the original bibliography-order global counter. Compared to the previous ``DOMAIN_ID_`` modes, the suffix is not* a per-domain counter, so labels never collide and clicking from inline text to the source list is unambiguous. Source-tag resolution order: 1. ``URLClassifier``-recognised academic sources use the short enum value: ``arxiv``, ``pubmed``, ``pmc``, ``semantic_scholar``, ``biorxiv``, ``medrxiv``, ``doi``. 2. Generic web URLs fall back to the cleaned domain (``nytimes.com``, ``openai.com``) via the existing ``_extract_domain``. 3. Empty or non-http(s) URLs (``file://``, local-RAG hits) tag as ``local`` and render without a hyperlink so the markdown stays clean. A future PR can plumb collection names through the RAG metadata pipeline to replace the uniform ``local`` fallback — noted in the helper docstring. ## What does NOT change * The agent still emits plain ``[N]`` citations — the LLM prompt and ``SearchResultsCollector`` are untouched. This is purely a display-layer transform applied after generation. * All other modes are preserved unchanged. Users on ``domain_id_hyperlinks`` etc. keep their current behaviour. * The global counter mechanism in ``SearchResultsCollector.add_results`` (``index = len(_all_links) + 1``) was already correct — the new mode just stops the formatter from throwing that number away. ## Files * ``citation_formatter.py``: new enum value, new ``_format_source_tagged_hyperlinks`` method, ``_extract_source_label`` helper (URLClassifier → domain → ``local`` fallback chain), and ``_is_linkable_url`` helper so file:// / empty URLs render as ``[local-N]`` rather than ``[[local-N]](file:///...)``. * ``research_service.py`` & ``scheduler/background.py``: add the new value to the string→enum dispatch maps. Existing Python fallbacks are deliberately left as-is. * ``default_settings.json``: add the new option (placed first to signal it as the default), flip ``value`` from ``"number_hyperlinks"`` to ``"source_tagged_hyperlinks"``, expand the description. * ``golden_master_settings.json``: regenerated via ``scripts/dev/regenerate_golden_master.py``. ## Tests * ``test_source_tagged_hyperlinks_preserves_global_counter`` — the core property: ``arxiv-1, openai.com-2, arxiv-3`` (not per-domain re-numbering). Covers individual citations and comma-separated groups ``[1, 2, 3]`` → three tagged links concatenated. * ``test_source_tagged_hyperlinks_known_academic_sources`` — arxiv, pubmed, semantic_scholar, biorxiv tags. * ``test_source_tagged_hyperlinks_local_url_falls_back`` — both ``file://`` URLs and missing-URL citations render as plain ``[local-N]`` without a hyperlink. * ``test_enum_member_count`` and ``test__value`` in ``test_citation_formatter_high_value.py`` updated for the new member. feat(citation): use collection name for local-RAG citations + changelog Builds on the source-tagged citation work in this PR. Two pieces: ## Collection-name plumbing for local documents Previously, RAG / library hits all rendered as ``[local-N]`` because the formatter only saw the URL/title round-trip and had no signal about which collection a hit came from. Now the rendered sources block carries an optional ``Collection:`` line per source, and the formatter parses it back so library hits surface their (slugified) collection name as the citation tag. Concrete pipeline: 1. ``LibraryRAGSearchEngine`` already puts ``collection_name`` into ``result["metadata"]`` (existing — no change). 2. ``utilities/search_utilities.format_links_to_markdown`` now tracks ``canon_to_collection`` alongside ``canon_to_title`` and appends `` Collection: <name>`` after the ``URL:`` line when the metadata carries one. First non-empty wins per canonical URL (mirrors how title/quality work). 3. ``CitationFormatter._parse_collections`` extracts ``{citation_num: name}`` via a multiline regex anchored on the ``[N]`` header so a Collection: line attached to ``[1]`` cannot leak into ``[2]``. 4. ``_extract_source_label`` gains an optional ``collection`` parameter that wins outright when supplied. Otherwise the existing fallback chain (URLClassifier → domain → ``local``) is unchanged. 5. ``_slugify_collection`` normalises free-form collection names into compact inline-safe tags: ``"My Papers"`` → ``my-papers``, ``"team/finance"`` → ``team-finance``, edge cases degrade to ``local`` rather than empty. Result: a research mixing web hits and library hits now renders as e.g. ``[arxiv-1]``, ``[my-papers-2]``, ``[openai.com-3]``, ``[team-finance-4]`` — readers can see at a glance what kind of source each citation is. ## Changelog fragment Adds ``changelog.d/4012.feature.md`` per the towncrier convention documented in ``changelog.d/README.md``. Describes the new default citation format and notes that all previous modes remain available via ``report.citation_format``. ## Tests * ``test_source_tagged_hyperlinks_uses_collection_name`` — mixed web + library report renders with the right tags and no cross-contamination. * ``test_source_tagged_hyperlinks_collection_slugify_edge_cases`` — pins slugifier behaviour on whitespace, slashes, casing, unicode, and empty-after-slug edge cases. * ``test_source_tagged_hyperlinks_missing_collection_falls_back`` — library URL without a ``Collection:`` line keeps the previous ``local-N`` behaviour (compat with hand-rolled sources blocks). * ``test_source_tagged_hyperlinks_collection_line_isolation`` — regression guard for the regex anchoring: a ``Collection:`` line on ``[1]`` must not affect ``[2]``. * Four ``TestFormatLinksToMarkdownCollections`` tests cover the renderer side: emit on metadata present, omit on metadata absent, omit on metadata without ``collection_name``, first non-empty wins on URL dedup. 1173 tests pass across ``tests/text_optimization/``, ``tests/utilities/`` (search utilities), and ``tests/settings/``. ``mypy`` clean on both touched source files. * chore(citation): don't flip default to source_tagged yet Per maintainer call: ship the new ``source_tagged_hyperlinks`` mode as an opt-in only — keep ``number_hyperlinks`` as the default for ``report.citation_format`` for now. The mode stays available in the settings dropdown for users who want to try it; we can flip the default in a later release once it has soaked. Changes: * ``default_settings.json``: revert ``value`` to ``"number_hyperlinks"``; move the new option from first to second-to-last in the dropdown so the ordering doesn't read as "this is the default"; rewrite the description to lead with the existing modes. * ``golden_master_settings.json``: regenerate to track the JSON value. * ``changelog.d/4012.feature.md``: reword from "new default" to "new option, opt-in via the setting". No code change to the formatter, the new mode, the collection plumbing, or any of the 8 new tests added earlier in this PR.	2026-05-12 23:26:53 +02:00
LearningCircuit	d2a0889014	test: fix flaky rate-limit-triggered failures in rag upload coverage (#3943 ) `tests/research_library/routes/test_rag_routes_upload_coverage.py`'s `TestUploadToCollection` tests pass in isolation but the last three (test_upload_pdf_storage_failure_continues, test_upload_auto_index_triggered, test_upload_auto_index_no_password) flake to `429 TOO MANY REQUESTS` when run as part of the wider research_library test suite locally (LDR_DISABLE_RATE_LIMITING/DISABLE_RATE_LIMITING is unset). The `@upload_rate_limit_user`/`@upload_rate_limit_ip` decorators applied to `upload_to_collection` at module import time close over the real Limiter instance, so the existing fixture's symbol patches cannot undo them — by the time those tests run, earlier tests in the same pytest process have already consumed the per-user 10/minute budget against the shared in-memory storage. Add `patch.object(real_limiter, "enabled", False)` to the fixture so the real limiter is short-circuited for the duration of each test (and restored automatically on exit). CI is unaffected (it sets `DISABLE_RATE_LIMITING=true` at the workflow env, so the limiter is already disabled there).	2026-05-12 22:49:14 +02:00
LearningCircuit	b5ca512d5d	feat(hooks): add pre-commit hook to validate settings key namespaces (#4025 ) * feat(hooks): add pre-commit hook to validate settings key namespaces Parses ALLOWED_SETTING_PREFIXES and BLOCKED_SETTING_PREFIXES from settings_routes.py via AST (single source of truth) and checks hardcoded settings keys in Python (AST) and JavaScript (regex) files. Prevents the class of bug where a new settings key is added but its prefix is missing from the allow list. * feat(hooks): add pre-commit hook to validate settings key namespaces Parses ALLOWED_SETTING_PREFIXES and BLOCKED_SETTING_PREFIXES from settings_routes.py via AST (single source of truth) and checks hardcoded settings keys in Python (AST) and JavaScript (regex) files. Prevents the class of bug where a new settings key is added but its prefix is missing from the allow list.	2026-05-12 21:33:40 +02:00
LearningCircuit	37bd58ba6b	fix(settings): allow local_search_ namespace for embedding settings (#4024 ) The security namespace gate added in `6430dd4` blocked creation of local_search_* setting keys (embedding model, chunk size, etc.) because the prefix was missing from ALLOWED_SETTING_PREFIXES.	2026-05-12 20:56:51 +02:00
dependabot[bot]	964c774292	chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests/puppeteer (#4021 ) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>	2026-05-12 19:29:41 +02:00
dependabot[bot]	59e3bac836	chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests (#4020 ) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>	2026-05-12 19:29:11 +02:00
dependabot[bot]	1351a0cde7	chore(deps-dev): bump puppeteer in /tests/api_tests_with_login (#4019 ) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>	2026-05-12 19:28:37 +02:00
dependabot[bot]	67114f8066	chore(deps): bump puppeteer from 24.43.0 to 24.43.1 in /tests/ui_tests (#4018 ) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 24.43.0 to 24.43.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/CHANGELOG.md) - [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v24.43.0...puppeteer-v24.43.1) --- updated-dependencies: - dependency-name: puppeteer dependency-version: 24.43.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Prashant Sharma <prashant.shar51@gmail.com>	2026-05-12 19:28:12 +02:00

1 2 3 4 5 ...

6474 Commits