local-deep-research

mirror of https://github.com/LearningCircuit/local-deep-research.git synced 2026-06-15 19:46:56 +03:00

Author	SHA1	Message	Date
LearningCircuit	da0d18ed25	fix(release): set towncrier name to skip package import (#4071 ) The release job uses a sparse checkout that omits src/ and runs a standalone `pip install towncrier`. Towncrier 24.8 still calls `get_project_name()` even when --version is passed on the CLI, and the existing [tool.towncrier] config pointed at the `local_deep_research` package, so the build crashed with ModuleNotFoundError before rendering any fragments. Set `name = "local-deep-research"` so towncrier short-circuits the import path (build.py:195-197). Drop the now-misleading `package`/`package_dir` fields — `--version` is always passed, `directory = "changelog.d"` is explicit, and nothing else inside towncrier still needs them. Fix the workflow comment that misattributed the bypass to --version. Verified by rendering changelog.d/*.md fragments against this pyproject.toml in a fresh directory with no src/ present.	2026-05-17 02:30:51 +02:00
LearningCircuit	5d60f3d00e	chore(labels): add 'code-ready' as a human-only signal label (#4068 ) Introduces a new repository label, ``code-ready``, that communicates a human reviewer's judgement that a PR's code changes look technically ready — i.e. the implementation, tests, docs and review nits are all addressed — while CI and an approving codeowner review may still be outstanding. The label is meant to bridge the gap between "needs review" and "auto-merge": a maintainer can apply it after walking the diff to signal that the code side is good, even though merge is still blocked on CI runs finishing or an approver clicking the button. Critically, this label must be applied manually only, never by automation. The motivation is judgement, not heuristics — a workflow that flips it based on "all CI green" or "no unresolved comments" would dilute the signal and undermine the human-in-the-loop intent. The labels.yml entry is grouped under a new "Human-only signal labels" section with an explicit comment saying so, and the label description itself includes "Apply manually — never auto-applied" so the rule is visible everywhere the label surface. Verified before adding: * No existing workflow (``pr-triage.yml``, ``label-fixed-in-dev.yml``, ``advanced-search-reminder.yml``, ``sync-main-to-dev.yml``, ``danger-zone-alert.yml``, ``compose-published-smoke.yml``) applies ``code-ready``. Each workflow's ``addLabels(...)`` calls use a closed set of specific label names — no heuristic ever resolves to ``code-ready``. * No naming collision with existing labels (``code-ready`` is new; ``auto-merge``, ``needs-codeowner-review``, ``awaiting-codeowner`` are distinct concepts). * Label created live on GitHub via ``gh label create`` before this commit; this PR brings ``labels.yml`` into source-of-truth sync. Color: ``006b75`` (teal) — distinct from the existing yellow/green review-state palette so it reads as a separate axis from the codeowner-review lifecycle.	2026-05-16 14:18:09 +02:00
LearningCircuit	8597e429cc	Improve UI tests + CI: artifact uploads, WebKit skip narrowing, settle-wait migrations (#4061 ) * ci(responsive): restore artifact uploads and fix dead post-results gate The Responsive UI workflow lost its per-viewport artifact uploads (the explanatory comment around lines 206-209), so PR/release failures were un-debuggable - no screenshots, no test output. The downstream `post-results` job was also gated on `github.event_name == 'pull_request'`, which can never be true because the workflow has no `pull_request` trigger; the combined-report aggregator therefore never ran. Restore the upload step using `if: always()` + `if-no-files-found: ignore` (so server-startup failures still upload logs and quiet runs don't fail the step) and rewrite the `post-results` gate to `if: always()`. Artifact name matches the existing `ui-test-results-` pattern expected by the combined-report glob. test(playwright): narrow WebKit closed-context skip to webkit only (#4060) The catch at all-pages-mobile.spec.js:372 was previously calling `test.skip(true, ...)`, which skipped the test for every browser - so any non-WebKit error path also silently bailed out of the mobile-nav overlap assertion. Only Mobile Safari / WebKit is known to hit the `Target page, context or browser has been closed` race, so gate the skip on `browserName === 'webkit'`. Other browsers now re-throw and surface the regression. Also broaden the matched error message to include `Execution context was destroyed`, the alternate wording the same upstream race uses in newer Playwright versions. Skip annotation references issue #4060 so the skip is grep-able and can be removed when the underlying race is fixed or the DOM walk is restructured. * test(ui): add waitForStable helper to auth_helper.js Replaces ad-hoc `await delay(N)` sleeps used to "let the UI settle" after an action. The helper waits for a selector to be visible, then waits for its bounding box to stop changing across requestAnimationFrame ticks (bounded to 3s in-page). The final `idleMs` pause is configurable. JSDoc explicitly notes when NOT to use it: don't replace `delay()` calls that exercise wall-clock behavior (e.g. a 10s timer the app is supposed to respect). Those tests need real elapsed time, not a settle wait. Exported as a sibling of `safeClick` to keep Puppeteer test imports tidy. * test(ui): replace settle-delays with state-based waits in two puppeteer tests `test_research_cancellation.js` had 7 hardcoded `await delay(...)` calls and `test_form_validation_aria_ci.js` had 19. The vast majority were "give the UI a moment to settle" pauses with no real signal attached, so they slowed CI and quietly hid races whenever the runner was a beat slower than the chosen delay. For each call: - post-`navigateTo` 500ms sleeps -> `waitForSelector('#query', { visible: true })` - post-validation-trigger sleeps -> `waitForFunction` polling the `ldr-field-invalid` class to appear (or clear, when the test expects validation to pass) - post-focus 100ms -> `waitForFunction(() => document.activeElement?.id === 'query')` - post-cancel-click sleeps -> `waitForFunction` polling for `cancel\|stop\|suspend` to appear in the status text - post-typing 200ms -> `waitForFunction` polling for the typed value to land The one delay we kept: the explicit 10-second wait in the mid-stage cancellation test (`test_research_cancellation.js`), which deliberately exercises elapsed-time behavior of the research progress flow. That is not a settle wait and must stay wall-clock. Polling waits all use `.catch(() => {})` to preserve existing behavior when a selector or state never appears (the assertions further down handle the failure case more informatively than a hung wait would). * docs(pr-template): document label-gated CI workflows Several heavy E2E workflows are label-gated and silently no-op on PRs without the right label - new contributors had no way to know. Add a "CI test coverage" section to the PR template enumerating each gated workflow and the label that triggers it. No CI behavior change; documentation only. * test(form-validation): make waitForQueryReady detect validator attachment Local smoke-test (9 tests, ran against `scripts/dev/restart_server.sh`) exposed two latent races that the prior `await delay(500)` had been quietly hiding: 1. `waitForQueryReady` returned as soon as `#query` was visible, but the FormValidator class is registered against the field a tick later (research.js setupEventListeners). Waiting for the `.ldr-field-error` sibling that addValidation() inserts is the actual signal that the validator is wired and the submit handler will take the early-return path on an empty query. 2. `noLoadingUiOnEmptySubmit` ran after `errorClearsOnValidSubmit`, which typed a real query and triggered a real submit (the fetch fails but creates `.ldr-loading-overlay` first). `navigateTo` skipped the re-navigation because we were already on `/`, so the stale overlay carried over. Force a real `page.goto` for this test so it asserts about a fresh page, not the leftover state of the previous test. After the fix the suite passes 9/9 in ~1s (vs ~4.5s with the old delays). * chore(labels): rewrite test-trigger label descriptions for AI reviewer auto-apply The Friendly AI code reviewer (.github/workflows/ai-code-reviewer.yml) auto-applies labels based on the labels' descriptions in the repo. The existing test:puppeteer / test:e2e / ldr_research / ldr_research_static descriptions were passive ("Triggers Puppeteer E2E tests on this PR"), which doesn't guide the reviewer on when to apply them. Rewrite them in the same imperative, bias-toward-action style used by benchmark-needed ("Apply if a change risks degrading performance — when in doubt, add it. Run compare_configurations()"): - test:puppeteer + test:e2e — apply for any PR touching the web stack - ldr_research / ldr_research_static — apply for substantive code/arch changes, with the static variant biased even more toward "run it" since it uses the cheaper model Also add the test:* labels to labels.yml so they become version-controlled (previously they existed only on GitHub, created out-of-band). label-sync is additive and will overwrite the GitHub descriptions on next main push.	2026-05-16 13:17:28 +02:00
LearningCircuit	1ab65609db	ci(release): drop credential persistence on cleanup-changelog checkout (#4050 ) The `Checkout the release commit` step in the `cleanup-changelog` job defaulted to `persist-credentials: true`, leaving the job's GITHUB_TOKEN in `.git/config` for the duration of the run. If any later step in this job reads `.git/config` (artifact upload, third-party action that prints/dumps the repo state, etc.), the token leaks. Closes the only open `zizmor/artipacked` finding (code-scanning alert #4655). No functional impact: the only step that needs to push is `peter-evans/create-pull-request`, which already takes an explicit `token:` input and does not rely on the persisted git credential helper. Also dismissed code-scanning alert #7763 (CVE-2026-3298) via the GitHub API — that CVE is Windows-only per PSF advisory; this image is Linux, which Grype's package-version matcher does not account for. Alert #7764 (CVE-2026-7210) is left open as a tracking signal until Python 3.14.6 ships upstream (current latest is 3.14.5; no patched image exists yet).	2026-05-15 01:20:17 +02:00
LearningCircuit	a2f7f6ead6	fix(ci): drop environment: ci from reusable workflow (#4049 ) The `environment: ci` declaration on the research job has no functional value for LDR — the `ci` Environment has zero protection rules and zero environment-scoped secrets (verified via gh api). All required secrets (OPENROUTER_API_KEY, SERPER_API_KEY) are repo-level. The decorative env attachment becomes a problem for any external repo that calls this reusable workflow: GitHub silently auto-creates an empty `ci` Environment in the caller's repo, polluting their environments namespace. Dynamic environment via expression (e.g. `environment: ${{ inputs.env \|\| '' }}`) isn't a viable alternative — `actions/runner` Issue #2610 documents that expression-in-environment doesn't reliably evaluate input context, and an empty-string value still auto-creates an empty-named environment. Simplest correct fix is to delete the line. LDR's own callers (issue-research.yml, e2e-research-test.yml) keep working unchanged because they never depended on env-attached functionality. External callers no longer get the env-pollution side effect. This unblocks a follow-up `ldr-automations` toolkit repo that will expose meta-reusable workflows wrapping this one for other projects.	2026-05-15 01:11:15 +02:00
LearningCircuit	a6287a4362	fix(security): pin towncrier to exact version and bump Python to 3.14.5 (#4046 ) * fix(security): resolve Scorecard pin alerts and bump Python to 3.14.5 - Pin `pip install towncrier` to a single version with `--hash` (both occurrences in release.yml), resolving Scorecard Pinned-Dependencies alerts #7761 and #7762. - Bump the Dockerfile base image from python:3.14.4-slim to 3.14.5-slim (with new pinned manifest digest). 3.14.5 bundles libexpat 2.8.0 (gh-149017), which is required to mitigate CVE-2026-7210 — Grype alert #7760. * chore(release): drop hash-pins on towncrier, keep exact version pin Per review feedback: hash-pinning a build-time CLI like towncrier adds maintenance burden without meaningful supply-chain benefit. The rest of this repo already uses exact-version pins (`pdm==2.26.2`, `pyyaml==6.0.3`, etc.) which Scorecard's PinnedDependenciesID rule accepts — the original alerts fired only because `~=24.8` is a fuzzy version range.	2026-05-14 17:24:19 +02:00
LearningCircuit	074285a26d	fix(release): enrich AI release notes + render changelog in release flow (#4035 ) * fix(release): enrich AI release notes + render changelog in release flow Fixes the v1.6.10 release notes degradation where: 1. docs/release_notes/1.6.10.md was never created (no automation rendered changelog.d/ fragments before/at release time) 2. AI summary call returned 2xx but empty content with finish_reason=length create-release job now: - Sparse-checks-out changelog.d/ + pyproject.toml, installs towncrier (no PDM needed — towncrier reads pyproject directly), renders docs/release_notes/<version>.md before composing the release body. Guards against an empty fragment directory. - Fetches every merged PR's title + body in a single GraphQL round-trip and feeds them to the model. - Fetches the full diff between the previous /releases/latest tag and the new tag via the compare API, filters lockfiles/generated docs/ SBOM/static assets/binary patches, caps at 700k chars, strips NUL bytes before jq --rawfile. - Bumps AI_MAX_TOKENS default 4000 -> 64000 (matches the AI code reviewer's working budget). Adds AI_REASONING_MAX_TOKENS=16000 so Kimi K2 Thinking cannot burn the entire output budget on reasoning tokens — the root cause of v1.6.10's empty .content. - Adds .reasoning to the response-parsing fallback chain after .content and .reasoning_content. OpenRouter normalizes Moonshot's thinking trace to .reasoning (not .reasoning_content), which is why v1.6.10's diagnostic showed message keys "content, reasoning, reasoning_details" with no usable extraction path. - Enforces a 750k char overall prompt cap so PR descriptions + diff can't blow Kimi's 262k token context window. - Truncates the final release body to 124,400 chars to stay under GitHub's documented 125k release-body limit (HTTP 422 otherwise; gh CLI does not pre-validate). - Rewrites the SUMMARY_PROMPT to ask for a helpful narrative (not a TL;DR), with length sized to the material. New cleanup-changelog job opens a PR on main with the consumed fragments + rendered release-notes file, since the create-release runner is throwaway. Branch protection on main allows the PR (0 required reviews, 0 required checks). * chore(release): persist 1.6.10 changelog render + clear consumed fragments The v1.6.10 release shipped without docs/release_notes/1.6.10.md because no automation rendered changelog.d/ fragments at release time (see release.yml change in this PR for the fix going forward). Persists the render now so 1.6.11's release does not re-consume the same fragments. Renders the v1.6.10 release_notes file from the 30 fragments that were in changelog.d/ at v1.6.10 cut time, and removes those fragments from changelog.d/. The rendered content also backs the v1.6.10 GitHub release body update. * fix(release): address AI review findings (UTF-8, race, GraphQL cap) - UTF-8 character-aware truncation. Replace `head -c` (byte-oriented, splits multi-byte UTF-8 mid-sequence) with Python-based character truncation for the diff (700k), prompt (750k), and release body (124,400) caps. Matters because towncrier renders emoji section headers (💥/🔒/✨/🐛) that appear in diffs of docs/release_notes/; mid-emoji splits produce invalid UTF-8 that jq --rawfile then refuses to encode and the GitHub Release API rejects with HTTP 422. - cleanup-changelog race fix. Pin checkout to ${{ github.sha }} instead of `ref: main`. If a PR with new fragments merged into main between create-release and cleanup-changelog, `ref: main` would consume those new fragments into THIS release's docs/release_notes file and delete them prematurely — stealing them from the next release. github.sha is the commit the workflow ran against, so the set of fragments matches what create-release rendered. - GraphQL query node-count cap. Limit PR-description batch to 100 PRs per query and log a warning if a release exceeds that (LDR's typical release is ~20-30 PRs, well under). Unbounded fan-out could trip GitHub's GraphQL complexity ceiling on a huge release. - Compare API 300-file warning. Log when .files[] hits the 300-file boundary so a future release's missing-file diff can be diagnosed quickly without rerunning. The cap is a documented GitHub limit. * fix(release): address review2 — PR cap, trap leak, base pin, prompt clarity - Raise PR-fetch cap 100 → 200. v1.6.10 had 144 unique PRs (LDR's dependency-bump traffic is heavy); the previous 100 cap would have silently dropped ~30% of PR descriptions from the AI prompt. The 750k-char overall prompt cap still protects context window. - Hoist COMPARE_JSON mktemp above the trap registration so the temp file is cleaned up even if jq throws under set -e between mktemp and the manual rm. ${DIFF_FILE}.clean (the NUL-strip staging path) also added to the trap; rm -f tolerates the missing-file case. - Pin base: main on peter-evans/create-pull-request. On tag-triggered runs github.sha may not sit on main HEAD, and the action's default-branch resolution could pick a non-main base. We always want the cleanup PR to target main. - Clarify SUMMARY_PROMPT section markers. The prior text said inputs are "separated by `----- SECTION -----` markers" using SECTION as a placeholder; a literal-minded model could look for that exact string and find none. Now lists the actual marker forms explicitly. - Add PREV_TAG == RELEASE_TAG guard. On a workflow re-run after the release exists, /releases/latest returns the just-created tag, making the diff empty. Falls back to the second-most-recent stable release. * fix(release): jq --arg for re-run guard + surface jq errors + doc updates Workflow fixes from a final pass: - Re-run guard now passes RELEASE_TAG to jq via `--arg rel` instead of shell-interpolating it into the program text. RELEASE_TAG is already validated as bare semver upstream so this is defense-in-depth, but --arg keeps shell quoting and jq quoting fully separated regardless of what RELEASE_TAG ever ends up containing. - Compare-API jq pipeline no longer swallows stderr or masks the exit code. Previously `jq ... 2>/dev/null \|\| true` would silently produce an empty diff and a "Diff size: 0 bytes" log line on any jq failure, giving a maintainer no actionable signal. Now an explicit if-not check logs a WARNING with jq's stderr intact and ensures the diff file is empty. Doc updates for the new release flow: - changelog.d/README.md: drop the obsolete "maintainer runs `pdm run towncrier build`" instructions; describe the automated render + follow-up cleanup PR. Keep the local --draft / --keep preview tips for fragment iteration. - docs/RELEASE_GUIDE.md: rewrite the maintainer flow (steps 1-3 of the old "Render + bump + commit both" sequence are obsolete — the workflow handles rendering now). Add the cleanup PR merge as a final checklist item. Update the body composition description from "AI TL;DR" to AI narrative with diff + PR-body inputs. * style(release): fix comment indent typo from prior edit	2026-05-14 10:17:31 +02:00
LearningCircuit	96e6548553	fix(ci): grant research job the perms its reusable needs (#3987 follow-up) (#4016 ) Every run of e2e-research-test.yml and issue-research.yml since the refactor has terminated as startup_failure with zero jobs, because the calling `research` job had no `permissions:` block. The reusable's `research` job declares `permissions: contents: read`, but reusable permissions can only be the same or lower than the caller's — and the caller's empty `{}` inheritance meant the reusable's request exceeded what was granted, so GitHub refused to load the workflow. Add an explicit permissions block to the calling `research` job in both workflows: - contents: read (for actions/checkout in the reusable) - actions: write (for actions/upload-artifact@v5+ which now requires this scope to upload artifacts) The user-visible symptom was: `gh issue` with the `ldr_research` label did nothing — the workflow ran for ~1 second, failed at startup, produced no comment. Same for PR labels post-merge. Tested locally with actionlint and zizmor — both clean. Real verification needs a labeled PR/issue after merge.	2026-05-11 23:52:38 +02:00
LearningCircuit	fa88bb908f	ci(prerelease-docker): publish floating :prerelease tag for each RC (#4005 ) The workflow now re-points :prerelease at every new RC manifest in addition to publishing the versioned prerelease-vX.Y.Z-<sha> tag. Testers can pin compose to :prerelease and `docker compose pull` to fetch the latest RC without manually bumping the tag each cycle. Versioned tags remain available for reproducible testing.	2026-05-11 19:33:06 +02:00
dependabot[bot]	ee0ad19256	chore(deps): bump google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml (#4009 ) Bumps [google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml](https://github.com/google/osv-scanner-action) from 2.3.5 to 2.3.8. - [Release notes](https://github.com/google/osv-scanner-action/releases) - [Commits](`c518547040...9a49870895`) --- updated-dependencies: - dependency-name: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml dependency-version: 2.3.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-11 17:46:10 +02:00
LearningCircuit	9755a900eb	ci(research): extract reusable LDR-research workflow + add issue-trigger caller (#3987 ) * ci(research): extract reusable LDR-research workflow + add issue-trigger caller Three triggers will end up calling the same install-and-run-LDR plumbing (PR diff today, issue body now, Reddit posts later). Factor out the middle of the workflow into a reusable workflow so we don't have to maintain the same logic in three places, and add the issue-trigger caller on top of it. Changes: - .github/workflows/ldr-research-reusable.yml (new) — workflow_call workflow that takes a fully-assembled query and returns a comment-ready markdown blob via artifact. Inputs include forward-compat knobs the future Reddit caller will need (max-query-length, max-sources, comment-footer override, include-sources-section, output-truncate-chars). - .github/workflows/e2e-research-test.yml — refactored from a single job to three jobs (build-query → research-via-reusable → post-comment). Behaviour is preserved: same headers, same footer, same diff truncation at MAX_DIFF_SIZE, same label-removal on completion. - .github/workflows/issue-research.yml (new) — triggers on `issues: types: [labeled]` gated by the same `ldr_research` label the PR workflow uses (GitHub event-type gating means they don't conflict). Output has two sections: "For the reporter" (cautious framing) and "For maintainers" (raw research context). Issue body is sanitized (control-char strip, 4000-char truncation) and never reaches a shell. - scripts/ldr-research.py — renamed from ldr-diff-research.py (`git mv`, history preserved). Drops --mode, --static-query, --max-diff-size: query now comes from stdin only and the caller workflow does prompt assembly. Output JSON shape: {research, sources, findings, iterations}. - .github/labels.yml — register ldr_research and ldr_research_static so they exist canonically rather than via on-the-fly creation. Reddit research is a follow-up PR; this PR ships the abstraction shape it will need. * docs(ci): regenerate workflow status dashboard for new LDR workflows The check-structure CI gate requires every workflow file to have a row in docs/ci/workflow-status.md. Regenerate to add rows for the two new workflows added in this PR. The live-status flips on unrelated rows (gitleaks, ossf-scorecard, responsive-ui-tests-enhanced, osv-scanner) are accurate snapshots of current status — the auto-regen workflow keeps them fresh on its own schedule. * ci(research): address review feedback — label cleanup, delimiter, artifact Three small follow-ups from the AI review on this PR: 1. Label cleanup on build-query failure. The post-comment job had `if: always() && needs.research.result != 'skipped'`, which meant that if build-query failed, research was skipped and the entire post-comment job (including the label-removal step) was skipped too — leaving a stuck `ldr_research` label on the PR/issue. Switch to `if: always()`; the download and post steps already self-guard with `needs.research.outputs.success == 'true'`, so only the label-removal step runs in the failure path. 2. Randomized GHA output delimiter. `__LDR_QUERY_EOF__` was a fixed string; a query containing that exact line could prematurely terminate the multi-line output. Use $$/$RANDOM/nanosecond as the delimiter base. Defense-in-depth — collision was already astronomically unlikely. 3. Optional `artifact-suffix` input on the reusable workflow. Until now the artifact name was `ldr-research-{run_id}-{run_attempt}-{github.job}`, which collides if a caller invokes the reusable multiple times in one run. The Reddit follow-up will use a matrix call, so add a caller-provided suffix now and sanitize it to artifact-safe chars. Existing callers don't pass it; default empty preserves today's name. * ci(research): fix per-line truncation in reusable workflow Two follow-ups from the second review pass: 1. The awk-based backstop truncation in `Write query to file` was per-line (operating on $0 / length($0)), not total. A long multi-line query with many short lines would silently bypass the max-query-length cap. Swap for a wc -c + head -c approach that truncates total bytes. Verified locally that a 114-byte multi-line input with all-short-lines is now correctly truncated to ~100 bytes. 2. Remove the unused EXIT_CODE capture in `Run LDR Research`. The step relies on JSON validation for error detection; capturing $? without using it was just dead code inherited from the original workflow.	2026-05-11 00:44:16 +02:00
LearningCircuit	c6dfc6dc8e	ci(workflows): build Vite frontend bundle before UI tests (#3989 ) The responsive-ui-tests-enhanced and puppeteer-e2e-tests workflows both started the Flask app without running `npm run build` first. `dist/` is gitignored, so the page rendered with the empty fallback from `vite_helper._fallback_assets()` — no bundled `styles.css`. Tests ran against a partially-unstyled UI, and CSS source changes between PRs were invisible to the responsive baseline. (playwright-webkit-tests.yml already does this — these two were the outliers.) Add two steps before the existing test setup in each workflow: - name: Install root frontend dependencies run: npm ci - name: Build Vite frontend bundle run: npm run build The existing `tests/ui_tests/npm ci` and `tests/puppeteer/npm install` steps still run separately to install the Puppeteer/Chromium test deps. Costs roughly 30s of build time per workflow run. Unblocks CSS-only PRs from being meaningfully validated by the responsive baseline.	2026-05-10 19:27:09 +02:00
LearningCircuit	e2150c3165	fix(ci): use release environment for prerelease-docker secrets (#3983 ) Switch the four prerelease-docker.yml jobs from `environment: prerelease` to `environment: release` so they pick up the same DOCKER_USERNAME / DOCKER_PASSWORD already known to work for docker-publish.yml. Avoids duplicating environment secret configuration on the new prerelease environment introduced in #3969. The dispatch-time approval gate in release.yml still uses `environment: prerelease`, so the two checkboxes in the review modal remain independent — this only affects which secret store the downstream build jobs read from.	2026-05-10 17:27:29 +02:00
LearningCircuit	91b68acafd	docs(ci): auto-generated workflow status dashboard (#3966 ) * docs(ci): add auto-generated workflow status dashboard Adds `docs/ci/workflow-status.md` — a single page that surfaces every GitHub Actions workflow in the repo, grouped by role, with action items (disabled / stale / manual-only) at the top. Live status badges link to each workflow's runs page. Auto-generated from the workflow YAML files + the GitHub API by `scripts/generate_workflow_status.py`. Why: the GitHub Actions tab is chronological-mixed (poor "is anything red right now?" view), and the static workflow table in `CI_CD_INFRASTRUCTURE.md` drifts when workflows are added/renamed (PR #3963 fixed three factually wrong header claims for exactly this reason). A reference page that mechanically reflects current state + identifies dormant workflows answers both gaps. What's surfaced today (verified live): - Disabled: `nuclei.yml` (caller commented out in `release-gate.yml:177`). - Stale: `update-precommit-hooks.yml` — its weekly Friday cron has been failing for 10+ consecutive weeks (since at least 2026-03-06). This was discovered by the dashboard, not previously tracked. - Manual-only: `check-config-docs.yml`, `sync-main-to-dev.yml` (both intentionally manual; the dashboard shows them so they're not forgotten). Generator design notes: - Resolves reusable workflows correctly: `gh run list --workflow=X.yml` is empty for `workflow_call`-only workflows. The script walks the call graph (release.yml → release-gate.yml → semgrep.yml etc.), fetches the parent run's job list, and matches by job key parsed from the caller YAML (not by name heuristic — `gitleaks-scan` ↔ `gitleaks-main.yml` would otherwise collide with `gitleaks.yml`). - Picks "primary trigger" per workflow so e.g. `codeql.yml` (PR + push + cron + workflow_call) gets its glyph from the gated daily run, not a stale PR run. - Stale check walks the recent runs list to find last success — a workflow that ran red yesterday and green a week ago is not stale. - Manual edits outside the `<!-- BEGIN/END GENERATED -->` markers are preserved on regeneration; the timestamp lives inside the markers so post-marker content is fully user-owned. - Preflights `gh auth status` and rate limit before any per-workflow call — fails fast with actionable message instead of partial output. CI integration: - `.github/workflows/check-workflow-status.yml` runs `--check-structure` on PRs touching workflows, the dashboard, or the generator. Pure structural check (no API calls, no live data) — fast and deterministic. Live regeneration stays on demand. Cost: ~340 GitHub API calls per regeneration, ~45 sec wall-clock, ~6.8% of the 5000/hr authenticated quota. * fixup(ci): review-pass corrections to workflow status dashboard Surfaced by three rounds of code-review + correctness + security agents on the original PR. Four small fixes; no behavioral change to the generated dashboard's content. 1. Recognize commented job keys — `JOB_KEY_RE` now accepts an optional `# ` prefix. Previously, when an entire job block was commented out (e.g. `release-gate.yml:175-181` for nuclei), the commented `uses:` line inherited the previous active job's key (`gitleaks-scan`) instead of the correct `nuclei-scan`. Latent — commented entries are filtered out before reaching gated-run lookup — but would misattribute status if someone partially uncommented a block (uncommented just the `uses:` line). 2. Pin pyyaml to ==6.0.3 in the CI workflow. The repo convention is exact `==` pins (95% of `pip install` calls in workflows); the only floating range was the one introduced by this PR. Matches pdm.lock. 3. Validate marker order in `merge_with_existing`. If a manual edit leaves the BEGIN/END markers reversed (e.g. mid-merge-conflict), bail to a clean overwrite instead of splicing interleaved garbage. 4. Remove `_coerce_jq_stream` — unused helper left behind from an earlier iteration. Zero call sites; no behavior change. Verified by re-running the generator + `--check-structure`. The rendered dashboard's only diff vs prior commit is the regeneration timestamp and live "Last activity" cells (expected — those reflect recent runs since the previous regen). * feat(ci): bucketed activity labels + auto-regen on version bump Two changes that together make the dashboard's diffs meaningful instead of noisy. 1. Coarse activity buckets. Replace exact UTC timestamps in every "Last activity / Last manual run / Last successful run" cell with one of: `this week`, `last week`, `2 weeks ago`, `3 weeks ago`, `last month`, `2 months ago`, `3+ months ago`, `long ago`, `never`. Calendar-day boundaries (no time-of-day jitter) so two regenerations on the same date produce zero diff when nothing actually drifted. Verified: same-day re-runs after stable workflow state → empty diff. Also drop the redundant `Days idle` columns from Stale and Manual-only tables (the bucket label already says it), and round the "Last regenerated" footer to a date. Why: a daily-running healthy workflow used to bump its timestamp every regen (noise). Now it stays in `this week` indefinitely, and the only diffs that land in a version-bump PR are real bucket transitions — exactly the "this slipped from last week to last month — something might be wrong" signal the dashboard exists for. 2. Auto-regenerate on version bump. Add a step to `version_check.yml` right after the existing `generate_config_docs.py` regen. Same pattern as the config docs precedent — the dashboard refresh rides along with each version-bump PR and is reviewable in the same diff. Costs ~340 GitHub API calls per run (well under the GITHUB_TOKEN 1000/hr workflow-runs limit). Adds `actions: read` to the job permissions block; uses `pyyaml==6.0.3` matching pdm.lock. * feat(ci): drop regen timestamp; add health banner; fix in-progress false-stale Three follow-ups to keep version-bump diffs strictly meaningful, plus two correctness fixes uncovered by repeated stability testing. 1. Drop the "Last regenerated" date. Git history is authoritative for "when this snapshot was taken"; embedding a date here forced a single-line diff every regeneration even when nothing else drifted. 2. Aggregated health banner at the top of the generated region: `63 workflows: 1 disabled · 1 stale · 2 manual-only · 59 active` Counts only change when a workflow shifts between {disabled, stale, manual, active} — same level of diff-stability as the per-row buckets. 3. `?event=schedule` for own-cron workflow badges. Verified effective by SHA-comparing badge bodies for workflows with multi-event run history. Makes the badge for e.g. `gitleaks.yml`, `fuzz.yml`, `osv-scanner.yml` reflect cron health specifically, rather than whichever PR ran last. The runs-page link uses the matching `?query=event%3Aschedule` so a click lands on the filtered run list. 4. Fix false-stale during in-flight release runs. Previously, when release.yml was running, gates reachable via release.yml (puppeteer-e2e-tests, ci-gate, etc.) would briefly flip to "stale" because `fetch_last_gated_run` returned the in-progress run first and `last_success` couldn't see past it. Now the function walks all 5 caller runs and returns both the latest match (for activity) and the latest successful match (for staleness), avoiding the flip. 5. Map all GitHub conclusion enum values. A `gitleaks.yml` run completed with `action_required` between two test regens; the glyph table didn't have it and rendered `?`. Added every documented value (`neutral`, `timed_out`, `stale`, `action_required`) and changed the unknown-fallback from `?` to em-dash, so future GitHub-side enum additions don't introduce a false-positive diff. Verified: two same-day regens after workflow state has settled now produce zero diff. * ci(version-bump): make workflow-status regen non-blocking Add `continue-on-error: true` to the dashboard regeneration step in version_check.yml. The regen calls ~340 GitHub API endpoints and would otherwise block the entire version-bump PR if any of them transiently fail (rate-limit hit, GitHub Actions outage, etc.). The failure mode should be "dashboard stays at the previous snapshot until next successful regen", not "release pipeline is blocked". The sibling `generate_config_docs.py` step doesn't need this — it's purely local with no external API dependency.	2026-05-10 15:58:32 +02:00
LearningCircuit	632bb176fc	fix(ci): scope prerelease-docker jobs to prerelease environment (#3978 ) The prerelease-docker workflow's jobs declared no environment, so the DOCKER_USERNAME / DOCKER_PASSWORD secrets stored on the new `prerelease` environment (added in #3969) were invisible to them and the Docker Hub login step failed with "Username and password required" (run 25627724313). Add `environment: prerelease` to all four jobs, mirroring how docker-publish.yml scopes every job to `environment: release`. This makes the environment secrets visible and applies the same reviewer gate that already protects the real publish workflow.	2026-05-10 14:17:08 +02:00
LearningCircuit	28b1732259	test(ui): replace flake-prone delays, fix local-DX bug, correct stale CI comment (#3972 ) * test(ui): replace fixed delays in metrics_dashboard with proper waits Five hardcoded `await delay(N)` calls in tests/ui_tests/test_metrics_dashboard.js became `page.waitForResponse(...)` and `page.waitForSelector(...)` plus a short `waitForFunction` for the SPA-route check. Each replacement waits for the real condition (an API response or a DOM element) instead of a fixed sleep, so the test stops racing and gets faster on machines that finish the work quickly. Verified 10/10 runs against a live local server: all pass at ~19.7s wall time (previously ~25s with the fixed delays summed alone consuming 12s of that). Concrete sites: * line 79 → wait for `/api/start_research` response, then `waitForFunction` on the URL change * line 164 → wait for `/api/metrics` response (10s ceiling) * line 290 → wait for `period=7d` response (5s ceiling) * lines 334, 352 → wait for the metrics dashboard selector after navigation * test(ui): handle puppeteer's fullPage screenshot ceiling gracefully Running test_responsive_ui_comprehensive.js locally without CI=true used to fail on the Settings page with `Protocol error (Page.captureScreenshot): Page is too large` — Puppeteer/Chromium's fullPage screenshot caps at 16384px, and the Settings page rendered at 375px wide blows past that limit. The error bubbled up to testPage's catch block and marked the whole page as failed. CI environments avoided the problem because the diagnostic-screenshot calls are guarded by `!process.env.CI`, so local devs couldn't reproduce CI's pass. The screenshots are diagnostic, not the test target. Added a `safeScreenshot(opts)` helper that catches the documented "Page is too large" / `captureScreenshot` protocol errors and falls back to a viewport-only capture so the run continues. Replaced all 9 fullPage screenshot call sites in this file with the helper; the safeScreenshot method itself still uses `page.screenshot` directly (the only place that should). Verified 5/5 runs locally pass (mobile viewport, no CI=true) at ~31s wall time; CI=true behavior is unchanged. * ci(workflows): correct stale concurrency-comment in responsive-ui-tests The comment block above `permissions:` claimed the workflow "triggers on both pull_request and workflow_call." That was true historically but became wrong when #2248 removed the pull_request trigger to keep this heavy matrix build (mobile + desktop, ~20 min each) off the PR gate. The comment was added later in #3600 with the stale wording, so anyone reading it has been misled about when this workflow actually runs. Rewrite to describe current behavior accurately: runs via workflow_call from release.yml's responsive-test-gate and via workflow_dispatch only. The concurrency-history note (PR #3554 / #3599) is preserved. No functional change — just the comment. * test(ui): filter benign navigation-abort race in benchmark page test `Benchmark Results Page › page loads without critical errors` was flaky with `Error loading benchmark history: TypeError: Failed to fetch`. The describe-scoped `beforeEach` already navigates to `/benchmark/results`, then the test re-navigates with listeners attached. The first navigation's in-flight history fetch gets aborted by the second navigation and surfaces as `Failed to fetch` — a benign race, not a real bug. Add `Failed to fetch` to the existing filter list (next to favicon, 404, and Failed to load resource), with an inline comment explaining why. Verified 5/5 clean runs locally; previously hit 1 flaky / 4 clean.	2026-05-10 13:46:34 +02:00
LearningCircuit	1315b679e0	ci(research): switch E2E research workflow to langgraph-agent strategy (#3965 ) * ci(research): switch E2E research workflow to langgraph-agent strategy The ldr_research label runs scripts/ldr-diff-research.py, which until now didn't pass a search_strategy and so fell through to the quick_summary default of source_based. Switch to the agentic langgraph-agent strategy so the workflow exercises the autonomous research path. - Adds --strategy CLI arg and LDR_STRATEGY env var, default langgraph-agent (consistent with the existing --provider / --search-tool / --iterations pattern). - Workflow exposes LDR_STRATEGY: vars.LDR_STRATEGY \|\| 'langgraph-agent' so the choice is overridable per-repo via Variables. - Notes in the script docstring that LDR_ITERATIONS=1 is a no-op for the langgraph strategy (which reads langgraph_agent.max_iterations from settings instead). * ci(research): consolidate model var to LDR_RESEARCH_MODEL The workflow had two model variables — vars.LDR_MODEL for diff mode and vars.LDR_STATIC_MODEL for static mode — selected by a small set-model step. Collapse to a single LDR_RESEARCH_MODEL variable shared by both labels, mirroring the AI reviewer's vars.AI_MODEL pattern. - Default: google/gemini-2.0-flash-001 (the value the script was already falling through to). - Override via Settings → Variables → New repository variable → name: LDR_RESEARCH_MODEL. - The set-model step is removed; the workflow now passes the env var through directly. - Script reads LDR_RESEARCH_MODEL instead of LDR_MODEL. Note: existing repo variables LDR_MODEL and LDR_STATIC_MODEL become orphaned by this rename and can be deleted from repo settings. * ci(research): stop overriding strategy iterations from the workflow Previously the workflow set LDR_ITERATIONS=1 and the script forwarded that as iterations= in kwargs. For source_based that capped research at one iteration; for langgraph-agent it was effectively a no-op (langgraph reads max_iterations, not iterations) but the wiring was misleading. - Drop LDR_ITERATIONS from the workflow env block. - Make --iterations default to None in the script and only forward it to quick_summary when explicitly set on the CLI. - Each strategy now uses its own setting-driven default unless overridden — for langgraph-agent that means langgraph_agent.max_iterations (default 50) flows through unchanged. * ci(research): split research model into MAIN + CHEAP per label Bring back per-label model selection with cleaner names: - ldr_research → vars.LDR_RESEARCH_MODEL (deep PR analysis, user-configurable) - ldr_research_static → vars.LDR_RESEARCH_CHEAP_MODEL (regression smoke, kept cheap) Both default to google/gemini-2.0-flash-001 if unset, so existing behaviour stays identical until you actually configure cheap-model. The script and its env-var contract are unchanged — the workflow just picks which value to feed into LDR_RESEARCH_MODEL based on the applied label.	2026-05-10 13:10:02 +02:00
LearningCircuit	8871d0fdab	ci(release): split prerelease docker into its own environment (#3969 ) Today the trigger-prerelease-docker job and the create-release / trigger-workflows jobs all gate on `environment: release`. GitHub's "Review deployments" modal collapses every pending job in the same environment under one checkbox, so approving `release` approves the prerelease test AND the actual publish at once. There is no UI affordance to test the prerelease docker build first and decide on the release afterward. Move trigger-prerelease-docker to a new `prerelease` environment so the review modal shows two independent checkboxes. Maintainers can now: - Approve `prerelease` only, test the docker image, then approve `release`. - Reject `prerelease` and approve `release` to skip the prerelease step. - Approve `prerelease`, test, then cancel the run to abandon the release. Requires a one-time GitHub Settings change: create a `prerelease` environment with the same required reviewers as `release`. PAT_TOKEN is a repo secret, so no environment-secret copy is needed. create-release and trigger-workflows remain on `release` — unchanged.	2026-05-10 12:10:32 +02:00
LearningCircuit	b8602b8f10	fix(security): suppress alerts #7743 #7744 #7745 (audited false positives) (#3968 ) - #7743 (zizmor/dangerous-triggers): welcome-first-time.yml Adds inline `# zizmor: ignore[dangerous-triggers]` with rationale. pull_request_target is required so fork PRs receive a writable token for the welcome comment. The workflow never checks out PR content, never executes fork-controlled scripts, and only reads `sender.login` (operator-trusted GitHub event metadata). The comment body is a static template with no PR-controlled interpolation. This is one of the safe, audited use cases of pull_request_target. - #7744 / #7745 (Bearer/javascript_lang_dangerous_insert_html): context-overflow.js innerHTML at lines 453 and 501. Adds `// bearer:disable javascript_lang_dangerous_insert_html` alongside the existing `eslint-disable` comments. All user-controlled values are already routed through escapeHtml; numeric fields go through formatNumber; CSS classes and badges are hardcoded literals. This matches the convention used in collection_details.js, embedding_settings.js, and other JS components in the repo.	2026-05-10 11:53:35 +02:00
LearningCircuit	0f65961fa0	docs(ci): cross-link compose-integration-test ↔ compose-published-smoke (#3963 ) Make the relationship between the two compose workflows explicit so future contributors don't try to add the build-override variant to release-gate's daily cron (PR #3962 attempted this and was reverted). - compose-integration-test.yml: add a "do not move into the daily cron" paragraph pointing readers at compose-published-smoke.yml as the workflow that already covers ongoing drift between main's compose.yml and the published image. - compose-published-smoke.yml: fix three stale claims in the header: 1. "per-PR / release-gate test" — neither is true (no pull_request trigger, not in release-gate.yml; runs only at release time + manual dispatch). 2. "PR compose changes already covered by the per-PR integration test" — there is no per-PR integration test; replace with the accurate reason (this test can only fail on drift already on main). 3. cron-offset comment referenced a "daily compose-integration-test (03:00)" that does not exist; offset is only against release-gate. Comment-only change. No workflow behavior changes.	2026-05-10 08:29:00 +02:00
LearningCircuit	e6d72faa14	test(embedding-settings): regression spec for model dropdown reset (#3863 ) (#3949 ) * test(embedding-settings): regression spec for model dropdown reset (#3863) Adds a Playwright spec that mocks the embedding-settings backend (`/library/api/rag/models`, `/library/api/rag/settings`, `/settings/api/local_search_embedding_model`, `/settings/api/embeddings.ollama.url`) so the page renders without a real Ollama or Sentence-Transformers backend. Three tests cover the bug from #3863 and the surrounding contract: 1. selecting a non-top model auto-saves it and persists across reload — exercises the change-listener path that the per-field auto-save relies on. 2. Ollama URL change does not reset the selected model — the load-bearing regression test. Reverting the preserve+restore patch in `updateModelOptions()` was confirmed to make this test fail with the same symptom the issue reporter saw (dropdown snaps to the index-0 model). 3. "Save Default Settings" button is gone — guards against an accidental re-introduction of the redundant button that originally triggered the bug. Mock route registration order matters here: the catch-all `/settings/api/` is registered first so the specific PUT mocks for `local_search_embedding_model` etc. (registered later) win Playwright's last-registered-wins precedence. * test(embedding-settings): address review findings on regression spec Round-2 review of PR #3949 surfaced that the spec was not actually wired into any CI workflow and had a couple of correctness/flakiness issues. - Add `embedding-settings-dropdown` to the curated Safari filename filter in `.github/workflows/playwright-webkit-tests.yml` (both Desktop Safari and Mobile Safari runs) so the daily release-gate Playwright run picks up this regression spec. - Replace `waitForTimeout(500)` with a deterministic poll on a new `state.modelsFetches` counter that ticks on each `/library/api/rag/models` GET. Once the count rises past the pre-action baseline, the post-save `loadAvailableModels()` call has completed and any spurious save the bug would trigger has already fired. - Gate the `local_search_embedding_model` and `local_search_embedding_provider` PUT mocks on request method, mirroring the `embeddings.ollama.url` handler. A stray GET hitting these handlers would otherwise push `undefined` into `state.modelSaves`/`providerSaves`. - Skip the entire describe block on mobile projects (`test.skip(({ isMobile }) => isMobile, ...)`). This is a desktop form-state regression test, not a layout test — it doesn't need to run across 12 device profiles. - Reframe test #1's docstring: it's a happy-path smoke test for the per-field auto-save contract, not a regression test for #3863. The old comment claimed it would catch the bug, which I confirmed empirically it does not (the dropdown-rebuild path that #3863 exploited isn't on the model-pick-and-reload flow). - Add a new test for the provider-change path (`updateModelOptions` is also reached from the provider-change handler at line 325 of `embedding_settings.js`). The model-shared-across-providers fixture forces the preserve+restore branch in `updateModelOptions` to fire and asserts the selection survives. Verified locally: this test fails alongside the Ollama-URL test when the patch is reverted. * test(embedding-settings): extract mocks helper + cover text_separators reset Round-2 review left two advisory items: extract the mock infrastructure to a shared helper, and cover the text_separators reset behavior added in the follow-up commit on PR #3940 (`40678b2`). Both are addressed here. - Move BASE_MODELS_PAYLOAD, defaultSettings, and mockEmbeddingApis to `tests/ui_tests/playwright/tests/helpers/embedding-settings-mocks.js`, matching the CommonJS pattern used by `mobile-utils.js` so additional embedding-page specs can reuse the same backend mocks. Also export DEFAULT_TEXT_SEPARATORS so callers can assert against it without duplicating the literal. - Add `state.textSeparatorsSaves` and a method-gated PUT route mock for `/settings/api/local_search_text_separators` to capture saves. - New test: clearing the text_separators textarea and blurring persists the default array. Sanity-checked locally by reverting the empty-textarea branch in `embedding_settings.js:395-415` to its pre-#3940 `if (!rawValue) return;` state — the new test fails (no PUT fired); restoring the fix makes it pass again. * ci: drop embedding-settings-dropdown from Mobile Safari filter The spec uses `test.skip(({ isMobile }) => isMobile, ...)` at the describe level, so on Mobile Safari Playwright would load the file and skip every test — harmless but noisy. Keep it in the Desktop Safari filter only, where it actually runs.	2026-05-10 08:11:19 +02:00
LearningCircuit	bc527f7aa9	ci(workflows): migrate to LDR_DISABLE_RATE_LIMITING canonical name (#3945 ) * ci(workflows): migrate to LDR_DISABLE_RATE_LIMITING canonical name (#3936 follow-up) Flips the 9 remaining `DISABLE_RATE_LIMITING=true` workflow uses to the canonical `LDR_DISABLE_RATE_LIMITING=true` name introduced in #3936, so CI no longer trips its own deprecation warning. Also closes a latent test-isolation gap in `test_enabled_by_default` that did not pop the canonical var, which would have started failing as soon as a developer or workflow exported it. * test(auth): flip remaining DISABLE_RATE_LIMITING uses to LDR_ prefix Picked up from the closed #3944. The test_auth_routes fixture used the legacy env var, and test_auth_rate_limiting carried a stale comment referencing it. Both now use the canonical LDR_DISABLE_RATE_LIMITING introduced in #3936, matching the workflow flips in this PR. * test(env_registry): isolate TestIsRateLimitingEnabled from canonical env var CI now exports LDR_DISABLE_RATE_LIMITING=true (per the workflow flips in this PR). Two tests in TestIsRateLimitingEnabled use patch.dict(os.environ, {"DISABLE_RATE_LIMITING": ...}) without clearing the canonical key, so the canonical var bled in from the outer process and short-circuited before the legacy code path: - test_enabled_when_flag_false: expected True (legacy=false), got False because canonical=true wins - test_legacy_form_emits_deprecation_warning_once: expected one warning, got zero because canonical short-circuit skips legacy Add a class-level autouse clean_env fixture that strips both env-var forms (mirroring the one in test_env_registry_extended.py). The remaining tests in this class were silently coincidence-passing under the bug because they expect False and canonical=true also gives False. Verified by exporting LDR_DISABLE_RATE_LIMITING=true and running the two test files: 65 passed.	2026-05-10 08:10:28 +02:00
LearningCircuit	5e3f37a7ce	fix(ci): grant pull-requests:write to welcome-first-time workflow (#3950 ) createComment on a PR via /issues/{n}/comments returns 403 with only `issues: write`. GitHub now requires `pull-requests: write` when the issue resource is actually a PR — the API response's `x-accepted-github-permissions: issues=write; pull_requests=write` indicates both are needed (issues for plain issues, pull_requests for PRs). All five recent runs of this workflow have failed for this reason; adding the permission unblocks the welcome comment.	2026-05-09 22:28:23 +02:00
LearningCircuit	5a0ca57ded	feat(ci): welcome first-time contributors with a single comment (3/5) (#3859 ) * feat(ci): welcome message on a contributor's first PR Adds .github/workflows/welcome-first-time.yml using actions/first-interaction@v3.1.0 (pinned by SHA). Posts a single comment on a contributor's first PR pointing at CONTRIBUTING.md and our review-process docs. Uses pull_request_target so forked PRs receive a writable token; the action only posts a fixed message (no checkout, no shell execution), so the security surface is minimal. Permissions limited to pull-requests: write. PR 3 of 5 introducing PR triage automation. Independent of the other PRs in the series. * feat(ci): rewrite welcome workflow with per-author check + starter pack Replaces actions/first-interaction (which has no author filter on isFirstPullRequest, so it would never fire on a repo with prior PRs) with a github-script that uses issues.listForRepo?creator=<user> to detect a contributor's actual first PR. Fixes the missing issues:write permission needed by issues.createComment (PR comments route through the issues API). Adds a bot filter (consistent with the PR triage workflow) so dependabot/renovate PRs don't trigger a human-facing welcome. Expands the welcome message into a starter pack: install guide, dev guide, pre-commit hook setup (inline commands), architecture overview, tests README, FAQ, troubleshooting, security policy, and Discord. All links use absolute URLs to files that exist on main.	2026-05-09 18:47:54 +02:00
LearningCircuit	8cc0184cbe	feat(ci): auto-apply triage labels on PR open and review (2/5) (#3858 ) * feat(ci): auto-apply triage labels on PR open and review events Adds .github/workflows/pr-triage.yml that: - on PR opened: applies external-contributor / first-time-contributor / bot / needs-codeowner-review based on author_association - on synchronize: flips awaiting-author → awaiting-codeowner when author pushes new commits - on review submitted by a codeowner: clears or applies lifecycle labels based on the review state (approved / changes_requested) - on review dismissed: re-applies needs-codeowner-review if a previous changes-requested review was withdrawn Codeowner detection accepts the hardcoded global-owners list OR any reviewer with OWNER/MEMBER/COLLABORATOR association (covers team-based codeowners that aren't direct repo members). Uses pull_request (not pull_request_target) so fork PRs run with a read-only token; label calls 403 silently for forks. Acceptable trade vs the security cost of running pull_request_target with secrets on fork code. Maintainers can apply labels manually for fork PRs. Updates CODEOWNERS with a comment noting the global-owners list is mirrored in pr-triage.yml; both must stay in sync. PR 2 of 5 introducing PR triage automation. Depends on labels being synced first via PR 1 (#3857). * fix(ci): tighten codeowner check, prune permissions, extend bot list - Drop the OWNER/MEMBER/COLLABORATOR fallback in isCodeownerReview; rely on the hardcoded CODEOWNERS list. The fallback was designed for team-based codeowners but this repo has no such setup, and the fallback would become a security-relevant mislabel if branch protection adopts require_code_owner_reviews=true. - Trim job permissions to issues:write only — pull-requests:write and contents:read were unnecessary (issues:write covers PR labels since PRs are issues internally). Matches label-fixed-in-dev.yml precedent. - Add mseep-ai and Nexus-Digital-Automations to KNOWN_BOTS — both appear in repo PR history without the [bot] suffix. * fix(ci): clear awaiting-author on approval, gate dismissed handler Two label-state bugs surfaced by the AI code reviewer on the previous revision (#3858 review pass): - Approval branch now also removes awaiting-author. Without this, a codeowner who switches from changes_requested to approved purely via comments (no intervening author push, so no synchronize event to flip the label) leaves awaiting-author stuck on the PR. - Dismissed branch now requires that the dismissed review was a codeowner's changes_requested review. Otherwise any non-codeowner review being dismissed while awaiting-author happens to be set would incorrectly flip the PR back to needs-codeowner-review while the real codeowner request is still active. * chore(ci): pre-commit check that pr-triage.yml CODEOWNERS matches .github/CODEOWNERS Eliminates the manual sync hazard flagged in the PR review: the hardcoded JS array in pr-triage.yml must mirror the global owners line in .github/CODEOWNERS, and the only thing keeping them in sync was a pair of comments. This adds a small Python pre-commit hook that parses both files and fails if their owner sets disagree (case-insensitive, order-independent). Triggers on edits to either file. Same shape as check-version-sync.py. * fix(ci): swallow 403 on fork-PR label calls so the run stays green The previous comment promised fork PRs would "403 silently" but addLabels never caught the error — every fork contribution would have shown a red check on the triage workflow. This adds a narrow 403 catch (scoped to the documented fork-PR case via pull_request's read-only token) shared by addLabels and removeLabel, with a console log so the no-op is visible in the run output. Other status codes still throw. Behavior matches the original design intent; comment is now accurate. Flagged by AI Code Review on the previous revision. * fix(ci): drop dead state check in dismissed-review handler GitHub mutates review.state to "dismissed" on pull_request_review action=dismissed events (github/docs#20216), so the previous guard `review.state !== 'changes_requested'` always returned early. The awaiting-author -> needs-codeowner-review flip never executed. Use the awaiting-author label as the discriminator instead — it's only set by a codeowner's changes_requested review, so its presence is reliable proof the dismissal is the one we care about. Dismissals of approval/comment reviews are no-ops because the label won't be present.	2026-05-09 15:57:04 +02:00
dependabot[bot]	649ead1079	chore(deps): bump github/codeql-action from 4.35.3 to 4.35.4 (#3919 ) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.3 to 4.35.4. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](`e46ed2cbd0...68bde559de`) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 4.35.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-09 14:43:47 +02:00
dependabot[bot]	fca32f072b	chore(deps): bump anthropics/claude-code-action from 1.0.107 to 1.0.119 (#3918 ) Bumps [anthropics/claude-code-action](https://github.com/anthropics/claude-code-action) from 1.0.107 to 1.0.119. - [Release notes](https://github.com/anthropics/claude-code-action/releases) - [Commits](`567fe954a4...476e359e62`) --- updated-dependencies: - dependency-name: anthropics/claude-code-action dependency-version: 1.0.119 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-09 14:43:20 +02:00
dependabot[bot]	e76c323813	chore(deps): bump actions/dependency-review-action from 4.9.0 to 5.0.0 (#3915 ) Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.9.0 to 5.0.0. - [Release notes](https://github.com/actions/dependency-review-action/releases) - [Commits](`2031cfc080...a1d282b36b`) --- updated-dependencies: - dependency-name: actions/dependency-review-action dependency-version: 5.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-09 13:22:14 +02:00
dependabot[bot]	a21c30bbe4	chore(deps): bump anchore/scan-action from 7.3.2 to 7.4.0 (#3917 ) Bumps [anchore/scan-action](https://github.com/anchore/scan-action) from 7.3.2 to 7.4.0. - [Release notes](https://github.com/anchore/scan-action/releases) - [Changelog](https://github.com/anchore/scan-action/blob/main/RELEASE.md) - [Commits](`7037fa0118...e1165082ff`) --- updated-dependencies: - dependency-name: anchore/scan-action dependency-version: 7.4.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-09 12:41:57 +02:00
dependabot[bot]	1aaff5cac9	chore(deps): bump sigstore/cosign-installer from 4.1.1 to 4.1.2 (#3916 ) Bumps [sigstore/cosign-installer](https://github.com/sigstore/cosign-installer) from 4.1.1 to 4.1.2. - [Release notes](https://github.com/sigstore/cosign-installer/releases) - [Commits](`cad07c2e89...6f9f177880`) --- updated-dependencies: - dependency-name: sigstore/cosign-installer dependency-version: 4.1.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-09 12:41:35 +02:00
LearningCircuit	3066a9b2c5	chore(deps): cover audited test dirs in dependabot config (#3913 ) Two test directories audited by .github/workflows/npm-audit.yml were missing from .github/dependabot.yml: - /tests/ui_tests/playwright - /tests/accessibility_tests So they only received Dependabot security alerts (via GitHub's GHSA scanner) and never the routine weekly version-bump PRs. That gap is why basic-ftp in tests/accessibility_tests had to be patched manually in #3896 instead of arriving as a normal Dependabot update. Add both as daily npm trackers, matching the cadence of the other test-dir entries.	2026-05-09 11:57:04 +02:00
LearningCircuit	7065b6b1b4	ci: weekly published-image smoke test with auto-issue on failure (#3890 ) * ci: weekly smoke of main's compose against the published Docker Hub image Complements compose-integration-test.yml (#3886). That workflow builds the LDR image from the working tree — it tests "this PR's code's compose with this PR's code's image". This new workflow tests "main's compose with the currently-published localdeepresearch/local-deep-research:latest" — the exact artefact users get when they follow the README quickstart: curl -O .../docker-compose.yml && docker compose up -d The drift between those two is real. Whenever a compose change lands on main but the image hasn't been republished (which happens between every release), users following the quickstart can hit a broken stack — the same class of bug as #3874, but only visible against the published image. Cadence: weekly Monday 05:00 UTC. The failure modes are slow-moving and weekly burns ~1/4 the CI minutes a daily run would. The PR-time / release gate test in #3886 covers the per-change cases. On schedule failure, opens (or comments on) a tracking issue with run URL, container digests, and a triage checklist. Stable title prefix dedups across weeks; manual workflow_dispatch runs do NOT auto-create issues (those are for ad-hoc testing). Reuses the same wait/probe/teardown logic as compose-integration-test.yml, intentionally not factored into a composite action — two workflows, ~50 lines of shared shell, refactoring for DRY would cost more than it saves right now and the loops will diverge as we tune them. * ci: require LDR healthy in published-image smoke test Same fix as #3886 commit `b7ea510f`. LDR has a Dockerfile-level HEALTHCHECK (Dockerfile:306, probes /api/v1/health), so docker inspect returns the health status not the state — checking for ldr_h='running' never matched even though the stack was healthy. Require 'healthy' for all three services to match the actual signal. * ci(compose-published-smoke): mirror AI review fixes from #3886 Same pipefail + service-based container resolution + safer probe pattern as #3886's commit `dbf6b83d`. Both workflows share the same wait/probe logic and should stay in sync. Per AI review on #3886: - Replace hardcoded ollama_service / searxng container names with docker compose ps -q <service> resolution everywhere. - set -euo pipefail throughout (no behavior change in the green path). - HTTP probe uses case statement on captured -w code, not grep on a pipe — pipefail-safe and gives better retry diagnostics. * ci(compose-published-smoke): address AI review — service names, --no-build, gh = form Three findings from AI review: 1. Digest capture loop iterated `ollama_service` (compose's container_name) while everything else uses service names + cid_for. Worked today only because container_name happens to match. Refactor to use cid_for for all three services — same pattern as the wait/fail-fast loops, name-drift safe. Also folds the separate ldr_id block into the same loop. 2. Add `--no-build` to `docker compose up -d`. No-op today (no service has a `build:` directive), but defends against a future compose change that adds one — this workflow specifically tests the published image, and silently building from source would invalidate the test. 3. Switch gh CLI calls to `--body="$BODY"` (= form) so a body that ever starts with `-` can't be misparsed as a flag. Hygiene; current bodies are all controlled heredocs. * ci(compose-published-smoke): drop curl -f to preserve HTTP error codes AI review #4 finding (legit, even though the same review's findings #1-#3 were stale — those were already fixed in `bc7ca8504`). curl -f makes the request fail-silently on HTTP 4xx/5xx, which suppresses the -w output. Combined with `\|\| echo "000"`, this means a 404 / 503 / or real network error all collapsed into "000" — erasing the diagnostic signal we most want when triaging a failure ("LDR is up but serving an error page" vs "LDR is unreachable"). Drop -f. Now: - HTTP 200/30x → match → success - HTTP 4xx/5xx → captured as the real code, logged on retry, never matches → eventual timeout with the actual code in the logs - Network failure → "" → \|\| echo "000" → logged as "000" Body is still discarded via -o /dev/null; nothing else changes. * ci(compose-published-smoke): make digest logging unconditional and explicit AI review on #3890 flagged that "digests are only emitted on failure" — not quite right (the dump step is `if: always()` and `cat`s digests on every run), but the framing inside the digest block was misleading: the heading said "## Image digests at time of failure", which is wrong on green runs and in the workflow log. Two small changes: 1. Drop the failure-specific phrasing in the heading. The auto-issue body still contextualizes "below" as failure-time when actually filed, so no information is lost there. 2. Wrap the cat in `::group::Image digests` for a properly labeled, collapsible log group on every run — matches the style of the "docker compose ps" and per-service log groups above. Audit/bisection ergonomics: every successful weekly run now leaves a clearly labeled record of which image SHAs the test exercised. * ci(compose-published-smoke): \|\| true on teardown to prevent false drift alerts AI review on #3890 finding: if `docker compose down` fails (daemon hiccup, hung container) after a successful smoke test, the job exits non-zero, which makes the auto-issue step's `if: failure()` trigger — opening a drift-tracking issue despite the stack having passed health and HTTP checks. False positive. Append `\|\| true` so teardown errors don't propagate. CI runners are ephemeral so any leftover state vanishes with the runner regardless; the only downside of swallowing here is hiding diagnostic info we don't need (we already dumped logs and digests in the previous step).	2026-05-09 11:08:41 +02:00
LearningCircuit	b829c2d65a	ci(compose-integration): hardening follow-ups (--no-build + drop curl -f) (#3898 ) * ci(compose-integration): add --no-build to docker compose up Defense-in-depth flag from AI review on #3890 (where the same fix landed in commit `bc7ca8504` alongside two other fixes). No-op today since no service in docker-compose.yml has a `build:` directive, but defends against a future compose change that adds one. This gate's whole purpose is to test the locally-built LDR image (tagged in the previous step) plus the pulled ollama/searxng images — silently building from source on `up` would invalidate the cache strategy and the test. * ci(compose-integration): drop curl -f to preserve HTTP error codes Mirrors the fix on #3890 (commit `e785f464b`). With -f, curl exits non-zero on HTTP 4xx/5xx AND suppresses -w output, so the `\|\| echo "000"` sentinel collapsed 404 / 503 / real-network-failure all into the same "000" — the most interesting diagnostic (LDR up but serving an error page) was lost. Without -f, every HTTP response gives us its real code in retry logs; only true network failures fall through to "000". Body still discarded via -o /dev/null.	2026-05-09 10:59:18 +02:00
dependabot[bot]	bf43e7c328	chore(deps): bump actions/setup-node from 4.4.0 to 6.4.0 (#3814 ) Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.4.0 to 6.4.0. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](https://github.com/actions/setup-node/compare/v4.4.0...48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: 6.4.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>	2026-05-09 09:45:05 +02:00
dependabot[bot]	12c01cd44c	chore(deps): bump actions/github-script from 8.0.0 to 9.0.0 (#3812 ) Bumps [actions/github-script](https://github.com/actions/github-script) from 8.0.0 to 9.0.0. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v8...3a2844b7e9c422d3c10d287c895573f7108da1b3) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: 9.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>	2026-05-09 09:44:27 +02:00
LearningCircuit	4540adaac2	ci: full docker-compose integration test + drop ollama model pre-pull (#3886 ) * ci: add full docker-compose integration test to release gate Brings up the bundled docker-compose.yml end-to-end (searxng + ollama + local-deep-research) and asserts the whole stack reaches healthy/serving. This is the test that would have caught #3874 (cap_drop: ALL breaking SearXNG) before users hit it — and the same class of bug whenever an upstream image bumps its capability or healthcheck requirements. Cost is bounded by scoping triggers carefully: - pull_request: only when compose / Dockerfile / entrypoint scripts change - schedule: daily at 03:00 UTC (offset from release-gate at 02:00) - workflow_call: invoked from release-gate.yml so a release can't bypass it We override MODEL=tinyllama:1.1b for the test (~640 MB) instead of the default gemma3:12b (~7-8 GB). Users tune MODEL via env the same way; the compose config under test is otherwise identical to what ships. Wait loop fails fast on container exits rather than burning the full 12 min budget, and dumps logs from all three services on any failure for triage. * ci: skip ollama model pull in compose integration test The integration test verifies "compose up + healthy + LDR serves" — it does not run inference. After #3885 the ollama healthcheck is `ollama list` (model-agnostic), so pulling a model only adds ~1-2 min and a flake source (Ollama Hub registry transients) without exercising anything the test checks. Layer a small override (.github/compose-ci-override.yml) that replaces the ollama service's entrypoint with `ollama serve`. The base docker-compose.yml is otherwise unchanged — capabilities, networking, healthchecks, depends_on, ports all come from the file users actually run. Wait budget drops 12 min → 6 min accordingly. End-to-end inference, if we ever add it, belongs in a separate workflow that's transparent about the cost and runs less frequently. * fix(docker): drop ollama model pre-pull from compose The bundled compose's ollama service overrode the image entrypoint with scripts/ollama_entrypoint.sh ${MODEL:-gemma3:12b}, which pre-pulled a multi-GB model on every fresh start. That had three problems: 1. Users running LM Studio / OpenAI / llama.cpp don't use ollama at all, but every fresh boot still pulled gemma3:12b (~7-8 GB). 2. First-time setup wasted 5-10 min on a model selection the user may not even want — gemma3:12b is a strong opinion baked into the compose. 3. CI integration tests (#3886) had to layer an override file just to skip the pull, since the model isn't relevant to "stack-comes-up" smoke testing. Drop the entrypoint override entirely. The ollama image's default entrypoint is `ollama serve`; that's all we need. The healthcheck introduced in #3885 already probes the daemon (model-agnostic) so this slots in cleanly. Also drops the now-unused `ldr_scripts:/scripts` mount on the ollama service. Behavior change for ollama users: the model is no longer pre-pulled on boot. They pull explicitly (`docker exec ollama_service ollama pull X`) or LDR pulls on first use. The first-research wait is the same total time, just deferred to when the user actually triggers it instead of blocking compose-up. In #3886, removes the .github/compose-ci-override.yml workaround now that the compose itself doesn't pull a model. The integration test runs against the compose users actually run, with no test-only overrides. The scripts/ollama_entrypoint.sh file is left in place — it's no longer referenced from compose but may be useful for users who want a pre-pull in their own deployments. Cleaning that up can be a separate follow-up once we're sure no one depends on it. * ci: drop redundant pre-pull step in compose integration test `docker compose up -d` already pulls any image it doesn't have locally (default pull_policy: missing). The separate `docker compose pull ollama searxng` step was just for log clarity; it does the same work twice. The LDR image is locally built and tagged in the previous step, so `up -d` sees it's present and uses it as-is — no risk of compose yanking our local image. * ci: require LDR healthy (not just running) in compose integration test Previous condition checked \`ldr_h = "running"\` but LDR has a Dockerfile-level HEALTHCHECK at Dockerfile:306 (probing /api/v1/health), so docker inspect returns the health status, not the state — i.e. "healthy", never "running". The wait loop never matched and timed out at 6 min despite the stack being healthy the whole time. CI run for evidence: log line "[23:33:04] ollama=healthy searxng=healthy ldr=healthy" repeats for ~5 min. Fix: require "healthy" for all three. ollama and searxng have compose-level healthchecks; LDR has a Dockerfile-level one. The status() helper already returns Health.Status when a healthcheck exists, so requiring "healthy" is the right signal for all three. Also retires the "LDR has no healthcheck" follow-up note from the PR body — that was based on me checking the compose only, not the Dockerfile. * ci(compose-integration-test): drop pull_request and schedule triggers Per the original design (and the conversation thread on #3886), this test should only run via release-gate.yml. release-gate fires daily on its own cron + on every release + on manual dispatch, which is exactly the coverage we want. Removing the pull_request trigger means PRs that touch docker-compose.yml no longer pay 3-6 min per run for a test whose feedback isn't actionable at PR time anyway. Removing the standalone daily schedule avoids duplicating release-gate's own daily run. The successful run on commit `b7ea510f` confirmed the stack-up + healthy + HTTP-probe path works end-to-end before this trigger constraint. * ci: move compose integration test from release-gate to release The previous wiring put compose-integration-test.yml inside release-gate.yml, which fires daily on its own cron at 02:00 UTC. That meant the integration test ran daily — not the design intent. The failure modes this test catches (compose / image changes that break the bundled stack) are tied to actual release events, not time, so the daily cycle is wasted CI minutes. Move it to release.yml as a peer gate alongside ci-gate, e2e-test-gate, compat-test-gate. Same pattern: needs version-check, gated on should_release == 'true', wired into the build job's needs/if as a required gate. Now runs only on release events (push to main with new version, tag push, manual dispatch). Removed the equivalent block from release-gate.yml. Updated the workflow file's header comment to reflect the new placement. * ci(compose-integration): share ldr-prod cache scope with docker-tests When the release pipeline runs, docker-tests.yml (called from ci-gate) builds the LDR production image and writes layers to the GHA cache under scope=ldr-prod. The compose integration gate runs in parallel and was building from scratch on a separate scope=compose-integration cache — fine in isolation, but it meant ~3-5 min of redundant build work when ldr-prod's layers were already warm. Read from both scopes (compose-integration first, then ldr-prod) and keep writing only to our own scope so we don't disturb the cross-workflow cache. Falls back to a fresh build if neither has layers (brand-new branch, scope rotation, etc.) — no behavior change in that case. Also explicit `target: ldr` to guard against future Dockerfile changes where the default stage could become something other than the production target. docker-tests.yml's ldr-prod build uses the same target, so the cache layers line up. No coupling between gates — if docker-tests.yml fails or its cache is cold, this gate still works (just slower, like before). * ci(compose-integration): address AI review — pipefail + service-based resolution Three findings from the AI review on #3886: 1. Replace hardcoded container names with `docker compose ps -q <service>`. Previously inspected containers via the literal `ollama_service` / `searxng` strings (compose's `container_name:` values), while `docker compose logs` already used service names. If those drift, the wait loop would silently time out and log collection would miss the service. New `cid_for()` helper resolves container IDs by service name everywhere — single source of truth, name-drift safe. 2. Add `set -euo pipefail` to the wait step (no functional change since it has no pipes, but consistent with hygiene). 3. Refactor the HTTP probe so it doesn't pipe curl into grep. Capturing the code via `-w "%{http_code}"` + case statement removes the pipe entirely, avoiding the curl-failure-masked-by-grep problem the review flagged. Sentinel "000" on curl error gets logged on retry for better debug signal. pipefail is now safe to enable here. Fourth finding (`ldr_scripts` volume orphaned): not actually orphaned — the LDR service still mounts it (docker-compose.yml:124), the top-level declaration at :240 stays. Acknowledged in PR thread. No behavior change in the green path; failure-mode error messages are slightly clearer ("Container for service <svc>" instead of bare name).	2026-05-09 03:08:46 +02:00
LearningCircuit	d8034e27a4	feat(ci): declarative label set for PR triage (1/5) (#3857 ) * feat(ci): introduce declarative labels for PR triage Add .github/labels.yml + labels-sync.yml workflow (EndBug/label-sync@v2) managing 7 new labels for PR triage: 4 persistent (external-contributor, first-time-contributor, bot, needs-rework) and 3 lifecycle (needs-codeowner-review, awaiting-author, awaiting-codeowner) that will be toggled per-PR by a follow-up workflow. Sync is additive (delete-other-labels: false) so the existing 75 labels are not touched. Workflow runs only on push to main when labels.yml changes, plus workflow_dispatch for manual sync. First PR of a 5-PR series introducing PR triage automation. * fix(ci): pin actions and harden labels-sync workflow Adds the missing contents:read permission (without it, actions/checkout fails because explicit permissions: zeroes out unspecified scopes). Brings the workflow into line with repo conventions used by every other label/issues-write workflow: - SHA-pin actions/checkout (v6.0.2), step-security/harden-runner (v2.19.1), and EndBug/label-sync (v2.3.3); enforced by validate-image-pinning.yml. - Add harden-runner first step with egress-policy: audit (matches 54/57 workflows including label-fixed-in-dev.yml). - Move permissions to job scope; top-level permissions: {} for OSSF Scorecard. - Add timeout-minutes: 5 (matches label-fixed-in-dev.yml). - Use sparse-checkout for labels.yml only with persist-credentials: false. - Document the deliberate omission of concurrency: (regression #3554/#3599).	2026-05-08 21:50:01 +02:00
LearningCircuit	fd94bf8945	chore(release): remove dead alias and ghost label refs from release.yml (#3871 ) Three label entries in the changelog generator config no longer correspond to active labels: - `docs` (line 29): alias for `documentation`, which is already in the same category. Being deleted as part of label cleanup. - `CI/CD` (line 33): alias for `ci-cd`, same category. Same cleanup. - `ai-review-requested` (line 79): ghost reference. Added in `c0683b5ce` (PR #1131) for a planned auto-trigger AI review feature that was never implemented. The real AI-review trigger uses `ai_code_review`, which remains in the same 🔄 Branch Syncs & Automation category.	2026-05-08 20:39:23 +02:00
Aqil Aziz	3bf78baf07	docs: fix API example links (#3852 )	2026-05-08 01:30:25 +02:00
dependabot[bot]	56290b15c0	chore(deps): bump step-security/harden-runner from 2.19.0 to 2.19.1 (#3811 ) Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.19.0 to 2.19.1. - [Release notes](https://github.com/step-security/harden-runner/releases) - [Commits](`8d3c67de8e...a5ad31d6a1`) --- updated-dependencies: - dependency-name: step-security/harden-runner dependency-version: 2.19.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-06 08:01:44 +02:00
dependabot[bot]	dee75bd2a5	chore(deps): bump github/codeql-action from 4.35.2 to 4.35.3 (#3815 ) Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.2 to 4.35.3. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](`95e58e9a2c...e46ed2cbd0`) --- updated-dependencies: - dependency-name: github/codeql-action dependency-version: 4.35.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-06 08:00:42 +02:00
LearningCircuit	c245a27090	feat(ci): add prerelease Docker image workflow for pre-release testing (#3761 ) * feat(ci): add prerelease Docker image workflow for pre-release testing Build a prerelease Docker image (prerelease-v{version}-{short_sha}) after all quality gates pass, in parallel with the release approval step. The image is pushed to Docker Hub for local testing before the official release is published. Old prerelease tags are auto-deleted (best-effort) when the production release completes. - New prerelease-docker.yml: standalone workflow triggered by repository_dispatch - release.yml: add short_sha output and trigger-prerelease-docker job - docker-publish.yml: add best-effort cleanup of prerelease-* tags * fix(ci): address review feedback on prerelease docker workflow - Group >> "$GITHUB_STEP_SUMMARY" redirects (clears actionlint SC2129 pre-commit failure). - Bump pinned actions to match the rest of the release pipeline: step-security/harden-runner v2.17.0 -> v2.19.0 (4 sites), aquasecurity/trivy-action v0.35.0 -> v0.36.0 (2 sites). - Scope prerelease tag cleanup in docker-publish.yml to prerelease-v${RELEASE_VERSION}-* so concurrent prereleases for other versions and any unrelated prerelease-* tags survive. - Correct Trivy SARIF comment (artifact-only, not GitHub Security tab). * fix(ci): bump harden-runner pin in trigger-prerelease-docker to v2.19.0 The new job inherited an older v2.17.0 pin while the rest of release.yml (and docker-publish.yml) is uniformly on v2.19.0. Align them.	2026-05-02 11:52:25 +02:00
LearningCircuit	b632ca8ec4	feat(release): migrate to towncrier news fragments (#3773 ) * feat(hooks): version-check staged release-notes against current release When a release-notes file is staged, compare its version against the current latest GitHub release (preferred) or __version__.py (fallback, when gh CLI is unavailable). Warn if the staged version isn't ahead. Semantics chosen so that __version__.py being ahead of releases (the normal pre-release state) does NOT trigger a false alarm: - vs latest release: file MUST be > release. Equality means the version was already published — almost always a stale/duplicated notes file. Warn. - vs __version__.py (fallback): file MUST be >= version.py. Equality is correct — that's the upcoming release. Only file < version.py is suspicious. Warn. The warning includes the source ("latest GitHub release" or "__version__.py (gh unavailable)") and suggests the next patch / minor / major versions. Robustness: - gh call has a 5s timeout and falls back gracefully on missing binary, network failure, or no releases yet. - Files with non-versioned names (e.g., a hypothetical README.md inside docs/release_notes/) are skipped silently. - Hook still always exits 0 — non-blocking nudge, never fails the commit. * feat(release): migrate to towncrier news fragments LDR's PR throughput (~12 PRs/day, releases every 1–2 days, ~25–50 PRs per release) made the shared docs/release_notes/<version>.md model unworkable — every contributor was racing to edit the same file, and the file's name kept moving as the version did. Replace it with the standard towncrier flow used by Twisted, urllib3, and pip: - Each PR drops one fragment under news/<id>.<category>.md, where <id> is the PR/issue number and category is one of: breaking, security, feature, bugfix, removal, misc. Orphan fragments (no PR/issue) use +<slug>.<category>.md. - At release prep time the maintainer runs: pdm run towncrier build --version <X.Y.Z> --yes which renders fragments into docs/CHANGELOG.md and deletes them. - The release workflow extracts the just-rendered section from docs/CHANGELOG.md (via awk) and uses it as the human-narrative input to the published release body, alongside the AI TL;DR and the auto-generated PR list. Existing docs/release_notes/{0.2.0,0.4.0,1.6.0,1.6.8,1.7.0}.md stay untouched as historical record. The pre-commit hook is rewritten to nudge for news/ fragments instead of the old shared file. The version-check and staging-marker scanner from PR #3768/#3773 are dropped — fragments don't carry versions in their names, and towncrier's structural model removes the staging-marker class of bug entirely. Filename validation (category in allowlist, name matches expected pattern) is added so typo'd categories don't silently vanish from the rendered output. Includes news/3773.feature.md as the first fragment using the new convention. * fix(release): allowlist news/ fragments in .gitignore The repo's whitelist-style .gitignore (`` then `!<allow>`) was silently ignoring news/<id>.<category>.md fragments, so the towncrier migration's first fragment didn't make it into the previous commit. Add `!news//.md` next to the existing docs/ / examples/ allowlist entries and re-add news/3773.feature.md. * refactor(release): use per-version files instead of CHANGELOG.md Replace the towncrier-on-CHANGELOG.md flow with per-version output files at docs/release_notes/<version>.md, matching the existing historical convention and dropping the awk extraction step from the release workflow. Towncrier doesn't support per-version filenames in [tool.towncrier] config, so the maintainer now runs scripts/release/render-notes.sh <version> at release prep time. The wrapper: 1. Calls `towncrier build --draft --version <X.Y.Z>` to render fragments to stdout (no file mutation). 2. Captures the output into docs/release_notes/<X.Y.Z>.md. 3. `git rm`s the consumed fragments (deletion staged for commit). 4. Stages the new release-notes file. Workflow changes: - Sparse-checkout reverts from docs/CHANGELOG.md to docs/release_notes - Body composition replaces awk section extraction with `cat docs/release_notes/${RELEASE_VERSION}.md` — simpler, matches the layout of historical pre-towncrier release notes (1.6.0.md etc.). pyproject.toml changes: - filename now points to docs/release_notes/_pending.md as a guarded placeholder. Only sees writes if a maintainer bypasses the wrapper script — clearly named so the mistake is recoverable. - title_format=false suppresses the inline `## <version> (<date>)` header. The release page already shows the version as title, and per-version files don't need an inline version header either. * fix(hooks): align staged-notice text with per-version-file flow The previous commit refactored to per-version files, but the pre-commit hook still pointed contributors at the old docs/CHANGELOG.md target. Update the staged-notice text to reference docs/release_notes/<version>.md and the wrapper script that produces it. * fix(release): correct stale CHANGELOG.md comment and avoid orphan target file Two follow-ups from review of the towncrier migration: - pyproject.toml: the [tool.towncrier] block comment still described the old `pdm run towncrier build --version <X.Y.Z> --yes` → docs/CHANGELOG.md flow that was abandoned in `1120cb8` in favor of per-version files via scripts/release/render-notes.sh. A maintainer reading only pyproject.toml would have followed wrong instructions. - render-notes.sh: `pdm run towncrier build --draft > "$TARGET"` truncates $TARGET before towncrier runs. If towncrier exits non-zero (set -e kills the script), the zero-byte file persists and the next attempt hits the line-35 overwrite guard. Render to a temp file first and `mv` on success — bash trap cleans up on failure. * docs(release): document news-fragment flow for contributors and maintainers The towncrier migration shipped without updating the surfaces that contributors and maintainers actually read: - .pre-commit-config.yaml: hook description still pointed at docs/release_notes/, not news/. - CONTRIBUTING.md: PR process never mentioned news fragments at all, so contributors only learned about them via the pre-commit nudge or by stumbling on news/README.md. - docs/RELEASE_GUIDE.md: the maintainer release flow listed only "bump version → merge", with no step for running scripts/release/render-notes.sh. Following the old checklist literally would let news/ fragments accumulate forever and never appear in releases (the workflow tolerates a missing per-version file but logs a warning and the hand-written narrative is lost). Add a contributor-facing item to the PR checklist, a maintainer-facing "Render news fragments" step before the version bump, and a dedicated "Release-notes flow" section that explains the contributor + maintainer sides, the script's guarantees, and how to preview locally. Also clarify in the "How Releases Work" overview that the release body is composed from three sources (AI TL;DR + per-version notes + auto PR list), not just an auto-generated changelog. * docs(release): clarify the version-bump trigger for automated releases The previous wording ("Releases are fully automated when PRs are merged to the main branch" / "Trigger: Any merge to main branch") was misleading — it implied every merge cuts a release. The actual trigger is more specific: the `version-check` job reads `__version__.py` and only proceeds when the resulting tag does not yet exist as a GitHub release. So in practice "merge a PR that bumps __version__.py" is what triggers an end-to-end release; non-bump PRs merge normally and short-circuit the pipeline. Also flesh out the "No duplicates" line to name the actual mechanism (`should_release=false` skipping every downstream gate) so a maintainer reading the doc can map it to the workflow code. * refactor(release): use towncrier native per-version output, rename news/ to changelog.d/ Three concerns rolled into one commit because they form a single coherent simplification: 1) Drop the wrapper script (scripts/release/render-notes.sh). Towncrier 24.x supports per-version-file output natively via `single_file = false` + a `{version}`-templated `filename`. The wrapper's `--draft > target` trick was only needed when `filename` had to be a fixed path. With the templated filename, the maintainer runs `pdm run towncrier build --version <X.Y.Z> --yes` directly: towncrier writes docs/release_notes/<X.Y.Z>.md, `git rm`s the consumed fragments, and `git add`s the new file — same end state as the wrapper, fewer moving parts, no bash, no extglob hazard, no temp-file dance, no mktemp portability concerns. The wrapper's two guards (refuse-overwrite, refuse-empty) guarded against unlikely operational mistakes that are easy to spot in `git status`; the simplification is worth the small loss. 2) Rename news/ → changelog.d/. The product has its own `news` feature (news.html, news-subscriptions, /news/api routes), so a top-level `news/` for release-engineering plumbing is genuinely confusing — code search and contributor onboarding mix the two concepts. `changelog.d/` is the de-facto Python community standard (attrs, hypothesis, Sentry, pyca/cryptography, structlog all use it). The .d/ suffix signals "directory of fragments that get assembled" — a long-standing Unix convention. Renaming now is cheap (one fragment); renaming later compounds. 3) Pre-commit hook tightening (from external code review): - Dedupe categories: hook now reads [[tool.towncrier.type]] from pyproject.toml at runtime via tomllib (3.11+ stdlib), so adding a category in pyproject.toml is automatically picked up. Falls back to the canonical six on parse failure (non-blocking hook). - Gate ANSI color escapes on sys.stdout.isatty() so CI logs and non-VT Windows terminals don't render `\033[36m` as visible garbage. Workflow comments, .gitignore allowlist, CONTRIBUTING.md, RELEASE_GUIDE.md, and changelog.d/README.md all updated in lock-step.	2026-05-02 11:20:06 +02:00
LearningCircuit	982d36fb96	fix(release): bump AI summary timeout + diagnose empty content (#3783 ) * fix(release): bump AI summary timeout + diagnose empty content The v1.6.8 release run hit two related failures in the AI TL;DR step: curl: (28) Operation timed out after 120001 milliseconds WARNING: AI response 2xx but no .choices[0].message.content — skipping summary The 120s --max-time was too tight for kimi-k2-thinking (and likely other thinking models) on a multi-PR release prompt. The retry succeeded HTTP-wise but returned a response without any extractable content, so the release shipped without a TL;DR. Three changes: 1. Default --max-time from 120s to 300s, configurable via vars.AI_RELEASE_SUMMARY_MAX_TIME. Releases never fail because of the AI step (the whole block is best-effort), but giving thinking models five minutes is realistic. 2. Fall back to `.choices[0].message.reasoning_content` when `.content` is empty. Some providers route thinking-model output into the reasoning field. Cheap try-and-fall-back; no harm if the field is missing. 3. When BOTH fields are empty, dump the response shape (top-level keys, message keys, finish_reason, error field) to step logs. Bounded to ~4 lines, but enough to debug the next failure without rerunning. Behavior unchanged when the call succeeds normally. * fix(release): bump AI summary timeout default to 15 min 900s (15 min) covers thinking models on multi-PR release prompts with comfortable headroom. Still configurable via vars.AI_RELEASE_SUMMARY_MAX_TIME.	2026-05-02 01:38:24 +02:00
LearningCircuit	0f0707abea	feat(release): prepend docs/release_notes/<version>.md to release body (#3768 ) * feat(release): prepend docs/release_notes/<version>.md to GitHub release body When a release is cut, look for docs/release_notes/${RELEASE_VERSION}.md and prepend its prose to the GitHub release body. The auto-generated, label-categorized PR list (from .github/release.yml + GitHub's generate-notes API) is appended below a horizontal-rule + "## What's Changed" heading. If the md file is missing, the workflow falls back silently to auto-notes only — no failure. The pre-commit hook recommend-release-notes.py already nudges contributors to stage entries under docs/release_notes/ for substantial changes, so this wires the end-to-end flow: contributor writes prose → release publishes prose-first body. * fix(release): address review findings on notes-prepend logic Three fixes from the review pass: 1. Drop the manual "## What's Changed" heading. GitHub's generate-notes API already emits it as the first line of the auto-body (verified against v1.6.0). Manually inserting another produced a duplicate heading in the published release. 2. Validate RELEASE_VERSION against a strict semver regex before using it in a filesystem path. Defense-in-depth — RELEASE_VERSION already comes from a Git refname or __version__.py, but neither path validates strictly enough to rule out path traversal. 3. Wrap the gh api generate-notes call so its failure aborts the step. `set -e` does NOT exit on a failing command substitution inside an assignment — without this the workflow would silently publish an empty/partial release body on a transient API error. * feat(hooks): show release-notes staged notice with format tips Two changes to recommend-release-notes.py: 1. Always inform the committer when a file under docs/release_notes/ is staged (was: silent). Notes contributors that the file ends up in the GitHub release body via .github/workflows/release.yml — not just archived as docs. 2. Embed format tips in both the staged notice and the missing-notes reminder: no leading `# H1` (release title renders separately), use `## sections`, mark BREAKING explicitly with an `### Impact` subsection, link PRs, and strip staging markers before tagging. * feat(release): prepend AI-generated TL;DR to release body Adds a best-effort AI summary at the very top of the release body, above the hand-written notes and the auto-generated PR list. Uses OpenRouter with vars.AI_MODEL (same convention as ai-code-reviewer.yml, default moonshotai/kimi-k2-thinking). Behavior: - Builds a prompt from the hand-written notes (if any) plus the auto-generated PR list, asking the model for a 30-second TL;DR starting with `## TL;DR`. - Up to 2 attempts (1 retry) with a 5s backoff to absorb transient API hiccups. - If OPENROUTER_API_KEY is unset, the call fails, or the response is unparseable, the step skips the AI section silently. Releases must never fail because of an LLM hiccup. Body order on the published release: 1. AI TL;DR (if generated) 2. Hand-written docs/release_notes/<version>.md (if present) 3. Horizontal rule (only if 1 or 2 was emitted) 4. Auto-generated `## What's Changed` PR list Configuration knobs (all optional, all have sensible defaults): - secrets.OPENROUTER_API_KEY — enables the AI step - vars.AI_MODEL — model id (default kimi-k2-thinking) - vars.AI_RELEASE_SUMMARY_TEMPERATURE (default 0.3) - vars.AI_RELEASE_SUMMARY_MAX_TOKENS (default 4000) * feat(hooks): warn when staged release-notes contain staging markers When a file under docs/release_notes/ is staged, scan its content (via git show :path) for in-progress staging language and print a yellow warning listing each hit by line number. Non-blocking — the hook still always exits 0. Markers checked: - "(pending)" - "Staging notes" - "Fold into the next tagged version" These would otherwise publish verbatim into the GitHub release body, since release.yml prepends the file as-is. The warning lets a contributor catch leftover staging text before they push the release commit. * fix(release): clarify semver-regex comment + remove prompt heading contradiction Two follow-ups from the latest AI review: 1. The semver-regex comment was misleading. Updated to spell out that RELEASE_VERSION is the bare semver (no `v` prefix) because the build job's "Determine version" step strips `refs/tags/v` and reads __version__.py (bare "1.6.7"), so a leading `v` reaching this check means an upstream contract change and should hard-fail. Behavior of the regex itself is unchanged — it correctly rejects `v1.7.0`. 2. The SUMMARY_PROMPT had a self-contradiction: "Open with the literal heading `## TL;DR`" (H2) vs "no headings above level 3" (which would forbid H2). Reworded to be explicit: use `## TL;DR`, `###` for subsections, no `#` and nothing deeper than `###`.	2026-05-01 21:55:05 +02:00
LearningCircuit	89aa0228b8	chore(labels): update GitHub labels for release automation clarity (#3764 ) - Remove emoji variant `maintenance 🔧` from workflows and release.yml in favor of plain `maintenance` label - Replace deleted `automated`/`version-bump` labels in version_check.yml with `automation`/`maintenance` - Add 7 previously-uncategorized labels to release.yml categories: css, ui-ux, accessibility, snappy → Frontend Changes dev-bugfix → Bug Fixes dev-enhancement → New Features developer-experience → Code Quality & Refactoring - Update workflows README documentation Accompanied by GitHub label operations (via gh CLI, not in this commit): - Created 4 missing labels: bugfix, docs, CI/CD, ai-review-requested - Deleted 17 junk/duplicate labels - Updated descriptions for all 72 labels with release note priority info	2026-05-01 19:19:11 +02:00
LearningCircuit	1abd8b9138	fix(ci): use pdm lock instead of pdm update in dependency workflow (#3755 ) The workflow used `pdm update -u --no-sync --no-self` which caused `pdm lock --check` to fail in pre-commit CI. The `-u` (--unconstrained) flag modifies pyproject.toml with relaxed version constraints, but the workflow only commits pdm.lock — discarding those pyproject.toml changes. When `pdm lock --check` re-resolves against the original pyproject.toml, it produces a different lockfile, causing the check to fail. Using `pdm lock` fixes this because it uses the same resolution code path as `pdm lock --check`, respects pyproject.toml constraints, and never modifies pyproject.toml.	2026-05-01 12:40:04 +02:00
LearningCircuit	ff1cda3e7c	fix(ci): give gh CLI repo context in monitor-publish (#3742 ) monitor-publish has no checkout step, so the gh CLI cannot infer the repository — and it does not fall back to GITHUB_REPOSITORY. Every `gh run list` call therefore failed with "failed to determine base repo", the error was swallowed by `2>/dev/null`, and the polling loop saw an empty result for all 40 minutes. Result: every release produced a false "Partial publish failure" issue showing both Docker and PyPI as `timed_out`, even though both publishes succeeded. Set GH_REPO from github.repository, and stop hiding gh's stderr so future failures are visible in the runner log instead of silent.	2026-04-30 01:18:59 +02:00
LearningCircuit	5a3705c7d2	ci: temporarily disable nuclei DAST scan from release gate (#3720 ) Nuclei scan is causing significant slowdowns. Commenting it out of the release gate so the current release can proceed. The scan workflow file (.github/workflows/nuclei.yml) is left intact for easy re-enablement.	2026-04-28 20:38:22 +02:00
LearningCircuit	37335c907d	ci(playwright-webkit): drop checks: write to satisfy Scorecard (#3704 ) Removes the `checks: write` job-level permission from both the desktop-safari and mobile-safari jobs in playwright-webkit-tests.yml. The permission was only needed by the EnricoMi/publish-unit-test-result-action "Publish Test Results" step in each job, which is also removed. Test results remain available via the "Upload Playwright Report" artifact step (already uploads test-results/results.xml). Failing tests still fail the job in the "Run ... Safari Tests" step, so the release-pipeline gate is unchanged. Closes Scorecard alerts #7715, #7716.	2026-04-28 01:21:58 +02:00

1 2 3 4 5 ...

1252 Commits