Commit Graph

1252 Commits

Author SHA1 Message Date
LearningCircuit
da0d18ed25 fix(release): set towncrier name to skip package import (#4071)
The release job uses a sparse checkout that omits src/ and runs a
standalone `pip install towncrier`. Towncrier 24.8 still calls
`get_project_name()` even when --version is passed on the CLI,
and the existing [tool.towncrier] config pointed at the
`local_deep_research` package, so the build crashed with
ModuleNotFoundError before rendering any fragments.

Set `name = "local-deep-research"` so towncrier short-circuits the
import path (build.py:195-197). Drop the now-misleading
`package`/`package_dir` fields — `--version` is always passed,
`directory = "changelog.d"` is explicit, and nothing else inside
towncrier still needs them. Fix the workflow comment that
misattributed the bypass to --version.

Verified by rendering changelog.d/*.md fragments against this
pyproject.toml in a fresh directory with no src/ present.
2026-05-17 02:30:51 +02:00
LearningCircuit
5d60f3d00e chore(labels): add 'code-ready' as a human-only signal label (#4068)
Introduces a new repository label, ``code-ready``, that communicates a
human reviewer's judgement that a PR's code changes look technically
ready — i.e. the implementation, tests, docs and review nits are all
addressed — while CI and an approving codeowner review may still be
outstanding. The label is meant to bridge the gap between "needs
review" and "auto-merge": a maintainer can apply it after walking the
diff to signal that the code side is good, even though merge is still
blocked on CI runs finishing or an approver clicking the button.

Critically, this label must be **applied manually only**, never by
automation. The motivation is judgement, not heuristics — a workflow
that flips it based on "all CI green" or "no unresolved comments"
would dilute the signal and undermine the human-in-the-loop intent.
The labels.yml entry is grouped under a new "Human-only signal
labels" section with an explicit comment saying so, and the label
description itself includes "Apply manually — never auto-applied" so
the rule is visible everywhere the label surface.

Verified before adding:
* No existing workflow (``pr-triage.yml``, ``label-fixed-in-dev.yml``,
  ``advanced-search-reminder.yml``, ``sync-main-to-dev.yml``,
  ``danger-zone-alert.yml``, ``compose-published-smoke.yml``) applies
  ``code-ready``. Each workflow's ``addLabels(...)`` calls use a
  closed set of specific label names — no heuristic ever resolves to
  ``code-ready``.
* No naming collision with existing labels (``code-ready`` is new;
  ``auto-merge``, ``needs-codeowner-review``, ``awaiting-codeowner``
  are distinct concepts).
* Label created live on GitHub via ``gh label create`` before this
  commit; this PR brings ``labels.yml`` into source-of-truth sync.

Color: ``006b75`` (teal) — distinct from the existing yellow/green
review-state palette so it reads as a separate axis from the
codeowner-review lifecycle.
2026-05-16 14:18:09 +02:00
LearningCircuit
8597e429cc Improve UI tests + CI: artifact uploads, WebKit skip narrowing, settle-wait migrations (#4061)
* ci(responsive): restore artifact uploads and fix dead post-results gate

The Responsive UI workflow lost its per-viewport artifact uploads (the
explanatory comment around lines 206-209), so PR/release failures were
un-debuggable - no screenshots, no test output. The downstream
`post-results` job was also gated on `github.event_name == 'pull_request'`,
which can never be true because the workflow has no `pull_request` trigger;
the combined-report aggregator therefore never ran.

Restore the upload step using `if: always()` + `if-no-files-found: ignore`
(so server-startup failures still upload logs and quiet runs don't fail
the step) and rewrite the `post-results` gate to `if: always()`. Artifact
name matches the existing `ui-test-results-*` pattern expected by the
combined-report glob.

* test(playwright): narrow WebKit closed-context skip to webkit only (#4060)

The catch at all-pages-mobile.spec.js:372 was previously calling
`test.skip(true, ...)`, which skipped the test for every browser - so any
non-WebKit error path also silently bailed out of the mobile-nav overlap
assertion. Only Mobile Safari / WebKit is known to hit the
`Target page, context or browser has been closed` race, so gate the skip
on `browserName === 'webkit'`. Other browsers now re-throw and surface the
regression.

Also broaden the matched error message to include
`Execution context was destroyed`, the alternate wording the same upstream
race uses in newer Playwright versions.

Skip annotation references issue #4060 so the skip is grep-able and can be
removed when the underlying race is fixed or the DOM walk is restructured.

* test(ui): add waitForStable helper to auth_helper.js

Replaces ad-hoc `await delay(N)` sleeps used to "let the UI settle" after
an action. The helper waits for a selector to be visible, then waits for
its bounding box to stop changing across requestAnimationFrame ticks
(bounded to 3s in-page). The final `idleMs` pause is configurable.

JSDoc explicitly notes when NOT to use it: don't replace `delay()` calls
that exercise wall-clock behavior (e.g. a 10s timer the app is supposed to
respect). Those tests need real elapsed time, not a settle wait.

Exported as a sibling of `safeClick` to keep Puppeteer test imports tidy.

* test(ui): replace settle-delays with state-based waits in two puppeteer tests

`test_research_cancellation.js` had 7 hardcoded `await delay(...)` calls
and `test_form_validation_aria_ci.js` had 19. The vast majority were
"give the UI a moment to settle" pauses with no real signal attached, so
they slowed CI and quietly hid races whenever the runner was a beat slower
than the chosen delay.

For each call:
- post-`navigateTo` 500ms sleeps -> `waitForSelector('#query', { visible: true })`
- post-validation-trigger sleeps -> `waitForFunction` polling the
  `ldr-field-invalid` class to appear (or clear, when the test expects
  validation to pass)
- post-focus 100ms -> `waitForFunction(() => document.activeElement?.id === 'query')`
- post-cancel-click sleeps -> `waitForFunction` polling for `cancel|stop|suspend`
  to appear in the status text
- post-typing 200ms -> `waitForFunction` polling for the typed value to land

The one delay we kept: the explicit 10-second wait in the mid-stage
cancellation test (`test_research_cancellation.js`), which deliberately
exercises elapsed-time behavior of the research progress flow. That is
not a settle wait and must stay wall-clock.

Polling waits all use `.catch(() => {})` to preserve existing
behavior when a selector or state never appears (the assertions further
down handle the failure case more informatively than a hung wait would).

* docs(pr-template): document label-gated CI workflows

Several heavy E2E workflows are label-gated and silently no-op on PRs
without the right label - new contributors had no way to know. Add a "CI
test coverage" section to the PR template enumerating each gated workflow
and the label that triggers it.

No CI behavior change; documentation only.

* test(form-validation): make waitForQueryReady detect validator attachment

Local smoke-test (9 tests, ran against `scripts/dev/restart_server.sh`)
exposed two latent races that the prior `await delay(500)` had been
quietly hiding:

1. `waitForQueryReady` returned as soon as `#query` was visible, but the
   FormValidator class is registered against the field a tick later
   (research.js setupEventListeners). Waiting for the `.ldr-field-error`
   sibling that addValidation() inserts is the actual signal that the
   validator is wired and the submit handler will take the early-return
   path on an empty query.

2. `noLoadingUiOnEmptySubmit` ran after `errorClearsOnValidSubmit`, which
   typed a real query and triggered a real submit (the fetch fails but
   creates `.ldr-loading-overlay` first). `navigateTo` skipped the
   re-navigation because we were already on `/`, so the stale overlay
   carried over. Force a real `page.goto` for this test so it asserts
   about a fresh page, not the leftover state of the previous test.

After the fix the suite passes 9/9 in ~1s (vs ~4.5s with the old delays).

* chore(labels): rewrite test-trigger label descriptions for AI reviewer auto-apply

The Friendly AI code reviewer (.github/workflows/ai-code-reviewer.yml)
auto-applies labels based on the labels' descriptions in the repo. The
existing test:puppeteer / test:e2e / ldr_research / ldr_research_static
descriptions were passive ("Triggers Puppeteer E2E tests on this PR"),
which doesn't guide the reviewer on *when* to apply them.

Rewrite them in the same imperative, bias-toward-action style used by
benchmark-needed ("Apply if a change risks degrading performance — when
in doubt, add it. Run compare_configurations()"):

- test:puppeteer + test:e2e — apply for any PR touching the web stack
- ldr_research / ldr_research_static — apply for substantive code/arch
  changes, with the static variant biased even more toward "run it"
  since it uses the cheaper model

Also add the test:* labels to labels.yml so they become version-controlled
(previously they existed only on GitHub, created out-of-band). label-sync
is additive and will overwrite the GitHub descriptions on next main push.
2026-05-16 13:17:28 +02:00
LearningCircuit
1ab65609db ci(release): drop credential persistence on cleanup-changelog checkout (#4050)
The `Checkout the release commit` step in the `cleanup-changelog` job
defaulted to `persist-credentials: true`, leaving the job's GITHUB_TOKEN
in `.git/config` for the duration of the run. If any later step in this
job reads `.git/config` (artifact upload, third-party action that
prints/dumps the repo state, etc.), the token leaks. Closes the only
open `zizmor/artipacked` finding (code-scanning alert #4655).

No functional impact: the only step that needs to push is
`peter-evans/create-pull-request`, which already takes an explicit
`token:` input and does not rely on the persisted git credential helper.

Also dismissed code-scanning alert #7763 (CVE-2026-3298) via the GitHub
API — that CVE is Windows-only per PSF advisory; this image is Linux,
which Grype's package-version matcher does not account for. Alert #7764
(CVE-2026-7210) is left open as a tracking signal until Python 3.14.6
ships upstream (current latest is 3.14.5; no patched image exists yet).
2026-05-15 01:20:17 +02:00
LearningCircuit
a2f7f6ead6 fix(ci): drop environment: ci from reusable workflow (#4049)
The `environment: ci` declaration on the research job has no functional
value for LDR — the `ci` Environment has zero protection rules and zero
environment-scoped secrets (verified via gh api). All required secrets
(OPENROUTER_API_KEY, SERPER_API_KEY) are repo-level.

The decorative env attachment becomes a problem for any external repo
that calls this reusable workflow: GitHub silently auto-creates an empty
`ci` Environment in the caller's repo, polluting their environments
namespace.

Dynamic environment via expression (e.g. `environment: ${{ inputs.env || '' }}`)
isn't a viable alternative — `actions/runner` Issue #2610 documents that
expression-in-environment doesn't reliably evaluate input context, and an
empty-string value still auto-creates an empty-named environment.

Simplest correct fix is to delete the line. LDR's own callers
(issue-research.yml, e2e-research-test.yml) keep working unchanged
because they never depended on env-attached functionality. External
callers no longer get the env-pollution side effect.

This unblocks a follow-up `ldr-automations` toolkit repo that will
expose meta-reusable workflows wrapping this one for other projects.
2026-05-15 01:11:15 +02:00
LearningCircuit
a6287a4362 fix(security): pin towncrier to exact version and bump Python to 3.14.5 (#4046)
* fix(security): resolve Scorecard pin alerts and bump Python to 3.14.5

- Pin `pip install towncrier` to a single version with `--hash` (both
  occurrences in release.yml), resolving Scorecard Pinned-Dependencies
  alerts #7761 and #7762.
- Bump the Dockerfile base image from python:3.14.4-slim to 3.14.5-slim
  (with new pinned manifest digest). 3.14.5 bundles libexpat 2.8.0
  (gh-149017), which is required to mitigate CVE-2026-7210 — Grype
  alert #7760.

* chore(release): drop hash-pins on towncrier, keep exact version pin

Per review feedback: hash-pinning a build-time CLI like towncrier adds
maintenance burden without meaningful supply-chain benefit. The rest of
this repo already uses exact-version pins (`pdm==2.26.2`, `pyyaml==6.0.3`,
etc.) which Scorecard's PinnedDependenciesID rule accepts — the original
alerts fired only because `~=24.8` is a fuzzy version range.
2026-05-14 17:24:19 +02:00
LearningCircuit
074285a26d fix(release): enrich AI release notes + render changelog in release flow (#4035)
* fix(release): enrich AI release notes + render changelog in release flow

Fixes the v1.6.10 release notes degradation where:
  1. docs/release_notes/1.6.10.md was never created (no automation rendered
     changelog.d/ fragments before/at release time)
  2. AI summary call returned 2xx but empty content with finish_reason=length

create-release job now:
  - Sparse-checks-out changelog.d/ + pyproject.toml, installs towncrier
    (no PDM needed — towncrier reads pyproject directly), renders
    docs/release_notes/<version>.md before composing the release body.
    Guards against an empty fragment directory.
  - Fetches every merged PR's title + body in a single GraphQL round-trip
    and feeds them to the model.
  - Fetches the full diff between the previous /releases/latest tag and
    the new tag via the compare API, filters lockfiles/generated docs/
    SBOM/static assets/binary patches, caps at 700k chars, strips NUL
    bytes before jq --rawfile.
  - Bumps AI_MAX_TOKENS default 4000 -> 64000 (matches the AI code
    reviewer's working budget). Adds AI_REASONING_MAX_TOKENS=16000 so
    Kimi K2 Thinking cannot burn the entire output budget on reasoning
    tokens — the root cause of v1.6.10's empty .content.
  - Adds .reasoning to the response-parsing fallback chain after
    .content and .reasoning_content. OpenRouter normalizes Moonshot's
    thinking trace to .reasoning (not .reasoning_content), which is why
    v1.6.10's diagnostic showed message keys "content, reasoning,
    reasoning_details" with no usable extraction path.
  - Enforces a 750k char overall prompt cap so PR descriptions + diff
    can't blow Kimi's 262k token context window.
  - Truncates the final release body to 124,400 chars to stay under
    GitHub's documented 125k release-body limit (HTTP 422 otherwise;
    gh CLI does not pre-validate).
  - Rewrites the SUMMARY_PROMPT to ask for a helpful narrative (not a
    TL;DR), with length sized to the material.

New cleanup-changelog job opens a PR on main with the consumed fragments
+ rendered release-notes file, since the create-release runner is
throwaway. Branch protection on main allows the PR (0 required reviews,
0 required checks).

* chore(release): persist 1.6.10 changelog render + clear consumed fragments

The v1.6.10 release shipped without docs/release_notes/1.6.10.md because
no automation rendered changelog.d/ fragments at release time (see
release.yml change in this PR for the fix going forward). Persists the
render now so 1.6.11's release does not re-consume the same fragments.

Renders the v1.6.10 release_notes file from the 30 fragments that were
in changelog.d/ at v1.6.10 cut time, and removes those fragments from
changelog.d/. The rendered content also backs the v1.6.10 GitHub
release body update.

* fix(release): address AI review findings (UTF-8, race, GraphQL cap)

- UTF-8 character-aware truncation. Replace `head -c` (byte-oriented,
  splits multi-byte UTF-8 mid-sequence) with Python-based character
  truncation for the diff (700k), prompt (750k), and release body
  (124,400) caps. Matters because towncrier renders emoji section
  headers (💥/🔒//🐛) that appear in diffs of docs/release_notes/;
  mid-emoji splits produce invalid UTF-8 that jq --rawfile then
  refuses to encode and the GitHub Release API rejects with HTTP 422.

- cleanup-changelog race fix. Pin checkout to ${{ github.sha }}
  instead of `ref: main`. If a PR with new fragments merged into main
  between create-release and cleanup-changelog, `ref: main` would
  consume those new fragments into THIS release's docs/release_notes
  file and delete them prematurely — stealing them from the next
  release. github.sha is the commit the workflow ran against, so the
  set of fragments matches what create-release rendered.

- GraphQL query node-count cap. Limit PR-description batch to 100 PRs
  per query and log a warning if a release exceeds that (LDR's typical
  release is ~20-30 PRs, well under). Unbounded fan-out could trip
  GitHub's GraphQL complexity ceiling on a huge release.

- Compare API 300-file warning. Log when .files[] hits the 300-file
  boundary so a future release's missing-file diff can be diagnosed
  quickly without rerunning. The cap is a documented GitHub limit.

* fix(release): address review2 — PR cap, trap leak, base pin, prompt clarity

- Raise PR-fetch cap 100 → 200. v1.6.10 had 144 unique PRs (LDR's
  dependency-bump traffic is heavy); the previous 100 cap would have
  silently dropped ~30% of PR descriptions from the AI prompt. The
  750k-char overall prompt cap still protects context window.

- Hoist COMPARE_JSON mktemp above the trap registration so the temp
  file is cleaned up even if jq throws under set -e between mktemp
  and the manual rm. ${DIFF_FILE}.clean (the NUL-strip staging path)
  also added to the trap; rm -f tolerates the missing-file case.

- Pin base: main on peter-evans/create-pull-request. On tag-triggered
  runs github.sha may not sit on main HEAD, and the action's
  default-branch resolution could pick a non-main base. We always
  want the cleanup PR to target main.

- Clarify SUMMARY_PROMPT section markers. The prior text said inputs
  are "separated by `----- SECTION -----` markers" using SECTION as a
  placeholder; a literal-minded model could look for that exact
  string and find none. Now lists the actual marker forms explicitly.

- Add PREV_TAG == RELEASE_TAG guard. On a workflow re-run after the
  release exists, /releases/latest returns the just-created tag,
  making the diff empty. Falls back to the second-most-recent stable
  release.

* fix(release): jq --arg for re-run guard + surface jq errors + doc updates

Workflow fixes from a final pass:

- Re-run guard now passes RELEASE_TAG to jq via `--arg rel` instead of
  shell-interpolating it into the program text. RELEASE_TAG is already
  validated as bare semver upstream so this is defense-in-depth, but
  --arg keeps shell quoting and jq quoting fully separated regardless
  of what RELEASE_TAG ever ends up containing.

- Compare-API jq pipeline no longer swallows stderr or masks the exit
  code. Previously `jq ... 2>/dev/null || true` would silently produce
  an empty diff and a "Diff size: 0 bytes" log line on any jq failure,
  giving a maintainer no actionable signal. Now an explicit if-not
  check logs a WARNING with jq's stderr intact and ensures the diff
  file is empty.

Doc updates for the new release flow:

- changelog.d/README.md: drop the obsolete "maintainer runs `pdm run
  towncrier build`" instructions; describe the automated render +
  follow-up cleanup PR. Keep the local --draft / --keep preview tips
  for fragment iteration.

- docs/RELEASE_GUIDE.md: rewrite the maintainer flow (steps 1-3 of the
  old "Render + bump + commit both" sequence are obsolete — the
  workflow handles rendering now). Add the cleanup PR merge as a final
  checklist item. Update the body composition description from "AI
  TL;DR" to AI narrative with diff + PR-body inputs.

* style(release): fix comment indent typo from prior edit
2026-05-14 10:17:31 +02:00
LearningCircuit
96e6548553 fix(ci): grant research job the perms its reusable needs (#3987 follow-up) (#4016)
Every run of e2e-research-test.yml and issue-research.yml since the
refactor has terminated as startup_failure with zero jobs, because the
calling `research` job had no `permissions:` block. The reusable's
`research` job declares `permissions: contents: read`, but reusable
permissions can only be the same or lower than the caller's — and the
caller's empty `{}` inheritance meant the reusable's request exceeded
what was granted, so GitHub refused to load the workflow.

Add an explicit permissions block to the calling `research` job in
both workflows:
  - contents: read   (for actions/checkout in the reusable)
  - actions: write   (for actions/upload-artifact@v5+ which now
                      requires this scope to upload artifacts)

The user-visible symptom was: `gh issue` with the `ldr_research`
label did nothing — the workflow ran for ~1 second, failed at
startup, produced no comment. Same for PR labels post-merge.

Tested locally with actionlint and zizmor — both clean. Real
verification needs a labeled PR/issue after merge.
2026-05-11 23:52:38 +02:00
LearningCircuit
fa88bb908f ci(prerelease-docker): publish floating :prerelease tag for each RC (#4005)
The workflow now re-points :prerelease at every new RC manifest in
addition to publishing the versioned prerelease-vX.Y.Z-<sha> tag.
Testers can pin compose to :prerelease and `docker compose pull` to
fetch the latest RC without manually bumping the tag each cycle.

Versioned tags remain available for reproducible testing.
2026-05-11 19:33:06 +02:00
dependabot[bot]
ee0ad19256 chore(deps): bump google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml (#4009)
Bumps [google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml](https://github.com/google/osv-scanner-action) from 2.3.5 to 2.3.8.
- [Release notes](https://github.com/google/osv-scanner-action/releases)
- [Commits](c518547040...9a49870895)

---
updated-dependencies:
- dependency-name: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml
  dependency-version: 2.3.8
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-11 17:46:10 +02:00
LearningCircuit
9755a900eb ci(research): extract reusable LDR-research workflow + add issue-trigger caller (#3987)
* ci(research): extract reusable LDR-research workflow + add issue-trigger caller

Three triggers will end up calling the same install-and-run-LDR
plumbing (PR diff today, issue body now, Reddit posts later). Factor
out the middle of the workflow into a reusable workflow so we don't
have to maintain the same logic in three places, and add the
issue-trigger caller on top of it.

Changes:

- .github/workflows/ldr-research-reusable.yml (new) — workflow_call
  workflow that takes a fully-assembled query and returns a
  comment-ready markdown blob via artifact. Inputs include
  forward-compat knobs the future Reddit caller will need
  (max-query-length, max-sources, comment-footer override,
  include-sources-section, output-truncate-chars).

- .github/workflows/e2e-research-test.yml — refactored from a single
  job to three jobs (build-query → research-via-reusable →
  post-comment). Behaviour is preserved: same headers, same footer,
  same diff truncation at MAX_DIFF_SIZE, same label-removal on
  completion.

- .github/workflows/issue-research.yml (new) — triggers on
  `issues: types: [labeled]` gated by the same `ldr_research` label
  the PR workflow uses (GitHub event-type gating means they don't
  conflict). Output has two sections: "For the reporter" (cautious
  framing) and "For maintainers" (raw research context). Issue body
  is sanitized (control-char strip, 4000-char truncation) and never
  reaches a shell.

- scripts/ldr-research.py — renamed from ldr-diff-research.py
  (`git mv`, history preserved). Drops --mode, --static-query,
  --max-diff-size: query now comes from stdin only and the caller
  workflow does prompt assembly. Output JSON shape: {research,
  sources, findings, iterations}.

- .github/labels.yml — register ldr_research and ldr_research_static
  so they exist canonically rather than via on-the-fly creation.

Reddit research is a follow-up PR; this PR ships the abstraction
shape it will need.

* docs(ci): regenerate workflow status dashboard for new LDR workflows

The check-structure CI gate requires every workflow file to have a row
in docs/ci/workflow-status.md. Regenerate to add rows for the two new
workflows added in this PR. The live-status flips on unrelated rows
(gitleaks, ossf-scorecard, responsive-ui-tests-enhanced, osv-scanner)
are accurate snapshots of current status — the auto-regen workflow
keeps them fresh on its own schedule.

* ci(research): address review feedback — label cleanup, delimiter, artifact

Three small follow-ups from the AI review on this PR:

1. Label cleanup on build-query failure. The post-comment job had
   `if: always() && needs.research.result != 'skipped'`, which meant
   that if build-query failed, research was skipped and the entire
   post-comment job (including the label-removal step) was skipped
   too — leaving a stuck `ldr_research` label on the PR/issue.
   Switch to `if: always()`; the download and post steps already
   self-guard with `needs.research.outputs.success == 'true'`, so
   only the label-removal step runs in the failure path.

2. Randomized GHA output delimiter. `__LDR_QUERY_EOF__` was a fixed
   string; a query containing that exact line could prematurely
   terminate the multi-line output. Use $$/$RANDOM/nanosecond as the
   delimiter base. Defense-in-depth — collision was already
   astronomically unlikely.

3. Optional `artifact-suffix` input on the reusable workflow. Until
   now the artifact name was
   `ldr-research-{run_id}-{run_attempt}-{github.job}`, which
   collides if a caller invokes the reusable multiple times in one
   run. The Reddit follow-up will use a matrix call, so add a
   caller-provided suffix now and sanitize it to artifact-safe
   chars. Existing callers don't pass it; default empty preserves
   today's name.

* ci(research): fix per-line truncation in reusable workflow

Two follow-ups from the second review pass:

1. The awk-based backstop truncation in `Write query to file` was
   per-line (operating on $0 / length($0)), not total. A long
   multi-line query with many short lines would silently bypass the
   max-query-length cap. Swap for a wc -c + head -c approach that
   truncates total bytes. Verified locally that a 114-byte
   multi-line input with all-short-lines is now correctly truncated
   to ~100 bytes.

2. Remove the unused EXIT_CODE capture in `Run LDR Research`. The
   step relies on JSON validation for error detection; capturing
   $? without using it was just dead code inherited from the
   original workflow.
2026-05-11 00:44:16 +02:00
LearningCircuit
c6dfc6dc8e ci(workflows): build Vite frontend bundle before UI tests (#3989)
The responsive-ui-tests-enhanced and puppeteer-e2e-tests workflows
both started the Flask app *without* running `npm run build` first.
`dist/` is gitignored, so the page rendered with the empty fallback
from `vite_helper._fallback_assets()` — no bundled `styles.css`. Tests
ran against a partially-unstyled UI, and CSS source changes between
PRs were invisible to the responsive baseline.

(playwright-webkit-tests.yml already does this — these two were the
outliers.)

Add two steps before the existing test setup in each workflow:

  - name: Install root frontend dependencies
    run: npm ci
  - name: Build Vite frontend bundle
    run: npm run build

The existing `tests/ui_tests/npm ci` and `tests/puppeteer/npm install`
steps still run separately to install the Puppeteer/Chromium test deps.

Costs roughly 30s of build time per workflow run. Unblocks CSS-only
PRs from being meaningfully validated by the responsive baseline.
2026-05-10 19:27:09 +02:00
LearningCircuit
e2150c3165 fix(ci): use release environment for prerelease-docker secrets (#3983)
Switch the four prerelease-docker.yml jobs from `environment: prerelease`
to `environment: release` so they pick up the same DOCKER_USERNAME /
DOCKER_PASSWORD already known to work for docker-publish.yml. Avoids
duplicating environment secret configuration on the new prerelease
environment introduced in #3969.

The dispatch-time approval gate in release.yml still uses
`environment: prerelease`, so the two checkboxes in the review modal
remain independent — this only affects which secret store the
downstream build jobs read from.
2026-05-10 17:27:29 +02:00
LearningCircuit
91b68acafd docs(ci): auto-generated workflow status dashboard (#3966)
* docs(ci): add auto-generated workflow status dashboard

Adds `docs/ci/workflow-status.md` — a single page that surfaces every
GitHub Actions workflow in the repo, grouped by role, with action items
(disabled / stale / manual-only) at the top. Live status badges link to
each workflow's runs page. Auto-generated from the workflow YAML files +
the GitHub API by `scripts/generate_workflow_status.py`.

Why: the GitHub Actions tab is chronological-mixed (poor "is anything
red right now?" view), and the static workflow table in
`CI_CD_INFRASTRUCTURE.md` drifts when workflows are added/renamed (PR
#3963 fixed three factually wrong header claims for exactly this
reason). A reference page that mechanically reflects current state +
identifies dormant workflows answers both gaps.

What's surfaced today (verified live):
- **Disabled**: `nuclei.yml` (caller commented out in
  `release-gate.yml:177`).
- **Stale**: `update-precommit-hooks.yml` — its weekly Friday cron has
  been **failing for 10+ consecutive weeks** (since at least 2026-03-06).
  This was discovered by the dashboard, not previously tracked.
- **Manual-only**: `check-config-docs.yml`, `sync-main-to-dev.yml`
  (both intentionally manual; the dashboard shows them so they're not
  forgotten).

Generator design notes:
- Resolves reusable workflows correctly: `gh run list --workflow=X.yml`
  is empty for `workflow_call`-only workflows. The script walks the
  call graph (release.yml → release-gate.yml → semgrep.yml etc.),
  fetches the parent run's job list, and matches by **job key** parsed
  from the caller YAML (not by name heuristic — `gitleaks-scan` ↔
  `gitleaks-main.yml` would otherwise collide with `gitleaks.yml`).
- Picks "primary trigger" per workflow so e.g. `codeql.yml` (PR + push +
  cron + workflow_call) gets its glyph from the gated daily run, not a
  stale PR run.
- Stale check walks the *recent* runs list to find last success — a
  workflow that ran red yesterday and green a week ago is not stale.
- Manual edits outside the `<!-- BEGIN/END GENERATED -->` markers are
  preserved on regeneration; the timestamp lives inside the markers so
  post-marker content is fully user-owned.
- Preflights `gh auth status` and rate limit before any per-workflow
  call — fails fast with actionable message instead of partial output.

CI integration:
- `.github/workflows/check-workflow-status.yml` runs
  `--check-structure` on PRs touching workflows, the dashboard, or the
  generator. Pure structural check (no API calls, no live data) — fast
  and deterministic. Live regeneration stays on demand.

Cost: ~340 GitHub API calls per regeneration, ~45 sec wall-clock,
~6.8% of the 5000/hr authenticated quota.

* fixup(ci): review-pass corrections to workflow status dashboard

Surfaced by three rounds of code-review + correctness + security agents
on the original PR. Four small fixes; no behavioral change to the
generated dashboard's content.

1. **Recognize commented job keys** — `JOB_KEY_RE` now accepts an
   optional `# ` prefix. Previously, when an entire job block was
   commented out (e.g. `release-gate.yml:175-181` for nuclei), the
   commented `uses:` line inherited the *previous* active job's key
   (`gitleaks-scan`) instead of the correct `nuclei-scan`. Latent —
   commented entries are filtered out before reaching gated-run lookup
   — but would misattribute status if someone partially uncommented a
   block (uncommented just the `uses:` line).

2. **Pin pyyaml to ==6.0.3** in the CI workflow. The repo convention is
   exact `==` pins (95% of `pip install` calls in workflows); the only
   floating range was the one introduced by this PR. Matches pdm.lock.

3. **Validate marker order** in `merge_with_existing`. If a manual edit
   leaves the BEGIN/END markers reversed (e.g. mid-merge-conflict), bail
   to a clean overwrite instead of splicing interleaved garbage.

4. **Remove `_coerce_jq_stream`** — unused helper left behind from an
   earlier iteration. Zero call sites; no behavior change.

Verified by re-running the generator + `--check-structure`. The
rendered dashboard's only diff vs prior commit is the regeneration
timestamp and live "Last activity" cells (expected — those reflect
recent runs since the previous regen).

* feat(ci): bucketed activity labels + auto-regen on version bump

Two changes that together make the dashboard's diffs meaningful instead
of noisy.

1. **Coarse activity buckets.** Replace exact UTC timestamps in every
   "Last activity / Last manual run / Last successful run" cell with one
   of: `this week`, `last week`, `2 weeks ago`, `3 weeks ago`,
   `last month`, `2 months ago`, `3+ months ago`, `long ago`, `never`.
   Calendar-day boundaries (no time-of-day jitter) so two regenerations
   on the same date produce **zero diff** when nothing actually drifted.
   Verified: same-day re-runs after stable workflow state → empty diff.

   Also drop the redundant `Days idle` columns from Stale and
   Manual-only tables (the bucket label already says it), and round the
   "Last regenerated" footer to a date.

   Why: a daily-running healthy workflow used to bump its timestamp
   every regen (noise). Now it stays in `this week` indefinitely, and
   the only diffs that land in a version-bump PR are real bucket
   transitions — exactly the "this slipped from last week to last month
   — something might be wrong" signal the dashboard exists for.

2. **Auto-regenerate on version bump.** Add a step to `version_check.yml`
   right after the existing `generate_config_docs.py` regen. Same
   pattern as the config docs precedent — the dashboard refresh rides
   along with each version-bump PR and is reviewable in the same diff.

   Costs ~340 GitHub API calls per run (well under the GITHUB_TOKEN
   1000/hr workflow-runs limit). Adds `actions: read` to the job
   permissions block; uses `pyyaml==6.0.3` matching pdm.lock.

* feat(ci): drop regen timestamp; add health banner; fix in-progress false-stale

Three follow-ups to keep version-bump diffs strictly meaningful, plus
two correctness fixes uncovered by repeated stability testing.

1. **Drop the "Last regenerated" date.** Git history is authoritative
   for "when this snapshot was taken"; embedding a date here forced a
   single-line diff every regeneration even when nothing else drifted.

2. **Aggregated health banner** at the top of the generated region:
   `**63 workflows:** 1 disabled · 1 stale · 2 manual-only · 59 active`
   Counts only change when a workflow shifts between
   {disabled, stale, manual, active} — same level of diff-stability as
   the per-row buckets.

3. **`?event=schedule` for own-cron workflow badges.** Verified
   effective by SHA-comparing badge bodies for workflows with
   multi-event run history. Makes the badge for e.g. `gitleaks.yml`,
   `fuzz.yml`, `osv-scanner.yml` reflect cron health specifically,
   rather than whichever PR ran last. The runs-page link uses the
   matching `?query=event%3Aschedule` so a click lands on the
   filtered run list.

4. **Fix false-stale during in-flight release runs.** Previously,
   when release.yml was running, gates reachable via release.yml
   (puppeteer-e2e-tests, ci-gate, etc.) would briefly flip to "stale"
   because `fetch_last_gated_run` returned the in-progress run first
   and `last_success` couldn't see past it. Now the function walks
   all 5 caller runs and returns both the latest match (for activity)
   and the latest successful match (for staleness), avoiding the flip.

5. **Map all GitHub conclusion enum values.** A `gitleaks.yml` run
   completed with `action_required` between two test regens; the
   glyph table didn't have it and rendered `?`. Added every
   documented value (`neutral`, `timed_out`, `stale`, `action_required`)
   and changed the unknown-fallback from `?` to em-dash, so future
   GitHub-side enum additions don't introduce a false-positive diff.

Verified: two same-day regens after workflow state has settled now
produce **zero diff**.

* ci(version-bump): make workflow-status regen non-blocking

Add `continue-on-error: true` to the dashboard regeneration step in
version_check.yml. The regen calls ~340 GitHub API endpoints and would
otherwise block the entire version-bump PR if any of them transiently
fail (rate-limit hit, GitHub Actions outage, etc.). The failure mode
should be "dashboard stays at the previous snapshot until next
successful regen", not "release pipeline is blocked".

The sibling `generate_config_docs.py` step doesn't need this — it's
purely local with no external API dependency.
2026-05-10 15:58:32 +02:00
LearningCircuit
632bb176fc fix(ci): scope prerelease-docker jobs to prerelease environment (#3978)
The prerelease-docker workflow's jobs declared no environment, so the
DOCKER_USERNAME / DOCKER_PASSWORD secrets stored on the new `prerelease`
environment (added in #3969) were invisible to them and the Docker Hub
login step failed with "Username and password required" (run
25627724313).

Add `environment: prerelease` to all four jobs, mirroring how
docker-publish.yml scopes every job to `environment: release`. This
makes the environment secrets visible and applies the same reviewer
gate that already protects the real publish workflow.
2026-05-10 14:17:08 +02:00
LearningCircuit
28b1732259 test(ui): replace flake-prone delays, fix local-DX bug, correct stale CI comment (#3972)
* test(ui): replace fixed delays in metrics_dashboard with proper waits

Five hardcoded `await delay(N)` calls in tests/ui_tests/test_metrics_dashboard.js
became `page.waitForResponse(...)` and `page.waitForSelector(...)` plus a
short `waitForFunction` for the SPA-route check. Each replacement waits for
the real condition (an API response or a DOM element) instead of a fixed
sleep, so the test stops racing and gets faster on machines that finish
the work quickly.

Verified 10/10 runs against a live local server: all pass at ~19.7s
wall time (previously ~25s with the fixed delays summed alone consuming
12s of that).

Concrete sites:
* line 79  → wait for `/api/start_research` response, then `waitForFunction`
            on the URL change
* line 164 → wait for `/api/metrics` response (10s ceiling)
* line 290 → wait for `period=7d` response (5s ceiling)
* lines 334, 352 → wait for the metrics dashboard selector after navigation

* test(ui): handle puppeteer's fullPage screenshot ceiling gracefully

Running test_responsive_ui_comprehensive.js locally without CI=true used
to fail on the Settings page with `Protocol error (Page.captureScreenshot):
Page is too large` — Puppeteer/Chromium's fullPage screenshot caps at
16384px, and the Settings page rendered at 375px wide blows past that
limit. The error bubbled up to testPage's catch block and marked the
whole page as failed. CI environments avoided the problem because the
diagnostic-screenshot calls are guarded by `!process.env.CI`, so local
devs couldn't reproduce CI's pass.

The screenshots are diagnostic, not the test target. Added a
`safeScreenshot(opts)` helper that catches the documented "Page is too
large" / `captureScreenshot` protocol errors and falls back to a
viewport-only capture so the run continues. Replaced all 9 fullPage
screenshot call sites in this file with the helper; the safeScreenshot
method itself still uses `page.screenshot` directly (the only place
that should).

Verified 5/5 runs locally pass (mobile viewport, no CI=true) at ~31s
wall time; CI=true behavior is unchanged.

* ci(workflows): correct stale concurrency-comment in responsive-ui-tests

The comment block above `permissions:` claimed the workflow "triggers on
both pull_request and workflow_call." That was true historically but
became wrong when #2248 removed the pull_request trigger to keep this
heavy matrix build (mobile + desktop, ~20 min each) off the PR gate.
The comment was added later in #3600 with the stale wording, so anyone
reading it has been misled about when this workflow actually runs.

Rewrite to describe current behavior accurately: runs via workflow_call
from release.yml's responsive-test-gate and via workflow_dispatch only.
The concurrency-history note (PR #3554 / #3599) is preserved.

No functional change — just the comment.

* test(ui): filter benign navigation-abort race in benchmark page test

`Benchmark Results Page › page loads without critical errors` was
flaky with `Error loading benchmark history: TypeError: Failed to
fetch`. The describe-scoped `beforeEach` already navigates to
`/benchmark/results`, then the test re-navigates with listeners
attached. The first navigation's in-flight history fetch gets aborted
by the second navigation and surfaces as `Failed to fetch` — a benign
race, not a real bug.

Add `Failed to fetch` to the existing filter list (next to favicon,
404, and Failed to load resource), with an inline comment explaining
why. Verified 5/5 clean runs locally; previously hit 1 flaky / 4 clean.
2026-05-10 13:46:34 +02:00
LearningCircuit
1315b679e0 ci(research): switch E2E research workflow to langgraph-agent strategy (#3965)
* ci(research): switch E2E research workflow to langgraph-agent strategy

The ldr_research label runs scripts/ldr-diff-research.py, which until
now didn't pass a search_strategy and so fell through to the
quick_summary default of source_based. Switch to the agentic
langgraph-agent strategy so the workflow exercises the autonomous
research path.

- Adds --strategy CLI arg and LDR_STRATEGY env var, default
  langgraph-agent (consistent with the existing --provider /
  --search-tool / --iterations pattern).
- Workflow exposes LDR_STRATEGY: vars.LDR_STRATEGY || 'langgraph-agent'
  so the choice is overridable per-repo via Variables.
- Notes in the script docstring that LDR_ITERATIONS=1 is a no-op for
  the langgraph strategy (which reads langgraph_agent.max_iterations
  from settings instead).

* ci(research): consolidate model var to LDR_RESEARCH_MODEL

The workflow had two model variables — vars.LDR_MODEL for diff mode and
vars.LDR_STATIC_MODEL for static mode — selected by a small set-model
step. Collapse to a single LDR_RESEARCH_MODEL variable shared by both
labels, mirroring the AI reviewer's vars.AI_MODEL pattern.

- Default: google/gemini-2.0-flash-001 (the value the script was
  already falling through to).
- Override via Settings → Variables → New repository variable
  → name: LDR_RESEARCH_MODEL.
- The set-model step is removed; the workflow now passes the env var
  through directly.
- Script reads LDR_RESEARCH_MODEL instead of LDR_MODEL.

Note: existing repo variables LDR_MODEL and LDR_STATIC_MODEL become
orphaned by this rename and can be deleted from repo settings.

* ci(research): stop overriding strategy iterations from the workflow

Previously the workflow set LDR_ITERATIONS=1 and the script forwarded
that as iterations= in kwargs. For source_based that capped research at
one iteration; for langgraph-agent it was effectively a no-op (langgraph
reads max_iterations, not iterations) but the wiring was misleading.

- Drop LDR_ITERATIONS from the workflow env block.
- Make --iterations default to None in the script and only forward it
  to quick_summary when explicitly set on the CLI.
- Each strategy now uses its own setting-driven default unless
  overridden — for langgraph-agent that means langgraph_agent.max_iterations
  (default 50) flows through unchanged.

* ci(research): split research model into MAIN + CHEAP per label

Bring back per-label model selection with cleaner names:

- ldr_research        → vars.LDR_RESEARCH_MODEL  (deep PR analysis,
                                                   user-configurable)
- ldr_research_static → vars.LDR_RESEARCH_CHEAP_MODEL  (regression
                                                        smoke, kept cheap)

Both default to google/gemini-2.0-flash-001 if unset, so existing
behaviour stays identical until you actually configure cheap-model.
The script and its env-var contract are unchanged — the workflow
just picks which value to feed into LDR_RESEARCH_MODEL based on the
applied label.
2026-05-10 13:10:02 +02:00
LearningCircuit
8871d0fdab ci(release): split prerelease docker into its own environment (#3969)
Today the trigger-prerelease-docker job and the create-release /
trigger-workflows jobs all gate on `environment: release`. GitHub's
"Review deployments" modal collapses every pending job in the same
environment under one checkbox, so approving `release` approves the
prerelease test AND the actual publish at once. There is no UI affordance
to test the prerelease docker build first and decide on the release
afterward.

Move trigger-prerelease-docker to a new `prerelease` environment so the
review modal shows two independent checkboxes. Maintainers can now:
- Approve `prerelease` only, test the docker image, then approve `release`.
- Reject `prerelease` and approve `release` to skip the prerelease step.
- Approve `prerelease`, test, then cancel the run to abandon the release.

Requires a one-time GitHub Settings change: create a `prerelease`
environment with the same required reviewers as `release`. PAT_TOKEN is a
repo secret, so no environment-secret copy is needed.

create-release and trigger-workflows remain on `release` — unchanged.
2026-05-10 12:10:32 +02:00
LearningCircuit
b8602b8f10 fix(security): suppress alerts #7743 #7744 #7745 (audited false positives) (#3968)
- #7743 (zizmor/dangerous-triggers): welcome-first-time.yml
  Adds inline `# zizmor: ignore[dangerous-triggers]` with rationale.
  pull_request_target is required so fork PRs receive a writable token
  for the welcome comment. The workflow never checks out PR content,
  never executes fork-controlled scripts, and only reads
  `sender.login` (operator-trusted GitHub event metadata). The
  comment body is a static template with no PR-controlled
  interpolation. This is one of the safe, audited use cases of
  pull_request_target.

- #7744 / #7745 (Bearer/javascript_lang_dangerous_insert_html):
  context-overflow.js innerHTML at lines 453 and 501.
  Adds `// bearer:disable javascript_lang_dangerous_insert_html`
  alongside the existing `eslint-disable` comments. All
  user-controlled values are already routed through escapeHtml;
  numeric fields go through formatNumber; CSS classes and badges
  are hardcoded literals. This matches the convention used in
  collection_details.js, embedding_settings.js, and other JS
  components in the repo.
2026-05-10 11:53:35 +02:00
LearningCircuit
0f65961fa0 docs(ci): cross-link compose-integration-test ↔ compose-published-smoke (#3963)
Make the relationship between the two compose workflows explicit so future
contributors don't try to add the build-override variant to release-gate's
daily cron (PR #3962 attempted this and was reverted).

- compose-integration-test.yml: add a "do not move into the daily cron"
  paragraph pointing readers at compose-published-smoke.yml as the
  workflow that already covers ongoing drift between main's compose.yml
  and the published image.
- compose-published-smoke.yml: fix three stale claims in the header:
    1. "per-PR / release-gate test" — neither is true (no pull_request
       trigger, not in release-gate.yml; runs only at release time +
       manual dispatch).
    2. "PR compose changes already covered by the per-PR integration
       test" — there is no per-PR integration test; replace with the
       accurate reason (this test can only fail on drift already on main).
    3. cron-offset comment referenced a "daily compose-integration-test
       (03:00)" that does not exist; offset is only against release-gate.

Comment-only change. No workflow behavior changes.
2026-05-10 08:29:00 +02:00
LearningCircuit
e6d72faa14 test(embedding-settings): regression spec for model dropdown reset (#3863) (#3949)
* test(embedding-settings): regression spec for model dropdown reset (#3863)

Adds a Playwright spec that mocks the embedding-settings backend
(`/library/api/rag/models`, `/library/api/rag/settings`,
`/settings/api/local_search_embedding_model`,
`/settings/api/embeddings.ollama.url`) so the page renders without a real
Ollama or Sentence-Transformers backend. Three tests cover the bug from
#3863 and the surrounding contract:

1. selecting a non-top model auto-saves it and persists across reload —
   exercises the change-listener path that the per-field auto-save relies
   on.
2. Ollama URL change does not reset the selected model — the
   load-bearing regression test. Reverting the preserve+restore patch in
   `updateModelOptions()` was confirmed to make this test fail with the
   same symptom the issue reporter saw (dropdown snaps to the index-0
   model).
3. "Save Default Settings" button is gone — guards against an accidental
   re-introduction of the redundant button that originally triggered the
   bug.

Mock route registration order matters here: the catch-all
`**/settings/api/**` is registered first so the specific PUT mocks for
`local_search_embedding_model` etc. (registered later) win Playwright's
last-registered-wins precedence.

* test(embedding-settings): address review findings on regression spec

Round-2 review of PR #3949 surfaced that the spec was not actually wired
into any CI workflow and had a couple of correctness/flakiness issues.

- Add `embedding-settings-dropdown` to the curated Safari filename filter
  in `.github/workflows/playwright-webkit-tests.yml` (both Desktop Safari
  and Mobile Safari runs) so the daily release-gate Playwright run picks
  up this regression spec.
- Replace `waitForTimeout(500)` with a deterministic poll on a new
  `state.modelsFetches` counter that ticks on each `/library/api/rag/models`
  GET. Once the count rises past the pre-action baseline, the post-save
  `loadAvailableModels()` call has completed and any spurious save the
  bug would trigger has already fired.
- Gate the `local_search_embedding_model` and `local_search_embedding_provider`
  PUT mocks on request method, mirroring the `embeddings.ollama.url`
  handler. A stray GET hitting these handlers would otherwise push
  `undefined` into `state.modelSaves`/`providerSaves`.
- Skip the entire describe block on mobile projects (`test.skip(({ isMobile })
  => isMobile, ...)`). This is a desktop form-state regression test, not
  a layout test — it doesn't need to run across 12 device profiles.
- Reframe test #1's docstring: it's a happy-path smoke test for the
  per-field auto-save contract, not a regression test for #3863. The old
  comment claimed it would catch the bug, which I confirmed empirically
  it does not (the dropdown-rebuild path that #3863 exploited isn't on
  the model-pick-and-reload flow).
- Add a new test for the provider-change path (`updateModelOptions` is
  also reached from the provider-change handler at line 325 of
  `embedding_settings.js`). The model-shared-across-providers fixture
  forces the preserve+restore branch in `updateModelOptions` to fire and
  asserts the selection survives. Verified locally: this test fails
  alongside the Ollama-URL test when the patch is reverted.

* test(embedding-settings): extract mocks helper + cover text_separators reset

Round-2 review left two advisory items: extract the mock infrastructure
to a shared helper, and cover the text_separators reset behavior added in
the follow-up commit on PR #3940 (40678b2). Both are addressed here.

- Move BASE_MODELS_PAYLOAD, defaultSettings, and mockEmbeddingApis to
  `tests/ui_tests/playwright/tests/helpers/embedding-settings-mocks.js`,
  matching the CommonJS pattern used by `mobile-utils.js` so additional
  embedding-page specs can reuse the same backend mocks. Also export
  DEFAULT_TEXT_SEPARATORS so callers can assert against it without
  duplicating the literal.
- Add `state.textSeparatorsSaves` and a method-gated PUT route mock for
  `/settings/api/local_search_text_separators` to capture saves.
- New test: clearing the text_separators textarea and blurring persists
  the default array. Sanity-checked locally by reverting the
  empty-textarea branch in `embedding_settings.js:395-415` to its
  pre-#3940 `if (!rawValue) return;` state — the new test fails (no PUT
  fired); restoring the fix makes it pass again.

* ci: drop embedding-settings-dropdown from Mobile Safari filter

The spec uses `test.skip(({ isMobile }) => isMobile, ...)` at the
describe level, so on Mobile Safari Playwright would load the file and
skip every test — harmless but noisy. Keep it in the Desktop Safari
filter only, where it actually runs.
2026-05-10 08:11:19 +02:00
LearningCircuit
bc527f7aa9 ci(workflows): migrate to LDR_DISABLE_RATE_LIMITING canonical name (#3945)
* ci(workflows): migrate to LDR_DISABLE_RATE_LIMITING canonical name (#3936 follow-up)

Flips the 9 remaining `DISABLE_RATE_LIMITING=true` workflow uses to the
canonical `LDR_DISABLE_RATE_LIMITING=true` name introduced in #3936, so
CI no longer trips its own deprecation warning. Also closes a latent
test-isolation gap in `test_enabled_by_default` that did not pop the
canonical var, which would have started failing as soon as a developer
or workflow exported it.

* test(auth): flip remaining DISABLE_RATE_LIMITING uses to LDR_ prefix

Picked up from the closed #3944. The test_auth_routes fixture used the
legacy env var, and test_auth_rate_limiting carried a stale comment
referencing it. Both now use the canonical LDR_DISABLE_RATE_LIMITING
introduced in #3936, matching the workflow flips in this PR.

* test(env_registry): isolate TestIsRateLimitingEnabled from canonical env var

CI now exports LDR_DISABLE_RATE_LIMITING=true (per the workflow flips
in this PR). Two tests in TestIsRateLimitingEnabled use
patch.dict(os.environ, {"DISABLE_RATE_LIMITING": ...}) without
clearing the canonical key, so the canonical var bled in from the
outer process and short-circuited before the legacy code path:

  - test_enabled_when_flag_false: expected True (legacy=false), got
    False because canonical=true wins
  - test_legacy_form_emits_deprecation_warning_once: expected one
    warning, got zero because canonical short-circuit skips legacy

Add a class-level autouse clean_env fixture that strips both env-var
forms (mirroring the one in test_env_registry_extended.py). The
remaining tests in this class were silently coincidence-passing under
the bug because they expect False and canonical=true also gives False.

Verified by exporting LDR_DISABLE_RATE_LIMITING=true and running the
two test files: 65 passed.
2026-05-10 08:10:28 +02:00
LearningCircuit
5e3f37a7ce fix(ci): grant pull-requests:write to welcome-first-time workflow (#3950)
createComment on a PR via /issues/{n}/comments returns 403 with only
`issues: write`. GitHub now requires `pull-requests: write` when the
issue resource is actually a PR — the API response's
`x-accepted-github-permissions: issues=write; pull_requests=write`
indicates both are needed (issues for plain issues, pull_requests
for PRs). All five recent runs of this workflow have failed for this
reason; adding the permission unblocks the welcome comment.
2026-05-09 22:28:23 +02:00
LearningCircuit
5a0ca57ded feat(ci): welcome first-time contributors with a single comment (3/5) (#3859)
* feat(ci): welcome message on a contributor's first PR

Adds .github/workflows/welcome-first-time.yml using
actions/first-interaction@v3.1.0 (pinned by SHA). Posts a single comment
on a contributor's first PR pointing at CONTRIBUTING.md and our
review-process docs.

Uses pull_request_target so forked PRs receive a writable token; the
action only posts a fixed message (no checkout, no shell execution),
so the security surface is minimal. Permissions limited to
pull-requests: write.

PR 3 of 5 introducing PR triage automation. Independent of the other
PRs in the series.

* feat(ci): rewrite welcome workflow with per-author check + starter pack

Replaces actions/first-interaction (which has no author filter on
isFirstPullRequest, so it would never fire on a repo with prior PRs)
with a github-script that uses issues.listForRepo?creator=<user> to
detect a contributor's actual first PR.

Fixes the missing issues:write permission needed by issues.createComment
(PR comments route through the issues API).

Adds a bot filter (consistent with the PR triage workflow) so
dependabot/renovate PRs don't trigger a human-facing welcome.

Expands the welcome message into a starter pack: install guide, dev
guide, pre-commit hook setup (inline commands), architecture overview,
tests README, FAQ, troubleshooting, security policy, and Discord. All
links use absolute URLs to files that exist on main.
2026-05-09 18:47:54 +02:00
LearningCircuit
8cc0184cbe feat(ci): auto-apply triage labels on PR open and review (2/5) (#3858)
* feat(ci): auto-apply triage labels on PR open and review events

Adds .github/workflows/pr-triage.yml that:
- on PR opened: applies external-contributor / first-time-contributor /
  bot / needs-codeowner-review based on author_association
- on synchronize: flips awaiting-author → awaiting-codeowner when author
  pushes new commits
- on review submitted by a codeowner: clears or applies lifecycle labels
  based on the review state (approved / changes_requested)
- on review dismissed: re-applies needs-codeowner-review if a previous
  changes-requested review was withdrawn

Codeowner detection accepts the hardcoded global-owners list OR any
reviewer with OWNER/MEMBER/COLLABORATOR association (covers team-based
codeowners that aren't direct repo members).

Uses pull_request (not pull_request_target) so fork PRs run with a
read-only token; label calls 403 silently for forks. Acceptable trade
vs the security cost of running pull_request_target with secrets on
fork code. Maintainers can apply labels manually for fork PRs.

Updates CODEOWNERS with a comment noting the global-owners list is
mirrored in pr-triage.yml; both must stay in sync.

PR 2 of 5 introducing PR triage automation. Depends on labels being
synced first via PR 1 (#3857).

* fix(ci): tighten codeowner check, prune permissions, extend bot list

- Drop the OWNER/MEMBER/COLLABORATOR fallback in isCodeownerReview;
  rely on the hardcoded CODEOWNERS list. The fallback was designed for
  team-based codeowners but this repo has no such setup, and the
  fallback would become a security-relevant mislabel if branch
  protection adopts require_code_owner_reviews=true.
- Trim job permissions to issues:write only — pull-requests:write and
  contents:read were unnecessary (issues:write covers PR labels since
  PRs are issues internally). Matches label-fixed-in-dev.yml precedent.
- Add mseep-ai and Nexus-Digital-Automations to KNOWN_BOTS — both
  appear in repo PR history without the [bot] suffix.

* fix(ci): clear awaiting-author on approval, gate dismissed handler

Two label-state bugs surfaced by the AI code reviewer on the previous
revision (#3858 review pass):

- Approval branch now also removes awaiting-author. Without this, a
  codeowner who switches from changes_requested to approved purely via
  comments (no intervening author push, so no synchronize event to
  flip the label) leaves awaiting-author stuck on the PR.

- Dismissed branch now requires that the dismissed review was a
  codeowner's changes_requested review. Otherwise any non-codeowner
  review being dismissed while awaiting-author happens to be set would
  incorrectly flip the PR back to needs-codeowner-review while the
  real codeowner request is still active.

* chore(ci): pre-commit check that pr-triage.yml CODEOWNERS matches .github/CODEOWNERS

Eliminates the manual sync hazard flagged in the PR review: the
hardcoded JS array in pr-triage.yml must mirror the global owners
line in .github/CODEOWNERS, and the only thing keeping them in sync
was a pair of comments. This adds a small Python pre-commit hook
that parses both files and fails if their owner sets disagree
(case-insensitive, order-independent). Triggers on edits to either
file. Same shape as check-version-sync.py.

* fix(ci): swallow 403 on fork-PR label calls so the run stays green

The previous comment promised fork PRs would "403 silently" but
addLabels never caught the error — every fork contribution would have
shown a red check on the triage workflow. This adds a narrow 403 catch
(scoped to the documented fork-PR case via pull_request's read-only
token) shared by addLabels and removeLabel, with a console log so the
no-op is visible in the run output. Other status codes still throw.

Behavior matches the original design intent; comment is now accurate.
Flagged by AI Code Review on the previous revision.

* fix(ci): drop dead state check in dismissed-review handler

GitHub mutates review.state to "dismissed" on pull_request_review
action=dismissed events (github/docs#20216), so the previous guard
`review.state !== 'changes_requested'` always returned early. The
awaiting-author -> needs-codeowner-review flip never executed.

Use the awaiting-author label as the discriminator instead — it's
only set by a codeowner's changes_requested review, so its presence
is reliable proof the dismissal is the one we care about. Dismissals
of approval/comment reviews are no-ops because the label won't be
present.
2026-05-09 15:57:04 +02:00
dependabot[bot]
649ead1079 chore(deps): bump github/codeql-action from 4.35.3 to 4.35.4 (#3919)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.3 to 4.35.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](e46ed2cbd0...68bde559de)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-09 14:43:47 +02:00
dependabot[bot]
fca32f072b chore(deps): bump anthropics/claude-code-action from 1.0.107 to 1.0.119 (#3918)
Bumps [anthropics/claude-code-action](https://github.com/anthropics/claude-code-action) from 1.0.107 to 1.0.119.
- [Release notes](https://github.com/anthropics/claude-code-action/releases)
- [Commits](567fe954a4...476e359e62)

---
updated-dependencies:
- dependency-name: anthropics/claude-code-action
  dependency-version: 1.0.119
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-09 14:43:20 +02:00
dependabot[bot]
e76c323813 chore(deps): bump actions/dependency-review-action from 4.9.0 to 5.0.0 (#3915)
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.9.0 to 5.0.0.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](2031cfc080...a1d282b36b)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-version: 5.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-09 13:22:14 +02:00
dependabot[bot]
a21c30bbe4 chore(deps): bump anchore/scan-action from 7.3.2 to 7.4.0 (#3917)
Bumps [anchore/scan-action](https://github.com/anchore/scan-action) from 7.3.2 to 7.4.0.
- [Release notes](https://github.com/anchore/scan-action/releases)
- [Changelog](https://github.com/anchore/scan-action/blob/main/RELEASE.md)
- [Commits](7037fa0118...e1165082ff)

---
updated-dependencies:
- dependency-name: anchore/scan-action
  dependency-version: 7.4.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-09 12:41:57 +02:00
dependabot[bot]
1aaff5cac9 chore(deps): bump sigstore/cosign-installer from 4.1.1 to 4.1.2 (#3916)
Bumps [sigstore/cosign-installer](https://github.com/sigstore/cosign-installer) from 4.1.1 to 4.1.2.
- [Release notes](https://github.com/sigstore/cosign-installer/releases)
- [Commits](cad07c2e89...6f9f177880)

---
updated-dependencies:
- dependency-name: sigstore/cosign-installer
  dependency-version: 4.1.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-09 12:41:35 +02:00
LearningCircuit
3066a9b2c5 chore(deps): cover audited test dirs in dependabot config (#3913)
Two test directories audited by .github/workflows/npm-audit.yml were
missing from .github/dependabot.yml:

- /tests/ui_tests/playwright
- /tests/accessibility_tests

So they only received Dependabot security alerts (via GitHub's GHSA
scanner) and never the routine weekly version-bump PRs. That gap is
why basic-ftp in tests/accessibility_tests had to be patched manually
in #3896 instead of arriving as a normal Dependabot update.

Add both as daily npm trackers, matching the cadence of the other
test-dir entries.
2026-05-09 11:57:04 +02:00
LearningCircuit
7065b6b1b4 ci: weekly published-image smoke test with auto-issue on failure (#3890)
* ci: weekly smoke of main's compose against the published Docker Hub image

Complements compose-integration-test.yml (#3886). That workflow builds the
LDR image from the working tree — it tests "this PR's code's compose with
this PR's code's image". This new workflow tests "main's compose with the
currently-published localdeepresearch/local-deep-research:latest" — the
exact artefact users get when they follow the README quickstart:

    curl -O .../docker-compose.yml && docker compose up -d

The drift between those two is real. Whenever a compose change lands on
main but the image hasn't been republished (which happens between every
release), users following the quickstart can hit a broken stack — the same
class of bug as #3874, but only visible against the published image.

Cadence: weekly Monday 05:00 UTC. The failure modes are slow-moving and
weekly burns ~1/4 the CI minutes a daily run would. The PR-time / release
gate test in #3886 covers the per-change cases.

On schedule failure, opens (or comments on) a tracking issue with run URL,
container digests, and a triage checklist. Stable title prefix dedups
across weeks; manual workflow_dispatch runs do NOT auto-create issues
(those are for ad-hoc testing).

Reuses the same wait/probe/teardown logic as compose-integration-test.yml,
intentionally not factored into a composite action — two workflows, ~50
lines of shared shell, refactoring for DRY would cost more than it saves
right now and the loops will diverge as we tune them.

* ci: require LDR healthy in published-image smoke test

Same fix as #3886 commit b7ea510f. LDR has a Dockerfile-level HEALTHCHECK
(Dockerfile:306, probes /api/v1/health), so docker inspect returns the
health status not the state — checking for ldr_h='running' never matched
even though the stack was healthy. Require 'healthy' for all three
services to match the actual signal.

* ci(compose-published-smoke): mirror AI review fixes from #3886

Same pipefail + service-based container resolution + safer probe pattern
as #3886's commit dbf6b83d. Both workflows share the same wait/probe
logic and should stay in sync.

Per AI review on #3886:
- Replace hardcoded ollama_service / searxng container names with
  docker compose ps -q <service> resolution everywhere.
- set -euo pipefail throughout (no behavior change in the green path).
- HTTP probe uses case statement on captured -w code, not grep on a
  pipe — pipefail-safe and gives better retry diagnostics.

* ci(compose-published-smoke): address AI review — service names, --no-build, gh = form

Three findings from AI review:

1. Digest capture loop iterated `ollama_service` (compose's container_name)
   while everything else uses service names + cid_for. Worked today only
   because container_name happens to match. Refactor to use cid_for for all
   three services — same pattern as the wait/fail-fast loops, name-drift
   safe. Also folds the separate ldr_id block into the same loop.

2. Add `--no-build` to `docker compose up -d`. No-op today (no service has a
   `build:` directive), but defends against a future compose change that
   adds one — this workflow specifically tests the *published* image, and
   silently building from source would invalidate the test.

3. Switch gh CLI calls to `--body="$BODY"` (= form) so a body that ever
   starts with `-` can't be misparsed as a flag. Hygiene; current bodies
   are all controlled heredocs.

* ci(compose-published-smoke): drop curl -f to preserve HTTP error codes

AI review #4 finding (legit, even though the same review's findings #1-#3
were stale — those were already fixed in bc7ca8504).

curl -f makes the request fail-silently on HTTP 4xx/5xx, which suppresses
the -w output. Combined with `|| echo "000"`, this means a 404 / 503 / or
real network error all collapsed into "000" — erasing the diagnostic
signal we most want when triaging a failure ("LDR is up but serving an
error page" vs "LDR is unreachable").

Drop -f. Now:
  - HTTP 200/30x → match → success
  - HTTP 4xx/5xx → captured as the real code, logged on retry, never
    matches → eventual timeout with the actual code in the logs
  - Network failure → "" → || echo "000" → logged as "000"

Body is still discarded via -o /dev/null; nothing else changes.

* ci(compose-published-smoke): make digest logging unconditional and explicit

AI review on #3890 flagged that "digests are only emitted on failure" —
not quite right (the dump step is `if: always()` and `cat`s digests on
every run), but the framing inside the digest block was misleading: the
heading said "## Image digests at time of failure", which is wrong on
green runs and in the workflow log.

Two small changes:

1. Drop the failure-specific phrasing in the heading. The auto-issue
   body still contextualizes "below" as failure-time when actually
   filed, so no information is lost there.

2. Wrap the cat in `::group::Image digests` for a properly labeled,
   collapsible log group on every run — matches the style of the
   "docker compose ps" and per-service log groups above.

Audit/bisection ergonomics: every successful weekly run now leaves a
clearly labeled record of which image SHAs the test exercised.

* ci(compose-published-smoke): || true on teardown to prevent false drift alerts

AI review on #3890 finding: if `docker compose down` fails (daemon hiccup,
hung container) after a successful smoke test, the job exits non-zero,
which makes the auto-issue step's `if: failure()` trigger — opening a
drift-tracking issue despite the stack having passed health and HTTP
checks. False positive.

Append `|| true` so teardown errors don't propagate. CI runners are
ephemeral so any leftover state vanishes with the runner regardless;
the only downside of swallowing here is hiding diagnostic info we don't
need (we already dumped logs and digests in the previous step).
2026-05-09 11:08:41 +02:00
LearningCircuit
b829c2d65a ci(compose-integration): hardening follow-ups (--no-build + drop curl -f) (#3898)
* ci(compose-integration): add --no-build to docker compose up

Defense-in-depth flag from AI review on #3890 (where the same fix landed
in commit bc7ca8504 alongside two other fixes). No-op today since no
service in docker-compose.yml has a `build:` directive, but defends
against a future compose change that adds one. This gate's whole purpose
is to test the locally-built LDR image (tagged in the previous step)
plus the pulled ollama/searxng images — silently building from source
on `up` would invalidate the cache strategy and the test.

* ci(compose-integration): drop curl -f to preserve HTTP error codes

Mirrors the fix on #3890 (commit e785f464b). With -f, curl exits non-zero
on HTTP 4xx/5xx AND suppresses -w output, so the `|| echo "000"` sentinel
collapsed 404 / 503 / real-network-failure all into the same "000" — the
most interesting diagnostic (LDR up but serving an error page) was lost.

Without -f, every HTTP response gives us its real code in retry logs; only
true network failures fall through to "000". Body still discarded via
-o /dev/null.
2026-05-09 10:59:18 +02:00
dependabot[bot]
bf43e7c328 chore(deps): bump actions/setup-node from 4.4.0 to 6.4.0 (#3814)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.4.0 to 6.4.0.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](https://github.com/actions/setup-node/compare/v4.4.0...48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: 6.4.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-05-09 09:45:05 +02:00
dependabot[bot]
12c01cd44c chore(deps): bump actions/github-script from 8.0.0 to 9.0.0 (#3812)
Bumps [actions/github-script](https://github.com/actions/github-script) from 8.0.0 to 9.0.0.
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](https://github.com/actions/github-script/compare/v8...3a2844b7e9c422d3c10d287c895573f7108da1b3)

---
updated-dependencies:
- dependency-name: actions/github-script
  dependency-version: 9.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com>
2026-05-09 09:44:27 +02:00
LearningCircuit
4540adaac2 ci: full docker-compose integration test + drop ollama model pre-pull (#3886)
* ci: add full docker-compose integration test to release gate

Brings up the bundled docker-compose.yml end-to-end (searxng + ollama +
local-deep-research) and asserts the whole stack reaches healthy/serving.
This is the test that would have caught #3874 (cap_drop: ALL breaking
SearXNG) before users hit it — and the same class of bug whenever an
upstream image bumps its capability or healthcheck requirements.

Cost is bounded by scoping triggers carefully:

- pull_request: only when compose / Dockerfile / entrypoint scripts change
- schedule: daily at 03:00 UTC (offset from release-gate at 02:00)
- workflow_call: invoked from release-gate.yml so a release can't bypass it

We override MODEL=tinyllama:1.1b for the test (~640 MB) instead of the
default gemma3:12b (~7-8 GB). Users tune MODEL via env the same way; the
compose config under test is otherwise identical to what ships.

Wait loop fails fast on container exits rather than burning the full 12 min
budget, and dumps logs from all three services on any failure for triage.

* ci: skip ollama model pull in compose integration test

The integration test verifies "compose up + healthy + LDR serves" — it does
not run inference. After #3885 the ollama healthcheck is `ollama list`
(model-agnostic), so pulling a model only adds ~1-2 min and a flake source
(Ollama Hub registry transients) without exercising anything the test
checks.

Layer a small override (.github/compose-ci-override.yml) that replaces the
ollama service's entrypoint with `ollama serve`. The base docker-compose.yml
is otherwise unchanged — capabilities, networking, healthchecks, depends_on,
ports all come from the file users actually run.

Wait budget drops 12 min → 6 min accordingly.

End-to-end inference, if we ever add it, belongs in a separate workflow
that's transparent about the cost and runs less frequently.

* fix(docker): drop ollama model pre-pull from compose

The bundled compose's ollama service overrode the image entrypoint with
scripts/ollama_entrypoint.sh ${MODEL:-gemma3:12b}, which pre-pulled a
multi-GB model on every fresh start. That had three problems:

1. Users running LM Studio / OpenAI / llama.cpp don't use ollama at all,
   but every fresh boot still pulled gemma3:12b (~7-8 GB).
2. First-time setup wasted 5-10 min on a model selection the user may
   not even want — gemma3:12b is a strong opinion baked into the compose.
3. CI integration tests (#3886) had to layer an override file just to
   skip the pull, since the model isn't relevant to "stack-comes-up"
   smoke testing.

Drop the entrypoint override entirely. The ollama image's default
entrypoint is `ollama serve`; that's all we need. The healthcheck
introduced in #3885 already probes the daemon (model-agnostic) so this
slots in cleanly. Also drops the now-unused `ldr_scripts:/scripts`
mount on the ollama service.

Behavior change for ollama users: the model is no longer pre-pulled on
boot. They pull explicitly (`docker exec ollama_service ollama pull X`)
or LDR pulls on first use. The first-research wait is the same total
time, just deferred to when the user actually triggers it instead of
blocking compose-up.

In #3886, removes the .github/compose-ci-override.yml workaround now
that the compose itself doesn't pull a model. The integration test
runs against the compose users actually run, with no test-only overrides.

The scripts/ollama_entrypoint.sh file is left in place — it's no longer
referenced from compose but may be useful for users who want a pre-pull
in their own deployments. Cleaning that up can be a separate follow-up
once we're sure no one depends on it.

* ci: drop redundant pre-pull step in compose integration test

`docker compose up -d` already pulls any image it doesn't have locally
(default pull_policy: missing). The separate `docker compose pull ollama
searxng` step was just for log clarity; it does the same work twice.

The LDR image is locally built and tagged in the previous step, so
`up -d` sees it's present and uses it as-is — no risk of compose
yanking our local image.

* ci: require LDR healthy (not just running) in compose integration test

Previous condition checked \`ldr_h = "running"\` but LDR has a Dockerfile-level
HEALTHCHECK at Dockerfile:306 (probing /api/v1/health), so docker inspect
returns the health status, not the state — i.e. "healthy", never "running".
The wait loop never matched and timed out at 6 min despite the stack being
healthy the whole time. CI run for evidence: log line
"[23:33:04] ollama=healthy searxng=healthy ldr=healthy" repeats for ~5 min.

Fix: require "healthy" for all three. ollama and searxng have compose-level
healthchecks; LDR has a Dockerfile-level one. The status() helper already
returns Health.Status when a healthcheck exists, so requiring "healthy" is
the right signal for all three.

Also retires the "LDR has no healthcheck" follow-up note from the PR body —
that was based on me checking the compose only, not the Dockerfile.

* ci(compose-integration-test): drop pull_request and schedule triggers

Per the original design (and the conversation thread on #3886), this test
should only run via release-gate.yml. release-gate fires daily on its own
cron + on every release + on manual dispatch, which is exactly the
coverage we want.

Removing the pull_request trigger means PRs that touch docker-compose.yml
no longer pay 3-6 min per run for a test whose feedback isn't actionable
at PR time anyway. Removing the standalone daily schedule avoids
duplicating release-gate's own daily run.

The successful run on commit b7ea510f confirmed the stack-up + healthy
+ HTTP-probe path works end-to-end before this trigger constraint.

* ci: move compose integration test from release-gate to release

The previous wiring put compose-integration-test.yml inside release-gate.yml,
which fires daily on its own cron at 02:00 UTC. That meant the integration
test ran daily — not the design intent. The failure modes this test catches
(compose / image changes that break the bundled stack) are tied to actual
release events, not time, so the daily cycle is wasted CI minutes.

Move it to release.yml as a peer gate alongside ci-gate, e2e-test-gate,
compat-test-gate. Same pattern: needs version-check, gated on
should_release == 'true', wired into the build job's needs/if as a
required gate. Now runs only on release events (push to main with new
version, tag push, manual dispatch).

Removed the equivalent block from release-gate.yml. Updated the workflow
file's header comment to reflect the new placement.

* ci(compose-integration): share ldr-prod cache scope with docker-tests

When the release pipeline runs, docker-tests.yml (called from ci-gate)
builds the LDR production image and writes layers to the GHA cache under
scope=ldr-prod. The compose integration gate runs in parallel and was
building from scratch on a separate scope=compose-integration cache —
fine in isolation, but it meant ~3-5 min of redundant build work when
ldr-prod's layers were already warm.

Read from both scopes (compose-integration first, then ldr-prod) and
keep writing only to our own scope so we don't disturb the cross-workflow
cache. Falls back to a fresh build if neither has layers (brand-new
branch, scope rotation, etc.) — no behavior change in that case.

Also explicit `target: ldr` to guard against future Dockerfile changes
where the default stage could become something other than the production
target. docker-tests.yml's ldr-prod build uses the same target, so the
cache layers line up.

No coupling between gates — if docker-tests.yml fails or its cache is
cold, this gate still works (just slower, like before).

* ci(compose-integration): address AI review — pipefail + service-based resolution

Three findings from the AI review on #3886:

1. Replace hardcoded container names with `docker compose ps -q <service>`.
   Previously inspected containers via the literal `ollama_service` /
   `searxng` strings (compose's `container_name:` values), while
   `docker compose logs` already used service names. If those drift, the
   wait loop would silently time out and log collection would miss the
   service. New `cid_for()` helper resolves container IDs by service name
   everywhere — single source of truth, name-drift safe.

2. Add `set -euo pipefail` to the wait step (no functional change since
   it has no pipes, but consistent with hygiene).

3. Refactor the HTTP probe so it doesn't pipe curl into grep. Capturing
   the code via `-w "%{http_code}"` + case statement removes the pipe
   entirely, avoiding the curl-failure-masked-by-grep problem the review
   flagged. Sentinel "000" on curl error gets logged on retry for better
   debug signal. pipefail is now safe to enable here.

Fourth finding (`ldr_scripts` volume orphaned): not actually orphaned —
the LDR service still mounts it (docker-compose.yml:124), the top-level
declaration at :240 stays. Acknowledged in PR thread.

No behavior change in the green path; failure-mode error messages are
slightly clearer ("Container for service <svc>" instead of bare name).
2026-05-09 03:08:46 +02:00
LearningCircuit
d8034e27a4 feat(ci): declarative label set for PR triage (1/5) (#3857)
* feat(ci): introduce declarative labels for PR triage

Add .github/labels.yml + labels-sync.yml workflow (EndBug/label-sync@v2)
managing 7 new labels for PR triage: 4 persistent (external-contributor,
first-time-contributor, bot, needs-rework) and 3 lifecycle
(needs-codeowner-review, awaiting-author, awaiting-codeowner) that will
be toggled per-PR by a follow-up workflow.

Sync is additive (delete-other-labels: false) so the existing 75
labels are not touched. Workflow runs only on push to main when
labels.yml changes, plus workflow_dispatch for manual sync.

First PR of a 5-PR series introducing PR triage automation.

* fix(ci): pin actions and harden labels-sync workflow

Adds the missing contents:read permission (without it, actions/checkout
fails because explicit permissions: zeroes out unspecified scopes).
Brings the workflow into line with repo conventions used by every other
label/issues-write workflow:

- SHA-pin actions/checkout (v6.0.2), step-security/harden-runner (v2.19.1),
  and EndBug/label-sync (v2.3.3); enforced by validate-image-pinning.yml.
- Add harden-runner first step with egress-policy: audit (matches 54/57
  workflows including label-fixed-in-dev.yml).
- Move permissions to job scope; top-level permissions: {} for OSSF Scorecard.
- Add timeout-minutes: 5 (matches label-fixed-in-dev.yml).
- Use sparse-checkout for labels.yml only with persist-credentials: false.
- Document the deliberate omission of concurrency: (regression #3554/#3599).
2026-05-08 21:50:01 +02:00
LearningCircuit
fd94bf8945 chore(release): remove dead alias and ghost label refs from release.yml (#3871)
Three label entries in the changelog generator config no longer correspond
to active labels:

- `docs` (line 29): alias for `documentation`, which is already in the
  same category. Being deleted as part of label cleanup.
- `CI/CD` (line 33): alias for `ci-cd`, same category. Same cleanup.
- `ai-review-requested` (line 79): ghost reference. Added in c0683b5ce
  (PR #1131) for a planned auto-trigger AI review feature that was
  never implemented. The real AI-review trigger uses `ai_code_review`,
  which remains in the same 🔄 Branch Syncs & Automation category.
2026-05-08 20:39:23 +02:00
Aqil Aziz
3bf78baf07 docs: fix API example links (#3852) 2026-05-08 01:30:25 +02:00
dependabot[bot]
56290b15c0 chore(deps): bump step-security/harden-runner from 2.19.0 to 2.19.1 (#3811)
Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.19.0 to 2.19.1.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](8d3c67de8e...a5ad31d6a1)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.19.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-06 08:01:44 +02:00
dependabot[bot]
dee75bd2a5 chore(deps): bump github/codeql-action from 4.35.2 to 4.35.3 (#3815)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.2 to 4.35.3.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](95e58e9a2c...e46ed2cbd0)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-06 08:00:42 +02:00
LearningCircuit
c245a27090 feat(ci): add prerelease Docker image workflow for pre-release testing (#3761)
* feat(ci): add prerelease Docker image workflow for pre-release testing

Build a prerelease Docker image (prerelease-v{version}-{short_sha}) after
all quality gates pass, in parallel with the release approval step. The
image is pushed to Docker Hub for local testing before the official release
is published. Old prerelease tags are auto-deleted (best-effort) when the
production release completes.

- New prerelease-docker.yml: standalone workflow triggered by repository_dispatch
- release.yml: add short_sha output and trigger-prerelease-docker job
- docker-publish.yml: add best-effort cleanup of prerelease-* tags

* fix(ci): address review feedback on prerelease docker workflow

- Group >> "$GITHUB_STEP_SUMMARY" redirects (clears actionlint SC2129
  pre-commit failure).
- Bump pinned actions to match the rest of the release pipeline:
  step-security/harden-runner v2.17.0 -> v2.19.0 (4 sites),
  aquasecurity/trivy-action v0.35.0 -> v0.36.0 (2 sites).
- Scope prerelease tag cleanup in docker-publish.yml to
  prerelease-v${RELEASE_VERSION}-* so concurrent prereleases for other
  versions and any unrelated prerelease-* tags survive.
- Correct Trivy SARIF comment (artifact-only, not GitHub Security tab).

* fix(ci): bump harden-runner pin in trigger-prerelease-docker to v2.19.0

The new job inherited an older v2.17.0 pin while the rest of release.yml
(and docker-publish.yml) is uniformly on v2.19.0. Align them.
2026-05-02 11:52:25 +02:00
LearningCircuit
b632ca8ec4 feat(release): migrate to towncrier news fragments (#3773)
* feat(hooks): version-check staged release-notes against current release

When a release-notes file is staged, compare its version against the
current latest GitHub release (preferred) or __version__.py (fallback,
when gh CLI is unavailable). Warn if the staged version isn't ahead.

Semantics chosen so that __version__.py being ahead of releases (the
normal pre-release state) does NOT trigger a false alarm:

- vs latest release: file MUST be > release. Equality means the
  version was already published — almost always a stale/duplicated
  notes file. Warn.
- vs __version__.py (fallback): file MUST be >= version.py. Equality
  is correct — that's the upcoming release. Only file < version.py
  is suspicious. Warn.

The warning includes the source ("latest GitHub release" or
"__version__.py (gh unavailable)") and suggests the next patch /
minor / major versions.

Robustness:
- gh call has a 5s timeout and falls back gracefully on missing
  binary, network failure, or no releases yet.
- Files with non-versioned names (e.g., a hypothetical README.md
  inside docs/release_notes/) are skipped silently.
- Hook still always exits 0 — non-blocking nudge, never fails the
  commit.

* feat(release): migrate to towncrier news fragments

LDR's PR throughput (~12 PRs/day, releases every 1–2 days, ~25–50 PRs
per release) made the shared docs/release_notes/<version>.md model
unworkable — every contributor was racing to edit the same file, and
the file's name kept moving as the version did.

Replace it with the standard towncrier flow used by Twisted, urllib3,
and pip:

- Each PR drops one fragment under news/<id>.<category>.md, where
  <id> is the PR/issue number and category is one of: breaking,
  security, feature, bugfix, removal, misc. Orphan fragments
  (no PR/issue) use +<slug>.<category>.md.
- At release prep time the maintainer runs:
    pdm run towncrier build --version <X.Y.Z> --yes
  which renders fragments into docs/CHANGELOG.md and deletes them.
- The release workflow extracts the just-rendered section from
  docs/CHANGELOG.md (via awk) and uses it as the human-narrative
  input to the published release body, alongside the AI TL;DR and
  the auto-generated PR list.

Existing docs/release_notes/{0.2.0,0.4.0,1.6.0,1.6.8,1.7.0}.md stay
untouched as historical record.

The pre-commit hook is rewritten to nudge for news/ fragments
instead of the old shared file. The version-check and staging-marker
scanner from PR #3768/#3773 are dropped — fragments don't carry
versions in their names, and towncrier's structural model removes
the staging-marker class of bug entirely. Filename validation
(category in allowlist, name matches expected pattern) is added so
typo'd categories don't silently vanish from the rendered output.

Includes news/3773.feature.md as the first fragment using the new
convention.

* fix(release): allowlist news/ fragments in .gitignore

The repo's whitelist-style .gitignore (`*` then `!<allow>`) was
silently ignoring news/<id>.<category>.md fragments, so the towncrier
migration's first fragment didn't make it into the previous commit.
Add `!news/**/*.md` next to the existing docs/ / examples/ allowlist
entries and re-add news/3773.feature.md.

* refactor(release): use per-version files instead of CHANGELOG.md

Replace the towncrier-on-CHANGELOG.md flow with per-version output
files at docs/release_notes/<version>.md, matching the existing
historical convention and dropping the awk extraction step from the
release workflow.

Towncrier doesn't support per-version filenames in [tool.towncrier]
config, so the maintainer now runs scripts/release/render-notes.sh
<version> at release prep time. The wrapper:

  1. Calls `towncrier build --draft --version <X.Y.Z>` to render
     fragments to stdout (no file mutation).
  2. Captures the output into docs/release_notes/<X.Y.Z>.md.
  3. `git rm`s the consumed fragments (deletion staged for commit).
  4. Stages the new release-notes file.

Workflow changes:
  - Sparse-checkout reverts from docs/CHANGELOG.md to docs/release_notes
  - Body composition replaces awk section extraction with `cat
    docs/release_notes/${RELEASE_VERSION}.md` — simpler, matches the
    layout of historical pre-towncrier release notes (1.6.0.md etc.).

pyproject.toml changes:
  - filename now points to docs/release_notes/_pending.md as a guarded
    placeholder. Only sees writes if a maintainer bypasses the wrapper
    script — clearly named so the mistake is recoverable.
  - title_format=false suppresses the inline `## <version> (<date>)`
    header. The release page already shows the version as title, and
    per-version files don't need an inline version header either.

* fix(hooks): align staged-notice text with per-version-file flow

The previous commit refactored to per-version files, but the
pre-commit hook still pointed contributors at the old
docs/CHANGELOG.md target. Update the staged-notice text to reference
docs/release_notes/<version>.md and the wrapper script that produces it.

* fix(release): correct stale CHANGELOG.md comment and avoid orphan target file

Two follow-ups from review of the towncrier migration:

- pyproject.toml: the [tool.towncrier] block comment still described
  the old `pdm run towncrier build --version <X.Y.Z> --yes` →
  docs/CHANGELOG.md flow that was abandoned in 1120cb8 in favor of
  per-version files via scripts/release/render-notes.sh. A maintainer
  reading only pyproject.toml would have followed wrong instructions.

- render-notes.sh: `pdm run towncrier build --draft > "$TARGET"`
  truncates $TARGET *before* towncrier runs. If towncrier exits
  non-zero (set -e kills the script), the zero-byte file persists and
  the next attempt hits the line-35 overwrite guard. Render to a temp
  file first and `mv` on success — bash trap cleans up on failure.

* docs(release): document news-fragment flow for contributors and maintainers

The towncrier migration shipped without updating the surfaces that
contributors and maintainers actually read:

- .pre-commit-config.yaml: hook description still pointed at
  docs/release_notes/, not news/.
- CONTRIBUTING.md: PR process never mentioned news fragments at all,
  so contributors only learned about them via the pre-commit nudge
  or by stumbling on news/README.md.
- docs/RELEASE_GUIDE.md: the maintainer release flow listed only
  "bump version → merge", with no step for running
  scripts/release/render-notes.sh. Following the old checklist
  literally would let news/ fragments accumulate forever and never
  appear in releases (the workflow tolerates a missing per-version
  file but logs a warning and the hand-written narrative is lost).

Add a contributor-facing item to the PR checklist, a maintainer-facing
"Render news fragments" step before the version bump, and a dedicated
"Release-notes flow" section that explains the contributor + maintainer
sides, the script's guarantees, and how to preview locally.

Also clarify in the "How Releases Work" overview that the release body
is composed from three sources (AI TL;DR + per-version notes + auto PR
list), not just an auto-generated changelog.

* docs(release): clarify the version-bump trigger for automated releases

The previous wording ("Releases are fully automated when PRs are
merged to the main branch" / "Trigger: Any merge to main branch") was
misleading — it implied every merge cuts a release. The actual
trigger is more specific: the `version-check` job reads
`__version__.py` and only proceeds when the resulting tag does not
yet exist as a GitHub release. So in practice "merge a PR that bumps
__version__.py" is what triggers an end-to-end release; non-bump PRs
merge normally and short-circuit the pipeline.

Also flesh out the "No duplicates" line to name the actual mechanism
(`should_release=false` skipping every downstream gate) so a
maintainer reading the doc can map it to the workflow code.

* refactor(release): use towncrier native per-version output, rename news/ to changelog.d/

Three concerns rolled into one commit because they form a single
coherent simplification:

1) Drop the wrapper script (scripts/release/render-notes.sh).

   Towncrier 24.x supports per-version-file output natively via
   `single_file = false` + a `{version}`-templated `filename`. The
   wrapper's `--draft > target` trick was only needed when `filename`
   had to be a fixed path. With the templated filename, the maintainer
   runs `pdm run towncrier build --version <X.Y.Z> --yes` directly:
   towncrier writes docs/release_notes/<X.Y.Z>.md, `git rm`s the
   consumed fragments, and `git add`s the new file — same end state as
   the wrapper, fewer moving parts, no bash, no extglob hazard, no
   temp-file dance, no mktemp portability concerns.

   The wrapper's two guards (refuse-overwrite, refuse-empty) guarded
   against unlikely operational mistakes that are easy to spot in
   `git status`; the simplification is worth the small loss.

2) Rename news/ → changelog.d/.

   The product has its own `news` feature (news.html, news-subscriptions,
   /news/api routes), so a top-level `news/` for release-engineering
   plumbing is genuinely confusing — code search and contributor
   onboarding mix the two concepts. `changelog.d/` is the de-facto
   Python community standard (attrs, hypothesis, Sentry, pyca/cryptography,
   structlog all use it). The .d/ suffix signals "directory of fragments
   that get assembled" — a long-standing Unix convention. Renaming now
   is cheap (one fragment); renaming later compounds.

3) Pre-commit hook tightening (from external code review):
   - Dedupe categories: hook now reads [[tool.towncrier.type]] from
     pyproject.toml at runtime via tomllib (3.11+ stdlib), so adding
     a category in pyproject.toml is automatically picked up. Falls
     back to the canonical six on parse failure (non-blocking hook).
   - Gate ANSI color escapes on sys.stdout.isatty() so CI logs and
     non-VT Windows terminals don't render `\033[36m` as visible
     garbage.

Workflow comments, .gitignore allowlist, CONTRIBUTING.md, RELEASE_GUIDE.md,
and changelog.d/README.md all updated in lock-step.
2026-05-02 11:20:06 +02:00
LearningCircuit
982d36fb96 fix(release): bump AI summary timeout + diagnose empty content (#3783)
* fix(release): bump AI summary timeout + diagnose empty content

The v1.6.8 release run hit two related failures in the AI TL;DR step:

  curl: (28) Operation timed out after 120001 milliseconds
  WARNING: AI response 2xx but no .choices[0].message.content
           — skipping summary

The 120s --max-time was too tight for kimi-k2-thinking (and likely
other thinking models) on a multi-PR release prompt. The retry
succeeded HTTP-wise but returned a response without any extractable
content, so the release shipped without a TL;DR.

Three changes:

1. Default --max-time from 120s to 300s, configurable via
   vars.AI_RELEASE_SUMMARY_MAX_TIME. Releases never fail because of
   the AI step (the whole block is best-effort), but giving thinking
   models five minutes is realistic.

2. Fall back to `.choices[0].message.reasoning_content` when
   `.content` is empty. Some providers route thinking-model output
   into the reasoning field. Cheap try-and-fall-back; no harm if the
   field is missing.

3. When BOTH fields are empty, dump the response shape (top-level
   keys, message keys, finish_reason, error field) to step logs.
   Bounded to ~4 lines, but enough to debug the next failure
   without rerunning.

Behavior unchanged when the call succeeds normally.

* fix(release): bump AI summary timeout default to 15 min

900s (15 min) covers thinking models on multi-PR release prompts
with comfortable headroom. Still configurable via
vars.AI_RELEASE_SUMMARY_MAX_TIME.
2026-05-02 01:38:24 +02:00
LearningCircuit
0f0707abea feat(release): prepend docs/release_notes/<version>.md to release body (#3768)
* feat(release): prepend docs/release_notes/<version>.md to GitHub release body

When a release is cut, look for docs/release_notes/${RELEASE_VERSION}.md
and prepend its prose to the GitHub release body. The auto-generated,
label-categorized PR list (from .github/release.yml + GitHub's
generate-notes API) is appended below a horizontal-rule + "## What's
Changed" heading. If the md file is missing, the workflow falls back
silently to auto-notes only — no failure.

The pre-commit hook recommend-release-notes.py already nudges
contributors to stage entries under docs/release_notes/ for substantial
changes, so this wires the end-to-end flow: contributor writes prose →
release publishes prose-first body.

* fix(release): address review findings on notes-prepend logic

Three fixes from the review pass:

1. Drop the manual "## What's Changed" heading. GitHub's
   generate-notes API already emits it as the first line of the
   auto-body (verified against v1.6.0). Manually inserting another
   produced a duplicate heading in the published release.

2. Validate RELEASE_VERSION against a strict semver regex before
   using it in a filesystem path. Defense-in-depth — RELEASE_VERSION
   already comes from a Git refname or __version__.py, but neither
   path validates strictly enough to rule out path traversal.

3. Wrap the gh api generate-notes call so its failure aborts the
   step. `set -e` does NOT exit on a failing command substitution
   inside an assignment — without this the workflow would silently
   publish an empty/partial release body on a transient API error.

* feat(hooks): show release-notes staged notice with format tips

Two changes to recommend-release-notes.py:

1. Always inform the committer when a file under docs/release_notes/
   is staged (was: silent). Notes contributors that the file ends up
   in the GitHub release body via .github/workflows/release.yml — not
   just archived as docs.

2. Embed format tips in both the staged notice and the missing-notes
   reminder: no leading `# H1` (release title renders separately), use
   `## sections`, mark BREAKING explicitly with an `### Impact`
   subsection, link PRs, and strip staging markers before tagging.

* feat(release): prepend AI-generated TL;DR to release body

Adds a best-effort AI summary at the very top of the release body,
above the hand-written notes and the auto-generated PR list. Uses
OpenRouter with vars.AI_MODEL (same convention as ai-code-reviewer.yml,
default moonshotai/kimi-k2-thinking).

Behavior:
- Builds a prompt from the hand-written notes (if any) plus the
  auto-generated PR list, asking the model for a 30-second TL;DR
  starting with `## TL;DR`.
- Up to 2 attempts (1 retry) with a 5s backoff to absorb transient
  API hiccups.
- If OPENROUTER_API_KEY is unset, the call fails, or the response
  is unparseable, the step skips the AI section silently. Releases
  must never fail because of an LLM hiccup.

Body order on the published release:
  1. AI TL;DR (if generated)
  2. Hand-written docs/release_notes/<version>.md (if present)
  3. Horizontal rule (only if 1 or 2 was emitted)
  4. Auto-generated `## What's Changed` PR list

Configuration knobs (all optional, all have sensible defaults):
- secrets.OPENROUTER_API_KEY — enables the AI step
- vars.AI_MODEL — model id (default kimi-k2-thinking)
- vars.AI_RELEASE_SUMMARY_TEMPERATURE (default 0.3)
- vars.AI_RELEASE_SUMMARY_MAX_TOKENS (default 4000)

* feat(hooks): warn when staged release-notes contain staging markers

When a file under docs/release_notes/ is staged, scan its content
(via git show :path) for in-progress staging language and print a
yellow warning listing each hit by line number. Non-blocking — the
hook still always exits 0.

Markers checked:
- "(pending)"
- "Staging notes"
- "Fold into the next tagged version"

These would otherwise publish verbatim into the GitHub release body,
since release.yml prepends the file as-is. The warning lets a
contributor catch leftover staging text before they push the
release commit.

* fix(release): clarify semver-regex comment + remove prompt heading contradiction

Two follow-ups from the latest AI review:

1. The semver-regex comment was misleading. Updated to spell out that
   RELEASE_VERSION is the bare semver (no `v` prefix) because the build
   job's "Determine version" step strips `refs/tags/v` and reads
   __version__.py (bare "1.6.7"), so a leading `v` reaching this check
   means an upstream contract change and should hard-fail. Behavior of
   the regex itself is unchanged — it correctly rejects `v1.7.0`.

2. The SUMMARY_PROMPT had a self-contradiction: "Open with the literal
   heading `## TL;DR`" (H2) vs "no headings above level 3" (which would
   forbid H2). Reworded to be explicit: use `## TL;DR`, `###` for
   subsections, no `#` and nothing deeper than `###`.
2026-05-01 21:55:05 +02:00
LearningCircuit
89aa0228b8 chore(labels): update GitHub labels for release automation clarity (#3764)
- Remove emoji variant `maintenance 🔧` from workflows and release.yml
  in favor of plain `maintenance` label
- Replace deleted `automated`/`version-bump` labels in version_check.yml
  with `automation`/`maintenance`
- Add 7 previously-uncategorized labels to release.yml categories:
  css, ui-ux, accessibility, snappy → Frontend Changes
  dev-bugfix → Bug Fixes
  dev-enhancement → New Features
  developer-experience → Code Quality & Refactoring
- Update workflows README documentation

Accompanied by GitHub label operations (via gh CLI, not in this commit):
- Created 4 missing labels: bugfix, docs, CI/CD, ai-review-requested
- Deleted 17 junk/duplicate labels
- Updated descriptions for all 72 labels with release note priority info
2026-05-01 19:19:11 +02:00
LearningCircuit
1abd8b9138 fix(ci): use pdm lock instead of pdm update in dependency workflow (#3755)
The workflow used `pdm update -u --no-sync --no-self` which caused
`pdm lock --check` to fail in pre-commit CI. The `-u` (--unconstrained)
flag modifies pyproject.toml with relaxed version constraints, but the
workflow only commits pdm.lock — discarding those pyproject.toml changes.
When `pdm lock --check` re-resolves against the original pyproject.toml,
it produces a different lockfile, causing the check to fail.

Using `pdm lock` fixes this because it uses the same resolution code path
as `pdm lock --check`, respects pyproject.toml constraints, and never
modifies pyproject.toml.
2026-05-01 12:40:04 +02:00
LearningCircuit
ff1cda3e7c fix(ci): give gh CLI repo context in monitor-publish (#3742)
monitor-publish has no checkout step, so the gh CLI cannot infer the
repository — and it does not fall back to GITHUB_REPOSITORY. Every
`gh run list` call therefore failed with "failed to determine base repo",
the error was swallowed by `2>/dev/null`, and the polling loop saw an
empty result for all 40 minutes. Result: every release produced a false
"Partial publish failure" issue showing both Docker and PyPI as
`timed_out`, even though both publishes succeeded.

Set GH_REPO from github.repository, and stop hiding gh's stderr so
future failures are visible in the runner log instead of silent.
2026-04-30 01:18:59 +02:00
LearningCircuit
5a3705c7d2 ci: temporarily disable nuclei DAST scan from release gate (#3720)
Nuclei scan is causing significant slowdowns. Commenting it out
of the release gate so the current release can proceed. The scan
workflow file (.github/workflows/nuclei.yml) is left intact for
easy re-enablement.
2026-04-28 20:38:22 +02:00
LearningCircuit
37335c907d ci(playwright-webkit): drop checks: write to satisfy Scorecard (#3704)
Removes the `checks: write` job-level permission from both the
desktop-safari and mobile-safari jobs in playwright-webkit-tests.yml.
The permission was only needed by the EnricoMi/publish-unit-test-result-action
"Publish Test Results" step in each job, which is also removed.

Test results remain available via the "Upload Playwright Report"
artifact step (already uploads test-results/results.xml). Failing
tests still fail the job in the "Run ... Safari Tests" step, so the
release-pipeline gate is unchanged.

Closes Scorecard alerts #7715, #7716.
2026-04-28 01:21:58 +02:00