mirror of
https://github.com/LearningCircuit/local-deep-research.git
synced 2026-06-15 19:46:56 +03:00
ba0912056c5b78bf2c8a1fb15eef202d72fec401
1252 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
da0d18ed25 |
fix(release): set towncrier name to skip package import (#4071)
The release job uses a sparse checkout that omits src/ and runs a standalone `pip install towncrier`. Towncrier 24.8 still calls `get_project_name()` even when --version is passed on the CLI, and the existing [tool.towncrier] config pointed at the `local_deep_research` package, so the build crashed with ModuleNotFoundError before rendering any fragments. Set `name = "local-deep-research"` so towncrier short-circuits the import path (build.py:195-197). Drop the now-misleading `package`/`package_dir` fields — `--version` is always passed, `directory = "changelog.d"` is explicit, and nothing else inside towncrier still needs them. Fix the workflow comment that misattributed the bypass to --version. Verified by rendering changelog.d/*.md fragments against this pyproject.toml in a fresh directory with no src/ present. |
||
|
|
5d60f3d00e |
chore(labels): add 'code-ready' as a human-only signal label (#4068)
Introduces a new repository label, ``code-ready``, that communicates a human reviewer's judgement that a PR's code changes look technically ready — i.e. the implementation, tests, docs and review nits are all addressed — while CI and an approving codeowner review may still be outstanding. The label is meant to bridge the gap between "needs review" and "auto-merge": a maintainer can apply it after walking the diff to signal that the code side is good, even though merge is still blocked on CI runs finishing or an approver clicking the button. Critically, this label must be **applied manually only**, never by automation. The motivation is judgement, not heuristics — a workflow that flips it based on "all CI green" or "no unresolved comments" would dilute the signal and undermine the human-in-the-loop intent. The labels.yml entry is grouped under a new "Human-only signal labels" section with an explicit comment saying so, and the label description itself includes "Apply manually — never auto-applied" so the rule is visible everywhere the label surface. Verified before adding: * No existing workflow (``pr-triage.yml``, ``label-fixed-in-dev.yml``, ``advanced-search-reminder.yml``, ``sync-main-to-dev.yml``, ``danger-zone-alert.yml``, ``compose-published-smoke.yml``) applies ``code-ready``. Each workflow's ``addLabels(...)`` calls use a closed set of specific label names — no heuristic ever resolves to ``code-ready``. * No naming collision with existing labels (``code-ready`` is new; ``auto-merge``, ``needs-codeowner-review``, ``awaiting-codeowner`` are distinct concepts). * Label created live on GitHub via ``gh label create`` before this commit; this PR brings ``labels.yml`` into source-of-truth sync. Color: ``006b75`` (teal) — distinct from the existing yellow/green review-state palette so it reads as a separate axis from the codeowner-review lifecycle. |
||
|
|
8597e429cc |
Improve UI tests + CI: artifact uploads, WebKit skip narrowing, settle-wait migrations (#4061)
* ci(responsive): restore artifact uploads and fix dead post-results gate The Responsive UI workflow lost its per-viewport artifact uploads (the explanatory comment around lines 206-209), so PR/release failures were un-debuggable - no screenshots, no test output. The downstream `post-results` job was also gated on `github.event_name == 'pull_request'`, which can never be true because the workflow has no `pull_request` trigger; the combined-report aggregator therefore never ran. Restore the upload step using `if: always()` + `if-no-files-found: ignore` (so server-startup failures still upload logs and quiet runs don't fail the step) and rewrite the `post-results` gate to `if: always()`. Artifact name matches the existing `ui-test-results-*` pattern expected by the combined-report glob. * test(playwright): narrow WebKit closed-context skip to webkit only (#4060) The catch at all-pages-mobile.spec.js:372 was previously calling `test.skip(true, ...)`, which skipped the test for every browser - so any non-WebKit error path also silently bailed out of the mobile-nav overlap assertion. Only Mobile Safari / WebKit is known to hit the `Target page, context or browser has been closed` race, so gate the skip on `browserName === 'webkit'`. Other browsers now re-throw and surface the regression. Also broaden the matched error message to include `Execution context was destroyed`, the alternate wording the same upstream race uses in newer Playwright versions. Skip annotation references issue #4060 so the skip is grep-able and can be removed when the underlying race is fixed or the DOM walk is restructured. * test(ui): add waitForStable helper to auth_helper.js Replaces ad-hoc `await delay(N)` sleeps used to "let the UI settle" after an action. The helper waits for a selector to be visible, then waits for its bounding box to stop changing across requestAnimationFrame ticks (bounded to 3s in-page). The final `idleMs` pause is configurable. JSDoc explicitly notes when NOT to use it: don't replace `delay()` calls that exercise wall-clock behavior (e.g. a 10s timer the app is supposed to respect). Those tests need real elapsed time, not a settle wait. Exported as a sibling of `safeClick` to keep Puppeteer test imports tidy. * test(ui): replace settle-delays with state-based waits in two puppeteer tests `test_research_cancellation.js` had 7 hardcoded `await delay(...)` calls and `test_form_validation_aria_ci.js` had 19. The vast majority were "give the UI a moment to settle" pauses with no real signal attached, so they slowed CI and quietly hid races whenever the runner was a beat slower than the chosen delay. For each call: - post-`navigateTo` 500ms sleeps -> `waitForSelector('#query', { visible: true })` - post-validation-trigger sleeps -> `waitForFunction` polling the `ldr-field-invalid` class to appear (or clear, when the test expects validation to pass) - post-focus 100ms -> `waitForFunction(() => document.activeElement?.id === 'query')` - post-cancel-click sleeps -> `waitForFunction` polling for `cancel|stop|suspend` to appear in the status text - post-typing 200ms -> `waitForFunction` polling for the typed value to land The one delay we kept: the explicit 10-second wait in the mid-stage cancellation test (`test_research_cancellation.js`), which deliberately exercises elapsed-time behavior of the research progress flow. That is not a settle wait and must stay wall-clock. Polling waits all use `.catch(() => {})` to preserve existing behavior when a selector or state never appears (the assertions further down handle the failure case more informatively than a hung wait would). * docs(pr-template): document label-gated CI workflows Several heavy E2E workflows are label-gated and silently no-op on PRs without the right label - new contributors had no way to know. Add a "CI test coverage" section to the PR template enumerating each gated workflow and the label that triggers it. No CI behavior change; documentation only. * test(form-validation): make waitForQueryReady detect validator attachment Local smoke-test (9 tests, ran against `scripts/dev/restart_server.sh`) exposed two latent races that the prior `await delay(500)` had been quietly hiding: 1. `waitForQueryReady` returned as soon as `#query` was visible, but the FormValidator class is registered against the field a tick later (research.js setupEventListeners). Waiting for the `.ldr-field-error` sibling that addValidation() inserts is the actual signal that the validator is wired and the submit handler will take the early-return path on an empty query. 2. `noLoadingUiOnEmptySubmit` ran after `errorClearsOnValidSubmit`, which typed a real query and triggered a real submit (the fetch fails but creates `.ldr-loading-overlay` first). `navigateTo` skipped the re-navigation because we were already on `/`, so the stale overlay carried over. Force a real `page.goto` for this test so it asserts about a fresh page, not the leftover state of the previous test. After the fix the suite passes 9/9 in ~1s (vs ~4.5s with the old delays). * chore(labels): rewrite test-trigger label descriptions for AI reviewer auto-apply The Friendly AI code reviewer (.github/workflows/ai-code-reviewer.yml) auto-applies labels based on the labels' descriptions in the repo. The existing test:puppeteer / test:e2e / ldr_research / ldr_research_static descriptions were passive ("Triggers Puppeteer E2E tests on this PR"), which doesn't guide the reviewer on *when* to apply them. Rewrite them in the same imperative, bias-toward-action style used by benchmark-needed ("Apply if a change risks degrading performance — when in doubt, add it. Run compare_configurations()"): - test:puppeteer + test:e2e — apply for any PR touching the web stack - ldr_research / ldr_research_static — apply for substantive code/arch changes, with the static variant biased even more toward "run it" since it uses the cheaper model Also add the test:* labels to labels.yml so they become version-controlled (previously they existed only on GitHub, created out-of-band). label-sync is additive and will overwrite the GitHub descriptions on next main push. |
||
|
|
1ab65609db |
ci(release): drop credential persistence on cleanup-changelog checkout (#4050)
The `Checkout the release commit` step in the `cleanup-changelog` job defaulted to `persist-credentials: true`, leaving the job's GITHUB_TOKEN in `.git/config` for the duration of the run. If any later step in this job reads `.git/config` (artifact upload, third-party action that prints/dumps the repo state, etc.), the token leaks. Closes the only open `zizmor/artipacked` finding (code-scanning alert #4655). No functional impact: the only step that needs to push is `peter-evans/create-pull-request`, which already takes an explicit `token:` input and does not rely on the persisted git credential helper. Also dismissed code-scanning alert #7763 (CVE-2026-3298) via the GitHub API — that CVE is Windows-only per PSF advisory; this image is Linux, which Grype's package-version matcher does not account for. Alert #7764 (CVE-2026-7210) is left open as a tracking signal until Python 3.14.6 ships upstream (current latest is 3.14.5; no patched image exists yet). |
||
|
|
a2f7f6ead6 |
fix(ci): drop environment: ci from reusable workflow (#4049)
The `environment: ci` declaration on the research job has no functional
value for LDR — the `ci` Environment has zero protection rules and zero
environment-scoped secrets (verified via gh api). All required secrets
(OPENROUTER_API_KEY, SERPER_API_KEY) are repo-level.
The decorative env attachment becomes a problem for any external repo
that calls this reusable workflow: GitHub silently auto-creates an empty
`ci` Environment in the caller's repo, polluting their environments
namespace.
Dynamic environment via expression (e.g. `environment: ${{ inputs.env || '' }}`)
isn't a viable alternative — `actions/runner` Issue #2610 documents that
expression-in-environment doesn't reliably evaluate input context, and an
empty-string value still auto-creates an empty-named environment.
Simplest correct fix is to delete the line. LDR's own callers
(issue-research.yml, e2e-research-test.yml) keep working unchanged
because they never depended on env-attached functionality. External
callers no longer get the env-pollution side effect.
This unblocks a follow-up `ldr-automations` toolkit repo that will
expose meta-reusable workflows wrapping this one for other projects.
|
||
|
|
a6287a4362 |
fix(security): pin towncrier to exact version and bump Python to 3.14.5 (#4046)
* fix(security): resolve Scorecard pin alerts and bump Python to 3.14.5 - Pin `pip install towncrier` to a single version with `--hash` (both occurrences in release.yml), resolving Scorecard Pinned-Dependencies alerts #7761 and #7762. - Bump the Dockerfile base image from python:3.14.4-slim to 3.14.5-slim (with new pinned manifest digest). 3.14.5 bundles libexpat 2.8.0 (gh-149017), which is required to mitigate CVE-2026-7210 — Grype alert #7760. * chore(release): drop hash-pins on towncrier, keep exact version pin Per review feedback: hash-pinning a build-time CLI like towncrier adds maintenance burden without meaningful supply-chain benefit. The rest of this repo already uses exact-version pins (`pdm==2.26.2`, `pyyaml==6.0.3`, etc.) which Scorecard's PinnedDependenciesID rule accepts — the original alerts fired only because `~=24.8` is a fuzzy version range. |
||
|
|
074285a26d |
fix(release): enrich AI release notes + render changelog in release flow (#4035)
* fix(release): enrich AI release notes + render changelog in release flow
Fixes the v1.6.10 release notes degradation where:
1. docs/release_notes/1.6.10.md was never created (no automation rendered
changelog.d/ fragments before/at release time)
2. AI summary call returned 2xx but empty content with finish_reason=length
create-release job now:
- Sparse-checks-out changelog.d/ + pyproject.toml, installs towncrier
(no PDM needed — towncrier reads pyproject directly), renders
docs/release_notes/<version>.md before composing the release body.
Guards against an empty fragment directory.
- Fetches every merged PR's title + body in a single GraphQL round-trip
and feeds them to the model.
- Fetches the full diff between the previous /releases/latest tag and
the new tag via the compare API, filters lockfiles/generated docs/
SBOM/static assets/binary patches, caps at 700k chars, strips NUL
bytes before jq --rawfile.
- Bumps AI_MAX_TOKENS default 4000 -> 64000 (matches the AI code
reviewer's working budget). Adds AI_REASONING_MAX_TOKENS=16000 so
Kimi K2 Thinking cannot burn the entire output budget on reasoning
tokens — the root cause of v1.6.10's empty .content.
- Adds .reasoning to the response-parsing fallback chain after
.content and .reasoning_content. OpenRouter normalizes Moonshot's
thinking trace to .reasoning (not .reasoning_content), which is why
v1.6.10's diagnostic showed message keys "content, reasoning,
reasoning_details" with no usable extraction path.
- Enforces a 750k char overall prompt cap so PR descriptions + diff
can't blow Kimi's 262k token context window.
- Truncates the final release body to 124,400 chars to stay under
GitHub's documented 125k release-body limit (HTTP 422 otherwise;
gh CLI does not pre-validate).
- Rewrites the SUMMARY_PROMPT to ask for a helpful narrative (not a
TL;DR), with length sized to the material.
New cleanup-changelog job opens a PR on main with the consumed fragments
+ rendered release-notes file, since the create-release runner is
throwaway. Branch protection on main allows the PR (0 required reviews,
0 required checks).
* chore(release): persist 1.6.10 changelog render + clear consumed fragments
The v1.6.10 release shipped without docs/release_notes/1.6.10.md because
no automation rendered changelog.d/ fragments at release time (see
release.yml change in this PR for the fix going forward). Persists the
render now so 1.6.11's release does not re-consume the same fragments.
Renders the v1.6.10 release_notes file from the 30 fragments that were
in changelog.d/ at v1.6.10 cut time, and removes those fragments from
changelog.d/. The rendered content also backs the v1.6.10 GitHub
release body update.
* fix(release): address AI review findings (UTF-8, race, GraphQL cap)
- UTF-8 character-aware truncation. Replace `head -c` (byte-oriented,
splits multi-byte UTF-8 mid-sequence) with Python-based character
truncation for the diff (700k), prompt (750k), and release body
(124,400) caps. Matters because towncrier renders emoji section
headers (💥/🔒/✨/🐛) that appear in diffs of docs/release_notes/;
mid-emoji splits produce invalid UTF-8 that jq --rawfile then
refuses to encode and the GitHub Release API rejects with HTTP 422.
- cleanup-changelog race fix. Pin checkout to ${{ github.sha }}
instead of `ref: main`. If a PR with new fragments merged into main
between create-release and cleanup-changelog, `ref: main` would
consume those new fragments into THIS release's docs/release_notes
file and delete them prematurely — stealing them from the next
release. github.sha is the commit the workflow ran against, so the
set of fragments matches what create-release rendered.
- GraphQL query node-count cap. Limit PR-description batch to 100 PRs
per query and log a warning if a release exceeds that (LDR's typical
release is ~20-30 PRs, well under). Unbounded fan-out could trip
GitHub's GraphQL complexity ceiling on a huge release.
- Compare API 300-file warning. Log when .files[] hits the 300-file
boundary so a future release's missing-file diff can be diagnosed
quickly without rerunning. The cap is a documented GitHub limit.
* fix(release): address review2 — PR cap, trap leak, base pin, prompt clarity
- Raise PR-fetch cap 100 → 200. v1.6.10 had 144 unique PRs (LDR's
dependency-bump traffic is heavy); the previous 100 cap would have
silently dropped ~30% of PR descriptions from the AI prompt. The
750k-char overall prompt cap still protects context window.
- Hoist COMPARE_JSON mktemp above the trap registration so the temp
file is cleaned up even if jq throws under set -e between mktemp
and the manual rm. ${DIFF_FILE}.clean (the NUL-strip staging path)
also added to the trap; rm -f tolerates the missing-file case.
- Pin base: main on peter-evans/create-pull-request. On tag-triggered
runs github.sha may not sit on main HEAD, and the action's
default-branch resolution could pick a non-main base. We always
want the cleanup PR to target main.
- Clarify SUMMARY_PROMPT section markers. The prior text said inputs
are "separated by `----- SECTION -----` markers" using SECTION as a
placeholder; a literal-minded model could look for that exact
string and find none. Now lists the actual marker forms explicitly.
- Add PREV_TAG == RELEASE_TAG guard. On a workflow re-run after the
release exists, /releases/latest returns the just-created tag,
making the diff empty. Falls back to the second-most-recent stable
release.
* fix(release): jq --arg for re-run guard + surface jq errors + doc updates
Workflow fixes from a final pass:
- Re-run guard now passes RELEASE_TAG to jq via `--arg rel` instead of
shell-interpolating it into the program text. RELEASE_TAG is already
validated as bare semver upstream so this is defense-in-depth, but
--arg keeps shell quoting and jq quoting fully separated regardless
of what RELEASE_TAG ever ends up containing.
- Compare-API jq pipeline no longer swallows stderr or masks the exit
code. Previously `jq ... 2>/dev/null || true` would silently produce
an empty diff and a "Diff size: 0 bytes" log line on any jq failure,
giving a maintainer no actionable signal. Now an explicit if-not
check logs a WARNING with jq's stderr intact and ensures the diff
file is empty.
Doc updates for the new release flow:
- changelog.d/README.md: drop the obsolete "maintainer runs `pdm run
towncrier build`" instructions; describe the automated render +
follow-up cleanup PR. Keep the local --draft / --keep preview tips
for fragment iteration.
- docs/RELEASE_GUIDE.md: rewrite the maintainer flow (steps 1-3 of the
old "Render + bump + commit both" sequence are obsolete — the
workflow handles rendering now). Add the cleanup PR merge as a final
checklist item. Update the body composition description from "AI
TL;DR" to AI narrative with diff + PR-body inputs.
* style(release): fix comment indent typo from prior edit
|
||
|
|
96e6548553 |
fix(ci): grant research job the perms its reusable needs (#3987 follow-up) (#4016)
Every run of e2e-research-test.yml and issue-research.yml since the
refactor has terminated as startup_failure with zero jobs, because the
calling `research` job had no `permissions:` block. The reusable's
`research` job declares `permissions: contents: read`, but reusable
permissions can only be the same or lower than the caller's — and the
caller's empty `{}` inheritance meant the reusable's request exceeded
what was granted, so GitHub refused to load the workflow.
Add an explicit permissions block to the calling `research` job in
both workflows:
- contents: read (for actions/checkout in the reusable)
- actions: write (for actions/upload-artifact@v5+ which now
requires this scope to upload artifacts)
The user-visible symptom was: `gh issue` with the `ldr_research`
label did nothing — the workflow ran for ~1 second, failed at
startup, produced no comment. Same for PR labels post-merge.
Tested locally with actionlint and zizmor — both clean. Real
verification needs a labeled PR/issue after merge.
|
||
|
|
fa88bb908f |
ci(prerelease-docker): publish floating :prerelease tag for each RC (#4005)
The workflow now re-points :prerelease at every new RC manifest in addition to publishing the versioned prerelease-vX.Y.Z-<sha> tag. Testers can pin compose to :prerelease and `docker compose pull` to fetch the latest RC without manually bumping the tag each cycle. Versioned tags remain available for reproducible testing. |
||
|
|
ee0ad19256 |
chore(deps): bump google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml (#4009)
Bumps [google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml](https://github.com/google/osv-scanner-action) from 2.3.5 to 2.3.8.
- [Release notes](https://github.com/google/osv-scanner-action/releases)
- [Commits](
|
||
|
|
9755a900eb |
ci(research): extract reusable LDR-research workflow + add issue-trigger caller (#3987)
* ci(research): extract reusable LDR-research workflow + add issue-trigger caller
Three triggers will end up calling the same install-and-run-LDR
plumbing (PR diff today, issue body now, Reddit posts later). Factor
out the middle of the workflow into a reusable workflow so we don't
have to maintain the same logic in three places, and add the
issue-trigger caller on top of it.
Changes:
- .github/workflows/ldr-research-reusable.yml (new) — workflow_call
workflow that takes a fully-assembled query and returns a
comment-ready markdown blob via artifact. Inputs include
forward-compat knobs the future Reddit caller will need
(max-query-length, max-sources, comment-footer override,
include-sources-section, output-truncate-chars).
- .github/workflows/e2e-research-test.yml — refactored from a single
job to three jobs (build-query → research-via-reusable →
post-comment). Behaviour is preserved: same headers, same footer,
same diff truncation at MAX_DIFF_SIZE, same label-removal on
completion.
- .github/workflows/issue-research.yml (new) — triggers on
`issues: types: [labeled]` gated by the same `ldr_research` label
the PR workflow uses (GitHub event-type gating means they don't
conflict). Output has two sections: "For the reporter" (cautious
framing) and "For maintainers" (raw research context). Issue body
is sanitized (control-char strip, 4000-char truncation) and never
reaches a shell.
- scripts/ldr-research.py — renamed from ldr-diff-research.py
(`git mv`, history preserved). Drops --mode, --static-query,
--max-diff-size: query now comes from stdin only and the caller
workflow does prompt assembly. Output JSON shape: {research,
sources, findings, iterations}.
- .github/labels.yml — register ldr_research and ldr_research_static
so they exist canonically rather than via on-the-fly creation.
Reddit research is a follow-up PR; this PR ships the abstraction
shape it will need.
* docs(ci): regenerate workflow status dashboard for new LDR workflows
The check-structure CI gate requires every workflow file to have a row
in docs/ci/workflow-status.md. Regenerate to add rows for the two new
workflows added in this PR. The live-status flips on unrelated rows
(gitleaks, ossf-scorecard, responsive-ui-tests-enhanced, osv-scanner)
are accurate snapshots of current status — the auto-regen workflow
keeps them fresh on its own schedule.
* ci(research): address review feedback — label cleanup, delimiter, artifact
Three small follow-ups from the AI review on this PR:
1. Label cleanup on build-query failure. The post-comment job had
`if: always() && needs.research.result != 'skipped'`, which meant
that if build-query failed, research was skipped and the entire
post-comment job (including the label-removal step) was skipped
too — leaving a stuck `ldr_research` label on the PR/issue.
Switch to `if: always()`; the download and post steps already
self-guard with `needs.research.outputs.success == 'true'`, so
only the label-removal step runs in the failure path.
2. Randomized GHA output delimiter. `__LDR_QUERY_EOF__` was a fixed
string; a query containing that exact line could prematurely
terminate the multi-line output. Use $$/$RANDOM/nanosecond as the
delimiter base. Defense-in-depth — collision was already
astronomically unlikely.
3. Optional `artifact-suffix` input on the reusable workflow. Until
now the artifact name was
`ldr-research-{run_id}-{run_attempt}-{github.job}`, which
collides if a caller invokes the reusable multiple times in one
run. The Reddit follow-up will use a matrix call, so add a
caller-provided suffix now and sanitize it to artifact-safe
chars. Existing callers don't pass it; default empty preserves
today's name.
* ci(research): fix per-line truncation in reusable workflow
Two follow-ups from the second review pass:
1. The awk-based backstop truncation in `Write query to file` was
per-line (operating on $0 / length($0)), not total. A long
multi-line query with many short lines would silently bypass the
max-query-length cap. Swap for a wc -c + head -c approach that
truncates total bytes. Verified locally that a 114-byte
multi-line input with all-short-lines is now correctly truncated
to ~100 bytes.
2. Remove the unused EXIT_CODE capture in `Run LDR Research`. The
step relies on JSON validation for error detection; capturing
$? without using it was just dead code inherited from the
original workflow.
|
||
|
|
c6dfc6dc8e |
ci(workflows): build Vite frontend bundle before UI tests (#3989)
The responsive-ui-tests-enhanced and puppeteer-e2e-tests workflows
both started the Flask app *without* running `npm run build` first.
`dist/` is gitignored, so the page rendered with the empty fallback
from `vite_helper._fallback_assets()` — no bundled `styles.css`. Tests
ran against a partially-unstyled UI, and CSS source changes between
PRs were invisible to the responsive baseline.
(playwright-webkit-tests.yml already does this — these two were the
outliers.)
Add two steps before the existing test setup in each workflow:
- name: Install root frontend dependencies
run: npm ci
- name: Build Vite frontend bundle
run: npm run build
The existing `tests/ui_tests/npm ci` and `tests/puppeteer/npm install`
steps still run separately to install the Puppeteer/Chromium test deps.
Costs roughly 30s of build time per workflow run. Unblocks CSS-only
PRs from being meaningfully validated by the responsive baseline.
|
||
|
|
e2150c3165 |
fix(ci): use release environment for prerelease-docker secrets (#3983)
Switch the four prerelease-docker.yml jobs from `environment: prerelease` to `environment: release` so they pick up the same DOCKER_USERNAME / DOCKER_PASSWORD already known to work for docker-publish.yml. Avoids duplicating environment secret configuration on the new prerelease environment introduced in #3969. The dispatch-time approval gate in release.yml still uses `environment: prerelease`, so the two checkboxes in the review modal remain independent — this only affects which secret store the downstream build jobs read from. |
||
|
|
91b68acafd |
docs(ci): auto-generated workflow status dashboard (#3966)
* docs(ci): add auto-generated workflow status dashboard Adds `docs/ci/workflow-status.md` — a single page that surfaces every GitHub Actions workflow in the repo, grouped by role, with action items (disabled / stale / manual-only) at the top. Live status badges link to each workflow's runs page. Auto-generated from the workflow YAML files + the GitHub API by `scripts/generate_workflow_status.py`. Why: the GitHub Actions tab is chronological-mixed (poor "is anything red right now?" view), and the static workflow table in `CI_CD_INFRASTRUCTURE.md` drifts when workflows are added/renamed (PR #3963 fixed three factually wrong header claims for exactly this reason). A reference page that mechanically reflects current state + identifies dormant workflows answers both gaps. What's surfaced today (verified live): - **Disabled**: `nuclei.yml` (caller commented out in `release-gate.yml:177`). - **Stale**: `update-precommit-hooks.yml` — its weekly Friday cron has been **failing for 10+ consecutive weeks** (since at least 2026-03-06). This was discovered by the dashboard, not previously tracked. - **Manual-only**: `check-config-docs.yml`, `sync-main-to-dev.yml` (both intentionally manual; the dashboard shows them so they're not forgotten). Generator design notes: - Resolves reusable workflows correctly: `gh run list --workflow=X.yml` is empty for `workflow_call`-only workflows. The script walks the call graph (release.yml → release-gate.yml → semgrep.yml etc.), fetches the parent run's job list, and matches by **job key** parsed from the caller YAML (not by name heuristic — `gitleaks-scan` ↔ `gitleaks-main.yml` would otherwise collide with `gitleaks.yml`). - Picks "primary trigger" per workflow so e.g. `codeql.yml` (PR + push + cron + workflow_call) gets its glyph from the gated daily run, not a stale PR run. - Stale check walks the *recent* runs list to find last success — a workflow that ran red yesterday and green a week ago is not stale. - Manual edits outside the `<!-- BEGIN/END GENERATED -->` markers are preserved on regeneration; the timestamp lives inside the markers so post-marker content is fully user-owned. - Preflights `gh auth status` and rate limit before any per-workflow call — fails fast with actionable message instead of partial output. CI integration: - `.github/workflows/check-workflow-status.yml` runs `--check-structure` on PRs touching workflows, the dashboard, or the generator. Pure structural check (no API calls, no live data) — fast and deterministic. Live regeneration stays on demand. Cost: ~340 GitHub API calls per regeneration, ~45 sec wall-clock, ~6.8% of the 5000/hr authenticated quota. * fixup(ci): review-pass corrections to workflow status dashboard Surfaced by three rounds of code-review + correctness + security agents on the original PR. Four small fixes; no behavioral change to the generated dashboard's content. 1. **Recognize commented job keys** — `JOB_KEY_RE` now accepts an optional `# ` prefix. Previously, when an entire job block was commented out (e.g. `release-gate.yml:175-181` for nuclei), the commented `uses:` line inherited the *previous* active job's key (`gitleaks-scan`) instead of the correct `nuclei-scan`. Latent — commented entries are filtered out before reaching gated-run lookup — but would misattribute status if someone partially uncommented a block (uncommented just the `uses:` line). 2. **Pin pyyaml to ==6.0.3** in the CI workflow. The repo convention is exact `==` pins (95% of `pip install` calls in workflows); the only floating range was the one introduced by this PR. Matches pdm.lock. 3. **Validate marker order** in `merge_with_existing`. If a manual edit leaves the BEGIN/END markers reversed (e.g. mid-merge-conflict), bail to a clean overwrite instead of splicing interleaved garbage. 4. **Remove `_coerce_jq_stream`** — unused helper left behind from an earlier iteration. Zero call sites; no behavior change. Verified by re-running the generator + `--check-structure`. The rendered dashboard's only diff vs prior commit is the regeneration timestamp and live "Last activity" cells (expected — those reflect recent runs since the previous regen). * feat(ci): bucketed activity labels + auto-regen on version bump Two changes that together make the dashboard's diffs meaningful instead of noisy. 1. **Coarse activity buckets.** Replace exact UTC timestamps in every "Last activity / Last manual run / Last successful run" cell with one of: `this week`, `last week`, `2 weeks ago`, `3 weeks ago`, `last month`, `2 months ago`, `3+ months ago`, `long ago`, `never`. Calendar-day boundaries (no time-of-day jitter) so two regenerations on the same date produce **zero diff** when nothing actually drifted. Verified: same-day re-runs after stable workflow state → empty diff. Also drop the redundant `Days idle` columns from Stale and Manual-only tables (the bucket label already says it), and round the "Last regenerated" footer to a date. Why: a daily-running healthy workflow used to bump its timestamp every regen (noise). Now it stays in `this week` indefinitely, and the only diffs that land in a version-bump PR are real bucket transitions — exactly the "this slipped from last week to last month — something might be wrong" signal the dashboard exists for. 2. **Auto-regenerate on version bump.** Add a step to `version_check.yml` right after the existing `generate_config_docs.py` regen. Same pattern as the config docs precedent — the dashboard refresh rides along with each version-bump PR and is reviewable in the same diff. Costs ~340 GitHub API calls per run (well under the GITHUB_TOKEN 1000/hr workflow-runs limit). Adds `actions: read` to the job permissions block; uses `pyyaml==6.0.3` matching pdm.lock. * feat(ci): drop regen timestamp; add health banner; fix in-progress false-stale Three follow-ups to keep version-bump diffs strictly meaningful, plus two correctness fixes uncovered by repeated stability testing. 1. **Drop the "Last regenerated" date.** Git history is authoritative for "when this snapshot was taken"; embedding a date here forced a single-line diff every regeneration even when nothing else drifted. 2. **Aggregated health banner** at the top of the generated region: `**63 workflows:** 1 disabled · 1 stale · 2 manual-only · 59 active` Counts only change when a workflow shifts between {disabled, stale, manual, active} — same level of diff-stability as the per-row buckets. 3. **`?event=schedule` for own-cron workflow badges.** Verified effective by SHA-comparing badge bodies for workflows with multi-event run history. Makes the badge for e.g. `gitleaks.yml`, `fuzz.yml`, `osv-scanner.yml` reflect cron health specifically, rather than whichever PR ran last. The runs-page link uses the matching `?query=event%3Aschedule` so a click lands on the filtered run list. 4. **Fix false-stale during in-flight release runs.** Previously, when release.yml was running, gates reachable via release.yml (puppeteer-e2e-tests, ci-gate, etc.) would briefly flip to "stale" because `fetch_last_gated_run` returned the in-progress run first and `last_success` couldn't see past it. Now the function walks all 5 caller runs and returns both the latest match (for activity) and the latest successful match (for staleness), avoiding the flip. 5. **Map all GitHub conclusion enum values.** A `gitleaks.yml` run completed with `action_required` between two test regens; the glyph table didn't have it and rendered `?`. Added every documented value (`neutral`, `timed_out`, `stale`, `action_required`) and changed the unknown-fallback from `?` to em-dash, so future GitHub-side enum additions don't introduce a false-positive diff. Verified: two same-day regens after workflow state has settled now produce **zero diff**. * ci(version-bump): make workflow-status regen non-blocking Add `continue-on-error: true` to the dashboard regeneration step in version_check.yml. The regen calls ~340 GitHub API endpoints and would otherwise block the entire version-bump PR if any of them transiently fail (rate-limit hit, GitHub Actions outage, etc.). The failure mode should be "dashboard stays at the previous snapshot until next successful regen", not "release pipeline is blocked". The sibling `generate_config_docs.py` step doesn't need this — it's purely local with no external API dependency. |
||
|
|
632bb176fc |
fix(ci): scope prerelease-docker jobs to prerelease environment (#3978)
The prerelease-docker workflow's jobs declared no environment, so the DOCKER_USERNAME / DOCKER_PASSWORD secrets stored on the new `prerelease` environment (added in #3969) were invisible to them and the Docker Hub login step failed with "Username and password required" (run 25627724313). Add `environment: prerelease` to all four jobs, mirroring how docker-publish.yml scopes every job to `environment: release`. This makes the environment secrets visible and applies the same reviewer gate that already protects the real publish workflow. |
||
|
|
28b1732259 |
test(ui): replace flake-prone delays, fix local-DX bug, correct stale CI comment (#3972)
* test(ui): replace fixed delays in metrics_dashboard with proper waits
Five hardcoded `await delay(N)` calls in tests/ui_tests/test_metrics_dashboard.js
became `page.waitForResponse(...)` and `page.waitForSelector(...)` plus a
short `waitForFunction` for the SPA-route check. Each replacement waits for
the real condition (an API response or a DOM element) instead of a fixed
sleep, so the test stops racing and gets faster on machines that finish
the work quickly.
Verified 10/10 runs against a live local server: all pass at ~19.7s
wall time (previously ~25s with the fixed delays summed alone consuming
12s of that).
Concrete sites:
* line 79 → wait for `/api/start_research` response, then `waitForFunction`
on the URL change
* line 164 → wait for `/api/metrics` response (10s ceiling)
* line 290 → wait for `period=7d` response (5s ceiling)
* lines 334, 352 → wait for the metrics dashboard selector after navigation
* test(ui): handle puppeteer's fullPage screenshot ceiling gracefully
Running test_responsive_ui_comprehensive.js locally without CI=true used
to fail on the Settings page with `Protocol error (Page.captureScreenshot):
Page is too large` — Puppeteer/Chromium's fullPage screenshot caps at
16384px, and the Settings page rendered at 375px wide blows past that
limit. The error bubbled up to testPage's catch block and marked the
whole page as failed. CI environments avoided the problem because the
diagnostic-screenshot calls are guarded by `!process.env.CI`, so local
devs couldn't reproduce CI's pass.
The screenshots are diagnostic, not the test target. Added a
`safeScreenshot(opts)` helper that catches the documented "Page is too
large" / `captureScreenshot` protocol errors and falls back to a
viewport-only capture so the run continues. Replaced all 9 fullPage
screenshot call sites in this file with the helper; the safeScreenshot
method itself still uses `page.screenshot` directly (the only place
that should).
Verified 5/5 runs locally pass (mobile viewport, no CI=true) at ~31s
wall time; CI=true behavior is unchanged.
* ci(workflows): correct stale concurrency-comment in responsive-ui-tests
The comment block above `permissions:` claimed the workflow "triggers on
both pull_request and workflow_call." That was true historically but
became wrong when #2248 removed the pull_request trigger to keep this
heavy matrix build (mobile + desktop, ~20 min each) off the PR gate.
The comment was added later in #3600 with the stale wording, so anyone
reading it has been misled about when this workflow actually runs.
Rewrite to describe current behavior accurately: runs via workflow_call
from release.yml's responsive-test-gate and via workflow_dispatch only.
The concurrency-history note (PR #3554 / #3599) is preserved.
No functional change — just the comment.
* test(ui): filter benign navigation-abort race in benchmark page test
`Benchmark Results Page › page loads without critical errors` was
flaky with `Error loading benchmark history: TypeError: Failed to
fetch`. The describe-scoped `beforeEach` already navigates to
`/benchmark/results`, then the test re-navigates with listeners
attached. The first navigation's in-flight history fetch gets aborted
by the second navigation and surfaces as `Failed to fetch` — a benign
race, not a real bug.
Add `Failed to fetch` to the existing filter list (next to favicon,
404, and Failed to load resource), with an inline comment explaining
why. Verified 5/5 clean runs locally; previously hit 1 flaky / 4 clean.
|
||
|
|
1315b679e0 |
ci(research): switch E2E research workflow to langgraph-agent strategy (#3965)
* ci(research): switch E2E research workflow to langgraph-agent strategy
The ldr_research label runs scripts/ldr-diff-research.py, which until
now didn't pass a search_strategy and so fell through to the
quick_summary default of source_based. Switch to the agentic
langgraph-agent strategy so the workflow exercises the autonomous
research path.
- Adds --strategy CLI arg and LDR_STRATEGY env var, default
langgraph-agent (consistent with the existing --provider /
--search-tool / --iterations pattern).
- Workflow exposes LDR_STRATEGY: vars.LDR_STRATEGY || 'langgraph-agent'
so the choice is overridable per-repo via Variables.
- Notes in the script docstring that LDR_ITERATIONS=1 is a no-op for
the langgraph strategy (which reads langgraph_agent.max_iterations
from settings instead).
* ci(research): consolidate model var to LDR_RESEARCH_MODEL
The workflow had two model variables — vars.LDR_MODEL for diff mode and
vars.LDR_STATIC_MODEL for static mode — selected by a small set-model
step. Collapse to a single LDR_RESEARCH_MODEL variable shared by both
labels, mirroring the AI reviewer's vars.AI_MODEL pattern.
- Default: google/gemini-2.0-flash-001 (the value the script was
already falling through to).
- Override via Settings → Variables → New repository variable
→ name: LDR_RESEARCH_MODEL.
- The set-model step is removed; the workflow now passes the env var
through directly.
- Script reads LDR_RESEARCH_MODEL instead of LDR_MODEL.
Note: existing repo variables LDR_MODEL and LDR_STATIC_MODEL become
orphaned by this rename and can be deleted from repo settings.
* ci(research): stop overriding strategy iterations from the workflow
Previously the workflow set LDR_ITERATIONS=1 and the script forwarded
that as iterations= in kwargs. For source_based that capped research at
one iteration; for langgraph-agent it was effectively a no-op (langgraph
reads max_iterations, not iterations) but the wiring was misleading.
- Drop LDR_ITERATIONS from the workflow env block.
- Make --iterations default to None in the script and only forward it
to quick_summary when explicitly set on the CLI.
- Each strategy now uses its own setting-driven default unless
overridden — for langgraph-agent that means langgraph_agent.max_iterations
(default 50) flows through unchanged.
* ci(research): split research model into MAIN + CHEAP per label
Bring back per-label model selection with cleaner names:
- ldr_research → vars.LDR_RESEARCH_MODEL (deep PR analysis,
user-configurable)
- ldr_research_static → vars.LDR_RESEARCH_CHEAP_MODEL (regression
smoke, kept cheap)
Both default to google/gemini-2.0-flash-001 if unset, so existing
behaviour stays identical until you actually configure cheap-model.
The script and its env-var contract are unchanged — the workflow
just picks which value to feed into LDR_RESEARCH_MODEL based on the
applied label.
|
||
|
|
8871d0fdab |
ci(release): split prerelease docker into its own environment (#3969)
Today the trigger-prerelease-docker job and the create-release / trigger-workflows jobs all gate on `environment: release`. GitHub's "Review deployments" modal collapses every pending job in the same environment under one checkbox, so approving `release` approves the prerelease test AND the actual publish at once. There is no UI affordance to test the prerelease docker build first and decide on the release afterward. Move trigger-prerelease-docker to a new `prerelease` environment so the review modal shows two independent checkboxes. Maintainers can now: - Approve `prerelease` only, test the docker image, then approve `release`. - Reject `prerelease` and approve `release` to skip the prerelease step. - Approve `prerelease`, test, then cancel the run to abandon the release. Requires a one-time GitHub Settings change: create a `prerelease` environment with the same required reviewers as `release`. PAT_TOKEN is a repo secret, so no environment-secret copy is needed. create-release and trigger-workflows remain on `release` — unchanged. |
||
|
|
b8602b8f10 |
fix(security): suppress alerts #7743 #7744 #7745 (audited false positives) (#3968)
- #7743 (zizmor/dangerous-triggers): welcome-first-time.yml Adds inline `# zizmor: ignore[dangerous-triggers]` with rationale. pull_request_target is required so fork PRs receive a writable token for the welcome comment. The workflow never checks out PR content, never executes fork-controlled scripts, and only reads `sender.login` (operator-trusted GitHub event metadata). The comment body is a static template with no PR-controlled interpolation. This is one of the safe, audited use cases of pull_request_target. - #7744 / #7745 (Bearer/javascript_lang_dangerous_insert_html): context-overflow.js innerHTML at lines 453 and 501. Adds `// bearer:disable javascript_lang_dangerous_insert_html` alongside the existing `eslint-disable` comments. All user-controlled values are already routed through escapeHtml; numeric fields go through formatNumber; CSS classes and badges are hardcoded literals. This matches the convention used in collection_details.js, embedding_settings.js, and other JS components in the repo. |
||
|
|
0f65961fa0 |
docs(ci): cross-link compose-integration-test ↔ compose-published-smoke (#3963)
Make the relationship between the two compose workflows explicit so future contributors don't try to add the build-override variant to release-gate's daily cron (PR #3962 attempted this and was reverted). - compose-integration-test.yml: add a "do not move into the daily cron" paragraph pointing readers at compose-published-smoke.yml as the workflow that already covers ongoing drift between main's compose.yml and the published image. - compose-published-smoke.yml: fix three stale claims in the header: 1. "per-PR / release-gate test" — neither is true (no pull_request trigger, not in release-gate.yml; runs only at release time + manual dispatch). 2. "PR compose changes already covered by the per-PR integration test" — there is no per-PR integration test; replace with the accurate reason (this test can only fail on drift already on main). 3. cron-offset comment referenced a "daily compose-integration-test (03:00)" that does not exist; offset is only against release-gate. Comment-only change. No workflow behavior changes. |
||
|
|
e6d72faa14 |
test(embedding-settings): regression spec for model dropdown reset (#3863) (#3949)
* test(embedding-settings): regression spec for model dropdown reset (#3863)
Adds a Playwright spec that mocks the embedding-settings backend
(`/library/api/rag/models`, `/library/api/rag/settings`,
`/settings/api/local_search_embedding_model`,
`/settings/api/embeddings.ollama.url`) so the page renders without a real
Ollama or Sentence-Transformers backend. Three tests cover the bug from
#3863 and the surrounding contract:
1. selecting a non-top model auto-saves it and persists across reload —
exercises the change-listener path that the per-field auto-save relies
on.
2. Ollama URL change does not reset the selected model — the
load-bearing regression test. Reverting the preserve+restore patch in
`updateModelOptions()` was confirmed to make this test fail with the
same symptom the issue reporter saw (dropdown snaps to the index-0
model).
3. "Save Default Settings" button is gone — guards against an accidental
re-introduction of the redundant button that originally triggered the
bug.
Mock route registration order matters here: the catch-all
`**/settings/api/**` is registered first so the specific PUT mocks for
`local_search_embedding_model` etc. (registered later) win Playwright's
last-registered-wins precedence.
* test(embedding-settings): address review findings on regression spec
Round-2 review of PR #3949 surfaced that the spec was not actually wired
into any CI workflow and had a couple of correctness/flakiness issues.
- Add `embedding-settings-dropdown` to the curated Safari filename filter
in `.github/workflows/playwright-webkit-tests.yml` (both Desktop Safari
and Mobile Safari runs) so the daily release-gate Playwright run picks
up this regression spec.
- Replace `waitForTimeout(500)` with a deterministic poll on a new
`state.modelsFetches` counter that ticks on each `/library/api/rag/models`
GET. Once the count rises past the pre-action baseline, the post-save
`loadAvailableModels()` call has completed and any spurious save the
bug would trigger has already fired.
- Gate the `local_search_embedding_model` and `local_search_embedding_provider`
PUT mocks on request method, mirroring the `embeddings.ollama.url`
handler. A stray GET hitting these handlers would otherwise push
`undefined` into `state.modelSaves`/`providerSaves`.
- Skip the entire describe block on mobile projects (`test.skip(({ isMobile })
=> isMobile, ...)`). This is a desktop form-state regression test, not
a layout test — it doesn't need to run across 12 device profiles.
- Reframe test #1's docstring: it's a happy-path smoke test for the
per-field auto-save contract, not a regression test for #3863. The old
comment claimed it would catch the bug, which I confirmed empirically
it does not (the dropdown-rebuild path that #3863 exploited isn't on
the model-pick-and-reload flow).
- Add a new test for the provider-change path (`updateModelOptions` is
also reached from the provider-change handler at line 325 of
`embedding_settings.js`). The model-shared-across-providers fixture
forces the preserve+restore branch in `updateModelOptions` to fire and
asserts the selection survives. Verified locally: this test fails
alongside the Ollama-URL test when the patch is reverted.
* test(embedding-settings): extract mocks helper + cover text_separators reset
Round-2 review left two advisory items: extract the mock infrastructure
to a shared helper, and cover the text_separators reset behavior added in
the follow-up commit on PR #3940 (
|
||
|
|
bc527f7aa9 |
ci(workflows): migrate to LDR_DISABLE_RATE_LIMITING canonical name (#3945)
* ci(workflows): migrate to LDR_DISABLE_RATE_LIMITING canonical name (#3936 follow-up) Flips the 9 remaining `DISABLE_RATE_LIMITING=true` workflow uses to the canonical `LDR_DISABLE_RATE_LIMITING=true` name introduced in #3936, so CI no longer trips its own deprecation warning. Also closes a latent test-isolation gap in `test_enabled_by_default` that did not pop the canonical var, which would have started failing as soon as a developer or workflow exported it. * test(auth): flip remaining DISABLE_RATE_LIMITING uses to LDR_ prefix Picked up from the closed #3944. The test_auth_routes fixture used the legacy env var, and test_auth_rate_limiting carried a stale comment referencing it. Both now use the canonical LDR_DISABLE_RATE_LIMITING introduced in #3936, matching the workflow flips in this PR. * test(env_registry): isolate TestIsRateLimitingEnabled from canonical env var CI now exports LDR_DISABLE_RATE_LIMITING=true (per the workflow flips in this PR). Two tests in TestIsRateLimitingEnabled use patch.dict(os.environ, {"DISABLE_RATE_LIMITING": ...}) without clearing the canonical key, so the canonical var bled in from the outer process and short-circuited before the legacy code path: - test_enabled_when_flag_false: expected True (legacy=false), got False because canonical=true wins - test_legacy_form_emits_deprecation_warning_once: expected one warning, got zero because canonical short-circuit skips legacy Add a class-level autouse clean_env fixture that strips both env-var forms (mirroring the one in test_env_registry_extended.py). The remaining tests in this class were silently coincidence-passing under the bug because they expect False and canonical=true also gives False. Verified by exporting LDR_DISABLE_RATE_LIMITING=true and running the two test files: 65 passed. |
||
|
|
5e3f37a7ce |
fix(ci): grant pull-requests:write to welcome-first-time workflow (#3950)
createComment on a PR via /issues/{n}/comments returns 403 with only
`issues: write`. GitHub now requires `pull-requests: write` when the
issue resource is actually a PR — the API response's
`x-accepted-github-permissions: issues=write; pull_requests=write`
indicates both are needed (issues for plain issues, pull_requests
for PRs). All five recent runs of this workflow have failed for this
reason; adding the permission unblocks the welcome comment.
|
||
|
|
5a0ca57ded |
feat(ci): welcome first-time contributors with a single comment (3/5) (#3859)
* feat(ci): welcome message on a contributor's first PR Adds .github/workflows/welcome-first-time.yml using actions/first-interaction@v3.1.0 (pinned by SHA). Posts a single comment on a contributor's first PR pointing at CONTRIBUTING.md and our review-process docs. Uses pull_request_target so forked PRs receive a writable token; the action only posts a fixed message (no checkout, no shell execution), so the security surface is minimal. Permissions limited to pull-requests: write. PR 3 of 5 introducing PR triage automation. Independent of the other PRs in the series. * feat(ci): rewrite welcome workflow with per-author check + starter pack Replaces actions/first-interaction (which has no author filter on isFirstPullRequest, so it would never fire on a repo with prior PRs) with a github-script that uses issues.listForRepo?creator=<user> to detect a contributor's actual first PR. Fixes the missing issues:write permission needed by issues.createComment (PR comments route through the issues API). Adds a bot filter (consistent with the PR triage workflow) so dependabot/renovate PRs don't trigger a human-facing welcome. Expands the welcome message into a starter pack: install guide, dev guide, pre-commit hook setup (inline commands), architecture overview, tests README, FAQ, troubleshooting, security policy, and Discord. All links use absolute URLs to files that exist on main. |
||
|
|
8cc0184cbe |
feat(ci): auto-apply triage labels on PR open and review (2/5) (#3858)
* feat(ci): auto-apply triage labels on PR open and review events Adds .github/workflows/pr-triage.yml that: - on PR opened: applies external-contributor / first-time-contributor / bot / needs-codeowner-review based on author_association - on synchronize: flips awaiting-author → awaiting-codeowner when author pushes new commits - on review submitted by a codeowner: clears or applies lifecycle labels based on the review state (approved / changes_requested) - on review dismissed: re-applies needs-codeowner-review if a previous changes-requested review was withdrawn Codeowner detection accepts the hardcoded global-owners list OR any reviewer with OWNER/MEMBER/COLLABORATOR association (covers team-based codeowners that aren't direct repo members). Uses pull_request (not pull_request_target) so fork PRs run with a read-only token; label calls 403 silently for forks. Acceptable trade vs the security cost of running pull_request_target with secrets on fork code. Maintainers can apply labels manually for fork PRs. Updates CODEOWNERS with a comment noting the global-owners list is mirrored in pr-triage.yml; both must stay in sync. PR 2 of 5 introducing PR triage automation. Depends on labels being synced first via PR 1 (#3857). * fix(ci): tighten codeowner check, prune permissions, extend bot list - Drop the OWNER/MEMBER/COLLABORATOR fallback in isCodeownerReview; rely on the hardcoded CODEOWNERS list. The fallback was designed for team-based codeowners but this repo has no such setup, and the fallback would become a security-relevant mislabel if branch protection adopts require_code_owner_reviews=true. - Trim job permissions to issues:write only — pull-requests:write and contents:read were unnecessary (issues:write covers PR labels since PRs are issues internally). Matches label-fixed-in-dev.yml precedent. - Add mseep-ai and Nexus-Digital-Automations to KNOWN_BOTS — both appear in repo PR history without the [bot] suffix. * fix(ci): clear awaiting-author on approval, gate dismissed handler Two label-state bugs surfaced by the AI code reviewer on the previous revision (#3858 review pass): - Approval branch now also removes awaiting-author. Without this, a codeowner who switches from changes_requested to approved purely via comments (no intervening author push, so no synchronize event to flip the label) leaves awaiting-author stuck on the PR. - Dismissed branch now requires that the dismissed review was a codeowner's changes_requested review. Otherwise any non-codeowner review being dismissed while awaiting-author happens to be set would incorrectly flip the PR back to needs-codeowner-review while the real codeowner request is still active. * chore(ci): pre-commit check that pr-triage.yml CODEOWNERS matches .github/CODEOWNERS Eliminates the manual sync hazard flagged in the PR review: the hardcoded JS array in pr-triage.yml must mirror the global owners line in .github/CODEOWNERS, and the only thing keeping them in sync was a pair of comments. This adds a small Python pre-commit hook that parses both files and fails if their owner sets disagree (case-insensitive, order-independent). Triggers on edits to either file. Same shape as check-version-sync.py. * fix(ci): swallow 403 on fork-PR label calls so the run stays green The previous comment promised fork PRs would "403 silently" but addLabels never caught the error — every fork contribution would have shown a red check on the triage workflow. This adds a narrow 403 catch (scoped to the documented fork-PR case via pull_request's read-only token) shared by addLabels and removeLabel, with a console log so the no-op is visible in the run output. Other status codes still throw. Behavior matches the original design intent; comment is now accurate. Flagged by AI Code Review on the previous revision. * fix(ci): drop dead state check in dismissed-review handler GitHub mutates review.state to "dismissed" on pull_request_review action=dismissed events (github/docs#20216), so the previous guard `review.state !== 'changes_requested'` always returned early. The awaiting-author -> needs-codeowner-review flip never executed. Use the awaiting-author label as the discriminator instead — it's only set by a codeowner's changes_requested review, so its presence is reliable proof the dismissal is the one we care about. Dismissals of approval/comment reviews are no-ops because the label won't be present. |
||
|
|
649ead1079 |
chore(deps): bump github/codeql-action from 4.35.3 to 4.35.4 (#3919)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.3 to 4.35.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](
|
||
|
|
fca32f072b |
chore(deps): bump anthropics/claude-code-action from 1.0.107 to 1.0.119 (#3918)
Bumps [anthropics/claude-code-action](https://github.com/anthropics/claude-code-action) from 1.0.107 to 1.0.119.
- [Release notes](https://github.com/anthropics/claude-code-action/releases)
- [Commits](
|
||
|
|
e76c323813 |
chore(deps): bump actions/dependency-review-action from 4.9.0 to 5.0.0 (#3915)
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4.9.0 to 5.0.0.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](
|
||
|
|
a21c30bbe4 |
chore(deps): bump anchore/scan-action from 7.3.2 to 7.4.0 (#3917)
Bumps [anchore/scan-action](https://github.com/anchore/scan-action) from 7.3.2 to 7.4.0.
- [Release notes](https://github.com/anchore/scan-action/releases)
- [Changelog](https://github.com/anchore/scan-action/blob/main/RELEASE.md)
- [Commits](
|
||
|
|
1aaff5cac9 |
chore(deps): bump sigstore/cosign-installer from 4.1.1 to 4.1.2 (#3916)
Bumps [sigstore/cosign-installer](https://github.com/sigstore/cosign-installer) from 4.1.1 to 4.1.2.
- [Release notes](https://github.com/sigstore/cosign-installer/releases)
- [Commits](
|
||
|
|
3066a9b2c5 |
chore(deps): cover audited test dirs in dependabot config (#3913)
Two test directories audited by .github/workflows/npm-audit.yml were missing from .github/dependabot.yml: - /tests/ui_tests/playwright - /tests/accessibility_tests So they only received Dependabot security alerts (via GitHub's GHSA scanner) and never the routine weekly version-bump PRs. That gap is why basic-ftp in tests/accessibility_tests had to be patched manually in #3896 instead of arriving as a normal Dependabot update. Add both as daily npm trackers, matching the cadence of the other test-dir entries. |
||
|
|
7065b6b1b4 |
ci: weekly published-image smoke test with auto-issue on failure (#3890)
* ci: weekly smoke of main's compose against the published Docker Hub image Complements compose-integration-test.yml (#3886). That workflow builds the LDR image from the working tree — it tests "this PR's code's compose with this PR's code's image". This new workflow tests "main's compose with the currently-published localdeepresearch/local-deep-research:latest" — the exact artefact users get when they follow the README quickstart: curl -O .../docker-compose.yml && docker compose up -d The drift between those two is real. Whenever a compose change lands on main but the image hasn't been republished (which happens between every release), users following the quickstart can hit a broken stack — the same class of bug as #3874, but only visible against the published image. Cadence: weekly Monday 05:00 UTC. The failure modes are slow-moving and weekly burns ~1/4 the CI minutes a daily run would. The PR-time / release gate test in #3886 covers the per-change cases. On schedule failure, opens (or comments on) a tracking issue with run URL, container digests, and a triage checklist. Stable title prefix dedups across weeks; manual workflow_dispatch runs do NOT auto-create issues (those are for ad-hoc testing). Reuses the same wait/probe/teardown logic as compose-integration-test.yml, intentionally not factored into a composite action — two workflows, ~50 lines of shared shell, refactoring for DRY would cost more than it saves right now and the loops will diverge as we tune them. * ci: require LDR healthy in published-image smoke test Same fix as #3886 commit |
||
|
|
b829c2d65a |
ci(compose-integration): hardening follow-ups (--no-build + drop curl -f) (#3898)
* ci(compose-integration): add --no-build to docker compose up Defense-in-depth flag from AI review on #3890 (where the same fix landed in commit |
||
|
|
bf43e7c328 |
chore(deps): bump actions/setup-node from 4.4.0 to 6.4.0 (#3814)
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.4.0 to 6.4.0. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](https://github.com/actions/setup-node/compare/v4.4.0...48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: 6.4.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> |
||
|
|
12c01cd44c |
chore(deps): bump actions/github-script from 8.0.0 to 9.0.0 (#3812)
Bumps [actions/github-script](https://github.com/actions/github-script) from 8.0.0 to 9.0.0. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v8...3a2844b7e9c422d3c10d287c895573f7108da1b3) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: 9.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: LearningCircuit <185559241+LearningCircuit@users.noreply.github.com> |
||
|
|
4540adaac2 |
ci: full docker-compose integration test + drop ollama model pre-pull (#3886)
* ci: add full docker-compose integration test to release gate
Brings up the bundled docker-compose.yml end-to-end (searxng + ollama +
local-deep-research) and asserts the whole stack reaches healthy/serving.
This is the test that would have caught #3874 (cap_drop: ALL breaking
SearXNG) before users hit it — and the same class of bug whenever an
upstream image bumps its capability or healthcheck requirements.
Cost is bounded by scoping triggers carefully:
- pull_request: only when compose / Dockerfile / entrypoint scripts change
- schedule: daily at 03:00 UTC (offset from release-gate at 02:00)
- workflow_call: invoked from release-gate.yml so a release can't bypass it
We override MODEL=tinyllama:1.1b for the test (~640 MB) instead of the
default gemma3:12b (~7-8 GB). Users tune MODEL via env the same way; the
compose config under test is otherwise identical to what ships.
Wait loop fails fast on container exits rather than burning the full 12 min
budget, and dumps logs from all three services on any failure for triage.
* ci: skip ollama model pull in compose integration test
The integration test verifies "compose up + healthy + LDR serves" — it does
not run inference. After #3885 the ollama healthcheck is `ollama list`
(model-agnostic), so pulling a model only adds ~1-2 min and a flake source
(Ollama Hub registry transients) without exercising anything the test
checks.
Layer a small override (.github/compose-ci-override.yml) that replaces the
ollama service's entrypoint with `ollama serve`. The base docker-compose.yml
is otherwise unchanged — capabilities, networking, healthchecks, depends_on,
ports all come from the file users actually run.
Wait budget drops 12 min → 6 min accordingly.
End-to-end inference, if we ever add it, belongs in a separate workflow
that's transparent about the cost and runs less frequently.
* fix(docker): drop ollama model pre-pull from compose
The bundled compose's ollama service overrode the image entrypoint with
scripts/ollama_entrypoint.sh ${MODEL:-gemma3:12b}, which pre-pulled a
multi-GB model on every fresh start. That had three problems:
1. Users running LM Studio / OpenAI / llama.cpp don't use ollama at all,
but every fresh boot still pulled gemma3:12b (~7-8 GB).
2. First-time setup wasted 5-10 min on a model selection the user may
not even want — gemma3:12b is a strong opinion baked into the compose.
3. CI integration tests (#3886) had to layer an override file just to
skip the pull, since the model isn't relevant to "stack-comes-up"
smoke testing.
Drop the entrypoint override entirely. The ollama image's default
entrypoint is `ollama serve`; that's all we need. The healthcheck
introduced in #3885 already probes the daemon (model-agnostic) so this
slots in cleanly. Also drops the now-unused `ldr_scripts:/scripts`
mount on the ollama service.
Behavior change for ollama users: the model is no longer pre-pulled on
boot. They pull explicitly (`docker exec ollama_service ollama pull X`)
or LDR pulls on first use. The first-research wait is the same total
time, just deferred to when the user actually triggers it instead of
blocking compose-up.
In #3886, removes the .github/compose-ci-override.yml workaround now
that the compose itself doesn't pull a model. The integration test
runs against the compose users actually run, with no test-only overrides.
The scripts/ollama_entrypoint.sh file is left in place — it's no longer
referenced from compose but may be useful for users who want a pre-pull
in their own deployments. Cleaning that up can be a separate follow-up
once we're sure no one depends on it.
* ci: drop redundant pre-pull step in compose integration test
`docker compose up -d` already pulls any image it doesn't have locally
(default pull_policy: missing). The separate `docker compose pull ollama
searxng` step was just for log clarity; it does the same work twice.
The LDR image is locally built and tagged in the previous step, so
`up -d` sees it's present and uses it as-is — no risk of compose
yanking our local image.
* ci: require LDR healthy (not just running) in compose integration test
Previous condition checked \`ldr_h = "running"\` but LDR has a Dockerfile-level
HEALTHCHECK at Dockerfile:306 (probing /api/v1/health), so docker inspect
returns the health status, not the state — i.e. "healthy", never "running".
The wait loop never matched and timed out at 6 min despite the stack being
healthy the whole time. CI run for evidence: log line
"[23:33:04] ollama=healthy searxng=healthy ldr=healthy" repeats for ~5 min.
Fix: require "healthy" for all three. ollama and searxng have compose-level
healthchecks; LDR has a Dockerfile-level one. The status() helper already
returns Health.Status when a healthcheck exists, so requiring "healthy" is
the right signal for all three.
Also retires the "LDR has no healthcheck" follow-up note from the PR body —
that was based on me checking the compose only, not the Dockerfile.
* ci(compose-integration-test): drop pull_request and schedule triggers
Per the original design (and the conversation thread on #3886), this test
should only run via release-gate.yml. release-gate fires daily on its own
cron + on every release + on manual dispatch, which is exactly the
coverage we want.
Removing the pull_request trigger means PRs that touch docker-compose.yml
no longer pay 3-6 min per run for a test whose feedback isn't actionable
at PR time anyway. Removing the standalone daily schedule avoids
duplicating release-gate's own daily run.
The successful run on commit
|
||
|
|
d8034e27a4 |
feat(ci): declarative label set for PR triage (1/5) (#3857)
* feat(ci): introduce declarative labels for PR triage
Add .github/labels.yml + labels-sync.yml workflow (EndBug/label-sync@v2)
managing 7 new labels for PR triage: 4 persistent (external-contributor,
first-time-contributor, bot, needs-rework) and 3 lifecycle
(needs-codeowner-review, awaiting-author, awaiting-codeowner) that will
be toggled per-PR by a follow-up workflow.
Sync is additive (delete-other-labels: false) so the existing 75
labels are not touched. Workflow runs only on push to main when
labels.yml changes, plus workflow_dispatch for manual sync.
First PR of a 5-PR series introducing PR triage automation.
* fix(ci): pin actions and harden labels-sync workflow
Adds the missing contents:read permission (without it, actions/checkout
fails because explicit permissions: zeroes out unspecified scopes).
Brings the workflow into line with repo conventions used by every other
label/issues-write workflow:
- SHA-pin actions/checkout (v6.0.2), step-security/harden-runner (v2.19.1),
and EndBug/label-sync (v2.3.3); enforced by validate-image-pinning.yml.
- Add harden-runner first step with egress-policy: audit (matches 54/57
workflows including label-fixed-in-dev.yml).
- Move permissions to job scope; top-level permissions: {} for OSSF Scorecard.
- Add timeout-minutes: 5 (matches label-fixed-in-dev.yml).
- Use sparse-checkout for labels.yml only with persist-credentials: false.
- Document the deliberate omission of concurrency: (regression #3554/#3599).
|
||
|
|
fd94bf8945 |
chore(release): remove dead alias and ghost label refs from release.yml (#3871)
Three label entries in the changelog generator config no longer correspond
to active labels:
- `docs` (line 29): alias for `documentation`, which is already in the
same category. Being deleted as part of label cleanup.
- `CI/CD` (line 33): alias for `ci-cd`, same category. Same cleanup.
- `ai-review-requested` (line 79): ghost reference. Added in
|
||
|
|
3bf78baf07 | docs: fix API example links (#3852) | ||
|
|
56290b15c0 |
chore(deps): bump step-security/harden-runner from 2.19.0 to 2.19.1 (#3811)
Bumps [step-security/harden-runner](https://github.com/step-security/harden-runner) from 2.19.0 to 2.19.1.
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](
|
||
|
|
dee75bd2a5 |
chore(deps): bump github/codeql-action from 4.35.2 to 4.35.3 (#3815)
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.2 to 4.35.3.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](
|
||
|
|
c245a27090 |
feat(ci): add prerelease Docker image workflow for pre-release testing (#3761)
* feat(ci): add prerelease Docker image workflow for pre-release testing
Build a prerelease Docker image (prerelease-v{version}-{short_sha}) after
all quality gates pass, in parallel with the release approval step. The
image is pushed to Docker Hub for local testing before the official release
is published. Old prerelease tags are auto-deleted (best-effort) when the
production release completes.
- New prerelease-docker.yml: standalone workflow triggered by repository_dispatch
- release.yml: add short_sha output and trigger-prerelease-docker job
- docker-publish.yml: add best-effort cleanup of prerelease-* tags
* fix(ci): address review feedback on prerelease docker workflow
- Group >> "$GITHUB_STEP_SUMMARY" redirects (clears actionlint SC2129
pre-commit failure).
- Bump pinned actions to match the rest of the release pipeline:
step-security/harden-runner v2.17.0 -> v2.19.0 (4 sites),
aquasecurity/trivy-action v0.35.0 -> v0.36.0 (2 sites).
- Scope prerelease tag cleanup in docker-publish.yml to
prerelease-v${RELEASE_VERSION}-* so concurrent prereleases for other
versions and any unrelated prerelease-* tags survive.
- Correct Trivy SARIF comment (artifact-only, not GitHub Security tab).
* fix(ci): bump harden-runner pin in trigger-prerelease-docker to v2.19.0
The new job inherited an older v2.17.0 pin while the rest of release.yml
(and docker-publish.yml) is uniformly on v2.19.0. Align them.
|
||
|
|
b632ca8ec4 |
feat(release): migrate to towncrier news fragments (#3773)
* feat(hooks): version-check staged release-notes against current release
When a release-notes file is staged, compare its version against the
current latest GitHub release (preferred) or __version__.py (fallback,
when gh CLI is unavailable). Warn if the staged version isn't ahead.
Semantics chosen so that __version__.py being ahead of releases (the
normal pre-release state) does NOT trigger a false alarm:
- vs latest release: file MUST be > release. Equality means the
version was already published — almost always a stale/duplicated
notes file. Warn.
- vs __version__.py (fallback): file MUST be >= version.py. Equality
is correct — that's the upcoming release. Only file < version.py
is suspicious. Warn.
The warning includes the source ("latest GitHub release" or
"__version__.py (gh unavailable)") and suggests the next patch /
minor / major versions.
Robustness:
- gh call has a 5s timeout and falls back gracefully on missing
binary, network failure, or no releases yet.
- Files with non-versioned names (e.g., a hypothetical README.md
inside docs/release_notes/) are skipped silently.
- Hook still always exits 0 — non-blocking nudge, never fails the
commit.
* feat(release): migrate to towncrier news fragments
LDR's PR throughput (~12 PRs/day, releases every 1–2 days, ~25–50 PRs
per release) made the shared docs/release_notes/<version>.md model
unworkable — every contributor was racing to edit the same file, and
the file's name kept moving as the version did.
Replace it with the standard towncrier flow used by Twisted, urllib3,
and pip:
- Each PR drops one fragment under news/<id>.<category>.md, where
<id> is the PR/issue number and category is one of: breaking,
security, feature, bugfix, removal, misc. Orphan fragments
(no PR/issue) use +<slug>.<category>.md.
- At release prep time the maintainer runs:
pdm run towncrier build --version <X.Y.Z> --yes
which renders fragments into docs/CHANGELOG.md and deletes them.
- The release workflow extracts the just-rendered section from
docs/CHANGELOG.md (via awk) and uses it as the human-narrative
input to the published release body, alongside the AI TL;DR and
the auto-generated PR list.
Existing docs/release_notes/{0.2.0,0.4.0,1.6.0,1.6.8,1.7.0}.md stay
untouched as historical record.
The pre-commit hook is rewritten to nudge for news/ fragments
instead of the old shared file. The version-check and staging-marker
scanner from PR #3768/#3773 are dropped — fragments don't carry
versions in their names, and towncrier's structural model removes
the staging-marker class of bug entirely. Filename validation
(category in allowlist, name matches expected pattern) is added so
typo'd categories don't silently vanish from the rendered output.
Includes news/3773.feature.md as the first fragment using the new
convention.
* fix(release): allowlist news/ fragments in .gitignore
The repo's whitelist-style .gitignore (`*` then `!<allow>`) was
silently ignoring news/<id>.<category>.md fragments, so the towncrier
migration's first fragment didn't make it into the previous commit.
Add `!news/**/*.md` next to the existing docs/ / examples/ allowlist
entries and re-add news/3773.feature.md.
* refactor(release): use per-version files instead of CHANGELOG.md
Replace the towncrier-on-CHANGELOG.md flow with per-version output
files at docs/release_notes/<version>.md, matching the existing
historical convention and dropping the awk extraction step from the
release workflow.
Towncrier doesn't support per-version filenames in [tool.towncrier]
config, so the maintainer now runs scripts/release/render-notes.sh
<version> at release prep time. The wrapper:
1. Calls `towncrier build --draft --version <X.Y.Z>` to render
fragments to stdout (no file mutation).
2. Captures the output into docs/release_notes/<X.Y.Z>.md.
3. `git rm`s the consumed fragments (deletion staged for commit).
4. Stages the new release-notes file.
Workflow changes:
- Sparse-checkout reverts from docs/CHANGELOG.md to docs/release_notes
- Body composition replaces awk section extraction with `cat
docs/release_notes/${RELEASE_VERSION}.md` — simpler, matches the
layout of historical pre-towncrier release notes (1.6.0.md etc.).
pyproject.toml changes:
- filename now points to docs/release_notes/_pending.md as a guarded
placeholder. Only sees writes if a maintainer bypasses the wrapper
script — clearly named so the mistake is recoverable.
- title_format=false suppresses the inline `## <version> (<date>)`
header. The release page already shows the version as title, and
per-version files don't need an inline version header either.
* fix(hooks): align staged-notice text with per-version-file flow
The previous commit refactored to per-version files, but the
pre-commit hook still pointed contributors at the old
docs/CHANGELOG.md target. Update the staged-notice text to reference
docs/release_notes/<version>.md and the wrapper script that produces it.
* fix(release): correct stale CHANGELOG.md comment and avoid orphan target file
Two follow-ups from review of the towncrier migration:
- pyproject.toml: the [tool.towncrier] block comment still described
the old `pdm run towncrier build --version <X.Y.Z> --yes` →
docs/CHANGELOG.md flow that was abandoned in
|
||
|
|
982d36fb96 |
fix(release): bump AI summary timeout + diagnose empty content (#3783)
* fix(release): bump AI summary timeout + diagnose empty content
The v1.6.8 release run hit two related failures in the AI TL;DR step:
curl: (28) Operation timed out after 120001 milliseconds
WARNING: AI response 2xx but no .choices[0].message.content
— skipping summary
The 120s --max-time was too tight for kimi-k2-thinking (and likely
other thinking models) on a multi-PR release prompt. The retry
succeeded HTTP-wise but returned a response without any extractable
content, so the release shipped without a TL;DR.
Three changes:
1. Default --max-time from 120s to 300s, configurable via
vars.AI_RELEASE_SUMMARY_MAX_TIME. Releases never fail because of
the AI step (the whole block is best-effort), but giving thinking
models five minutes is realistic.
2. Fall back to `.choices[0].message.reasoning_content` when
`.content` is empty. Some providers route thinking-model output
into the reasoning field. Cheap try-and-fall-back; no harm if the
field is missing.
3. When BOTH fields are empty, dump the response shape (top-level
keys, message keys, finish_reason, error field) to step logs.
Bounded to ~4 lines, but enough to debug the next failure
without rerunning.
Behavior unchanged when the call succeeds normally.
* fix(release): bump AI summary timeout default to 15 min
900s (15 min) covers thinking models on multi-PR release prompts
with comfortable headroom. Still configurable via
vars.AI_RELEASE_SUMMARY_MAX_TIME.
|
||
|
|
0f0707abea |
feat(release): prepend docs/release_notes/<version>.md to release body (#3768)
* feat(release): prepend docs/release_notes/<version>.md to GitHub release body
When a release is cut, look for docs/release_notes/${RELEASE_VERSION}.md
and prepend its prose to the GitHub release body. The auto-generated,
label-categorized PR list (from .github/release.yml + GitHub's
generate-notes API) is appended below a horizontal-rule + "## What's
Changed" heading. If the md file is missing, the workflow falls back
silently to auto-notes only — no failure.
The pre-commit hook recommend-release-notes.py already nudges
contributors to stage entries under docs/release_notes/ for substantial
changes, so this wires the end-to-end flow: contributor writes prose →
release publishes prose-first body.
* fix(release): address review findings on notes-prepend logic
Three fixes from the review pass:
1. Drop the manual "## What's Changed" heading. GitHub's
generate-notes API already emits it as the first line of the
auto-body (verified against v1.6.0). Manually inserting another
produced a duplicate heading in the published release.
2. Validate RELEASE_VERSION against a strict semver regex before
using it in a filesystem path. Defense-in-depth — RELEASE_VERSION
already comes from a Git refname or __version__.py, but neither
path validates strictly enough to rule out path traversal.
3. Wrap the gh api generate-notes call so its failure aborts the
step. `set -e` does NOT exit on a failing command substitution
inside an assignment — without this the workflow would silently
publish an empty/partial release body on a transient API error.
* feat(hooks): show release-notes staged notice with format tips
Two changes to recommend-release-notes.py:
1. Always inform the committer when a file under docs/release_notes/
is staged (was: silent). Notes contributors that the file ends up
in the GitHub release body via .github/workflows/release.yml — not
just archived as docs.
2. Embed format tips in both the staged notice and the missing-notes
reminder: no leading `# H1` (release title renders separately), use
`## sections`, mark BREAKING explicitly with an `### Impact`
subsection, link PRs, and strip staging markers before tagging.
* feat(release): prepend AI-generated TL;DR to release body
Adds a best-effort AI summary at the very top of the release body,
above the hand-written notes and the auto-generated PR list. Uses
OpenRouter with vars.AI_MODEL (same convention as ai-code-reviewer.yml,
default moonshotai/kimi-k2-thinking).
Behavior:
- Builds a prompt from the hand-written notes (if any) plus the
auto-generated PR list, asking the model for a 30-second TL;DR
starting with `## TL;DR`.
- Up to 2 attempts (1 retry) with a 5s backoff to absorb transient
API hiccups.
- If OPENROUTER_API_KEY is unset, the call fails, or the response
is unparseable, the step skips the AI section silently. Releases
must never fail because of an LLM hiccup.
Body order on the published release:
1. AI TL;DR (if generated)
2. Hand-written docs/release_notes/<version>.md (if present)
3. Horizontal rule (only if 1 or 2 was emitted)
4. Auto-generated `## What's Changed` PR list
Configuration knobs (all optional, all have sensible defaults):
- secrets.OPENROUTER_API_KEY — enables the AI step
- vars.AI_MODEL — model id (default kimi-k2-thinking)
- vars.AI_RELEASE_SUMMARY_TEMPERATURE (default 0.3)
- vars.AI_RELEASE_SUMMARY_MAX_TOKENS (default 4000)
* feat(hooks): warn when staged release-notes contain staging markers
When a file under docs/release_notes/ is staged, scan its content
(via git show :path) for in-progress staging language and print a
yellow warning listing each hit by line number. Non-blocking — the
hook still always exits 0.
Markers checked:
- "(pending)"
- "Staging notes"
- "Fold into the next tagged version"
These would otherwise publish verbatim into the GitHub release body,
since release.yml prepends the file as-is. The warning lets a
contributor catch leftover staging text before they push the
release commit.
* fix(release): clarify semver-regex comment + remove prompt heading contradiction
Two follow-ups from the latest AI review:
1. The semver-regex comment was misleading. Updated to spell out that
RELEASE_VERSION is the bare semver (no `v` prefix) because the build
job's "Determine version" step strips `refs/tags/v` and reads
__version__.py (bare "1.6.7"), so a leading `v` reaching this check
means an upstream contract change and should hard-fail. Behavior of
the regex itself is unchanged — it correctly rejects `v1.7.0`.
2. The SUMMARY_PROMPT had a self-contradiction: "Open with the literal
heading `## TL;DR`" (H2) vs "no headings above level 3" (which would
forbid H2). Reworded to be explicit: use `## TL;DR`, `###` for
subsections, no `#` and nothing deeper than `###`.
|
||
|
|
89aa0228b8 |
chore(labels): update GitHub labels for release automation clarity (#3764)
- Remove emoji variant `maintenance 🔧` from workflows and release.yml
in favor of plain `maintenance` label
- Replace deleted `automated`/`version-bump` labels in version_check.yml
with `automation`/`maintenance`
- Add 7 previously-uncategorized labels to release.yml categories:
css, ui-ux, accessibility, snappy → Frontend Changes
dev-bugfix → Bug Fixes
dev-enhancement → New Features
developer-experience → Code Quality & Refactoring
- Update workflows README documentation
Accompanied by GitHub label operations (via gh CLI, not in this commit):
- Created 4 missing labels: bugfix, docs, CI/CD, ai-review-requested
- Deleted 17 junk/duplicate labels
- Updated descriptions for all 72 labels with release note priority info
|
||
|
|
1abd8b9138 |
fix(ci): use pdm lock instead of pdm update in dependency workflow (#3755)
The workflow used `pdm update -u --no-sync --no-self` which caused `pdm lock --check` to fail in pre-commit CI. The `-u` (--unconstrained) flag modifies pyproject.toml with relaxed version constraints, but the workflow only commits pdm.lock — discarding those pyproject.toml changes. When `pdm lock --check` re-resolves against the original pyproject.toml, it produces a different lockfile, causing the check to fail. Using `pdm lock` fixes this because it uses the same resolution code path as `pdm lock --check`, respects pyproject.toml constraints, and never modifies pyproject.toml. |
||
|
|
ff1cda3e7c |
fix(ci): give gh CLI repo context in monitor-publish (#3742)
monitor-publish has no checkout step, so the gh CLI cannot infer the repository — and it does not fall back to GITHUB_REPOSITORY. Every `gh run list` call therefore failed with "failed to determine base repo", the error was swallowed by `2>/dev/null`, and the polling loop saw an empty result for all 40 minutes. Result: every release produced a false "Partial publish failure" issue showing both Docker and PyPI as `timed_out`, even though both publishes succeeded. Set GH_REPO from github.repository, and stop hiding gh's stderr so future failures are visible in the runner log instead of silent. |
||
|
|
5a3705c7d2 |
ci: temporarily disable nuclei DAST scan from release gate (#3720)
Nuclei scan is causing significant slowdowns. Commenting it out of the release gate so the current release can proceed. The scan workflow file (.github/workflows/nuclei.yml) is left intact for easy re-enablement. |
||
|
|
37335c907d |
ci(playwright-webkit): drop checks: write to satisfy Scorecard (#3704)
Removes the `checks: write` job-level permission from both the desktop-safari and mobile-safari jobs in playwright-webkit-tests.yml. The permission was only needed by the EnricoMi/publish-unit-test-result-action "Publish Test Results" step in each job, which is also removed. Test results remain available via the "Upload Playwright Report" artifact step (already uploads test-results/results.xml). Failing tests still fail the job in the "Run ... Safari Tests" step, so the release-pipeline gate is unchanged. Closes Scorecard alerts #7715, #7716. |