ci(release): build-once-promote refactor for Docker pipeline (#3977)

* ci(release): build-once-promote refactor for Docker pipeline Today the release pipeline builds the Docker image twice — once in prerelease-docker.yml for "testing" and again in docker-publish.yml for the actual release. The image you tested is not the image you ship: base layer patches, transitive deps, and apt/pip resolution can diverge between the two builds. This refactor makes prerelease-docker.yml the canonical build and turns docker-publish.yml into a thin retag step. `docker buildx imagetools create` is a registry-side metadata operation that takes seconds and preserves the manifest digest, so the released image is bit-identical to the one tested. Cosign signatures, SBOM attestations, and SLSA provenance are stored at sha256-<digest>.{sig,att} keyed by digest, so signing once in prerelease covers the release tags transitively. Pipeline shape changes: - prerelease-docker.yml is now a reusable workflow (workflow_call) called from release.yml. It builds, scans (Trivy), signs (cosign), attests the SBOM (cosign attest --type spdxjson, replacing the deprecated cosign attach sbom), and emits SLSA provenance. The manifest_digest is exposed as a workflow output. The `prerelease` environment gates the first build job for human approval. - docker-publish.yml shrinks from ~457 to ~250 lines. It receives source_tag and expected_digest in the dispatch payload, verifies the source digest before retag, retags via imagetools create, verifies the digest is preserved (defense against re-encoding), re-runs Trivy against the digest (catches CVE-DB updates between prerelease and promote), verifies the cosign signature transitivity, and runs the existing prerelease cleanup loop. - release.yml adds prerelease-docker to create-release.needs and trigger-workflows.needs, so the GitHub Release and the publish dispatch only happen after the canonical Docker build completes. The dispatch payload now carries source_tag and expected_digest. A new cleanup-on-rejection job removes orphan prerelease tags and cosign artifacts when the release is rejected (without it, every rejection would leave dangling sha256-<digest>.{sig,att} on Docker Hub). - README cosign verify example updated to the keyless invocation users actually need (identity regex pointing at prerelease-docker.yml, --certificate-oidc-issuer, --certificate-github-workflow-repository), plus the SBOM verify-attestation command. Notable design decisions (verified across multiple subagent review rounds): - SLSA provenance entryPoint stays as release.yml (the top-level caller). Per the SLSA GHA buildtype v1 spec and the canonical slsa-github-generator behavior, reusable workflows are explicitly NOT entryPoints — pointing at prerelease-docker.yml would break verifier policies that allowlist trigger workflows. - Cosign cert identity for verification matches Fulcio's SAN URI, which is built from job_workflow_ref — the CALLEE for reusable workflows. So the identity regex matches prerelease-docker.yml even though the build is invoked from release.yml. Hardened with escaped dots, refs/(heads|tags)/ constraint, and --certificate-github-workflow-repository to defend against the reusable-workflow-identity-reuse class of attacks. - cleanup-on-rejection uses an allowlist if (failure || cancelled), not a denylist (!= 'success'), to avoid firing on `skipped` (e.g. when release_exists short-circuits the run). It also fails loudly on 401/403 from the Docker Hub API so a missing Delete scope on the PAT can't silently let orphans accumulate. Supersedes #3969 (split-environment): the env split is preserved by the new structure — prerelease env on the called workflow's first job, release env on create-release/trigger-workflows. Pre-merge checklist for the maintainer: - Create the `prerelease` environment in GitHub Settings with the same required reviewers as `release`. Without it, the called workflow's approval gate auto-creates the env with no protection rules and silently approves the build. - Verify DOCKER_USERNAME / DOCKER_PASSWORD remain repo-level secrets (they currently are). Environment-scoped secrets do not propagate across reusable workflow calls except via the called job's own environment. * ci(release): fixes from multi-round subagent review Round 1 surfaced 14 candidate findings; Round 2 verified 7 as real bugs and refuted 4 as false positives. This commit applies the verified fixes. CONFIRMED bugs fixed: 1. **Approval gate was per-job, not workflow-wide.** The previous `environment: prerelease` on `build-amd64` only let `build-arm64` and `security-scan` run pre-approval (GitHub environments are job-scoped per docs + community/discussions/174381). Replaced with a sentinel `approval-gate` job that all three build jobs `needs:`. Single approval click still gates everything, but now actually blocks all parallel jobs. 2. **`cleanup-on-rejection` if-condition missed the prerelease-rejection path.** When prerelease-docker.result was `failure`, both create-release and trigger-workflows became `skipped` (their `if:` requires success), and the cleanup `if:` only fired on `failure`/`cancelled` of dependents. Added explicit `prerelease-docker.result == 'failure'` clause so the most common rejection path actually triggers cleanup. 3. **Trivy re-scan ran AFTER retag.** A failing scan would leave release tags `:1.6.9`, `:1.6`, `:latest` publicly published with no rollback. Reordered: scan source digest BEFORE retag. Content is bit-identical (same digest), so scanning the prerelease tag tests what would be promoted — but failure now leaves no public broken tags. Also moved cosign verify before retag for the same reason. 4. **Trivy only scanned linux/amd64 by default** against a manifest list digest (per Trivy docs + aquasecurity/trivy#7847). Replaced single scan with two explicit per-platform invocations (`--platform linux/amd64`, then `linux/arm64`) so arm64 layers are also gated by the freshness check. 5. **Trivy DB freshness wasn't guaranteed.** apt-installed Trivy may use a stale embedded DB. Added explicit `trivy image --download-db-only` before the scans so the CVE-DB freshness window the re-scan exists for is actually exercised. 6. **`cosign attest` re-runs accumulated attestation layers** (verified via cosign 2.x `mutate.go` `dedupeAndReplace`). Added `--replace` to both attest calls (SLSA provenance + SBOM). Sigstore spec allows multi-sig so `cosign sign` is left as-is. 7. **SLSA provenance values inherited from old code were misleading.** - `builder.id`: changed from `https://github.com/actions/runner` (the agent binary) to the workflow ref the build is actually defined in (per SLSA v0.2 spec — builder.id should be a verifiable trust root). - `completeness.{parameters,environment,materials}`: flipped from `true` to `false`. The predicate captures no workflow_call inputs, no environment, and the build does network I/O — claiming completeness was a public signed false statement. - `buildInvocationId`: now includes `${run_id}-${run_attempt}` so re-runs are distinguishable. REFUTED (kept as-is, with confidence): - `imagetools create` does NOT change the digest in this case. Buildx's Combine() in util/imagetools/create.go has an explicit short-circuit for single-source manifest-list inputs that returns the bytes byte-for-byte (no annotations + same registry required, both true here). - Concurrent rejection digest collision is not a real concern — Docker builds in this pipeline are not bit-deterministic (apt, network, file timestamps, default provenance attestations all vary). - The `prerelease-v1.6.9-*` cleanup pattern does NOT collide with `prerelease-v1.6.91-*` (trailing dash in the prefix disambiguates). - Reusable-workflow approval prompts appear inline on the caller run page for single-level calls — not a UX regression. * ci(release): revert most Round 2 review additions Keep the build-once-promote refactor's structural shape but back out the defensive additions from commit 68606b299: - approval-gate sentinel job → revert to `environment: prerelease` on build-amd64 only - SLSA builder.id, completeness flags, buildInvocationId → revert to inherited values from the previous docker-publish.yml - `cosign attest --replace` → drop, accept default append behavior - Pre-promote Trivy + multi-platform scans + db refresh + pre-promote cosign verify → revert to single post-promote scan and post-promote cosign verify - cleanup-on-rejection if-condition → drop the `prerelease-docker.result == 'failure'` allowlist clause Rationale: keep the change set minimal vs main. The defensive additions were correct in isolation but expand scope of this PR. * fix(ci): drop invalid --trivyignores flag from raw trivy CLI invocation The Round 2 promote step used `--trivyignores .trivyignore`, which is the INPUT name of the aquasecurity/trivy-action wrapper, not a flag of the raw Trivy binary. The CLI accepts only `--ignorefile` (singular) and auto-loads `.trivyignore` from cwd by default. As-was, every release run would hard-fail with `unknown flag: --trivyignores` from cobra/pflag before any scanning occurred. Removing the flag is sufficient — Trivy auto-loads the ignorefile from the checkout root. prerelease-docker.yml is unaffected: it uses the action wrapper with `trivyignores: '.trivyignore'` as input, which IS correct usage for the action layer (it translates to --ignorefile internally via TRIVY_IGNOREFILE). Sources: - https://trivy.dev/latest/docs/references/configuration/cli/trivy_image/ - https://github.com/aquasecurity/trivy-action/blob/master/action.yaml * ci(release): apply remaining bugs from multi-round review After Round 4 verification confirmed several deferred findings, applying the bug fixes the user explicitly requested: 1. Re-introduce the `approval-gate` sentinel job in prerelease-docker.yml. GitHub Actions environments are job-scoped, so without a gate sentinel `build-arm64` and `security-scan` would run pre-approval — pushing the `-arm64` per-arch tag and consuming Trivy minutes regardless of whether the maintainer approved or rejected the gate. Single approval click still gates everything via `needs: [approval-gate]`. 2. Fix the SLSA `builder.id` to use `${{ github.workflow_ref }}` instead of the inherited `https://github.com/actions/runner` agent identity. `workflow_ref` resolves to the canonical `<owner>/<repo>/.github/workflows/<file>.yml@<callee-ref>` format that matches slsa-github-generator's output and that verifier policies can pin against. 3. Flip SLSA `completeness.{parameters,environment,materials}` from `true` to `false`. The predicate captures no workflow_call inputs, no environment, and the build does network I/O — claiming completeness was a public signed false statement. 4. Add `${{ github.run_attempt }}` to the SLSA `buildInvocationId` so "Re-run failed jobs" attempts are distinguishable. 5. Expand `cleanup-on-rejection` `if:` to include `prerelease-docker.result == 'failure'` and `'cancelled'`. Without these clauses, the most common rejection path (env approval rejected for prerelease) leaves dependents `skipped`, which the existing allowlist doesn't match — orphan tags persist on Docker Hub forever. 6. Drop unused `packages: write` from both the called workflow and the caller's reusable-workflow block. Docker Hub auth uses DOCKER_PASSWORD, not GITHUB_TOKEN; `packages: write` only matters for ghcr.io which the project doesn't use. 7. Update `docs/CI_CD_INFRASTRUCTURE.md` Build & Deploy table to reflect the build-once-promote split. 8. Update `docs/RELEASE_GUIDE.md` "Automatic Publishing" section to describe both approval gates (`prerelease` and `release`). * ci(release): R5/R6 review fixes — cosign pin, multi-arch SBOM, orphan SBOM Round 5 (10 agents) and Round 6 (5 agents debunking) verified these findings, all of which are now applied: 1. **Pin cosign to v2.6.0**. R6A2 verified that `sigstore/cosign-installer@v4.1.2` ships cosign v3.0.6 by default. cosign v3 enables `--new-bundle-format` ON BY DEFAULT, which changes the on-wire signature/attestation format. Mismatched version across sign/verify works in-pipeline (both on v3), but downstream verifiers running the README cosign-verify recipe on v2 would fail. Pinning all three cosign-installer steps to v2.6.0 keeps the legacy tag-based sigstore format until we deliberately migrate the entire ecosystem. 2. **Multi-arch SBOM via per-arch attestations**. R6A3 verified the claim (anchore/syft#1708, actions/attest-sbom#60): syft against a manifest list digest only scans the host platform's layers. The previous SBOM attestation against the manifest digest claimed to describe both amd64 + arm64 but actually only enumerated amd64. ARM64 consumers were verifying a misleading SBOM. Fix: iterate over manifest entries from `imagetools inspect --raw`, run `syft --platform <plat>` against each per-arch digest, and `cosign attest --replace --type spdxjson` each per-arch SBOM against the per-arch digest. ALSO keep a manifest-list-level SBOM (host arch only) so end-users running `cosign verify-attestation user/img:latest` don't get an empty result. 3. **Re-add `--replace` to cosign attest** (both SLSA and SPDX). R5A7's deeper analysis enumerated specific failure modes beyond cosmetic clutter: Kyverno `count: 1` policies, registry layer count caps, audit ambiguity (verify returns success on first matching layer), Rekor entry bloat. R3A5 already confirmed `--replace` is per- predicate-type, so SLSA and SPDX attestations don't disturb each other. 4. **Container-image SBOM no longer orphaned**. R6A4 verified: the Syft-produced container SBOMs were uploaded as artifact `sbom` from prerelease-docker.yml but never downloaded by `create-release` — they were invisible on the GitHub Release page. Fix: download the `sbom` artifact, rename to `sbom-container-*` to disambiguate from the filesystem `sbom-spdx.json`, and attach to `gh release create`. 5. **Narrow `secrets: inherit` to explicit secrets**. R5A3 flagged that `secrets: inherit` propagates ALL repo secrets (PAT_TOKEN, OPENROUTER_API_KEY, SERPER_API_KEY, GITHUB_TOKEN) into a workflow that only needs Docker Hub creds. Replaced with explicit `DOCKER_USERNAME` + `DOCKER_PASSWORD` mapping; the called workflow now declares these as required `workflow_call.secrets`. 6. **Drop unused `DEPS_HASH` build-arg**. R5A2 confirmed it was declared in the Dockerfile but never referenced in any RUN/COPY, so it never busted the Docker layer cache. Cache invalidation already happens correctly via `COPY pdm.lock` (file content hash). Removed the ARG declaration from Dockerfile and the three `build-args:` passes from prerelease-docker.yml. R6 also REFUTED two earlier claims: - R5A8's concurrency claim: reusable workflows DO share the caller's `workflow_run` and concurrency group (R3A8 was correct). Don't add a `concurrency:` block to prerelease-docker.yml — would create a separate group and re-introduce the race R5A8 imagined. - R5A10's harden-runner CVE claim: v2.19.1 (used here) is well after the fix versions for both CVE-2026-32946 (v2.16.0) and CVE-2026-25598 (v2.14.2). No bump needed. * ci(release): R7 fixes — cosign v2.6.3, drop misleading manifest-level SBOM Round 7 (5 agents) verified the R5/R6 fixes and surfaced two real bugs: 1. **cosign-installer pinned cosign v2.6.0**, which has two known security advisories: GHSA-whqx-f9j3-ch6m (fixed in v2.6.2) and GHSA-w6c6-c85g-mmv6 (fixed in v2.6.3). Bumped pin to v2.6.3 in all three workflow files so the install step picks up the fixes. Same minor (v2.6.x), so no flag drift — `--replace`, `--type`, `--bundle`, `--certificate-*` all behave identically. 2. **The manifest-level SBOM attestation was misleading**. The previous step ran `syft <repo>@<manifest-list-digest>` on an amd64 runner, which (per anchore/syft#1708) only enumerates amd64 layers. The SBOM was then attested at the manifest-list digest where it was discoverable by ALL platform consumers — so an arm64 user verifying `:latest` would receive a signed SBOM that lies about the layers they actually pulled. The per-arch loop already produces accurate per-platform SBOMs; the manifest-level fallback only re-introduced the lie for UX convenience. Dropped the manifest-level attest call entirely. Per-arch SBOMs are the only honest representation. Updated the README's `cosign verify-attestation` recipe to resolve to the per-platform digest first (using `jq` over `imagetools inspect --raw`), so end-users on either architecture get the SBOM that actually describes what they pulled. Removed `sbom.spdx.json` from the workflow artifact + release-staging logic since it no longer exists. 3. **Empty-loop assertion**: added a defensive count check before the per-arch SBOM loop. If a future buildx output change ever produced zero per-arch entries (e.g., all entries marked architecture: unknown), the previous code would silently skip the loop and pass CI green with no SBOMs. Now it fails loud with the raw manifest dumped for debugging. Note on round-7 reviewer's other concerns: - "Pipe-to-while subshell scope": confirmed safe. set -euo pipefail inherited; failures in syft/cosign attest abort the subshell, and pipefail propagates to the outer step. - "imagetools inspect --raw stability": OCI image-index spec is stable for ~7 years. The jq filter handles the BuildKit attestation pseudo- entries via `architecture != "unknown"`. - "harden-runner v2.19.1 CVEs": false alarm. v2.19.1 is well above the fix versions (v2.16.0, v2.14.2). No bump needed. * ci(release): R8 fixes from 8th review round Round 8 (5 agents covering Dockerfile, npm/Vite, runtime image, edge cases, and post-fix smoke check) surfaced 7 real bugs the previous 7 rounds missed. All fixed here, plus a comment per user request. 1. **docker-publish.yml checkout pinned to released tag**. The promote step reads `.trivyignore` from cwd; a `repository_dispatch`-triggered checkout defaults to the default branch's tip, which can drift between prerelease scan and promote scan if `.trivyignore` is edited on main while the release awaits approval. Added `ref: ${{ github.event.client_payload.tag }}` to checkout. 2. **docker-publish.yml concurrency block added**. release.yml has its own concurrency, but docker-publish.yml is a separate workflow run. Two near-simultaneous publish-docker dispatches for the same release tag (e.g., a manual re-trigger after a transient Docker Hub 5xx) could interleave and have their cleanup-loop prefix-match deletions race each other. Group: `publish-docker-${{ github.event.client_payload.tag }}`, cancel-in-progress: false. 3. **publish.yml's frontend builder bumped from Node 20 → 24** to match `package.json`'s `engines: { node: ">=24.0.0" }`. Mismatched Node versions across the PyPI build (Node 20) and the Docker image (Node 24, installed via NodeSource) could resolve transitive deps differently and ship frontend assets that fail at runtime. Pinned to specific `node:24-alpine` SHA. 4. **HEALTHCHECK no longer leaks Python processes**. The old `urllib.request.urlopen(...)` had no Python-level timeout, so a hung-but-alive backend would freeze the probe until Docker's outer timeout SIGKILL'd it — leaving a Python process per probe interval leaking PIDs/FDs over time. Added `timeout=5` and an explicit `r.status == 200` check so non-200 2xx responses (e.g., from misconfigured proxies) don't pass. 5. **Removed broken `VOLUME /scripts/`**. /scripts is image content (the ollama entrypoint baked in by the layer below the VOLUME directive), not user state. A VOLUME on an image-populated path causes anonymous- volume accumulation on every `docker run` and silently shadows the script if a user ever bind-mounts it. 6. **Added `VOLUME /data`** so users who don't bind-mount don't silently lose research data + encrypted DBs on `docker rm`. The entrypoint creates the persistent state at /data/{logs,cache,encrypted_databases}, but without VOLUME the directory is part of the writable image layer. 7. **Stale comment in release.yml** (the SBOM download step) updated — no longer mentions the manifest-level SBOM that was dropped in commit 33d69b4e4. Plus one comment update per user request: 8. **`apt-get upgrade -y` rationale comment** added at the build-once-promote section of the Dockerfile (top stage), and cross-referenced from the other two `apt-get upgrade` sites (ldr-test stage and runtime stage). Documents that the trade-off of bit-for-bit reproducibility for always-fresh CVE patches is intentional, and explains how build-once-promote mitigates the reproducibility loss. * ci(release): clean up per-arch cosign attestation orphans on rejection Round 9 found that the per-arch SBOM attestations introduced in commit 11e702f7d (the multi-arch SBOM fix) live at `sha256-<per-arch-digest>.{sig,att,sbom}` keyed by the PER-ARCH manifest digests, not the manifest-list digest. The cleanup-on-rejection job only knew the manifest-list digest, so on rejection paths the per-arch attestation artifacts were left orphaned on Docker Hub forever — and unreachable through any tag, since the per-arch leaf tags were also deleted. Fix: before deleting the manifest tag, inspect it via `imagetools inspect --raw` to discover the per-arch digests, then queue per-arch `{sig,att,sbom}` deletions alongside the manifest-level cleanup. If the manifest tag doesn't exist (e.g., build failed before manifest creation), log a clear warning and proceed — the per-arch artifacts wouldn't have been created in that case anyway. * ci(release): drop prerelease env gate — use single release approval The `prerelease` environment approval was a holdover from when prerelease docker was a SEPARATE test build alongside the release build (two distinct artifacts, two distinct decisions). In the build-once-promote model the "prerelease" image IS the release image (just under a different tag), so gating the BUILD with a human approval is redundant — the only meaningful decision is whether the tested image becomes the official release. Changes: - Remove the `approval-gate` sentinel job in prerelease-docker.yml. - Drop `needs: [approval-gate]` from build-amd64, build-arm64, and security-scan. They now run automatically once release.yml's security + CI gates pass. - Update workflow comments in release.yml and prerelease-docker.yml to reflect the single-gate flow. - Update RELEASE_GUIDE.md "Approval and Publishing" section: now describes ONE `release` env approval, not two. - Update CI_CD_INFRASTRUCTURE.md row for prerelease-docker.yml. The cleanup-on-rejection job is unchanged — its triggers still fire correctly on prerelease-docker `failure`/`cancelled` (build/sign/attest errors) and on create-release / trigger-workflows `failure`/`cancelled` (release env rejection). One fewer rejection path to consider, but the mechanism is the same. Operational benefits: - One fewer approval click per release - One fewer GitHub Environment to create as a pre-merge setup step (no more "create the `prerelease` env in Settings before merging") - Build completes during/after security gates, so the prerelease tag is ready by the time the maintainer is ready to test * ci(docker-publish): group GITHUB_OUTPUT writes (shellcheck SC2129) CI's actionlint hook (which runs shellcheck on workflow run blocks) flagged the 'Determine release tags' step for issuing five sequential `echo ... >> "$GITHUB_OUTPUT"` redirects. Grouped them into a single braced block + one redirect, per SC2129's recommendation. * docs(release): correct approval flow after env-scoped secrets merge After merging main, prerelease-docker.yml's four jobs declare `environment: release` (PRs #3978/#3983) because DOCKER_USERNAME and DOCKER_PASSWORD are env-scoped. That means the first `release` env approval now gates the canonical build, not just the publish step — the "automatic build then test then approve" flow described in earlier docs no longer matches reality. - RELEASE_GUIDE.md: rewrite the approval section to describe two release-env approvals (release.yml + docker-publish.yml) and the narrow Docker-only test window between them. - CI_CD_INFRASTRUCTURE.md: update the prerelease-docker.yml row. - release.yml: rewrite the `prerelease-docker:` job comment to reflect that this step is gated, not automatic, and explain why. * ci(release): atomic publish ordering — GitHub Release runs last (#4044) * ci(release): make GitHub Release publishing atomic with Docker + PyPI Before this change, `create-release` published the public GitHub Release BEFORE `docker-publish.yml` retagged and BEFORE `publish.yml` shipped to PyPI. If either downstream failed, the public Release pointed at non-existent artifacts. This change closes that window: - Convert `docker-publish.yml` from `repository_dispatch` to `workflow_call`. Its result is now visible to release.yml as `needs.publish-docker.result`, which lets: * `create-release` block on Docker promote success * `cleanup-on-rejection` safely scope cosign artifact deletion to cases where retag failed (after a successful retag, release tags share the prerelease manifest digest, so cosign artifacts must stay — deleting them would invalidate release-tag verification) - Keep `publish.yml` on `repository_dispatch`. PyPI Trusted Publishing matches the OIDC `workflow_ref` claim against the CALLER when invoked via `workflow_call`, so a reusable publish.yml would fail with `invalid-publisher`. Tracked in pypa/gh-action-pypi-publish#166 and pypi/warehouse#11096. - Restructure release.yml job graph: prerelease-docker → publish-docker (reusable) → trigger-pypi → monitor-pypi → create-release (LAST) - Rewrite `cleanup-on-rejection` with a partial-retag rollback preamble. `imagetools create -t :VERSION -t :MAJOR_MINOR -t :latest` is a single process with multiple registry calls, so a mid-step failure can leave some release tags landed. The cleanup script now checks each release tag against Docker Hub and rolls back any that exist BEFORE deleting cosign signature/attestation artifacts. - Slim `monitor-publish` → `monitor-pypi` (only watches publish.yml now; Docker is tracked natively via the inline job result). - Drop the workflow-level `concurrency:` block from docker-publish.yml. As a reusable workflow it shares release.yml's run, and release.yml's caller-level concurrency on `github.ref` already serialises releases for the same tag. - Update `docs/CI_CD_INFRASTRUCTURE.md` workflow-table rows and `docs/RELEASE_GUIDE.md` approval-flow section to describe the new ordering, plus a "Recovery from PyPI failure" section documenting the one remaining atomicity hole (PyPI fails after Docker success — Docker release tags exist, no PyPI, no GH Release; manual re-dispatch needed). Plan + 5-agent Round 1 review notes saved separately. * fix(release): plug blockers found in multi-round PR review Four fixes against the atomicity refactor — two blockers that would break the next release, two hardening items found while verifying them. B1 (BLOCKING): docker-publish.yml checked out at `ref: inputs.tag` (e.g. v1.6.11), but the v* git tag is created by `create-release` which runs LAST in the job graph — after `publish-docker`. So on every push-to-main triggered release (the documented primary path) the checkout would fail with `fatal: couldn't find remote ref v1.6.11`. Switch to `ref: github.sha`: same triggering commit the build and prerelease-docker jobs used, exists at the moment publish-docker runs for every event type, and still satisfies the original goal of pinning .trivyignore to the scanned commit. B2 (BLOCKING): cleanup-on-rejection referenced env-scoped DOCKER_USERNAME / DOCKER_PASSWORD but had no `environment: release`, so those secrets resolved to empty strings and the Docker Hub login exited 1 — leaving the orphan tags + cosign artifacts the cleanup was meant to remove. Add `environment: release`. The `release` env approval was already granted upstream in the run, so no new prompt. H1: monitor-pypi's `Wait for PyPI publish workflow to complete` step piped `gh run list | jq ...` without `set -euo pipefail`, so a transient gh failure (network, auth, rate limit) was swallowed by jq returning empty input — burning the full 40-minute budget on silent error rather than failing fast. Add `set -euo pipefail`. H2: cleanup-on-rejection's step 2 did not delete the floating `:prerelease` tag. If a release was rejected after prerelease-docker re-pointed `:prerelease`, step 4 deleted the cosign signature for that manifest while `:prerelease` still pointed at it — yielding a window where pulling `:prerelease` returns an image the README cosign-verify recipe cannot verify. Include `prerelease` in step 2's delete loop; the next successful prerelease-docker re-creates it. * chore(release): follow-up cleanups from PR review Bundle of low-risk follow-ups from the multi-round review of this PR. All same-scope as the atomicity refactor — staleness this PR introduced in docs/comments, hardening adjacent to the changed code paths. L1 (hardening): Drop `id-token: write` from `publish-docker` (caller) and `docker-publish.yml` `promote` (callee). cosign VERIFY is a read-only check against public Rekor/Fulcio; no GitHub OIDC token is minted, so the permission is unused. Signing (which DOES need the write) is exclusively in prerelease-docker.yml. L7 (stale comments): prerelease-docker.yml's header comments still referenced `trigger-workflows` — a job this PR split into `publish-docker` + `trigger-pypi`. Replaced both occurrences. L4 (doc): RELEASE_GUIDE.md "Emergency Procedures" claimed a manual GitHub release "still triggers PyPI/Docker" — false under the new design (publish.yml is repository_dispatch-only and docker-publish.yml is workflow_call-only, neither listens on `release:` events). Replaced with the actual recovery hierarchy. L5 (doc): RELEASE_GUIDE.md and CI_CD_INFRASTRUCTURE.md pipeline chains omitted the `provenance` job between `build` and `prerelease-docker`. L6 (doc): RELEASE_GUIDE.md described monitor-pypi's timeout as a flat "40 min" — the inner poll loop is 40 min but the surrounding `timeout-minutes:` is 90 min, so the user-facing failure surface differs. L4-bonus (doc): Manual-trigger section also claimed workflow_dispatch takes "version and prerelease flag" inputs — release.yml's `workflow_dispatch:` has no inputs defined. Replaced with the actual behavior (reads __version__.py at HEAD; use tag-push for older versions). M5 (doc): Both PAT_TOKEN comments overstated required scopes — claimed `workflow` scope was needed (it isn't; it only governs editing .github/workflows/ via the API) and didn't make explicit that `public_repo` is rejected by `repository_dispatch`. Rewritten. M8 (correctness): docker-publish.yml's cosign verify step targeted the mutable `:VERSION` tag instead of `@${EXPECTED_DIGEST}`. The preceding verify-promoted-tags step already confirms the tag resolves to the expected digest, but using the tag here leaves a tag-resolution TOCTOU window between the two steps. Trivy's re-scan already uses `@${EXPECTED_DIGEST}`; switching cosign to the same reference is consistent and races-free. L2 (style): While editing the cosign step, routed `github.repository` through an `env:` var (`REPO`) instead of direct `${{ }}` template interpolation into shell args, matching the convention in the rest of this workflow. * chore(ci): bump harden-runner pin in docker-publish.yml to match other workflows Last remaining v2.19.1 reference — every other workflow in this PR was bumped to v2.19.3 when main moved forward. Auto-merge missed this one because the surrounding hunk was in a conflict region. * chore(release): fixes from multi-round subagent review of the full PR Bundle of low-risk fixes confirmed by 30 subagents across 3 rounds. None are blockers; all are worth fixing in-scope. 1. SLSA provenance builder.id: was github.workflow_ref, which inside a workflow_call callee resolves to the CALLER (release.yml), not the intended callee (prerelease-docker.yml). The Fulcio cert is still right (built from the job_workflow_ref OIDC claim), so cosign verify and slsa-verifier are unaffected, but raw-JSON consumers reading builder.id would see release.yml. Compose the value from github.repository + hardcoded path + github.ref instead — the `job` context has no workflow_ref property (actionlint confirms), and for a local-path workflow_call the callee's ref equals github.ref. 2. Dockerfile: set ENV LDR_DATA_DIR=/data so the VOLUME /data directive is actually load-bearing. Without it, paths.py falls back to platformdirs (~/.local/share/local-deep-research) which is inside the ephemeral container layer — bare docker run -v vol:/data users would silently lose data on docker rm. 3. trigger-pypi: forward prerelease=false in client_payload. publish.yml gates Test PyPI vs prod PyPI on client_payload.prerelease == true; if absent, the expression evaluates to '' and falls through to prod. Set false explicitly to remove the silent-fallback landmine. 4. Stale/misleading cosign comments in release.yml: - line 322: said "v2.6.0" while value is "v2.6.3" — corrected and noted GHSA-w6c6-c85g-mmv6 patch coverage - line 332: attributed --bundle to v3.0.2+ but it's been in v2.4.0+ 5. release-gate.yml Node 20 → 24 (mirror publish.yml + Dockerfile). package.json declares engines.node >=24.0.0. The pip-install-check wheel is discarded so this was not a release-blocker, but the gate now validates the actual ship runtime. 6. README cosign-verify recipe: - Guard empty PLATFORM_DIGEST with a clear message for single-arch or pre-build-once-promote releases - Add docker buildx to prerequisites list - Spell out the legacy-verification substitution explicitly * fix(ci): pin Trivy in promote step via SHA-pinned action wrapper AI reviewer flagged docker-publish.yml's promote step as installing Trivy via `sudo apt-get install -y trivy` with no version pin, reintroducing a supply-chain risk to the release path. The prerelease scan in prerelease-docker.yml uses the SHA-pinned aquasecurity/trivy-action @ed142fd... wrapper with `version: 'v0.69.2'`, but the promote step switched to the bare CLI and lost that protection. Replace the apt-get install + raw `trivy image` invocation with the same pinned action wrapper. Same scan semantics (CRITICAL,HIGH, ignore-unfixed, .trivyignore, exit-code 1), same binary version (v0.69.2), same action SHA — keeps the two scans consistent and removes the unpinned apt path. * fix(ci): pin Trivy in release.yml build job — same fix as docker-publish.yml R4 review caught that the AI-reviewer-flagged unpinned Trivy install also exists in release.yml's `build` job, and is STRICTLY WORSE there because that job carries `id-token: write` (for cosign keyless signing of SBOMs). The attack chain that was open: 1. Aqua apt-repo compromise OR MITM of the unpinned GPG-key fetch 2. Malicious `trivy fs` binary installed 3. Binary exfiltrates ACTIONS_ID_TOKEN_REQUEST_URL/TOKEN env vars, minting an OIDC token under repo:LearningCircuit/local-deep-research 4. Binary tampers with sbom-spdx.json / sbom-cyclonedx.json contents 5. Next step `Sign release artifacts with Sigstore` cosign-signs the tampered SBOM with a legitimate Sigstore cert → fraudulent SBOM attached to the GitHub release with valid signature Replace with the SHA-pinned aquasecurity/trivy-action@ed142fd0... (same pin as docker-publish.yml and prerelease-docker.yml) using scan-type=fs for the filesystem scan, with `version: 'v0.69.2'` to pin the binary itself. Two separate action invocations (one per output format) because the action takes a single format per run. Also removes the unpinned `gpg --dearmor` of an unverified-fingerprint public key, which the prior comment misleadingly called "secure". * fix(ci): use TRIVY_USERNAME/PASSWORD env vars for trivy-action auth The trivy-action README prescribes TRIVY_USERNAME/TRIVY_PASSWORD env vars as the supported Docker Hub auth path. Even though docker/login- action already wrote ~/.docker/config.json earlier in the job (and Trivy reads it as a fallback), there's documented fragility with docker.io credential helpers (aquasecurity/trivy#432, aquasecurity/trivy#8385) that surfaces specifically on registry-pull scans like this one (unlike the prerelease scan which uses a locally-loaded image). The fallback would probably work today since localdeepresearch/ local-deep-research is public — anonymous pull would succeed even without auth — but rate-limiting on anonymous Docker Hub pulls is aggressive and the documented credential-helper quirks are real. Adding the env vars uses the action's prescribed auth path, with the same DOCKER_USERNAME/DOCKER_PASSWORD secrets already passed in via workflow_call. Zero-cost defense-in-depth.
2026-06-16 03:51:07 +03:00 · 2026-05-22 21:52:46 +02:00
parent 01a5b81a24
commit 1f0b0a4a95
9 changed files with 1185 additions and 573 deletions
--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@@ -1,24 +1,69 @@
 name: Publish Docker image

-# SECURITY: Only triggered via repository_dispatch from release.yml
-# (after security gate passes). We intentionally do NOT support
-# workflow_dispatch because that would bypass security checks.
+# SECURITY: Only invoked as a reusable workflow (`workflow_call`) from
+# release.yml after the release security gate AND the `release` environment
+# approval pass. We intentionally do NOT support workflow_dispatch — that
+# would bypass gates. We also intentionally do NOT support
+# repository_dispatch any more: with the atomicity refactor, this workflow
+# runs as a job inside release.yml's run so its result is visible to
+# downstream jobs (create-release, cleanup-on-rejection) — a property that
+# repository_dispatch fanout broke.
+#
+# This workflow is a thin RETAG step in the build-once-promote pipeline.
+# The actual build happens in prerelease-docker.yml; the multi-arch manifest
+# is signed and attested there. Here we only:
+#   - Verify the source manifest digest matches what prerelease produced
+#   - Retag the prerelease manifest to release tags (:1.6.9, :1.6, :latest)
+#   - Verify the digest is preserved (defends against imagetools re-encoding)
+#   - Re-run Trivy against the digest (catches CVE-database updates between
+#     prerelease build and release promote)
+#   - Verify cosign signature transitivity from the original digest
+#   - Clean up the prerelease tags
+#
 # To re-publish, trigger a new release through release.yml.
 on:
-  repository_dispatch:
-    types: [publish-docker]
+  workflow_call:
+    inputs:
+      tag:
+        description: "Release tag, e.g. 'v1.6.9' (with leading 'v')"
+        type: string
+        required: true
+      source_tag:
+        description: "Prerelease manifest tag to retag, e.g. 'prerelease-v1.6.9-abc1234'"
+        type: string
+        required: true
+      expected_digest:
+        description: "sha256:... digest of the prerelease manifest, captured by prerelease-docker.yml. Used to verify retag preserves the digest end-to-end."
+        type: string
+        required: true
+    secrets:
+      DOCKER_USERNAME:
+        required: true
+        description: "Docker Hub username (env-scoped to `release`)"
+      DOCKER_PASSWORD:
+        required: true
+        description: "Docker Hub PAT with Read+Write+Delete scopes (env-scoped to `release`)"

 permissions: {}  # Minimal top-level for OSSF Scorecard Token-Permissions

+# NOTE: no workflow-level `concurrency:` block. As a reusable workflow
+# called from release.yml, this workflow runs as part of the caller's run,
+# and release.yml's caller-level concurrency (keyed on github.workflow +
+# github.ref) already serialises release runs for the same tag. Adding a
+# callee-level block would be a no-op for the documented threat model
+# (two simultaneous releases for the same ref) and would not be reachable
+# anyway because there is no longer any standalone invocation path.
+
 jobs:
-  build-amd64:
-    name: Build AMD64 Image
+  promote:
+    name: Retag prerelease manifest as release
    runs-on: ubuntu-latest
    environment: release
    permissions:
+      # cosign verify is read-only against public Rekor/Fulcio — does not
+      # mint a GitHub OIDC token, so id-token: write is not required here.
+      # Signing (which does need id-token: write) happens in prerelease-docker.yml.
      contents: read
-    outputs:
-      digest: ${{ steps.build.outputs.digest }}

    steps:
      - name: Harden Runner
@@ -26,172 +71,19 @@ jobs:
        with:
          egress-policy: audit

-      - name: Check out the repo
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-        with:
-          persist-credentials: false
-          fetch-depth: 0
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
-
-      - name: Log in to Docker Hub
-        uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
-        with:
-          username: ${{ secrets.DOCKER_USERNAME }}
-          password: ${{ secrets.DOCKER_PASSWORD }}
-
-      - name: Build and push AMD64 image
-        id: build
-        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
-        with:
-          context: .
-          platforms: linux/amd64
-          push: true
-          tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:amd64-${{ github.sha }}
-          cache-from: type=gha,scope=linux-amd64
-          cache-to: type=gha,mode=max,scope=linux-amd64
-          build-args: |
-            DEPS_HASH=${{ hashFiles('pdm.lock') }}
-
-  build-arm64:
-    name: Build ARM64 Image
-    runs-on: ubuntu-24.04-arm
-    environment: release
-    permissions:
-      contents: read
-    outputs:
-      digest: ${{ steps.build.outputs.digest }}
-
-    steps:
-      - name: Harden Runner
-        uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
-        with:
-          egress-policy: audit
-
-      - name: Check out the repo
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-        with:
-          persist-credentials: false
-          fetch-depth: 0
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
-
-      - name: Log in to Docker Hub
-        uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
-        with:
-          username: ${{ secrets.DOCKER_USERNAME }}
-          password: ${{ secrets.DOCKER_PASSWORD }}
-
-      - name: Build and push ARM64 image
-        id: build
-        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
-        with:
-          context: .
-          platforms: linux/arm64
-          push: true
-          tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:arm64-${{ github.sha }}
-          cache-from: type=gha,scope=linux-arm64
-          cache-to: type=gha,mode=max,scope=linux-arm64
-          build-args: |
-            DEPS_HASH=${{ hashFiles('pdm.lock') }}
-
-  security-scan:
-    name: Security Scan
-    runs-on: ubuntu-latest
-    environment: release
-    permissions:
-      contents: read
-
-    steps:
-      - name: Harden the runner (Audit all outbound calls)
-        uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
-        with:
-          egress-policy: audit
-
-      - name: Check out the repo
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
-        with:
-          persist-credentials: false
-          fetch-depth: 0
-
-      - name: Free disk space
-        run: |
-          # Remove unnecessary packages to free up space for Docker build + Trivy scan
-          # GitHub runners have limited space; Trivy needs to export the full image to /tmp
-          sudo rm -rf /usr/share/dotnet || true
-          sudo rm -rf /usr/local/lib/android || true
-          sudo rm -rf /opt/ghc || true
-          sudo rm -rf /opt/hostedtoolcache/CodeQL || true
-          sudo docker image prune --all --force || true
-          df -h
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
-
-      - name: Build Docker image for security scan
-        uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
-        with:
-          context: .
-          platforms: linux/amd64
-          push: false
-          load: true
-          tags: local-deep-research:security-scan
-          cache-from: type=gha,scope=linux-amd64
-          cache-to: type=gha,mode=max,scope=linux-amd64
-          build-args: |
-            DEPS_HASH=${{ hashFiles('pdm.lock') }}
-
-      # Generate SARIF report for GitHub Security tab (all severities, doesn't fail)
-      - name: Generate Trivy SARIF report
-        uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
-        with:
-          image-ref: local-deep-research:security-scan
-          format: 'sarif'
-          output: 'trivy-release-scan.sarif'
-          ignore-unfixed: true
-          exit-code: '0'
-          version: 'v0.69.2'
-
-      # Separate scan that fails build only on fixable HIGH/CRITICAL vulnerabilities
-      - name: Check for fixable HIGH/CRITICAL vulnerabilities
-        uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
-        with:
-          image-ref: local-deep-research:security-scan
-          severity: 'CRITICAL,HIGH'
-          ignore-unfixed: true  # Only fail on vulnerabilities with available fixes
-          trivyignores: '.trivyignore'  # Ignore bundled library CVEs that can't be fixed
-          exit-code: '1'
-          version: 'v0.69.2'
-
-      - name: Upload Trivy scan results from release
-        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
-        if: always()
-        with:
-          name: trivy-release-scan
-          path: trivy-release-scan.sarif
-          retention-days: 7  # Reduced for security
-
-  create-manifest:
-    name: Create Multi-Platform Manifest
-    needs: [build-amd64, build-arm64, security-scan]
-    runs-on: ubuntu-latest
-    environment: release
-    permissions:
-      contents: read
-      id-token: write  # Required for Sigstore keyless signing
-      packages: write
-
-    steps:
-      - name: Harden Runner
-        uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
-        with:
-          egress-policy: audit
-
-      - name: Check out the repo
+      - name: Check out the repo at the triggering commit
+        # Pin the checkout to the EXACT commit that the prerelease was
+        # built/scanned from, so .trivyignore (and any other repo-state-
+        # dependent file the promote step reads) matches that commit.
+        # We use github.sha, NOT inputs.tag — the v* git tag is created
+        # by create-release LATER in this run (after publish-docker
+        # completes), so it does not exist yet when this checkout runs
+        # on a push-to-main trigger. github.sha is the triggering commit
+        # for every event type (push to main, tag push, workflow_dispatch)
+        # and is the same SHA the build/prerelease-docker jobs used.
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
        with:
+          ref: ${{ github.sha }}
          persist-credentials: false
          fetch-depth: 0

@@ -206,192 +98,158 @@ jobs:

      - name: Install Cosign
        uses: sigstore/cosign-installer@6f9f17788090df1f26f669e9d70d6ae9567deba6 # v4.1.2
+        with:
+          # Pin to cosign v2.x to match the version that signed the artifact
+          # in prerelease-docker.yml. Mismatched versions across sign/verify
+          # work today but new-bundle-format (cosign v3 default) would only
+          # produce/consume on v3.
+          cosign-release: 'v2.6.3'

-      - name: Install Syft
-        uses: anchore/sbom-action/download-syft@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
-
-      - name: Determine version tag
+      - name: Determine release tags
        id: version
        env:
-          EVENT_NAME: ${{ github.event_name }}
-          DISPATCH_TAG: ${{ github.event.client_payload.tag }}
-          RELEASE_TAG: ${{ github.event.release.tag_name }}
+          DISPATCH_TAG: ${{ inputs.tag }}
+          SOURCE_TAG: ${{ inputs.source_tag }}
+          EXPECTED_DIGEST: ${{ inputs.expected_digest }}
        run: |
          set -euo pipefail
-          if [ "$EVENT_NAME" = "repository_dispatch" ]; then
-            TAG="$DISPATCH_TAG"
-          else
-            TAG="$RELEASE_TAG"
+          if [[ -z "$DISPATCH_TAG" || -z "$SOURCE_TAG" || -z "$EXPECTED_DIGEST" ]]; then
+            echo "::error::Missing required workflow_call input. Got tag='${DISPATCH_TAG}' source_tag='${SOURCE_TAG}' expected_digest='${EXPECTED_DIGEST}'"
+            exit 1
          fi
-          echo "tag=$TAG" >> "$GITHUB_OUTPUT"
-          # Extract version without 'v' prefix
-          VERSION="${TAG#v}"
-          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
-          # Extract major.minor
+          if [[ "$EXPECTED_DIGEST" != sha256:* ]]; then
+            echo "::error::expected_digest must be of the form sha256:... — got '${EXPECTED_DIGEST}'"
+            exit 1
+          fi
+          VERSION="${DISPATCH_TAG#v}"
          MAJOR_MINOR=$(echo "$VERSION" | cut -d. -f1,2)
-          echo "major_minor=$MAJOR_MINOR" >> "$GITHUB_OUTPUT"
-
-      - name: Extract metadata for Docker
-        id: meta
-        uses: docker/metadata-action@030e881283bb7a6894de51c315a6bfe6a94e05cf # v6.0.0
-        with:
-          images: ${{ secrets.DOCKER_USERNAME }}/local-deep-research
-          tags: |
-            type=raw,value=${{ steps.version.outputs.version }}
-            type=raw,value=${{ steps.version.outputs.major_minor }}
-            type=raw,value=latest
-
-      - name: Create and push multi-platform manifest
-        id: manifest
-        env:
-          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
-          META_TAGS: ${{ steps.meta.outputs.tags }}
-        run: |
-          set -euo pipefail
-          # Get the tags from metadata (newline separated)
-          TAGS="$META_TAGS"
-
-          # Store first tag for attestation
-          FIRST_TAG=$(echo "$TAGS" | head -n 1)
-          echo "primary_tag=$FIRST_TAG" >> "$GITHUB_OUTPUT"
-
-          # Create manifest for each tag
-          while IFS= read -r tag; do
-            if [ -n "$tag" ]; then
-              echo "Creating manifest for: $tag"
-              docker buildx imagetools create -t "$tag" \
-                "${DOCKER_USERNAME}/local-deep-research:amd64-${{ github.sha }}" \
-                "${DOCKER_USERNAME}/local-deep-research:arm64-${{ github.sha }}"
-            fi
-          done <<< "$TAGS"
-
-          # Get the manifest digest for the primary tag (used for signing)
-          DIGEST=$(docker buildx imagetools inspect "$FIRST_TAG" --format '{{json .Manifest.Digest}}' | tr -d '"')
-          echo "digest=$DIGEST" >> "$GITHUB_OUTPUT"
-          echo "Manifest digest: $DIGEST"
-
-      - name: Sign Docker images with Cosign
-        env:
-          DIGEST: ${{ steps.manifest.outputs.digest }}
-          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
-        run: |
-          set -euo pipefail
-          # Sign by digest for reliability with manifest lists
-          # All tags point to the same manifest, so we sign once by digest
-          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
-          echo "Signing image by digest: $IMAGE_REF"
-          cosign sign --yes "$IMAGE_REF"
-
-          # Brief sleep to allow registry to propagate signature
-          echo "Waiting for signature propagation..."
-          sleep 5
-
-      - name: Generate SLSA provenance attestation
-        env:
-          DIGEST: ${{ steps.manifest.outputs.digest }}
-          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
-        run: |
-          set -euo pipefail
-          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
-
-          # Generate SLSA provenance predicate
-          # Note: SLSA spec requires "sha1" field name for git commit digest
-          cat > provenance.json <<EOF
+          # Group writes into one redirect (shellcheck SC2129).
          {
-            "buildType": "https://github.com/${{ github.repository }}/docker-build@v1",
-            "builder": {
-              "id": "https://github.com/actions/runner"
-            },
-            "invocation": {
-              "configSource": {
-                "uri": "https://github.com/${{ github.repository }}",
-                "digest": {
-                  "sha1": "${{ github.sha }}"
-                },
-                "entryPoint": ".github/workflows/docker-publish.yml"
-              }
-            },
-            "metadata": {
-              "buildInvocationId": "${{ github.run_id }}",
-              "buildStartedOn": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
-              "completeness": {
-                "parameters": true,
-                "environment": true,
-                "materials": true
-              },
-              "reproducible": false
-            },
-            "materials": [
-              {
-                "uri": "https://github.com/${{ github.repository }}",
-                "digest": {
-                  "sha1": "${{ github.sha }}"
-                }
-              }
-            ]
-          }
-          EOF
+            echo "tag=${DISPATCH_TAG}"
+            echo "version=${VERSION}"
+            echo "major_minor=${MAJOR_MINOR}"
+            echo "source_tag=${SOURCE_TAG}"
+            echo "expected_digest=${EXPECTED_DIGEST}"
+          } >> "$GITHUB_OUTPUT"

-          # Attach provenance to image by digest
-          cosign attest --yes --predicate provenance.json --type slsaprovenance "$IMAGE_REF"
-
-      - name: Verify image signature
+      - name: Verify source digest matches expected
        env:
-          DIGEST: ${{ steps.manifest.outputs.digest }}
          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          SOURCE_TAG: ${{ steps.version.outputs.source_tag }}
+          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
        run: |
          set -euo pipefail
-          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
-          echo "Verifying signature for: $IMAGE_REF"
+          # Defends against the prerelease tag being swapped between
+          # prerelease-docker's signing and this promote step.
+          SOURCE="${DOCKER_USERNAME}/local-deep-research:${SOURCE_TAG}"
+          ACTUAL=$(docker buildx imagetools inspect "$SOURCE" --format '{{json .Manifest.Digest}}' | tr -d '"')
+          if [[ "$ACTUAL" != "$EXPECTED_DIGEST" ]]; then
+            echo "::error::Source digest mismatch — possible tag tampering between prerelease and promote"
+            echo "  expected: $EXPECTED_DIGEST"
+            echo "  actual:   $ACTUAL"
+            exit 1
+          fi
+          echo "Source digest verified: $ACTUAL"

-          # Retry logic to handle registry propagation delay after signing
-          MAX_RETRIES=5
-          RETRY_DELAY=10
-          for i in $(seq 1 "$MAX_RETRIES"); do
-            echo "Verification attempt $i of $MAX_RETRIES..."
-            if cosign verify \
-              --certificate-identity-regexp="https://github.com/${{ github.repository }}" \
-              --certificate-oidc-issuer=https://token.actions.githubusercontent.com \
-              "$IMAGE_REF"; then
-              echo "Signature verification successful!"
-              exit 0
-            fi
-            if [ "$i" -lt "$MAX_RETRIES" ]; then
-              echo "Verification failed, waiting ${RETRY_DELAY}s before retry..."
-              sleep "$RETRY_DELAY"
+      - name: Promote (retag) prerelease manifest to release tags
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          SOURCE_TAG: ${{ steps.version.outputs.source_tag }}
+          VERSION: ${{ steps.version.outputs.version }}
+          MAJOR_MINOR: ${{ steps.version.outputs.major_minor }}
+        run: |
+          set -euo pipefail
+          SOURCE="${DOCKER_USERNAME}/local-deep-research:${SOURCE_TAG}"
+          # Single imagetools create with multiple -t — registry-side
+          # metadata-only operation, takes seconds, preserves digest.
+          docker buildx imagetools create \
+            -t "${DOCKER_USERNAME}/local-deep-research:${VERSION}" \
+            -t "${DOCKER_USERNAME}/local-deep-research:${MAJOR_MINOR}" \
+            -t "${DOCKER_USERNAME}/local-deep-research:latest" \
+            "$SOURCE"
+          echo "Promoted ${SOURCE} to :${VERSION}, :${MAJOR_MINOR}, :latest"
+
+      - name: Verify promoted tags share the source digest
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          VERSION: ${{ steps.version.outputs.version }}
+          MAJOR_MINOR: ${{ steps.version.outputs.major_minor }}
+          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
+        run: |
+          set -euo pipefail
+          # Defends against `imagetools create` re-encoding the manifest.
+          # If digests diverge, signatures and attestations (keyed by the
+          # original digest) won't be discoverable from the new tags.
+          for TAG in "${VERSION}" "${MAJOR_MINOR}" "latest"; do
+            REF="${DOCKER_USERNAME}/local-deep-research:${TAG}"
+            ACTUAL=$(docker buildx imagetools inspect "$REF" --format '{{json .Manifest.Digest}}' | tr -d '"')
+            if [[ "$ACTUAL" != "$EXPECTED_DIGEST" ]]; then
+              echo "::error::Digest mismatch on ${TAG} — imagetools create may have re-encoded the manifest"
+              echo "  expected: $EXPECTED_DIGEST"
+              echo "  actual:   $ACTUAL"
+              exit 1
            fi
+            echo "${TAG} -> ${ACTUAL} ✓"
          done
-          echo "Signature verification failed after $MAX_RETRIES attempts"
-          exit 1

-      - name: Attach SBOM to image
+      # Catches CVE-database updates that landed between the prerelease
+      # build and this promote step. Use the SHA-pinned action wrapper
+      # (same pin as prerelease-docker.yml's security-scan) with an
+      # explicit binary version pin — the prior `apt-get install -y trivy`
+      # approach was unpinned and exposed the release path to the Trivy
+      # apt-repo supply chain. The pinned action downloads the v0.69.2
+      # binary from GitHub releases by exact tag, which is the same
+      # binary the prerelease scan validated, keeping the two scans
+      # consistent.
+      #
+      # Unlike prerelease-docker.yml's security-scan (which scans a
+      # locally-loaded image, no registry pull), this step scans by
+      # registry digest — Trivy must pull the manifest + layers from
+      # Docker Hub. TRIVY_USERNAME/TRIVY_PASSWORD is the action's
+      # documented auth path; the `docker/login-action` above also
+      # writes ~/.docker/config.json which Trivy reads as a fallback,
+      # but the explicit env vars are more reliable (Trivy has
+      # documented docker.io credential-helper quirks — aquasecurity/
+      # trivy#432, aquasecurity/trivy#8385) and the image we scan IS on
+      # Docker Hub so this is the path most likely to keep working.
+      - name: Re-scan release digest with Trivy
+        uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
+        env:
+          TRIVY_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          TRIVY_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
+        with:
+          image-ref: ${{ secrets.DOCKER_USERNAME }}/local-deep-research@${{ steps.version.outputs.expected_digest }}
+          severity: 'CRITICAL,HIGH'
+          ignore-unfixed: true
+          trivyignores: '.trivyignore'
+          exit-code: '1'
+          version: 'v0.69.2'
+
+      - name: Verify cosign signature on promoted digest
        env:
-          DIGEST: ${{ steps.manifest.outputs.digest }}
          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
+          REPO: ${{ github.repository }}
        run: |
          set -euo pipefail
-          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
-
-          # Generate SBOM using syft
-          docker pull "$IMAGE_REF"
-          syft "$IMAGE_REF" -o spdx-json > sbom.spdx.json
-
-          # Attach SBOM to image by digest
-          cosign attach sbom --sbom sbom.spdx.json "$IMAGE_REF"
-
-          # Sign the SBOM
-          cosign sign --yes --attachment sbom "$IMAGE_REF"
-
-      - name: Upload SBOM artifact
-        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
-        with:
-          name: sbom
-          path: sbom.spdx.json
-          retention-days: 90
-
-      - name: Clean up temporary tags
-        run: |
-          echo "Manifest creation complete. Temporary platform-specific tags can be removed manually if desired."
+          # The cert was issued to the prerelease-docker.yml workflow when
+          # signing happened there, so the identity regex must match that
+          # workflow's path. Fulcio's SAN is built from job_workflow_ref,
+          # which for reusable workflows is the CALLEE.
+          #
+          # Verify by IMMUTABLE digest (not the :VERSION tag) so this step
+          # is invariant under any retag race between the verify-promoted-
+          # tags step above and this one. Trivy's re-scan above also uses
+          # @${EXPECTED_DIGEST}; keeping cosign on the same reference is
+          # consistent and avoids a tag-resolution TOCTOU window.
+          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
+          echo "Verifying signature for: $IMAGE_REF"
+          cosign verify \
+            --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
+            --certificate-identity-regexp "^https://github.com/${REPO}/\.github/workflows/prerelease-docker\.yml@refs/(heads|tags)/" \
+            --certificate-github-workflow-repository "${REPO}" \
+            "$IMAGE_REF"
+          echo "Signature transitivity verified ✓"

      - name: Clean up prerelease tags
        continue-on-error: true
@@ -454,3 +312,23 @@ jobs:
          done

          echo "Prerelease tag cleanup complete. Deleted ${DELETED} tag(s) matching ${PREFIX}*."
+
+      - name: Summary
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          VERSION: ${{ steps.version.outputs.version }}
+          MAJOR_MINOR: ${{ steps.version.outputs.major_minor }}
+          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
+        run: |
+          {
+            echo "## Docker Release Promoted"
+            echo ""
+            echo "**Digest:** \`${EXPECTED_DIGEST}\`"
+            echo ""
+            echo "**Tags:** \`${VERSION}\`, \`${MAJOR_MINOR}\`, \`latest\` — all share the same digest as the prerelease manifest, so cosign signatures, SBOM, and SLSA provenance from the prerelease step are transitively valid."
+            echo ""
+            echo '```'
+            echo "docker pull ${DOCKER_USERNAME}/local-deep-research:${VERSION}"
+            echo "docker pull ${DOCKER_USERNAME}/local-deep-research:latest"
+            echo '```'
+          } >> "$GITHUB_STEP_SUMMARY"
--- a/.github/workflows/prerelease-docker.yml
+++ b/.github/workflows/prerelease-docker.yml
@@ -1,17 +1,56 @@
 name: Prerelease Docker Image

-# Build a prerelease Docker image for local testing before the official
-# release is published.  Triggered exclusively via repository_dispatch from
-# release.yml (after all gates pass and approval is granted).
+# Build the canonical Docker image for a release.  In the build-once-promote
+# pipeline, this workflow IS the build — docker-publish.yml only retags the
+# manifest produced here.  Cosign signing, SBOM attestation, and SLSA
+# provenance are attached here once, keyed by manifest digest, so they're
+# discoverable from any tag (including the release tags later created by
+# imagetools create).
 #
-# NO workflow_dispatch — security is enforced upstream in release.yml.
+# Triggered exclusively via workflow_call from release.yml (after security
+# gates pass).  No workflow_dispatch — security and gate semantics are
+# enforced by the caller.  The build runs automatically; the only human
+# approval in the release flow is the `release` env on this workflow's
+# jobs + publish-docker + trigger-pypi + create-release in release.yml
+# (gates the actual publish, not the canonical build).

 on:
-  repository_dispatch:
-    types: [publish-prerelease-docker]
+  workflow_call:
+    inputs:
+      version:
+        description: "Bare semver, e.g. '1.6.9' (no leading 'v')"
+        type: string
+        required: true
+      short_sha:
+        description: "First 7 chars of commit SHA (used in the prerelease tag)"
+        type: string
+        required: true
+    secrets:
+      # Explicit secrets contract instead of `secrets: inherit` on the
+      # caller side — narrower blast radius if a future caller misuses
+      # this reusable workflow.
+      DOCKER_USERNAME:
+        required: true
+        description: "Docker Hub username for image push"
+      DOCKER_PASSWORD:
+        required: true
+        description: "Docker Hub PAT (Read+Write+Delete scopes)"
+    outputs:
+      manifest_digest:
+        description: "sha256:... digest of the multi-arch prerelease manifest. Used by docker-publish.yml to verify retag preserves the digest."
+        value: ${{ jobs.create-manifest.outputs.digest }}

 permissions: {}  # Minimal top-level for OSSF Scorecard Token-Permissions

+# No approval gate at the build step — the build runs automatically once
+# security gates and CI gates in release.yml pass. The only meaningful
+# human decision in the release flow is "should this signed, attested,
+# tested image become the official release?" — gated by the `release`
+# environment on this workflow's jobs + `publish-docker` + `trigger-pypi`
+# + `create-release` in release.yml. The maintainer can pull
+# `:prerelease-v<ver>-<sha>` and smoke-test between build completion
+# and approving the release env.
+
 jobs:
  build-amd64:
    name: Build AMD64 Prerelease Image
@@ -48,11 +87,9 @@ jobs:
          context: .
          platforms: linux/amd64
          push: true
-          tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ github.event.client_payload.version }}-${{ github.event.client_payload.short_sha }}-amd64
+          tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ inputs.version }}-${{ inputs.short_sha }}-amd64
          cache-from: type=gha,scope=linux-amd64
          cache-to: type=gha,mode=max,scope=linux-amd64
-          build-args: |
-            DEPS_HASH=${{ hashFiles('pdm.lock') }}

  build-arm64:
    name: Build ARM64 Prerelease Image
@@ -89,11 +126,9 @@ jobs:
          context: .
          platforms: linux/arm64
          push: true
-          tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ github.event.client_payload.version }}-${{ github.event.client_payload.short_sha }}-arm64
+          tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ inputs.version }}-${{ inputs.short_sha }}-arm64
          cache-from: type=gha,scope=linux-arm64
          cache-to: type=gha,mode=max,scope=linux-arm64
-          build-args: |
-            DEPS_HASH=${{ hashFiles('pdm.lock') }}

  security-scan:
    name: Security Scan
@@ -136,8 +171,6 @@ jobs:
          tags: local-deep-research:security-scan
          cache-from: type=gha,scope=linux-amd64
          cache-to: type=gha,mode=max,scope=linux-amd64
-          build-args: |
-            DEPS_HASH=${{ hashFiles('pdm.lock') }}

      # Generate Trivy SARIF for archival as a workflow artifact (all severities, never fails).
      # Severity-gating happens in the next step.
@@ -177,6 +210,11 @@ jobs:
    environment: release
    permissions:
      contents: read
+      id-token: write    # Required for cosign keyless OIDC signing
+      # No `packages: write` — Docker Hub auth uses DOCKER_PASSWORD secret,
+      # not GITHUB_TOKEN. `packages: write` only matters for ghcr.io pushes.
+    outputs:
+      digest: ${{ steps.capture-digest.outputs.digest }}

    steps:
      - name: Harden Runner
@@ -199,11 +237,22 @@ jobs:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}

+      - name: Install Cosign
+        uses: sigstore/cosign-installer@6f9f17788090df1f26f669e9d70d6ae9567deba6 # v4.1.2
+        with:
+          # Pin to cosign v2.x — see release.yml for rationale (v3 enables
+          # --new-bundle-format by default which changes the on-wire format
+          # and breaks downstream verifiers still on v2).
+          cosign-release: 'v2.6.3'
+
+      - name: Install Syft
+        uses: anchore/sbom-action/download-syft@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
+
      - name: Create and push multi-platform manifest
        env:
          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
-          VERSION: ${{ github.event.client_payload.version }}
-          SHORT_SHA: ${{ github.event.client_payload.short_sha }}
+          VERSION: ${{ inputs.version }}
+          SHORT_SHA: ${{ inputs.short_sha }}
        run: |
          set -euo pipefail
          TAG="prerelease-v${VERSION}-${SHORT_SHA}"
@@ -216,17 +265,220 @@ jobs:
          # Floating tag: re-point :prerelease at the manifest just created so
          # testers can pin compose to `:prerelease` and pull the latest RC via
          # `docker compose pull` without editing the tag each cycle. The
-          # versioned tag above remains for reproducibility.
+          # versioned tag above remains for reproducibility (and is what
+          # docker-publish.yml retags by digest into :1.6.9 / :1.6 / :latest).
          echo "Updating floating tag: ${DOCKER_USERNAME}/local-deep-research:prerelease"
          docker buildx imagetools create -t "${DOCKER_USERNAME}/local-deep-research:prerelease" \
            "${DOCKER_USERNAME}/local-deep-research:${TAG}"
          echo "Floating :prerelease tag updated"

+      - name: Capture manifest digest
+        id: capture-digest
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          VERSION: ${{ inputs.version }}
+          SHORT_SHA: ${{ inputs.short_sha }}
+        run: |
+          set -euo pipefail
+          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research:prerelease-v${VERSION}-${SHORT_SHA}"
+          # Same form as the existing docker-publish.yml inspector — avoids jq.
+          DIGEST=$(docker buildx imagetools inspect "$IMAGE_REF" --format '{{json .Manifest.Digest}}' | tr -d '"')
+          if [[ -z "$DIGEST" || "$DIGEST" != sha256:* ]]; then
+            echo "::error::Failed to capture manifest digest (got '${DIGEST}')"
+            exit 1
+          fi
+          echo "digest=${DIGEST}" >> "$GITHUB_OUTPUT"
+          echo "Manifest digest: ${DIGEST}"
+
+      - name: Sign manifest with Cosign
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          DIGEST: ${{ steps.capture-digest.outputs.digest }}
+        run: |
+          set -euo pipefail
+          # Sign by digest — signature artifact lands at sha256-<digest>.sig
+          # in the same repo, discoverable from ANY tag pointing at the same
+          # digest (including release tags created later by docker-publish.yml's
+          # imagetools-create retag).
+          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
+          echo "Signing image by digest: $IMAGE_REF"
+          cosign sign --yes "$IMAGE_REF"
+          # Brief sleep to allow registry to propagate signature
+          sleep 5
+
+      - name: Generate SLSA provenance attestation
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          DIGEST: ${{ steps.capture-digest.outputs.digest }}
+        run: |
+          set -euo pipefail
+          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
+
+          # entryPoint is the TOP-LEVEL caller (release.yml), not this
+          # reusable workflow. Per SLSA GHA buildtype v1 and the canonical
+          # slsa-github-generator, reusable workflows are explicitly NOT
+          # entryPoints. github.run_id / github.repository / github.sha all
+          # resolve to the caller's run context inside a reusable workflow.
+          # builder.id pins the workflow that actually defines the build
+          # steps — the trust root a verifier policy can pin against. We
+          # compose it from `github.repository` and a hardcoded path to
+          # THIS workflow file, with `github.ref` for the ref portion.
+          # Rationale: inside a workflow_call callee, the `github` context
+          # is scoped to the CALLER, so `github.workflow_ref` would point
+          # at release.yml (the wrong builder). The `job` context has no
+          # `workflow_ref` property either (only check_run_id, container,
+          # services, status — actionlint confirms). For a local-path
+          # reusable workflow (`uses: ./.github/workflows/...`), the
+          # callee's ref equals the caller's `github.ref`, so composing
+          # the path manually gives the correct
+          # `<owner>/<repo>/.github/workflows/prerelease-docker.yml@<ref>`
+          # format that matches the Fulcio cert SAN. Cosign and
+          # slsa-verifier both anchor on the cert anyway, so this fix is
+          # about correctness for raw-JSON policy engines / audit tools
+          # that read builder.id directly.
+          #
+          # completeness.* are FALSE because we don't capture invocation
+          # parameters or environment, and the build does network I/O for
+          # apt/pip/npm. Honest emptiness > false claims of completeness.
+          #
+          # buildInvocationId includes run_attempt so re-runs are
+          # distinguishable in audit logs.
+          cat > provenance.json <<EOF
+          {
+            "buildType": "https://github.com/${{ github.repository }}/docker-build@v1",
+            "builder": {
+              "id": "https://github.com/${{ github.repository }}/.github/workflows/prerelease-docker.yml@${{ github.ref }}"
+            },
+            "invocation": {
+              "configSource": {
+                "uri": "https://github.com/${{ github.repository }}",
+                "digest": {
+                  "sha1": "${{ github.sha }}"
+                },
+                "entryPoint": ".github/workflows/release.yml"
+              }
+            },
+            "metadata": {
+              "buildInvocationId": "${{ github.run_id }}-${{ github.run_attempt }}",
+              "buildStartedOn": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
+              "completeness": {
+                "parameters": false,
+                "environment": false,
+                "materials": false
+              },
+              "reproducible": false
+            },
+            "materials": [
+              {
+                "uri": "https://github.com/${{ github.repository }}",
+                "digest": {
+                  "sha1": "${{ github.sha }}"
+                }
+              }
+            ]
+          }
+          EOF
+
+          # --replace prevents duplicate SLSA attestations on re-run. Cosign's
+          # Replace logic is keyed by predicate-type URI, so it leaves the
+          # SBOM SPDX attestation (different predicateType) untouched.
+          cosign attest --yes --replace --predicate provenance.json --type slsaprovenance "$IMAGE_REF"
+
+      - name: Verify image signature
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          DIGEST: ${{ steps.capture-digest.outputs.digest }}
+        run: |
+          set -euo pipefail
+          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
+          echo "Verifying signature for: $IMAGE_REF"
+          # Retry to handle registry propagation delay after signing
+          MAX_RETRIES=5
+          RETRY_DELAY=10
+          for i in $(seq 1 "$MAX_RETRIES"); do
+            echo "Verification attempt $i of $MAX_RETRIES..."
+            if cosign verify \
+              --certificate-identity-regexp="https://github.com/${{ github.repository }}" \
+              --certificate-oidc-issuer=https://token.actions.githubusercontent.com \
+              "$IMAGE_REF"; then
+              echo "Signature verification successful!"
+              exit 0
+            fi
+            if [ "$i" -lt "$MAX_RETRIES" ]; then
+              echo "Verification failed, waiting ${RETRY_DELAY}s before retry..."
+              sleep "$RETRY_DELAY"
+            fi
+          done
+          echo "::error::Signature verification failed after $MAX_RETRIES attempts"
+          exit 1
+
+      - name: Generate per-platform SBOMs and attest each
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          DIGEST: ${{ steps.capture-digest.outputs.digest }}
+        run: |
+          set -euo pipefail
+          REPO="${DOCKER_USERNAME}/local-deep-research"
+          MANIFEST_REF="${REPO}@${DIGEST}"
+
+          # Multi-arch SBOM correctness: syft against a manifest list digest
+          # only scans the host platform's layers (per anchore/syft#1708),
+          # which would lie to ARM64 consumers. We attest each per-arch
+          # digest with its OWN SBOM so end-user verification is honest.
+          # We deliberately do NOT also produce a "manifest-level SBOM" —
+          # that would be amd64-only (host arch) and re-introduce the lie
+          # for any arm64 consumer running the README verifier recipe.
+          # The README documents the per-arch verification flow instead.
+          MANIFEST_JSON=$(docker buildx imagetools inspect "${MANIFEST_REF}" --raw)
+
+          # Defense against future buildx output changes: assert at least
+          # one per-arch entry exists. Without this, an empty/malformed
+          # manifest list would silently produce zero SBOMs and pass CI green.
+          PER_ARCH_COUNT=$(echo "${MANIFEST_JSON}" \
+            | jq '[.manifests[] | select(.platform.architecture != "unknown")] | length')
+          if [[ "${PER_ARCH_COUNT}" -lt 1 ]]; then
+            echo "::error::No per-arch manifest entries found in ${MANIFEST_REF} — SBOM generation cannot proceed"
+            echo "Raw manifest: ${MANIFEST_JSON}"
+            exit 1
+          fi
+          echo "Found ${PER_ARCH_COUNT} per-arch manifest(s) to scan"
+
+          echo "$MANIFEST_JSON" \
+            | jq -r '.manifests[] | select(.platform.architecture != "unknown") | "\(.platform.os)/\(.platform.architecture)\t\(.digest)"' \
+            | while IFS=$'\t' read -r PLAT PER_ARCH_DIGEST; do
+                ARCH="${PLAT##*/}"
+                PER_ARCH_REF="${REPO}@${PER_ARCH_DIGEST}"
+                SBOM_FILE="sbom-${ARCH}.spdx.json"
+                echo "=== Scanning ${PLAT} (${PER_ARCH_DIGEST}) ==="
+                # --platform tells syft which arch to scan — matters when
+                # the host runner can't natively execute the image.
+                syft --platform "${PLAT}" "${PER_ARCH_REF}" -o spdx-json > "${SBOM_FILE}"
+                # --replace prevents accumulation when a re-run lands on
+                # the same digest (e.g. "Re-run failed jobs" after a flake).
+                # Per cosign source pkg/cosign/remote/remote.go, --replace
+                # is per-predicate-type, so it doesn't disturb the SLSA
+                # attestation already on the manifest list digest.
+                cosign attest --yes --replace \
+                  --predicate "${SBOM_FILE}" --type spdxjson "${PER_ARCH_REF}"
+              done
+
+      - name: Upload SBOMs artifact
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        with:
+          name: sbom
+          # Per-arch SBOMs only — `sbom-amd64.spdx.json`, `sbom-arm64.spdx.json`,
+          # one per platform in the manifest list. No manifest-level SBOM is
+          # produced (would be host-arch-only and misleading for non-amd64
+          # consumers).
+          path: sbom-*.spdx.json
+          retention-days: 90
+
      - name: Summary
        env:
          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
-          VERSION: ${{ github.event.client_payload.version }}
-          SHORT_SHA: ${{ github.event.client_payload.short_sha }}
+          VERSION: ${{ inputs.version }}
+          SHORT_SHA: ${{ inputs.short_sha }}
+          DIGEST: ${{ steps.capture-digest.outputs.digest }}
        run: |
          TAG="prerelease-v${VERSION}-${SHORT_SHA}"
          {
@@ -235,8 +487,14 @@ jobs:
            echo "**Versioned tag:** \`${TAG}\`"
            echo "**Floating tag:** \`prerelease\` (now points at this build)"
            echo ""
+            echo "**Digest:** \`${DIGEST}\`"
+            echo ""
            echo '```'
            echo "docker pull ${DOCKER_USERNAME}/local-deep-research:${TAG}"
            echo "docker pull ${DOCKER_USERNAME}/local-deep-research:prerelease"
            echo '```'
+            echo ""
+            echo "Signed and attested. After release approval, docker-publish.yml"
+            echo "will retag this exact digest as \`:${VERSION}\`, \`:major.minor\`,"
+            echo "and \`:latest\`."
          } >> "$GITHUB_STEP_SUMMARY"
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -17,7 +17,11 @@ jobs:
    outputs:
      has-frontend: ${{ steps.check.outputs.has-frontend }}
    container:
-      image: node:20-alpine@sha256:bcd88137d802e2482c9df3cdec71e0431857ebbbdba6973776b5593214056d86 # node:20-alpine
+      # Node 24 to match `package.json`'s `engines: { node: ">=24.0.0" }`.
+      # Was previously node:20 — npm could resolve dependencies that target
+      # APIs missing on 20 and the wheel-building publish path could ship
+      # frontend assets that break at runtime on the Node-24 Docker image.
+      image: node:24-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f # node:24-alpine
      # Note: Network is needed for npm ci to work, but no secrets are available
      options: --user 1001
    permissions:
--- a/.github/workflows/release-gate.yml
+++ b/.github/workflows/release-gate.yml
@@ -343,7 +343,12 @@ jobs:
      - name: Install Node.js for Vite build
        uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
        with:
-          node-version: '20'
+          # Match package.json's `engines: { node: ">=24.0.0" }` and
+          # publish.yml's node:24-alpine. Building the wheel-tested
+          # frontend on Node 20 here while the real publish runs on
+          # Node 24 means the gate validates a different runtime than
+          # ships, defeating the gate's stated purpose.
+          node-version: '24'

      - name: Set up PDM (build only)
        uses: pdm-project/setup-pdm@973541a5febeafcfdadf8a51211435be6ecfd90f # v4.5
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -86,9 +86,13 @@ jobs:
  # The build job depends on this gate, ensuring no release can be created
  # without passing all security checks.
  #
-  # NOTE: Docker/PyPI publishing only happens via repository_dispatch from this
-  # workflow. Creating a release via GitHub UI will NOT trigger publishing
-  # (this is by design to prevent security gate bypass).
+  # NOTE: Docker publishing now runs inline (reusable workflow_call to
+  # docker-publish.yml), and PyPI publishing is still dispatched via
+  # repository_dispatch to publish.yml (PyPI Trusted Publishing does not
+  # support reusable workflows — see pypa/gh-action-pypi-publish#166 and
+  # pypi/warehouse#11096). Neither publish path is triggered by creating a
+  # release via the GitHub UI — only this workflow can dispatch them — so
+  # the security-gate flow is preserved.
  # ============================================================================
  release-gate:
    needs: [version-check]
@@ -282,41 +286,65 @@ jobs:
          fi
          echo "Version verified: $EXPECTED_VERSION"

-      - name: Generate SBOM (Software Bill of Materials)
+      # SBOM generation uses the SHA-pinned trivy-action with an explicit
+      # binary version pin. Previously this step shelled out
+      # `apt-get install -y trivy` from the (unpinned) Aqua apt repo, which
+      # in a job carrying `id-token: write` (line 229) meant a compromised
+      # apt mirror could exfiltrate the OIDC request token AND tamper with
+      # the SBOM bytes that the next step (Sign release artifacts) signs
+      # under this repo's Sigstore identity — i.e., a fraudulent SBOM
+      # carrying a legitimate cosign cert. Pin the action by SHA + the
+      # binary by version tag so the toolchain is reproducible and matches
+      # docker-publish.yml / prerelease-docker.yml.
+      - name: Generate SBOM (SPDX JSON)
+        if: steps.check_release.outputs.exists == 'false'
+        uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
+        with:
+          scan-type: 'fs'
+          scan-ref: '.'
+          format: 'spdx-json'
+          output: 'sbom-spdx.json'
+          version: 'v0.69.2'
+
+      - name: Generate SBOM (CycloneDX)
+        if: steps.check_release.outputs.exists == 'false'
+        uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
+        with:
+          scan-type: 'fs'
+          scan-ref: '.'
+          format: 'cyclonedx'
+          output: 'sbom-cyclonedx.json'
+          version: 'v0.69.2'
+
+      - name: List SBOMs
        if: steps.check_release.outputs.exists == 'false'
        run: |
-          echo "=== Generating Software Bill of Materials ==="
-
-          # Install Trivy for SBOM generation (using signed-by for secure key management)
-          curl -fsSL https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
-          echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee /etc/apt/sources.list.d/trivy.list
-          sudo apt-get update
-          sudo apt-get install -y trivy
-
-          # Generate SBOM in SPDX format (JSON)
-          trivy fs --format spdx-json --output sbom-spdx.json .
-          echo "Generated SBOM in SPDX-JSON format"
-
-          # Generate SBOM in CycloneDX format (JSON)
-          trivy fs --format cyclonedx --output sbom-cyclonedx.json .
-          echo "Generated SBOM in CycloneDX format"
-
-          # Display summary
-          echo ""
          echo "SBOM files generated:"
          ls -lh sbom-*.json

      - name: Install Cosign
        if: steps.check_release.outputs.exists == 'false'
        uses: sigstore/cosign-installer@6f9f17788090df1f26f669e9d70d6ae9567deba6 # v4.1.2
+        with:
+          # Pin to cosign v2.x — v4.1.2 of the installer ships cosign v3.0.6
+          # by default, but v3 enables --new-bundle-format ON BY DEFAULT.
+          # That changes the signature/attestation on-wire format and would
+          # require all downstream verifiers (including end-users running
+          # the README cosign-verify recipe) to also be on v3+. Pin to
+          # v2.6.3 (latest v2.x, includes the GHSA-w6c6-c85g-mmv6 patch)
+          # for the legacy tag-based sigstore format until we explicitly
+          # migrate to v3.
+          cosign-release: 'v2.6.3'

      - name: Sign release artifacts with Sigstore
        if: steps.check_release.outputs.exists == 'false'
        run: |
          echo "=== Signing release artifacts with Sigstore ==="

-          # Sign SBOM files using keyless signing (OIDC)
-          # cosign v3.0.2+ uses --bundle which contains signature, certificate, and metadata in one file
+          # Sign SBOM files using keyless signing (OIDC).
+          # `--bundle` writes a Sigstore protobuf bundle containing the
+          # signature, certificate, and Rekor inclusion proof in one file.
+          # Supported in cosign v2.4.0+ (we're pinned to v2.6.3 above).
          cosign sign-blob --yes --bundle sbom-spdx.json.bundle sbom-spdx.json
          cosign sign-blob --yes --bundle sbom-cyclonedx.json.bundle sbom-cyclonedx.json

@@ -364,9 +392,87 @@ jobs:
      provenance-name: "provenance.intoto.jsonl"
      compile-generator: true  # Build from source to bypass TUF key validation issues

-  create-release:
+  # Build, sign, and push the canonical Docker image. This is the single
+  # build for the release — publish-docker (the reusable workflow_call
+  # to docker-publish.yml) later retags this manifest by digest. The jobs
+  # inside this reusable workflow declare `environment: release` so they
+  # can read the env-scoped DOCKER_USERNAME / DOCKER_PASSWORD; the FIRST
+  # (and only) `release` env approval click in this run unlocks ALL
+  # release-env jobs together: prerelease-docker, publish-docker,
+  # trigger-pypi, monitor-pypi, and create-release. After the
+  # atomicity refactor, docker-publish is inline (reusable workflow_call)
+  # so it does not require a separate second approval — its result is
+  # visible to downstream jobs (create-release, cleanup-on-rejection) in
+  # the same run. (Earlier iterations of the build-once-promote refactor
+  # tried to run the canonical build pre-approval to give a real
+  # test-then-approve window; that required repo-level Docker Hub
+  # secrets, which we deliberately don't have.) The manifest_digest
+  # output flows through to the promote step so digest preservation can
+  # be verified end-to-end.
+  prerelease-docker:
    needs: [build, provenance]
    if: ${{ !cancelled() && needs.build.result == 'success' && needs.provenance.result == 'success' && needs.build.outputs.release_exists == 'false' }}
+    uses: ./.github/workflows/prerelease-docker.yml
+    with:
+      version: ${{ needs.build.outputs.version }}
+      short_sha: ${{ needs.build.outputs.short_sha }}
+    secrets:
+      # Explicit list (not `secrets: inherit`) — defense-in-depth so a
+      # future edit to prerelease-docker.yml that references an unrelated
+      # secret would fail loudly rather than silently accessing inherited
+      # values like PAT_TOKEN, OPENROUTER_API_KEY, or SERPER_API_KEY.
+      DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+      DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
+    permissions:
+      # Caller permissions cap what the called workflow can request.
+      # Must include id-token: write so cosign keyless OIDC works on the
+      # called workflow's create-manifest job.
+      # NOTE: no `packages: write` — Docker Hub auth uses DOCKER_PASSWORD
+      # secret, not GITHUB_TOKEN. `packages: write` only matters for ghcr.io.
+      contents: read
+      id-token: write
+
+  # Retag the prerelease manifest as the release tags (:VERSION, :MAJOR_MINOR,
+  # :latest). Runs as a reusable workflow_call so its result is visible to
+  # downstream jobs in this run — specifically: create-release blocks on
+  # success here, and cleanup-on-rejection keys its cosign-deletion logic on
+  # `needs.publish-docker.result`. Before this refactor, docker-publish.yml
+  # was triggered via repository_dispatch and its outcome was invisible to
+  # the parent release.yml run, which made cosign artifact cleanup unsafe
+  # after a partial retag failure (release tags could exist sharing the
+  # prerelease manifest digest, and deleting `sha256-<digest>.{sig,att}`
+  # would invalidate those release-tag signatures).
+  publish-docker:
+    needs: [build, prerelease-docker]
+    if: ${{ !cancelled() && needs.prerelease-docker.result == 'success' && needs.build.outputs.release_exists == 'false' }}
+    uses: ./.github/workflows/docker-publish.yml
+    with:
+      tag: ${{ needs.build.outputs.tag }}
+      source_tag: prerelease-v${{ needs.build.outputs.version }}-${{ needs.build.outputs.short_sha }}
+      expected_digest: ${{ needs.prerelease-docker.outputs.manifest_digest }}
+    secrets:
+      # Explicit secret pass — env-scoped (`release`) DOCKER_USERNAME and
+      # DOCKER_PASSWORD become available to the callee's `promote` job
+      # because that job declares `environment: release`.
+      DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+      DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
+    permissions:
+      # cosign VERIFY (the only sigstore operation in the callee) is a
+      # read-only check against the public Rekor/Fulcio infrastructure —
+      # no GitHub OIDC token is minted, so id-token: write is unnecessary.
+      # OIDC is only needed for cosign SIGN, which happens upstream in
+      # prerelease-docker.yml.
+      contents: read
+
+  create-release:
+    # create-release waits for prerelease-docker (canonical image exists,
+    # signed and attested), publish-docker (release tags retagged with
+    # digest preserved, cosign transitivity verified), and monitor-pypi
+    # (PyPI publish has completed). Only then is the GitHub Release
+    # published — so the Release page never points at non-existent
+    # artifacts. Gated by the `release` environment.
+    needs: [build, provenance, prerelease-docker, publish-docker, monitor-pypi]
+    if: ${{ !cancelled() && needs.build.result == 'success' && needs.provenance.result == 'success' && needs.prerelease-docker.result == 'success' && needs.publish-docker.result == 'success' && needs.monitor-pypi.result == 'success' && needs.build.outputs.release_exists == 'false' }}
    runs-on: ubuntu-latest
    environment: release
    permissions:
@@ -436,6 +542,35 @@ jobs:
        with:
          name: provenance.intoto.jsonl

+      - name: Download container-image SBOMs
+        # Produced by prerelease-docker.yml — one SBOM per architecture
+        # (sbom-amd64.spdx.json, sbom-arm64.spdx.json) from Syft. The
+        # cosign attestations on each per-arch digest are the
+        # cryptographically authoritative copies; attaching the raw JSON
+        # to the GitHub Release makes them discoverable to humans
+        # browsing the release page.
+        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
+        with:
+          name: sbom
+          path: container-sbom
+
+      - name: Stage container-image SBOMs for release attachment
+        # Rename per-arch SBOMs to avoid collision with the filesystem SBOM
+        # `sbom-spdx.json` produced by the build job (Trivy scan of source
+        # tree). Container SBOMs are the Syft scan of built image layers
+        # per platform — different content.
+        run: |
+          set -euo pipefail
+          # Per-arch SBOMs: rename sbom-amd64.spdx.json → sbom-container-amd64.spdx.json
+          # (and similarly for arm64). prerelease-docker.yml no longer
+          # produces a manifest-level sbom.spdx.json.
+          for f in container-sbom/sbom-*.spdx.json; do
+            [ -f "$f" ] || continue
+            base=$(basename "$f")
+            mv "$f" "sbom-container-${base#sbom-}"
+          done
+          ls -la sbom-container*.spdx.json || echo "No container SBOMs found"
+
      - name: List artifacts
        run: |
          echo "Release artifacts:"
@@ -780,7 +915,18 @@ jobs:
              print(f"Release body truncated to {len(content)} chars", file=sys.stderr)
          PYEOF

-          # Includes SBOM files, Sigstore signatures, and SLSA provenance
+          # Includes filesystem SBOM (Trivy scan of source tree) AND
+          # container-image SBOM (Syft scan of built image layers, per-arch),
+          # Sigstore signatures, and SLSA provenance. The two SBOM kinds
+          # describe different surfaces — both are useful for downstream
+          # supply-chain consumers.
+          # `find ... -print0 | xargs -0 ...` handles the variable number
+          # of per-arch container SBOMs gracefully (zero or more).
+          set -euo pipefail
+          CONTAINER_SBOMS=()
+          while IFS= read -r -d '' f; do
+            CONTAINER_SBOMS+=("$f")
+          done < <(find . -maxdepth 1 -name 'sbom-container*.spdx.json' -print0)
          gh release create "$RELEASE_TAG" \
            --repo "$GITHUB_REPOSITORY" \
            --title "Release $RELEASE_VERSION" \
@@ -789,11 +935,17 @@ jobs:
            sbom-spdx.json.bundle \
            sbom-cyclonedx.json \
            sbom-cyclonedx.json.bundle \
-            provenance.intoto.jsonl
+            provenance.intoto.jsonl \
+            "${CONTAINER_SBOMS[@]}"
        env:
-          # PAT_TOKEN required here because GITHUB_TOKEN cannot trigger downstream
-          # workflows (repository_dispatch). Minimum scopes: repo (for release
-          # creation + dispatch triggers), workflow (for triggering workflows).
+          # PAT_TOKEN is required here because GITHUB_TOKEN cannot trigger
+          # workflows that listen on `release:` (backwards-compatibility.yml,
+          # sbom.yml) — that's a documented GITHUB_TOKEN limitation.
+          # Minimum scope: `repo` (full scope; `public_repo` is NOT sufficient
+          # for the release-creation API on private repos and `repo` works
+          # uniformly). The `workflow` scope is NOT needed — it only governs
+          # editing files under .github/workflows/ via the API, which this
+          # step does not do.
          GITHUB_TOKEN: ${{ secrets.PAT_TOKEN }}
          RELEASE_TAG: ${{ needs.build.outputs.tag }}
          RELEASE_VERSION: ${{ needs.build.outputs.version }}
@@ -979,47 +1131,23 @@ jobs:
            automation
            maintenance

-  trigger-prerelease-docker:
-    needs: [build, provenance]
-    if: ${{ !cancelled() && needs.build.result == 'success' && needs.provenance.result == 'success' && needs.build.outputs.release_exists == 'false' }}
-    runs-on: ubuntu-latest
-    # Separate environment from `release` so the GitHub "Review deployments"
-    # modal shows two independent checkboxes — letting maintainers approve or
-    # reject the prerelease Docker test independently of the actual release
-    # publish (create-release + trigger-workflows still gate on `release`).
-    environment: prerelease
-    permissions:
-      contents: read
+  # NOTE: trigger-prerelease-docker and the old combined trigger-workflows
+  # (which used to dispatch BOTH publish-docker AND publish-pypi) were
+  # removed in the build-once-promote + atomicity refactors.
+  # prerelease-docker.yml and docker-publish.yml are now invoked as
+  # reusable workflows earlier in this file. Only PyPI publishing remains
+  # on repository_dispatch — PyPI Trusted Publishing requires the publish
+  # step to run in a top-level (non-reusable) workflow.

-    steps:
-      - name: Harden the runner (Audit all outbound calls)
-        uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
-        with:
-          egress-policy: audit
-
-      - name: Trigger prerelease Docker build
-        uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
-        env:
-          RELEASE_VERSION: ${{ needs.build.outputs.version }}
-          SHORT_SHA: ${{ needs.build.outputs.short_sha }}
-        with:
-          # PAT_TOKEN: repository_dispatch requires a PAT (GITHUB_TOKEN cannot trigger workflows)
-          github-token: ${{ secrets.PAT_TOKEN }}
-          script: |
-            await github.rest.repos.createDispatchEvent({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              event_type: 'publish-prerelease-docker',
-              client_payload: {
-                version: process.env.RELEASE_VERSION,
-                short_sha: process.env.SHORT_SHA
-              }
-            });
-            console.log('Triggered prerelease Docker build');
-
-  trigger-workflows:
-    needs: [build, create-release]
-    if: ${{ !cancelled() && needs.build.result == 'success' && needs.create-release.result == 'success' && needs.build.outputs.release_exists == 'false' }}
+  # Dispatch PyPI publish via repository_dispatch. Kept as a dispatch
+  # (rather than a reusable workflow_call) because PyPI's Trusted
+  # Publisher matches the OIDC `workflow_ref` claim, which points to the
+  # CALLER when a workflow is invoked via workflow_call — so a reusable
+  # publish.yml would fail with `invalid-publisher`. Tracked in
+  # pypa/gh-action-pypi-publish#166 and pypi/warehouse#11096.
+  trigger-pypi:
+    needs: [build, prerelease-docker, publish-docker]
+    if: ${{ !cancelled() && needs.prerelease-docker.result == 'success' && needs.publish-docker.result == 'success' && needs.build.outputs.release_exists == 'false' }}
    runs-on: ubuntu-latest
    environment: release
    permissions:
@@ -1033,63 +1161,50 @@ jobs:
        with:
          egress-policy: audit

-      # Recorded BEFORE the dispatches so monitor-publish can filter on createdAt
-      # without missing runs created during this job.
+      # Recorded BEFORE the dispatch so monitor-pypi can filter on createdAt
+      # without missing the run created during this job.
      - name: Record dispatch time
        id: dispatch-time
        run: echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$GITHUB_OUTPUT"

-      - name: Trigger Docker publish workflow
-        uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
-        env:
-          RELEASE_TAG: ${{ needs.build.outputs.tag }}
-        with:
-          # PAT_TOKEN: repository_dispatch requires a PAT (GITHUB_TOKEN cannot trigger workflows)
-          github-token: ${{ secrets.PAT_TOKEN }}
-          script: |
-            await github.rest.repos.createDispatchEvent({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              event_type: 'publish-docker',
-              client_payload: {
-                tag: process.env.RELEASE_TAG
-              }
-            });
-            console.log('Triggered Docker publish workflow');
-
      - name: Trigger PyPI publish workflow
        uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
        env:
          RELEASE_TAG: ${{ needs.build.outputs.tag }}
        with:
-          # PAT_TOKEN: repository_dispatch requires a PAT (GITHUB_TOKEN cannot trigger workflows)
+          # PAT_TOKEN: repository_dispatch from a workflow run cannot be
+          # fired with GITHUB_TOKEN — the API rejects it to prevent
+          # workflow-trigger loops. Minimum scope: `repo` (full scope —
+          # `public_repo` is rejected by createDispatchEvent regardless
+          # of repo visibility).
          github-token: ${{ secrets.PAT_TOKEN }}
          script: |
+            // Forward `prerelease: false` explicitly. publish.yml gates
+            // Test PyPI vs prod PyPI on `client_payload.prerelease == true`;
+            // if absent, the expression evaluates to '' and falls through to
+            // prod PyPI. Setting false here makes the choice explicit. The
+            // current pipeline only releases stable versions through this
+            // dispatch — true prereleases (if added later) would need a
+            // separate trigger path that flips this flag.
            await github.rest.repos.createDispatchEvent({
              owner: context.repo.owner,
              repo: context.repo.repo,
              event_type: 'publish-pypi',
              client_payload: {
-                tag: process.env.RELEASE_TAG
+                tag: process.env.RELEASE_TAG,
+                prerelease: false
              }
            });
            console.log('Triggered PyPI publish workflow');

-      - name: Summary
-        env:
-          RELEASE_VERSION: ${{ needs.build.outputs.version }}
-        run: |
-          echo "Release $RELEASE_VERSION created successfully!"
-          echo "Triggered PyPI and Docker publishing workflows via repository_dispatch"
-          echo "SBOM files (SPDX, CycloneDX) attached to release"
-          echo "Sigstore bundles (.bundle) attached for verification (contain signature + certificate)"
-          echo "SLSA provenance (provenance.intoto.jsonl) attached for supply chain security"
-          echo "Check the releases page: https://github.com/LearningCircuit/local-deep-research/releases"
-
-  # Monitor publish workflows and create issue on partial failure
-  monitor-publish:
-    needs: [build, trigger-workflows]
-    if: ${{ !cancelled() && needs.trigger-workflows.result == 'success' }}
+  # Block on the dispatched publish.yml run so create-release downstream
+  # only fires once PyPI has actually shipped. If PyPI fails, this job
+  # fails and create-release is skipped — preventing the GH Release from
+  # publishing with a missing PyPI artifact. (Docker promote already
+  # blocked synchronously above via publish-docker reusable call.)
+  monitor-pypi:
+    needs: [build, trigger-pypi]
+    if: ${{ !cancelled() && needs.trigger-pypi.result == 'success' }}
    runs-on: ubuntu-latest
    timeout-minutes: 90
    permissions:
@@ -1103,120 +1218,106 @@ jobs:
        with:
          egress-policy: audit

-      - name: Wait for publish workflows to complete
+      - name: Wait for PyPI publish workflow to complete
+        id: wait
        env:
          GH_TOKEN: ${{ github.token }}
          # gh CLI cannot infer the repo here (this job has no checkout step),
          # and it does NOT fall back to GITHUB_REPOSITORY. Without GH_REPO,
          # `gh run list` fails with "failed to determine base repo", the
-          # error is swallowed, and every poll sees an empty result — which
-          # made monitor-publish always time out and falsely open a
-          # "Partial publish failure" issue even when both publishes succeeded.
+          # error is swallowed, and every poll sees an empty result.
          GH_REPO: ${{ github.repository }}
          RELEASE_TAG: ${{ needs.build.outputs.tag }}
-          DISPATCH_TIME: ${{ needs.trigger-workflows.outputs.dispatch_time }}
+          DISPATCH_TIME: ${{ needs.trigger-pypi.outputs.dispatch_time }}
        run: |
-          echo "Monitoring publish workflows for tag $RELEASE_TAG (dispatched at $DISPATCH_TIME)..."
+          # pipefail is essential here: the poll loop pipes `gh run list`
+          # into `jq`, and without pipefail a transient `gh` failure
+          # (network, auth, rate limit) is swallowed by `jq` returning
+          # empty input, causing the loop to spin silently for the full
+          # 40-minute budget rather than surfacing the error immediately.
+          set -euo pipefail
+          echo "Monitoring publish.yml for tag $RELEASE_TAG (dispatched at $DISPATCH_TIME)..."

-          # Wait for workflows to start (repository_dispatch is async)
+          # Wait for the dispatched run to start (repository_dispatch is async)
          sleep 30

-          check_workflow() {
-            local workflow_name="$1"
-            local max_wait=2400  # 40 minutes
-            local elapsed=0
-            local run status conclusion
+          max_wait=2400  # 40 minutes
+          elapsed=0
+          conclusion=""

-            while [ "$elapsed" -lt "$max_wait" ]; do
-              # Compare timestamps numerically — lexicographic comparison breaks
-              # when GitHub returns sub-second precision (e.g. "...:56.500Z"
-              # sorts before "...:56Z"). fromdateiso8601 doesn't accept
-              # fractional seconds, so strip them first. The 60s buffer
-              # tolerates minor clock skew between the runner and GitHub's API.
-              # Stderr is intentionally NOT redirected so gh failures are
-              # visible in the runner log (silent failures previously caused
-              # the loop to time out without explanation).
-              run=$(gh run list --workflow="$workflow_name" --limit=20 --json status,conclusion,createdAt \
-                | jq -r --arg since "$DISPATCH_TIME" '
-                    def to_epoch: sub("\\.[0-9]+Z$"; "Z") | fromdateiso8601;
-                    (($since | to_epoch) - 60) as $s
-                    | [.[] | select((.createdAt | to_epoch) >= $s)] | .[0]
-                  ')
-
-              if [ "$run" = "null" ] || [ -z "$run" ]; then
-                echo "$workflow_name: waiting for run to appear..." >&2
-                sleep 30
-                elapsed=$((elapsed + 30))
-                continue
-              fi
-
-              status=$(echo "$run" | jq -r '.status')
-              conclusion=$(echo "$run" | jq -r '.conclusion')
-
-              if [ "$status" = "completed" ]; then
-                echo "$workflow_name: $conclusion" >&2
-                echo "$conclusion"
-                return
-              fi
+          while [ "$elapsed" -lt "$max_wait" ]; do
+            # Compare timestamps numerically — lexicographic comparison
+            # breaks when GitHub returns sub-second precision (e.g.
+            # "...:56.500Z" sorts before "...:56Z"). fromdateiso8601
+            # doesn't accept fractional seconds, so strip them first. The
+            # 60s buffer tolerates minor clock skew between the runner and
+            # GitHub's API. Stderr is intentionally NOT redirected so gh
+            # failures stay visible in the runner log.
+            run=$(gh run list --workflow=publish.yml --limit=20 --json status,conclusion,createdAt \
+              | jq -r --arg since "$DISPATCH_TIME" '
+                  def to_epoch: sub("\\.[0-9]+Z$"; "Z") | fromdateiso8601;
+                  (($since | to_epoch) - 60) as $s
+                  | [.[] | select((.createdAt | to_epoch) >= $s)] | .[0]
+                ')

+            if [ "$run" = "null" ] || [ -z "$run" ]; then
+              echo "publish.yml: waiting for run to appear..." >&2
              sleep 30
              elapsed=$((elapsed + 30))
-            done
+              continue
+            fi

-            echo "$workflow_name: timed out after ${max_wait}s" >&2
-            echo "timed_out"
-          }
+            status=$(echo "$run" | jq -r '.status')
+            conclusion=$(echo "$run" | jq -r '.conclusion')

-          DOCKER_RESULT=$(check_workflow "docker-publish.yml")
-          PYPI_RESULT=$(check_workflow "publish.yml")
+            if [ "$status" = "completed" ]; then
+              echo "publish.yml: $conclusion" >&2
+              break
+            fi

-          # Mark that monitoring completed (distinguishes from infra failure)
-          {
-            echo "monitor_completed=true"
-            echo "docker_result=$DOCKER_RESULT"
-            echo "pypi_result=$PYPI_RESULT"
-          } >> "$GITHUB_ENV"
+            sleep 30
+            elapsed=$((elapsed + 30))
+          done

-          if [ "$DOCKER_RESULT" = "success" ] && [ "$PYPI_RESULT" = "success" ]; then
-            echo "Both publish workflows completed successfully!"
-          else
-            echo "::warning::Partial publish failure detected"
+          if [ -z "$conclusion" ]; then
+            echo "::error::publish.yml run did not start or complete within ${max_wait}s"
+            echo "pypi_result=timed_out" >> "$GITHUB_OUTPUT"
+            exit 1
          fi

-      - name: Create issue on partial failure
-        if: env.monitor_completed == 'true' && (env.docker_result != 'success' || env.pypi_result != 'success')
+          echo "pypi_result=$conclusion" >> "$GITHUB_OUTPUT"
+          if [ "$conclusion" = "success" ]; then
+            echo "PyPI publish completed successfully"
+            exit 0
+          else
+            echo "::error::PyPI publish completed with conclusion: $conclusion"
+            exit 1
+          fi
+
+      - name: Create issue on PyPI publish failure
+        # Always run on failure so maintainers see a clear, persistent
+        # record that captures the workflow run ID. The Actions UI shows
+        # the failure too but issues are easier to search and triage.
+        if: failure()
        uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
        env:
          RELEASE_TAG: ${{ needs.build.outputs.tag }}
+          PYPI_RESULT: ${{ steps.wait.outputs.pypi_result }}
        with:
          script: |
            const { owner, repo } = context.repo;
            const tag = process.env.RELEASE_TAG;
-            const docker = process.env.docker_result || 'unknown';
-            const pypi = process.env.pypi_result || 'unknown';
-            const title = `Partial publish failure for ${tag}`;
+            const pypi = process.env.PYPI_RESULT || 'unknown';
+            const title = `PyPI publish failure for ${tag}`;

-            const failed = [
-              { name: 'Docker', result: docker },
-              { name: 'PyPI', result: pypi },
-            ].filter(t => t.result !== 'success');
-            const hasTimeout = failed.some(t => t.result === 'timed_out');
-            const hasFailure = failed.some(t => t.result === 'failure');
+            const action = pypi === 'timed_out'
+              ? '**Suggested action:** publish.yml did not complete within 40 minutes. Check the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions) — if the run never appeared, this likely indicates a dispatch issue; re-trigger via `repository_dispatch` event_type `publish-pypi`. If still running, no action may be needed.'
+              : '**Suggested action:** Inspect publish.yml logs in the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions), fix the underlying cause, then either (a) re-dispatch publish.yml with the same tag via `repository_dispatch` and re-run release.yml to complete create-release, or (b) manually publish the GitHub Release if Docker promote also succeeded.';

-            let action;
-            if (hasTimeout && !hasFailure) {
-              action = '**Suggested action:** One or more workflows did not complete within 40 minutes. Check the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions) — if a run never appeared, this likely indicates an infrastructure or dispatch issue and the publish can be re-triggered. If still running, no action may be needed.';
-            } else if (hasFailure && !hasTimeout) {
-              action = '**Suggested action:** Inspect the failed workflow logs in the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions), fix the underlying cause, then re-trigger the publish via `repository_dispatch`.';
-            } else {
-              action = '**Suggested action:** Mixed results — investigate each non-success target individually in the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions).';
-            }
+            const body = `## PyPI publish failed for ${tag}\n\nResult: \`${pypi}\`\n\nNote: At this point in the atomicity flow, Docker promote (publish-docker job) has already succeeded — the Docker release tags exist and are signed. Only PyPI and the GitHub Release are missing.\n\n${action}`;

-            const body = `## Publish Status for ${tag}\n\n| Target | Result |\n|--------|--------|\n| Docker | ${docker} |\n| PyPI | ${pypi} |\n\n${action}`;
-
-            // De-dup: if an open issue with the same title already exists
-            // (e.g. from a re-run of this workflow on the same tag), comment
-            // on it instead of opening a duplicate.
+            // De-dup: if an open issue with the same title already exists,
+            // comment on it instead of opening a duplicate.
            const existing = await github.paginate(github.rest.issues.listForRepo, {
              owner, repo, state: 'open', labels: 'ci-cd', per_page: 100,
            });
@@ -1232,3 +1333,218 @@ jobs:
                owner, repo, title, body, labels: ['ci-cd'],
              });
            }
+
+  # ============================================================================
+  # CLEANUP ON REJECTION
+  # ============================================================================
+  # In the build-once-promote model, prerelease-docker signs the manifest
+  # BEFORE the publish step runs. If publish-docker (the retag step) fails
+  # or the maintainer rejects the `release` env approval, the prerelease
+  # tag and its cosign artifacts (`sha256-<digest>.{sig,att}`) are left
+  # orphaned on Docker Hub forever — the existing cleanup loop inside
+  # docker-publish.yml only runs on its success path.
+  #
+  # SAFETY: cosign signature/attestation artifacts are stored at a tag named
+  # `sha256-<manifest-digest>.{sig,att}` and discovered by manifest digest,
+  # NOT by image tag. After publish-docker SUCCEEDS, the release tags
+  # (`:VERSION`, `:MAJOR_MINOR`, `:latest`) share the prerelease manifest
+  # digest (imagetools retag preserves the digest), so the cosign artifacts
+  # anchor BOTH the deleted prerelease tag AND the live release tags.
+  # Deleting them after a successful retag would invalidate release-tag
+  # cosign verification.
+  #
+  # Therefore: this job ONLY fires when publish-docker did not succeed.
+  # Beyond that point, docker-publish.yml's success-path cleanup handles
+  # prerelease-tag deletion, and cosign artifacts must stay.
+  #
+  # Edge case: partial retag failure (e.g., `:1.6.9` lands but `:latest`
+  # fails) — publish-docker exits failure with some release tags already
+  # created. The cleanup script enumerates the three possible release tags
+  # against Docker Hub and rolls back any that exist BEFORE deleting
+  # cosign artifacts. This is the only case where we delete release tags
+  # rather than leave them.
+  # ============================================================================
+  cleanup-on-rejection:
+    name: Clean up orphan prerelease tags and signatures
+    needs: [build, prerelease-docker, publish-docker]
+    if: >-
+      ${{ always()
+          && needs.prerelease-docker.result != 'skipped'
+          && needs.build.outputs.release_exists == 'false'
+          && (needs.prerelease-docker.result == 'failure'
+              || needs.prerelease-docker.result == 'cancelled'
+              || needs.publish-docker.result == 'failure'
+              || needs.publish-docker.result == 'cancelled') }}
+    runs-on: ubuntu-latest
+    # DOCKER_USERNAME / DOCKER_PASSWORD are env-scoped to `release`
+    # (deliberately not repo-level — see comment on the build job above).
+    # Without `environment: release` here, those secrets resolve to empty
+    # strings and the Docker Hub login at the bottom of this job exits 1,
+    # leaving the orphan tags + cosign artifacts the cleanup is meant to
+    # remove. The `release` env approval was already granted upstream in
+    # this run, so this does not add a new prompt.
+    environment: release
+    permissions:
+      contents: read
+
+    steps:
+      - name: Harden the runner (Audit all outbound calls)
+        uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
+        with:
+          egress-policy: audit
+
+      - name: Roll back partial release tags then delete prerelease + cosign artifacts
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
+          VERSION: ${{ needs.build.outputs.version }}
+          SHORT_SHA: ${{ needs.build.outputs.short_sha }}
+          DIGEST: ${{ needs.prerelease-docker.outputs.manifest_digest }}
+        run: |
+          set -euo pipefail
+          # Authenticate with Docker Hub API (same JWT flow as docker-publish.yml cleanup)
+          TOKEN=$(curl -sS -X POST -H "Content-Type: application/json" \
+            -d "{\"username\":\"${DOCKER_USERNAME}\",\"password\":\"${DOCKER_PASSWORD}\"}" \
+            https://hub.docker.com/v2/users/login/ | jq -r .token)
+
+          if [ -z "$TOKEN" ] || [ "$TOKEN" = "null" ]; then
+            echo "::error::Failed to authenticate with Docker Hub API — cannot clean up orphans"
+            exit 1
+          fi
+
+          REPO="${DOCKER_USERNAME}/local-deep-research"
+          MANIFEST_TAG="prerelease-v${VERSION}-${SHORT_SHA}"
+          MAJOR_MINOR=$(echo "$VERSION" | cut -d. -f1,2)
+
+          delete_tag() {
+            local TAG="$1"
+            local STATUS
+            STATUS=$(curl -sS -o /dev/null -w "%{http_code}" -X DELETE \
+              -H "Authorization: JWT ${TOKEN}" \
+              "https://hub.docker.com/v2/repositories/${REPO}/tags/${TAG}/" || echo "ERR")
+            case "$STATUS" in
+              200|204) echo "Deleted ${TAG}";;
+              404)     echo "Skip ${TAG} (already absent — expected)";;
+              401|403)
+                echo "::error::Auth failure deleting ${TAG} — DOCKER_PASSWORD may be missing Delete scope on the Docker Hub PAT"
+                exit 1
+                ;;
+              *)       echo "::warning::Unexpected HTTP ${STATUS} for ${TAG}";;
+            esac
+          }
+
+          # =====================================================================
+          # STEP 1 — Partial retag rollback.
+          # =====================================================================
+          # If publish-docker failed mid-way through `docker buildx imagetools
+          # create -t :VERSION -t :MAJOR_MINOR -t :latest`, one or two of the
+          # three release tags may have landed. Those tags would point at the
+          # prerelease manifest digest, which we're about to delete cosign
+          # artifacts for. We MUST roll back any landed release tags BEFORE
+          # touching cosign — otherwise the rollback leaves them broken.
+          #
+          # If publish-docker was skipped (prerelease failed before reaching
+          # retag), the release tags can't exist, so this loop is a cheap no-op
+          # (three 404 responses).
+          # =====================================================================
+          echo "STEP 1 — Roll back any release tags that landed during a partial retag..."
+          for RELEASE_TAG in "${VERSION}" "${MAJOR_MINOR}" "latest"; do
+            # HEAD-check before DELETE so the "expected absent" case is silent.
+            # Docker Hub's tag endpoint returns 200 on HEAD if tag exists, 404 if not.
+            CHECK=$(curl -sS -o /dev/null -w "%{http_code}" \
+              -H "Authorization: JWT ${TOKEN}" \
+              "https://hub.docker.com/v2/repositories/${REPO}/tags/${RELEASE_TAG}/" || echo "ERR")
+            if [ "$CHECK" = "200" ]; then
+              echo "::warning::Release tag ${RELEASE_TAG} exists from a partial retag — rolling back"
+              delete_tag "${RELEASE_TAG}"
+            else
+              echo "Release tag ${RELEASE_TAG}: not present (HTTP ${CHECK}) — nothing to roll back"
+            fi
+          done
+
+          # =====================================================================
+          # STEP 2 — Prerelease tag cleanup.
+          # =====================================================================
+          # Always-safe targets: the prerelease manifest list + its per-arch
+          # children + the floating `:prerelease` tag. Cleanup is best-effort
+          # here (404s are normal — e.g., build failed before pushing arm64).
+          #
+          # The floating `:prerelease` tag (re-pointed by prerelease-docker.yml
+          # to the current run's manifest) is included so a rejected release
+          # does not leave `:prerelease` pointing at a manifest whose cosign
+          # artifacts step 4 below is about to delete — pulling `:prerelease`
+          # in that window would yield an image whose signature is gone, and
+          # the README cosign-verify recipe would fail. Returning 404 on
+          # `:prerelease` is unambiguously safer than serving an unverifiable
+          # image; the next successful prerelease-docker run re-creates it.
+          # =====================================================================
+          echo "STEP 2 — Delete prerelease tags..."
+          for TAG in "${MANIFEST_TAG}" "${MANIFEST_TAG}-amd64" "${MANIFEST_TAG}-arm64" "prerelease"; do
+            delete_tag "${TAG}"
+          done
+
+          # =====================================================================
+          # STEP 3 — Discover per-arch digests from the manifest list BEFORE
+          # we delete cosign artifacts.
+          # =====================================================================
+          # prerelease-docker.yml's per-arch SBOM step attests against each
+          # per-arch digest, producing artifacts at
+          # `sha256-<per-arch-digest>.{sig,att,sbom}` in addition to the
+          # manifest-list-digest artifacts. Need to enumerate from the manifest
+          # while it still exists (we just deleted the TAG, but the manifest
+          # body persists until Docker Hub GC, and we can inspect by digest).
+          # Use the captured DIGEST to inspect the manifest list directly.
+          # =====================================================================
+          PER_ARCH_TAGS=()
+          if [[ -n "${DIGEST:-}" && "$DIGEST" == sha256:* ]]; then
+            echo "STEP 3 — Discovering per-arch digests from manifest list ${DIGEST}..."
+            if docker buildx imagetools inspect "${REPO}@${DIGEST}" --raw > /tmp/manifest.json 2>/dev/null; then
+              while IFS= read -r PER_ARCH_DIGEST; do
+                if [[ -n "${PER_ARCH_DIGEST}" && "${PER_ARCH_DIGEST}" == sha256:* ]]; then
+                  PER_ARCH_TAG_PREFIX="${PER_ARCH_DIGEST/:/-}"
+                  PER_ARCH_TAGS+=(
+                    "${PER_ARCH_TAG_PREFIX}.sig"
+                    "${PER_ARCH_TAG_PREFIX}.att"
+                    "${PER_ARCH_TAG_PREFIX}.sbom"
+                  )
+                  echo "  queued cleanup for per-arch artifacts at ${PER_ARCH_DIGEST}"
+                fi
+              done < <(jq -r '.manifests[] | select(.platform.architecture != "unknown") | .digest' /tmp/manifest.json)
+            else
+              echo "::warning::Could not inspect manifest at ${DIGEST} — skipping per-arch attestation cleanup"
+            fi
+          else
+            echo "STEP 3 — No valid manifest digest captured (got '${DIGEST:-<empty>}'); skipping per-arch discovery"
+          fi
+
+          # =====================================================================
+          # STEP 4 — Cosign signature/attestation artifact cleanup.
+          # =====================================================================
+          # Safe to delete only because STEP 1 already rolled back any release
+          # tags that might be sharing these digests. The .sbom entries are
+          # legacy from `cosign attach sbom` (current code uses `cosign attest
+          # --type spdxjson` which writes to .att) — kept as belt-and-suspenders
+          # cleanup for any leftovers from older releases.
+          # =====================================================================
+          COSIGN_TAGS=()
+          if [[ -n "${DIGEST:-}" && "$DIGEST" == sha256:* ]]; then
+            DIGEST_TAG_PREFIX="${DIGEST/:/-}"
+            COSIGN_TAGS+=(
+              "${DIGEST_TAG_PREFIX}.sig"
+              "${DIGEST_TAG_PREFIX}.att"
+              "${DIGEST_TAG_PREFIX}.sbom"
+            )
+          fi
+          COSIGN_TAGS+=("${PER_ARCH_TAGS[@]}")
+
+          if [ "${#COSIGN_TAGS[@]}" -gt 0 ]; then
+            echo "STEP 4 — Delete cosign signature/attestation artifacts..."
+            for TAG in "${COSIGN_TAGS[@]}"; do
+              delete_tag "${TAG}"
+            done
+          else
+            echo "STEP 4 — No cosign artifact tags to clean up (no captured digest)"
+          fi
+          # NOTE: NO `continue-on-error: true` at job level — auth failures
+          # (401/403) are loud so missing token scopes can't silently let
+          # orphans accumulate forever.
--- a/49
+++ b/49
@@ -8,6 +8,14 @@ SHELL ["/bin/bash", "-o", "pipefail", "-c"]

 ARG DEBIAN_FRONTEND=noninteractive

+# `apt-get upgrade -y` is INTENTIONAL — we want every build to pull the
+# latest patched Debian packages so security fixes flow into the image.
+# This trades bit-for-bit reproducibility (two rebuilds of the same source
+# can produce different layer digests across a Debian patch window) for
+# always-fresh-on-CVE behavior. The build-once-promote pipeline mitigates
+# the reproducibility loss: prerelease-docker.yml builds once per release
+# and the resulting digest is what gets retagged to :1.6.9 / :1.6 / :latest,
+# so the released image is bit-identical to the one tested.
 # Install system dependencies for SQLCipher and Node.js for frontend build
 # Using Acquire::Retries to handle transient Debian mirror errors during CI
 RUN apt-get update -o Acquire::Retries=3 && apt-get upgrade -y -o Acquire::Retries=3 \
@@ -63,9 +71,11 @@ ENV PDM_CHECK_UPDATE=false
 # This helps prevent httpcore.ReadTimeout errors during CI network congestion
 ENV PDM_REQUEST_TIMEOUT=120

-# Build argument to invalidate cache when dependencies change
-ARG DEPS_HASH
-
+# NOTE: `DEPS_HASH` was previously declared as a cache-invalidation arg but
+# never referenced in a RUN/COPY, so it had no effect — Docker only honors
+# ARG values for cache when they're actually used downstream. Cache
+# invalidation on dependency changes happens naturally via `COPY pdm.lock`
+# below, since the file's content hash changes when deps change.
 WORKDIR /install

 # Copy dependency files first (changes rarely)
@@ -120,6 +130,8 @@ ARG DEBIAN_FRONTEND=noninteractive
 # Install additional runtime dependencies for testing tools
 # Note: Node.js is already installed from builder-base
 # Using Acquire::Retries to handle transient Debian mirror errors during CI
+# `apt-get upgrade -y` is INTENTIONAL — see the rationale comment on the
+# corresponding upgrade in the builder-base stage (top of file).
 RUN apt-get update -o Acquire::Retries=3 && apt-get upgrade -y -o Acquire::Retries=3 \
    && apt-get install -y --no-install-recommends -o Acquire::Retries=3 \
    xauth \
@@ -238,7 +250,9 @@ ARG DEBIAN_FRONTEND=noninteractive
 # — Scorecard alert #7742 dismissed as won't-fix on the same basis.
 RUN pip3 install --no-cache-dir pip==26.1

-# Install runtime dependencies for SQLCipher and WeasyPrint
+# Install runtime dependencies for SQLCipher and WeasyPrint.
+# `apt-get upgrade -y` is INTENTIONAL — see rationale on the builder-base
+# upgrade (top of file). Trade reproducibility for always-fresh CVE patches.
 RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y --no-install-recommends \
    sqlcipher \
@@ -294,13 +308,30 @@ RUN HOME=/home/ldruser setpriv --reuid=ldruser --regid=ldruser --init-groups --
    sqlcipher = get_sqlcipher_module(); \
    print(f'✓ SQLCipher module loaded successfully: {sqlcipher}')"

-# Create volume for persistent configuration
-# Use /app for configuration to support non-root user
+# Persistent state. Without VOLUME directives the user loses all research
+# data + DBs on `docker rm`. Recommend bind-mounting these in production.
+# - /app/.config/local_deep_research: legacy config path (kept for backcompat)
+# - /data: where the entrypoint creates logs/, cache/, encrypted_databases/ —
+#   the actual user state, see scripts/ldr_entrypoint.sh.
+#
+# LDR_DATA_DIR pins the application to /data. Without this, the Python
+# code falls back to platformdirs.user_data_dir() which resolves to
+# /home/ldruser/.local/share/local-deep-research — NOT under any
+# declared VOLUME, so a `docker run -v vol:/data ...` user (without
+# also setting -e LDR_DATA_DIR=/data) would silently lose all data on
+# `docker rm`. Documented run paths (docker-compose.yml, README docker
+# run examples) already pass this env var explicitly; setting it here
+# makes the VOLUME actually load-bearing for bare `docker run -v ...`
+# invocations too.
+ENV LDR_DATA_DIR=/data
 VOLUME /app/.config/local_deep_research
+VOLUME /data

-# Create volume for Ollama start script
-VOLUME /scripts/
-# Copy the Ollama entrypoint script
+# NOTE: /scripts/ is image content (ollama entrypoint baked in below), NOT
+# user state. Previously declared as VOLUME, but a VOLUME on a directory
+# that the image populates causes anonymous-volume creation on every
+# `docker run` and silently shadows the script if a user bind-mounts it.
+# Removed for correctness.
 COPY --chown=ldruser:ldruser scripts/ollama_entrypoint.sh /scripts/ollama_entrypoint.sh

 # Copy LDR entrypoint script to handle volume permissions
--- a/README.md
+++ b/README.md
@@ -158,11 +158,38 @@ Your data stays yours. Each user gets their own isolated SQLCipher database encr

 **In-memory credentials**: Like all applications that use secrets at runtime — including [password managers](https://www.ise.io/casestudies/password-manager-hacking/), browsers, and API clients — credentials are held in plain text in process memory during active sessions. This is an [industry-wide accepted reality](https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html), not specific to LDR: if an attacker can read process memory, they can also read any in-process decryption key. We mitigate this with session-scoped credential lifetimes and core dump exclusion. Ideas for further improvements are always welcome via [GitHub Issues](https://github.com/LearningCircuit/local-deep-research/issues). See our [Security Policy](SECURITY.md) for details.

-**Supply Chain Security**: Docker images are signed with [Cosign](https://github.com/sigstore/cosign), include SLSA provenance attestations, and attach SBOMs. Verify with:
+**Supply Chain Security**: Docker images are signed with [Cosign](https://github.com/sigstore/cosign) using GitHub's keyless OIDC flow, include SLSA provenance attestations, and ship with attested SPDX SBOMs. Verify the image and its SBOM before running:
+
 ```bash
-cosign verify localdeepresearch/local-deep-research:latest
+# 1. Verify image signature
+cosign verify \
+  --certificate-identity-regexp "^https://github\.com/LearningCircuit/local-deep-research/\.github/workflows/prerelease-docker\.yml@.*$" \
+  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
+  --certificate-github-workflow-repository "LearningCircuit/local-deep-research" \
+  localdeepresearch/local-deep-research:latest
+
+# 2. Verify SBOM attestation (SPDX JSON) for YOUR platform
+#    SBOM attestations are stored per-architecture (amd64, arm64) on the
+#    per-arch image digest, not on the multi-arch manifest list. Resolve to
+#    your platform's digest first.
+ARCH=$(uname -m | sed -e 's/^x86_64$/amd64/' -e 's/^aarch64$/arm64/')
+PLATFORM_DIGEST=$(docker buildx imagetools inspect localdeepresearch/local-deep-research:latest --raw \
+  | jq -r --arg arch "$ARCH" '.manifests[] | select(.platform.architecture==$arch) | .digest')
+if [ -z "$PLATFORM_DIGEST" ]; then
+  echo "No per-arch digest found for $ARCH — image may be single-arch or" \
+       "from a pre-build-once-promote release. Skip step 2 in that case."
+  exit 1
+fi
+cosign verify-attestation \
+  --type spdxjson \
+  --certificate-identity-regexp "^https://github\.com/LearningCircuit/local-deep-research/\.github/workflows/prerelease-docker\.yml@.*$" \
+  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
+  --certificate-github-workflow-repository "LearningCircuit/local-deep-research" \
+  "localdeepresearch/local-deep-research@${PLATFORM_DIGEST}"
 ```

+The image-signature check confirms the image was built by the official `prerelease-docker.yml` workflow in `LearningCircuit/local-deep-research` — not by a forked repo or a leaked credential. The per-platform SBOM verification ensures you're inspecting the actual package set you're going to run, not the SBOM of a different architecture. Requires [cosign v2.0+](https://docs.sigstore.dev/cosign/installation/), [`jq`](https://jqlang.github.io/jq/), and `docker buildx` (bundled with Docker Desktop and Docker Engine ≥ 23.0; install the standalone plugin on older installs). Releases before the build-once-promote refactor were signed by `docker-publish.yml` and carried a single manifest-level SBOM rather than per-arch ones; for those, substitute `docker-publish.yml` for `prerelease-docker.yml` in the regex on both steps and skip the per-platform digest lookup (use the manifest list tag directly).
+
 **Security Transparency**: Scanner suppressions are documented with justifications in [Security Alerts Assessment](.github/SECURITY_ALERTS.md), [Scorecard Compliance](.github/SECURITY_SCORECARD.md), [Container CVE Suppressions](.trivyignore), and [SAST Rule Rationale](bearer.yml). Some alerts (Dependabot, code scanning) can only be dismissed or are very difficult to suppress outside the [GitHub Security tab](https://docs.github.com/en/code-security/dependabot/dependabot-alerts/viewing-and-updating-dependabot-alerts), so the files above do not cover every dismissed finding.

 [Detailed Architecture →](docs/architecture.md) | [Security Policy →](SECURITY.md) | [Security Review Process →](docs/processes/security-review-process/)
--- a/docs/CI_CD_INFRASTRUCTURE.md
+++ b/docs/CI_CD_INFRASTRUCTURE.md
@@ -129,10 +129,11 @@ pre-commit install-hooks

 | Workflow | Trigger | Purpose |
 |----------|---------|---------|
-| `docker-publish.yml` | Release, push | Build and publish Docker images |
+| `prerelease-docker.yml` | `workflow_call` from release.yml | Canonical multi-arch Docker build, cosign sign, SBOM/SLSA attestations. Jobs declare `environment: release` so the first `release` env approval gates the build (env-scoped Docker Hub secrets). |
+| `docker-publish.yml` | `workflow_call` from release.yml | Retag prerelease manifest as `:1.6.9` / `:1.6` / `:latest` (gated by `release` env). No rebuild — registry-side metadata only. Inlined as a reusable workflow so its result is visible to downstream jobs in release.yml (lets create-release block on Docker success, lets cleanup-on-rejection safely scope cosign artifact deletion). |
 | `docker-multiarch-test.yml` | PR, push | Multi-architecture build test |
-| `publish.yml` | Release | Publish to PyPI |
-| `release.yml` | Manual | Create releases |
+| `publish.yml` | `repository_dispatch` from release.yml | Publish to PyPI. Stays on `repository_dispatch` (not `workflow_call`) because PyPI Trusted Publishing rejects OIDC claims from reusable workflows — `pypa/gh-action-pypi-publish#166`, `pypi/warehouse#11096`. |
+| `release.yml` | Push to `main`, tag `v*.*.*`, manual | Orchestrate release: gates → build → provenance → prerelease-docker → publish-docker → trigger-pypi → monitor-pypi → create-release (last) |

 ### Code Quality

--- a/docs/RELEASE_GUIDE.md
+++ b/docs/RELEASE_GUIDE.md
@@ -43,10 +43,82 @@ and short-circuits everything downstream).
  `version-check` job sets `should_release=false` and every downstream
  job (security gate, CI gate, build, publish) is skipped.

-### 2. **Automatic Publishing** (with approval)
- **GitHub Release** → triggers:
-  - **PyPI publishing** (requires `release` environment approval)
-  - **Docker publishing** (requires `release` environment approval)
+### 2. **Approval and Publishing**
+
+The release pipeline uses the `release` GitHub environment to gate the
+publish steps. `DOCKER_USERNAME` / `DOCKER_PASSWORD` are scoped to that
+environment, so any job that pushes to Docker Hub must declare
+`environment: release` and therefore goes through the approval gate.
+
+When you merge to `main` (or push a tag), the pipeline runs in this
+order:
+
+1. Security gates + CI gates run automatically.
+2. `build` job runs (version pin, SBOM, Sigstore bundles), then
+   `provenance` job generates SLSA provenance for those artifacts.
+3. **One `release` env approval prompt** in `release.yml`. Approving
+   unlocks all release-env jobs in the same run, which then execute
+   sequentially:
+   1. `prerelease-docker` — canonical multi-arch Docker build, cosign
+      sign, SBOM/SLSA attestations, push as `prerelease-v<ver>-<sha>`
+      and re-point the floating `:prerelease` tag.
+   2. `publish-docker` — retags the prerelease manifest as `:1.6.9`,
+      `:1.6`, `:latest` (no rebuild, digest-preserving), then re-verifies
+      digest + cosign + Trivy on the promoted tag.
+   3. `trigger-pypi` — dispatches `publish.yml` via `repository_dispatch`
+      (PyPI Trusted Publishing requires the publish step to run in a
+      top-level workflow, so this can't be a reusable workflow_call).
+   4. `monitor-pypi` — polls `publish.yml` for completion. The inner
+      polling loop times out at 40 minutes (after which the job fails);
+      the surrounding GH Actions `timeout-minutes` is 90 to leave a
+      safety margin around the poll budget.
+   5. `create-release` — publishes the GitHub Release with
+      SBOM/sig/provenance assets. Runs **last**, gated on all of the
+      above succeeding, so the public Release never points at missing
+      Docker tags or a missing PyPI version.
+
+If any of `prerelease-docker`, `publish-docker`, or `monitor-pypi` fails,
+`create-release` is skipped and no public GitHub Release is created. The
+`cleanup-on-rejection` job then handles failure-mode cleanup:
+
+- If `publish-docker` failed mid-retag (e.g., `:1.6.9` landed but
+  `:latest` failed), it rolls back any landed release tags BEFORE
+  deleting prerelease tags and cosign artifacts (deleting cosign
+  artifacts while release tags share the manifest digest would invalidate
+  release-tag signatures).
+- If `publish-docker` succeeded but a later step (PyPI or
+  create-release) failed, `cleanup-on-rejection` does NOT fire — Docker
+  release tags exist and their cosign artifacts must stay. See
+  "Recovery from PyPI failure" below.
+
+### Recovery from PyPI failure (atomicity hole)
+
+The one orphan state the pipeline cannot fully clean up: `publish-docker`
+succeeded, PyPI failed. At this point Docker `:1.6.9` / `:1.6` /
+`:latest` exist and are signed; PyPI has nothing; no GitHub Release.
+`monitor-pypi` opens a tracking issue labeled `ci-cd`. To recover:
+
+1. Inspect the `publish.yml` workflow run, fix the underlying cause.
+2. Manually re-dispatch PyPI publish:
+   ```bash
+   gh api repos/LearningCircuit/local-deep-research/dispatches \
+     -f event_type=publish-pypi \
+     -F 'client_payload[tag]=v<X.Y.Z>'
+   ```
+3. Once PyPI publishes successfully, manually create the GitHub Release
+   from the existing tag (the SBOM/sig/provenance artifacts are still
+   uploaded as workflow artifacts on the failed `release.yml` run; you
+   can download them and attach manually, or re-run `create-release`
+   manually if the run is still re-runnable in the Actions UI).
+
+> Earlier iterations of this refactor described a single approval gate
+> with a pre-approval testing window. That design required
+> `DOCKER_USERNAME` / `DOCKER_PASSWORD` to be repo-level secrets so the
+> canonical build could run without env approval. They are env-scoped to
+> `release` instead, so the gate sits in front of the build. The
+> atomicity refactor preserves this single-approval model — one click
+> unlocks the whole chain, and create-release runs last so the
+> "published Release with broken artifacts" failure mode is closed.

 ## 👥 Who Can Release

@@ -95,11 +167,14 @@ with auto-notes plus AI summary only).

 ### Option A: Manual Trigger
 - Go to Actions → "Create Release" → "Run workflow"
- Specify version and prerelease flag
+- No inputs are required: the workflow reads the version from
+  `src/local_deep_research/__version__.py` at HEAD. To release an
+  older or different version, use Option B (push a version tag).

 ### Option B: Version Tags
 - `git tag v0.4.3 && git push origin v0.4.3`
- Automatically creates release
+- Automatically creates release; the workflow uses the tag's commit
+  SHA (not `main` HEAD), so this is the correct path for backporting.

 ## 🛡️ Branch Protection

@@ -117,10 +192,27 @@ Follow [Semantic Versioning](https://semver.org/):

 ## 🚨 Emergency Procedures

-If automation fails:
-1. **Manual GitHub release** still triggers PyPI/Docker
-2. **Contact code owners** for assistance
-3. **Check workflow logs** in GitHub Actions
+If automation fails, do NOT create a GitHub release through the UI as
+the first recovery step — under the atomicity refactor, a manually
+created GitHub release does NOT trigger `publish.yml` (it listens only
+on `repository_dispatch`) and does NOT trigger `docker-publish.yml`
+(workflow_call only). The downstream `release:` listeners that DO fire
+(`backwards-compatibility.yml`, `sbom.yml`) are observability-only.
+
+Recovery, in order of preference:
+1. **Check workflow logs** in GitHub Actions to identify which job
+   failed, and use the targeted recovery for that failure mode:
+   - PyPI failure with Docker already promoted: see
+     [Recovery from PyPI failure](#recovery-from-pypi-failure-atomicity-hole) above.
+   - Any other failure: re-run the failed job via the Actions UI if
+     it's still re-runnable (typically within 30 days).
+2. **Re-trigger the full pipeline** via `workflow_dispatch` if
+   re-running individual jobs isn't possible. Safe for digest-keyed
+   cosign verification — old digests remain valid because their cosign
+   artifacts persist; the new run produces a new digest with its own
+   signatures.
+3. **Contact code owners** if recovery requires manual Docker Hub or
+   PyPI intervention.

 ## 📝 Release-notes flow (towncrier news fragments)