ci(release): build-once-promote refactor for Docker pipeline (#3977)

* ci(release): build-once-promote refactor for Docker pipeline

Today the release pipeline builds the Docker image twice — once in
prerelease-docker.yml for "testing" and again in docker-publish.yml for
the actual release. The image you tested is not the image you ship: base
layer patches, transitive deps, and apt/pip resolution can diverge
between the two builds.

This refactor makes prerelease-docker.yml the canonical build and turns
docker-publish.yml into a thin retag step. `docker buildx imagetools
create` is a registry-side metadata operation that takes seconds and
preserves the manifest digest, so the released image is bit-identical
to the one tested. Cosign signatures, SBOM attestations, and SLSA
provenance are stored at sha256-<digest>.{sig,att} keyed by digest, so
signing once in prerelease covers the release tags transitively.

Pipeline shape changes:

- prerelease-docker.yml is now a reusable workflow (workflow_call) called
  from release.yml. It builds, scans (Trivy), signs (cosign), attests
  the SBOM (cosign attest --type spdxjson, replacing the deprecated
  cosign attach sbom), and emits SLSA provenance. The manifest_digest is
  exposed as a workflow output. The `prerelease` environment gates the
  first build job for human approval.
- docker-publish.yml shrinks from ~457 to ~250 lines. It receives source_tag
  and expected_digest in the dispatch payload, verifies the source digest
  before retag, retags via imagetools create, verifies the digest is
  preserved (defense against re-encoding), re-runs Trivy against the
  digest (catches CVE-DB updates between prerelease and promote),
  verifies the cosign signature transitivity, and runs the existing
  prerelease cleanup loop.
- release.yml adds prerelease-docker to create-release.needs and
  trigger-workflows.needs, so the GitHub Release and the publish dispatch
  only happen after the canonical Docker build completes. The dispatch
  payload now carries source_tag and expected_digest. A new
  cleanup-on-rejection job removes orphan prerelease tags and cosign
  artifacts when the release is rejected (without it, every rejection
  would leave dangling sha256-<digest>.{sig,att} on Docker Hub).
- README cosign verify example updated to the keyless invocation users
  actually need (identity regex pointing at prerelease-docker.yml,
  --certificate-oidc-issuer, --certificate-github-workflow-repository),
  plus the SBOM verify-attestation command.

Notable design decisions (verified across multiple subagent review
rounds):

- SLSA provenance entryPoint stays as release.yml (the top-level caller).
  Per the SLSA GHA buildtype v1 spec and the canonical
  slsa-github-generator behavior, reusable workflows are explicitly NOT
  entryPoints — pointing at prerelease-docker.yml would break verifier
  policies that allowlist trigger workflows.
- Cosign cert identity for verification matches Fulcio's SAN URI, which
  is built from job_workflow_ref — the CALLEE for reusable workflows. So
  the identity regex matches prerelease-docker.yml even though the build
  is invoked from release.yml. Hardened with escaped dots, refs/(heads|tags)/
  constraint, and --certificate-github-workflow-repository to defend
  against the reusable-workflow-identity-reuse class of attacks.
- cleanup-on-rejection uses an allowlist if (failure || cancelled), not
  a denylist (!= 'success'), to avoid firing on `skipped` (e.g. when
  release_exists short-circuits the run). It also fails loudly on 401/403
  from the Docker Hub API so a missing Delete scope on the PAT can't
  silently let orphans accumulate.

Supersedes #3969 (split-environment): the env split is preserved by the
new structure — prerelease env on the called workflow's first job,
release env on create-release/trigger-workflows.

Pre-merge checklist for the maintainer:
- Create the `prerelease` environment in GitHub Settings with the same
  required reviewers as `release`. Without it, the called workflow's
  approval gate auto-creates the env with no protection rules and
  silently approves the build.
- Verify DOCKER_USERNAME / DOCKER_PASSWORD remain repo-level secrets
  (they currently are). Environment-scoped secrets do not propagate
  across reusable workflow calls except via the called job's own
  environment.

* ci(release): fixes from multi-round subagent review

Round 1 surfaced 14 candidate findings; Round 2 verified 7 as real bugs and
refuted 4 as false positives. This commit applies the verified fixes.

CONFIRMED bugs fixed:

1. **Approval gate was per-job, not workflow-wide.** The previous
   `environment: prerelease` on `build-amd64` only let `build-arm64` and
   `security-scan` run pre-approval (GitHub environments are
   job-scoped per docs + community/discussions/174381). Replaced with a
   sentinel `approval-gate` job that all three build jobs `needs:`. Single
   approval click still gates everything, but now actually blocks all
   parallel jobs.

2. **`cleanup-on-rejection` if-condition missed the prerelease-rejection
   path.** When prerelease-docker.result was `failure`, both create-release
   and trigger-workflows became `skipped` (their `if:` requires success),
   and the cleanup `if:` only fired on `failure`/`cancelled` of dependents.
   Added explicit `prerelease-docker.result == 'failure'` clause so the
   most common rejection path actually triggers cleanup.

3. **Trivy re-scan ran AFTER retag.** A failing scan would leave release
   tags `:1.6.9`, `:1.6`, `:latest` publicly published with no rollback.
   Reordered: scan source digest BEFORE retag. Content is bit-identical
   (same digest), so scanning the prerelease tag tests what would be
   promoted — but failure now leaves no public broken tags. Also moved
   cosign verify before retag for the same reason.

4. **Trivy only scanned linux/amd64 by default** against a manifest list
   digest (per Trivy docs + aquasecurity/trivy#7847). Replaced single scan
   with two explicit per-platform invocations
   (`--platform linux/amd64`, then `linux/arm64`) so arm64 layers are also
   gated by the freshness check.

5. **Trivy DB freshness wasn't guaranteed.** apt-installed Trivy may use a
   stale embedded DB. Added explicit `trivy image --download-db-only`
   before the scans so the CVE-DB freshness window the re-scan exists for
   is actually exercised.

6. **`cosign attest` re-runs accumulated attestation layers** (verified via
   cosign 2.x `mutate.go` `dedupeAndReplace`). Added `--replace` to both
   attest calls (SLSA provenance + SBOM). Sigstore spec allows multi-sig
   so `cosign sign` is left as-is.

7. **SLSA provenance values inherited from old code were misleading.**
   - `builder.id`: changed from `https://github.com/actions/runner` (the
     agent binary) to the workflow ref the build is actually defined in
     (per SLSA v0.2 spec — builder.id should be a verifiable trust root).
   - `completeness.{parameters,environment,materials}`: flipped from
     `true` to `false`. The predicate captures no workflow_call inputs,
     no environment, and the build does network I/O — claiming
     completeness was a public signed false statement.
   - `buildInvocationId`: now includes `${run_id}-${run_attempt}` so
     re-runs are distinguishable.

REFUTED (kept as-is, with confidence):

- `imagetools create` does NOT change the digest in this case. Buildx's
  Combine() in util/imagetools/create.go has an explicit short-circuit
  for single-source manifest-list inputs that returns the bytes
  byte-for-byte (no annotations + same registry required, both true here).
- Concurrent rejection digest collision is not a real concern — Docker
  builds in this pipeline are not bit-deterministic (apt, network, file
  timestamps, default provenance attestations all vary).
- The `prerelease-v1.6.9-*` cleanup pattern does NOT collide with
  `prerelease-v1.6.91-*` (trailing dash in the prefix disambiguates).
- Reusable-workflow approval prompts appear inline on the caller run
  page for single-level calls — not a UX regression.

* ci(release): revert most Round 2 review additions

Keep the build-once-promote refactor's structural shape but back out the
defensive additions from commit 68606b299:

- approval-gate sentinel job → revert to `environment: prerelease` on
  build-amd64 only
- SLSA builder.id, completeness flags, buildInvocationId → revert to
  inherited values from the previous docker-publish.yml
- `cosign attest --replace` → drop, accept default append behavior
- Pre-promote Trivy + multi-platform scans + db refresh + pre-promote
  cosign verify → revert to single post-promote scan and post-promote
  cosign verify
- cleanup-on-rejection if-condition → drop the
  `prerelease-docker.result == 'failure'` allowlist clause

Rationale: keep the change set minimal vs main. The defensive additions
were correct in isolation but expand scope of this PR.

* fix(ci): drop invalid --trivyignores flag from raw trivy CLI invocation

The Round 2 promote step used `--trivyignores .trivyignore`, which is the
INPUT name of the aquasecurity/trivy-action wrapper, not a flag of the
raw Trivy binary. The CLI accepts only `--ignorefile` (singular) and
auto-loads `.trivyignore` from cwd by default.

As-was, every release run would hard-fail with `unknown flag:
--trivyignores` from cobra/pflag before any scanning occurred. Removing
the flag is sufficient — Trivy auto-loads the ignorefile from the
checkout root.

prerelease-docker.yml is unaffected: it uses the action wrapper with
`trivyignores: '.trivyignore'` as input, which IS correct usage for the
action layer (it translates to --ignorefile internally via
TRIVY_IGNOREFILE).

Sources:
- https://trivy.dev/latest/docs/references/configuration/cli/trivy_image/
- https://github.com/aquasecurity/trivy-action/blob/master/action.yaml

* ci(release): apply remaining bugs from multi-round review

After Round 4 verification confirmed several deferred findings, applying
the bug fixes the user explicitly requested:

1. Re-introduce the `approval-gate` sentinel job in prerelease-docker.yml.
   GitHub Actions environments are job-scoped, so without a gate sentinel
   `build-arm64` and `security-scan` would run pre-approval — pushing
   the `-arm64` per-arch tag and consuming Trivy minutes regardless of
   whether the maintainer approved or rejected the gate. Single approval
   click still gates everything via `needs: [approval-gate]`.

2. Fix the SLSA `builder.id` to use `${{ github.workflow_ref }}` instead
   of the inherited `https://github.com/actions/runner` agent identity.
   `workflow_ref` resolves to the canonical
   `<owner>/<repo>/.github/workflows/<file>.yml@<callee-ref>` format that
   matches slsa-github-generator's output and that verifier policies can
   pin against.

3. Flip SLSA `completeness.{parameters,environment,materials}` from
   `true` to `false`. The predicate captures no workflow_call inputs, no
   environment, and the build does network I/O — claiming completeness
   was a public signed false statement.

4. Add `${{ github.run_attempt }}` to the SLSA `buildInvocationId` so
   "Re-run failed jobs" attempts are distinguishable.

5. Expand `cleanup-on-rejection` `if:` to include
   `prerelease-docker.result == 'failure'` and `'cancelled'`. Without
   these clauses, the most common rejection path (env approval rejected
   for prerelease) leaves dependents `skipped`, which the existing
   allowlist doesn't match — orphan tags persist on Docker Hub forever.

6. Drop unused `packages: write` from both the called workflow and the
   caller's reusable-workflow block. Docker Hub auth uses
   DOCKER_PASSWORD, not GITHUB_TOKEN; `packages: write` only matters for
   ghcr.io which the project doesn't use.

7. Update `docs/CI_CD_INFRASTRUCTURE.md` Build & Deploy table to reflect
   the build-once-promote split.

8. Update `docs/RELEASE_GUIDE.md` "Automatic Publishing" section to
   describe both approval gates (`prerelease` and `release`).

* ci(release): R5/R6 review fixes — cosign pin, multi-arch SBOM, orphan SBOM

Round 5 (10 agents) and Round 6 (5 agents debunking) verified these
findings, all of which are now applied:

1. **Pin cosign to v2.6.0**. R6A2 verified that `sigstore/cosign-installer@v4.1.2`
   ships cosign v3.0.6 by default. cosign v3 enables `--new-bundle-format`
   ON BY DEFAULT, which changes the on-wire signature/attestation format.
   Mismatched version across sign/verify works in-pipeline (both on v3),
   but downstream verifiers running the README cosign-verify recipe on v2
   would fail. Pinning all three cosign-installer steps to v2.6.0 keeps
   the legacy tag-based sigstore format until we deliberately migrate
   the entire ecosystem.

2. **Multi-arch SBOM via per-arch attestations**. R6A3 verified the claim
   (anchore/syft#1708, actions/attest-sbom#60): syft against a manifest
   list digest only scans the host platform's layers. The previous SBOM
   attestation against the manifest digest claimed to describe both
   amd64 + arm64 but actually only enumerated amd64. ARM64 consumers
   were verifying a misleading SBOM. Fix: iterate over manifest entries
   from `imagetools inspect --raw`, run `syft --platform <plat>` against
   each per-arch digest, and `cosign attest --replace --type spdxjson`
   each per-arch SBOM against the per-arch digest. ALSO keep a
   manifest-list-level SBOM (host arch only) so end-users running
   `cosign verify-attestation user/img:latest` don't get an empty result.

3. **Re-add `--replace` to cosign attest** (both SLSA and SPDX). R5A7's
   deeper analysis enumerated specific failure modes beyond cosmetic
   clutter: Kyverno `count: 1` policies, registry layer count caps,
   audit ambiguity (verify returns success on first matching layer),
   Rekor entry bloat. R3A5 already confirmed `--replace` is per-
   predicate-type, so SLSA and SPDX attestations don't disturb each
   other.

4. **Container-image SBOM no longer orphaned**. R6A4 verified: the
   Syft-produced container SBOMs were uploaded as artifact `sbom` from
   prerelease-docker.yml but never downloaded by `create-release` — they
   were invisible on the GitHub Release page. Fix: download the `sbom`
   artifact, rename to `sbom-container-*` to disambiguate from the
   filesystem `sbom-spdx.json`, and attach to `gh release create`.

5. **Narrow `secrets: inherit` to explicit secrets**. R5A3 flagged that
   `secrets: inherit` propagates ALL repo secrets (PAT_TOKEN,
   OPENROUTER_API_KEY, SERPER_API_KEY, GITHUB_TOKEN) into a workflow
   that only needs Docker Hub creds. Replaced with explicit
   `DOCKER_USERNAME` + `DOCKER_PASSWORD` mapping; the called workflow
   now declares these as required `workflow_call.secrets`.

6. **Drop unused `DEPS_HASH` build-arg**. R5A2 confirmed it was declared
   in the Dockerfile but never referenced in any RUN/COPY, so it never
   busted the Docker layer cache. Cache invalidation already happens
   correctly via `COPY pdm.lock` (file content hash). Removed the ARG
   declaration from Dockerfile and the three `build-args:` passes from
   prerelease-docker.yml.

R6 also REFUTED two earlier claims:
- R5A8's concurrency claim: reusable workflows DO share the caller's
  `workflow_run` and concurrency group (R3A8 was correct). Don't add a
  `concurrency:` block to prerelease-docker.yml — would create a
  separate group and re-introduce the race R5A8 imagined.
- R5A10's harden-runner CVE claim: v2.19.1 (used here) is well after
  the fix versions for both CVE-2026-32946 (v2.16.0) and CVE-2026-25598
  (v2.14.2). No bump needed.

* ci(release): R7 fixes — cosign v2.6.3, drop misleading manifest-level SBOM

Round 7 (5 agents) verified the R5/R6 fixes and surfaced two real bugs:

1. **cosign-installer pinned cosign v2.6.0**, which has two known security
   advisories: GHSA-whqx-f9j3-ch6m (fixed in v2.6.2) and GHSA-w6c6-c85g-mmv6
   (fixed in v2.6.3). Bumped pin to v2.6.3 in all three workflow files so
   the install step picks up the fixes. Same minor (v2.6.x), so no flag
   drift — `--replace`, `--type`, `--bundle`, `--certificate-*` all behave
   identically.

2. **The manifest-level SBOM attestation was misleading**. The previous
   step ran `syft <repo>@<manifest-list-digest>` on an amd64 runner,
   which (per anchore/syft#1708) only enumerates amd64 layers. The SBOM
   was then attested at the manifest-list digest where it was discoverable
   by ALL platform consumers — so an arm64 user verifying `:latest` would
   receive a signed SBOM that lies about the layers they actually pulled.
   The per-arch loop already produces accurate per-platform SBOMs; the
   manifest-level fallback only re-introduced the lie for UX convenience.

   Dropped the manifest-level attest call entirely. Per-arch SBOMs are the
   only honest representation. Updated the README's `cosign
   verify-attestation` recipe to resolve to the per-platform digest first
   (using `jq` over `imagetools inspect --raw`), so end-users on either
   architecture get the SBOM that actually describes what they pulled.
   Removed `sbom.spdx.json` from the workflow artifact + release-staging
   logic since it no longer exists.

3. **Empty-loop assertion**: added a defensive count check before the
   per-arch SBOM loop. If a future buildx output change ever produced
   zero per-arch entries (e.g., all entries marked architecture: unknown),
   the previous code would silently skip the loop and pass CI green with
   no SBOMs. Now it fails loud with the raw manifest dumped for debugging.

Note on round-7 reviewer's other concerns:
- "Pipe-to-while subshell scope": confirmed safe. set -euo pipefail
  inherited; failures in syft/cosign attest abort the subshell, and
  pipefail propagates to the outer step.
- "imagetools inspect --raw stability": OCI image-index spec is stable
  for ~7 years. The jq filter handles the BuildKit attestation pseudo-
  entries via `architecture != "unknown"`.
- "harden-runner v2.19.1 CVEs": false alarm. v2.19.1 is well above the
  fix versions (v2.16.0, v2.14.2). No bump needed.

* ci(release): R8 fixes from 8th review round

Round 8 (5 agents covering Dockerfile, npm/Vite, runtime image, edge
cases, and post-fix smoke check) surfaced 7 real bugs the previous 7
rounds missed. All fixed here, plus a comment per user request.

1. **docker-publish.yml checkout pinned to released tag**. The promote
   step reads `.trivyignore` from cwd; a `repository_dispatch`-triggered
   checkout defaults to the default branch's tip, which can drift between
   prerelease scan and promote scan if `.trivyignore` is edited on main
   while the release awaits approval. Added `ref: ${{
   github.event.client_payload.tag }}` to checkout.

2. **docker-publish.yml concurrency block added**. release.yml has its
   own concurrency, but docker-publish.yml is a separate workflow run.
   Two near-simultaneous publish-docker dispatches for the same release
   tag (e.g., a manual re-trigger after a transient Docker Hub 5xx) could
   interleave and have their cleanup-loop prefix-match deletions race
   each other. Group: `publish-docker-${{ github.event.client_payload.tag
   }}`, cancel-in-progress: false.

3. **publish.yml's frontend builder bumped from Node 20 → 24** to match
   `package.json`'s `engines: { node: ">=24.0.0" }`. Mismatched Node
   versions across the PyPI build (Node 20) and the Docker image (Node
   24, installed via NodeSource) could resolve transitive deps differently
   and ship frontend assets that fail at runtime. Pinned to specific
   `node:24-alpine` SHA.

4. **HEALTHCHECK no longer leaks Python processes**. The old
   `urllib.request.urlopen(...)` had no Python-level timeout, so a
   hung-but-alive backend would freeze the probe until Docker's outer
   timeout SIGKILL'd it — leaving a Python process per probe interval
   leaking PIDs/FDs over time. Added `timeout=5` and an explicit `r.status
   == 200` check so non-200 2xx responses (e.g., from misconfigured
   proxies) don't pass.

5. **Removed broken `VOLUME /scripts/`**. /scripts is image content (the
   ollama entrypoint baked in by the layer below the VOLUME directive),
   not user state. A VOLUME on an image-populated path causes anonymous-
   volume accumulation on every `docker run` and silently shadows the
   script if a user ever bind-mounts it.

6. **Added `VOLUME /data`** so users who don't bind-mount don't silently
   lose research data + encrypted DBs on `docker rm`. The entrypoint
   creates the persistent state at /data/{logs,cache,encrypted_databases},
   but without VOLUME the directory is part of the writable image layer.

7. **Stale comment in release.yml** (the SBOM download step) updated —
   no longer mentions the manifest-level SBOM that was dropped in
   commit 33d69b4e4.

Plus one comment update per user request:
8. **`apt-get upgrade -y` rationale comment** added at the
   build-once-promote section of the Dockerfile (top stage), and
   cross-referenced from the other two `apt-get upgrade` sites
   (ldr-test stage and runtime stage). Documents that the trade-off of
   bit-for-bit reproducibility for always-fresh CVE patches is
   intentional, and explains how build-once-promote mitigates the
   reproducibility loss.

* ci(release): clean up per-arch cosign attestation orphans on rejection

Round 9 found that the per-arch SBOM attestations introduced in commit
11e702f7d (the multi-arch SBOM fix) live at
`sha256-<per-arch-digest>.{sig,att,sbom}` keyed by the PER-ARCH manifest
digests, not the manifest-list digest. The cleanup-on-rejection job only
knew the manifest-list digest, so on rejection paths the per-arch
attestation artifacts were left orphaned on Docker Hub forever — and
unreachable through any tag, since the per-arch leaf tags were also
deleted.

Fix: before deleting the manifest tag, inspect it via `imagetools inspect
--raw` to discover the per-arch digests, then queue per-arch
`{sig,att,sbom}` deletions alongside the manifest-level cleanup. If the
manifest tag doesn't exist (e.g., build failed before manifest creation),
log a clear warning and proceed — the per-arch artifacts wouldn't have
been created in that case anyway.

* ci(release): drop prerelease env gate — use single release approval

The `prerelease` environment approval was a holdover from when prerelease
docker was a SEPARATE test build alongside the release build (two
distinct artifacts, two distinct decisions). In the build-once-promote
model the "prerelease" image IS the release image (just under a
different tag), so gating the BUILD with a human approval is redundant —
the only meaningful decision is whether the tested image becomes the
official release.

Changes:
- Remove the `approval-gate` sentinel job in prerelease-docker.yml.
- Drop `needs: [approval-gate]` from build-amd64, build-arm64, and
  security-scan. They now run automatically once release.yml's security
  + CI gates pass.
- Update workflow comments in release.yml and prerelease-docker.yml to
  reflect the single-gate flow.
- Update RELEASE_GUIDE.md "Approval and Publishing" section: now
  describes ONE `release` env approval, not two.
- Update CI_CD_INFRASTRUCTURE.md row for prerelease-docker.yml.

The cleanup-on-rejection job is unchanged — its triggers still fire
correctly on prerelease-docker `failure`/`cancelled` (build/sign/attest
errors) and on create-release / trigger-workflows `failure`/`cancelled`
(release env rejection). One fewer rejection path to consider, but the
mechanism is the same.

Operational benefits:
- One fewer approval click per release
- One fewer GitHub Environment to create as a pre-merge setup step
  (no more "create the `prerelease` env in Settings before merging")
- Build completes during/after security gates, so the prerelease tag is
  ready by the time the maintainer is ready to test

* ci(docker-publish): group GITHUB_OUTPUT writes (shellcheck SC2129)

CI's actionlint hook (which runs shellcheck on workflow run blocks)
flagged the 'Determine release tags' step for issuing five sequential
`echo ... >> "$GITHUB_OUTPUT"` redirects. Grouped them into a single
braced block + one redirect, per SC2129's recommendation.

* docs(release): correct approval flow after env-scoped secrets merge

After merging main, prerelease-docker.yml's four jobs declare
`environment: release` (PRs #3978/#3983) because DOCKER_USERNAME and
DOCKER_PASSWORD are env-scoped. That means the first `release` env
approval now gates the canonical build, not just the publish step —
the "automatic build then test then approve" flow described in earlier
docs no longer matches reality.

- RELEASE_GUIDE.md: rewrite the approval section to describe two
  release-env approvals (release.yml + docker-publish.yml) and the
  narrow Docker-only test window between them.
- CI_CD_INFRASTRUCTURE.md: update the prerelease-docker.yml row.
- release.yml: rewrite the `prerelease-docker:` job comment to reflect
  that this step is gated, not automatic, and explain why.

* ci(release): atomic publish ordering — GitHub Release runs last (#4044)

* ci(release): make GitHub Release publishing atomic with Docker + PyPI

Before this change, `create-release` published the public GitHub Release
BEFORE `docker-publish.yml` retagged and BEFORE `publish.yml` shipped to
PyPI. If either downstream failed, the public Release pointed at
non-existent artifacts.

This change closes that window:

- Convert `docker-publish.yml` from `repository_dispatch` to
  `workflow_call`. Its result is now visible to release.yml as
  `needs.publish-docker.result`, which lets:
  * `create-release` block on Docker promote success
  * `cleanup-on-rejection` safely scope cosign artifact deletion to
    cases where retag failed (after a successful retag, release tags
    share the prerelease manifest digest, so cosign artifacts must
    stay — deleting them would invalidate release-tag verification)
- Keep `publish.yml` on `repository_dispatch`. PyPI Trusted Publishing
  matches the OIDC `workflow_ref` claim against the CALLER when invoked
  via `workflow_call`, so a reusable publish.yml would fail with
  `invalid-publisher`. Tracked in pypa/gh-action-pypi-publish#166 and
  pypi/warehouse#11096.
- Restructure release.yml job graph:
    prerelease-docker → publish-docker (reusable) → trigger-pypi
      → monitor-pypi → create-release (LAST)
- Rewrite `cleanup-on-rejection` with a partial-retag rollback preamble.
  `imagetools create -t :VERSION -t :MAJOR_MINOR -t :latest` is a single
  process with multiple registry calls, so a mid-step failure can leave
  some release tags landed. The cleanup script now checks each release
  tag against Docker Hub and rolls back any that exist BEFORE deleting
  cosign signature/attestation artifacts.
- Slim `monitor-publish` → `monitor-pypi` (only watches publish.yml now;
  Docker is tracked natively via the inline job result).
- Drop the workflow-level `concurrency:` block from docker-publish.yml.
  As a reusable workflow it shares release.yml's run, and release.yml's
  caller-level concurrency on `github.ref` already serialises releases
  for the same tag.
- Update `docs/CI_CD_INFRASTRUCTURE.md` workflow-table rows and
  `docs/RELEASE_GUIDE.md` approval-flow section to describe the new
  ordering, plus a "Recovery from PyPI failure" section documenting the
  one remaining atomicity hole (PyPI fails after Docker success — Docker
  release tags exist, no PyPI, no GH Release; manual re-dispatch needed).

Plan + 5-agent Round 1 review notes saved separately.

* fix(release): plug blockers found in multi-round PR review

Four fixes against the atomicity refactor — two blockers that would
break the next release, two hardening items found while verifying them.

B1 (BLOCKING): docker-publish.yml checked out at `ref: inputs.tag`
(e.g. v1.6.11), but the v* git tag is created by `create-release`
which runs LAST in the job graph — after `publish-docker`. So on every
push-to-main triggered release (the documented primary path) the
checkout would fail with `fatal: couldn't find remote ref v1.6.11`.
Switch to `ref: github.sha`: same triggering commit the build and
prerelease-docker jobs used, exists at the moment publish-docker
runs for every event type, and still satisfies the original goal
of pinning .trivyignore to the scanned commit.

B2 (BLOCKING): cleanup-on-rejection referenced env-scoped
DOCKER_USERNAME / DOCKER_PASSWORD but had no `environment: release`,
so those secrets resolved to empty strings and the Docker Hub login
exited 1 — leaving the orphan tags + cosign artifacts the cleanup
was meant to remove. Add `environment: release`. The `release` env
approval was already granted upstream in the run, so no new prompt.

H1: monitor-pypi's `Wait for PyPI publish workflow to complete` step
piped `gh run list | jq ...` without `set -euo pipefail`, so a
transient gh failure (network, auth, rate limit) was swallowed by
jq returning empty input — burning the full 40-minute budget on
silent error rather than failing fast. Add `set -euo pipefail`.

H2: cleanup-on-rejection's step 2 did not delete the floating
`:prerelease` tag. If a release was rejected after prerelease-docker
re-pointed `:prerelease`, step 4 deleted the cosign signature for
that manifest while `:prerelease` still pointed at it — yielding a
window where pulling `:prerelease` returns an image the README
cosign-verify recipe cannot verify. Include `prerelease` in step 2's
delete loop; the next successful prerelease-docker re-creates it.

* chore(release): follow-up cleanups from PR review

Bundle of low-risk follow-ups from the multi-round review of this PR.
All same-scope as the atomicity refactor — staleness this PR introduced
in docs/comments, hardening adjacent to the changed code paths.

L1 (hardening): Drop `id-token: write` from `publish-docker` (caller)
and `docker-publish.yml` `promote` (callee). cosign VERIFY is a
read-only check against public Rekor/Fulcio; no GitHub OIDC token is
minted, so the permission is unused. Signing (which DOES need the
write) is exclusively in prerelease-docker.yml.

L7 (stale comments): prerelease-docker.yml's header comments still
referenced `trigger-workflows` — a job this PR split into
`publish-docker` + `trigger-pypi`. Replaced both occurrences.

L4 (doc): RELEASE_GUIDE.md "Emergency Procedures" claimed a manual
GitHub release "still triggers PyPI/Docker" — false under the new
design (publish.yml is repository_dispatch-only and docker-publish.yml
is workflow_call-only, neither listens on `release:` events). Replaced
with the actual recovery hierarchy.

L5 (doc): RELEASE_GUIDE.md and CI_CD_INFRASTRUCTURE.md pipeline chains
omitted the `provenance` job between `build` and `prerelease-docker`.

L6 (doc): RELEASE_GUIDE.md described monitor-pypi's timeout as a flat
"40 min" — the inner poll loop is 40 min but the surrounding
`timeout-minutes:` is 90 min, so the user-facing failure surface differs.

L4-bonus (doc): Manual-trigger section also claimed workflow_dispatch
takes "version and prerelease flag" inputs — release.yml's
`workflow_dispatch:` has no inputs defined. Replaced with the actual
behavior (reads __version__.py at HEAD; use tag-push for older versions).

M5 (doc): Both PAT_TOKEN comments overstated required scopes — claimed
`workflow` scope was needed (it isn't; it only governs editing
.github/workflows/ via the API) and didn't make explicit that
`public_repo` is rejected by `repository_dispatch`. Rewritten.

M8 (correctness): docker-publish.yml's cosign verify step targeted the
mutable `:VERSION` tag instead of `@${EXPECTED_DIGEST}`. The preceding
verify-promoted-tags step already confirms the tag resolves to the
expected digest, but using the tag here leaves a tag-resolution TOCTOU
window between the two steps. Trivy's re-scan already uses
`@${EXPECTED_DIGEST}`; switching cosign to the same reference is
consistent and races-free.

L2 (style): While editing the cosign step, routed `github.repository`
through an `env:` var (`REPO`) instead of direct `${{ }}` template
interpolation into shell args, matching the convention in the rest of
this workflow.

* chore(ci): bump harden-runner pin in docker-publish.yml to match other workflows

Last remaining v2.19.1 reference — every other workflow in this PR was
bumped to v2.19.3 when main moved forward. Auto-merge missed this one
because the surrounding hunk was in a conflict region.

* chore(release): fixes from multi-round subagent review of the full PR

Bundle of low-risk fixes confirmed by 30 subagents across 3 rounds.
None are blockers; all are worth fixing in-scope.

1. SLSA provenance builder.id: was github.workflow_ref, which inside a
   workflow_call callee resolves to the CALLER (release.yml), not the
   intended callee (prerelease-docker.yml). The Fulcio cert is still
   right (built from the job_workflow_ref OIDC claim), so cosign verify
   and slsa-verifier are unaffected, but raw-JSON consumers reading
   builder.id would see release.yml. Compose the value from
   github.repository + hardcoded path + github.ref instead — the `job`
   context has no workflow_ref property (actionlint confirms), and for
   a local-path workflow_call the callee's ref equals github.ref.

2. Dockerfile: set ENV LDR_DATA_DIR=/data so the VOLUME /data directive
   is actually load-bearing. Without it, paths.py falls back to
   platformdirs (~/.local/share/local-deep-research) which is inside the
   ephemeral container layer — bare docker run -v vol:/data users would
   silently lose data on docker rm.

3. trigger-pypi: forward prerelease=false in client_payload. publish.yml
   gates Test PyPI vs prod PyPI on client_payload.prerelease == true; if
   absent, the expression evaluates to '' and falls through to prod. Set
   false explicitly to remove the silent-fallback landmine.

4. Stale/misleading cosign comments in release.yml:
   - line 322: said "v2.6.0" while value is "v2.6.3" — corrected and
     noted GHSA-w6c6-c85g-mmv6 patch coverage
   - line 332: attributed --bundle to v3.0.2+ but it's been in v2.4.0+

5. release-gate.yml Node 20 → 24 (mirror publish.yml + Dockerfile).
   package.json declares engines.node >=24.0.0. The pip-install-check
   wheel is discarded so this was not a release-blocker, but the gate
   now validates the actual ship runtime.

6. README cosign-verify recipe:
   - Guard empty PLATFORM_DIGEST with a clear message for single-arch
     or pre-build-once-promote releases
   - Add docker buildx to prerequisites list
   - Spell out the legacy-verification substitution explicitly

* fix(ci): pin Trivy in promote step via SHA-pinned action wrapper

AI reviewer flagged docker-publish.yml's promote step as installing Trivy
via `sudo apt-get install -y trivy` with no version pin, reintroducing a
supply-chain risk to the release path. The prerelease scan in
prerelease-docker.yml uses the SHA-pinned aquasecurity/trivy-action
@ed142fd... wrapper with `version: 'v0.69.2'`, but the promote step
switched to the bare CLI and lost that protection.

Replace the apt-get install + raw `trivy image` invocation with the same
pinned action wrapper. Same scan semantics (CRITICAL,HIGH, ignore-unfixed,
.trivyignore, exit-code 1), same binary version (v0.69.2), same action
SHA — keeps the two scans consistent and removes the unpinned apt path.

* fix(ci): pin Trivy in release.yml build job — same fix as docker-publish.yml

R4 review caught that the AI-reviewer-flagged unpinned Trivy install also
exists in release.yml's `build` job, and is STRICTLY WORSE there because
that job carries `id-token: write` (for cosign keyless signing of SBOMs).

The attack chain that was open:
1. Aqua apt-repo compromise OR MITM of the unpinned GPG-key fetch
2. Malicious `trivy fs` binary installed
3. Binary exfiltrates ACTIONS_ID_TOKEN_REQUEST_URL/TOKEN env vars,
   minting an OIDC token under repo:LearningCircuit/local-deep-research
4. Binary tampers with sbom-spdx.json / sbom-cyclonedx.json contents
5. Next step `Sign release artifacts with Sigstore` cosign-signs the
   tampered SBOM with a legitimate Sigstore cert → fraudulent SBOM
   attached to the GitHub release with valid signature

Replace with the SHA-pinned aquasecurity/trivy-action@ed142fd0... (same
pin as docker-publish.yml and prerelease-docker.yml) using scan-type=fs
for the filesystem scan, with `version: 'v0.69.2'` to pin the binary
itself. Two separate action invocations (one per output format) because
the action takes a single format per run.

Also removes the unpinned `gpg --dearmor` of an unverified-fingerprint
public key, which the prior comment misleadingly called "secure".

* fix(ci): use TRIVY_USERNAME/PASSWORD env vars for trivy-action auth

The trivy-action README prescribes TRIVY_USERNAME/TRIVY_PASSWORD env
vars as the supported Docker Hub auth path. Even though docker/login-
action already wrote ~/.docker/config.json earlier in the job (and Trivy
reads it as a fallback), there's documented fragility with docker.io
credential helpers (aquasecurity/trivy#432, aquasecurity/trivy#8385)
that surfaces specifically on registry-pull scans like this one (unlike
the prerelease scan which uses a locally-loaded image).

The fallback would probably work today since localdeepresearch/
local-deep-research is public — anonymous pull would succeed even
without auth — but rate-limiting on anonymous Docker Hub pulls is
aggressive and the documented credential-helper quirks are real. Adding
the env vars uses the action's prescribed auth path, with the same
DOCKER_USERNAME/DOCKER_PASSWORD secrets already passed in via
workflow_call. Zero-cost defense-in-depth.
This commit is contained in:
LearningCircuit
2026-05-22 21:52:46 +02:00
committed by GitHub
parent 01a5b81a24
commit 1f0b0a4a95
9 changed files with 1185 additions and 573 deletions

View File

@@ -1,24 +1,69 @@
name: Publish Docker image
# SECURITY: Only triggered via repository_dispatch from release.yml
# (after security gate passes). We intentionally do NOT support
# workflow_dispatch because that would bypass security checks.
# SECURITY: Only invoked as a reusable workflow (`workflow_call`) from
# release.yml after the release security gate AND the `release` environment
# approval pass. We intentionally do NOT support workflow_dispatch — that
# would bypass gates. We also intentionally do NOT support
# repository_dispatch any more: with the atomicity refactor, this workflow
# runs as a job inside release.yml's run so its result is visible to
# downstream jobs (create-release, cleanup-on-rejection) — a property that
# repository_dispatch fanout broke.
#
# This workflow is a thin RETAG step in the build-once-promote pipeline.
# The actual build happens in prerelease-docker.yml; the multi-arch manifest
# is signed and attested there. Here we only:
# - Verify the source manifest digest matches what prerelease produced
# - Retag the prerelease manifest to release tags (:1.6.9, :1.6, :latest)
# - Verify the digest is preserved (defends against imagetools re-encoding)
# - Re-run Trivy against the digest (catches CVE-database updates between
# prerelease build and release promote)
# - Verify cosign signature transitivity from the original digest
# - Clean up the prerelease tags
#
# To re-publish, trigger a new release through release.yml.
on:
repository_dispatch:
types: [publish-docker]
workflow_call:
inputs:
tag:
description: "Release tag, e.g. 'v1.6.9' (with leading 'v')"
type: string
required: true
source_tag:
description: "Prerelease manifest tag to retag, e.g. 'prerelease-v1.6.9-abc1234'"
type: string
required: true
expected_digest:
description: "sha256:... digest of the prerelease manifest, captured by prerelease-docker.yml. Used to verify retag preserves the digest end-to-end."
type: string
required: true
secrets:
DOCKER_USERNAME:
required: true
description: "Docker Hub username (env-scoped to `release`)"
DOCKER_PASSWORD:
required: true
description: "Docker Hub PAT with Read+Write+Delete scopes (env-scoped to `release`)"
permissions: {} # Minimal top-level for OSSF Scorecard Token-Permissions
# NOTE: no workflow-level `concurrency:` block. As a reusable workflow
# called from release.yml, this workflow runs as part of the caller's run,
# and release.yml's caller-level concurrency (keyed on github.workflow +
# github.ref) already serialises release runs for the same tag. Adding a
# callee-level block would be a no-op for the documented threat model
# (two simultaneous releases for the same ref) and would not be reachable
# anyway because there is no longer any standalone invocation path.
jobs:
build-amd64:
name: Build AMD64 Image
promote:
name: Retag prerelease manifest as release
runs-on: ubuntu-latest
environment: release
permissions:
# cosign verify is read-only against public Rekor/Fulcio — does not
# mint a GitHub OIDC token, so id-token: write is not required here.
# Signing (which does need id-token: write) happens in prerelease-docker.yml.
contents: read
outputs:
digest: ${{ steps.build.outputs.digest }}
steps:
- name: Harden Runner
@@ -26,172 +71,19 @@ jobs:
with:
egress-policy: audit
- name: Check out the repo
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
fetch-depth: 0
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push AMD64 image
id: build
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
platforms: linux/amd64
push: true
tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:amd64-${{ github.sha }}
cache-from: type=gha,scope=linux-amd64
cache-to: type=gha,mode=max,scope=linux-amd64
build-args: |
DEPS_HASH=${{ hashFiles('pdm.lock') }}
build-arm64:
name: Build ARM64 Image
runs-on: ubuntu-24.04-arm
environment: release
permissions:
contents: read
outputs:
digest: ${{ steps.build.outputs.digest }}
steps:
- name: Harden Runner
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: audit
- name: Check out the repo
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
fetch-depth: 0
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push ARM64 image
id: build
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
platforms: linux/arm64
push: true
tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:arm64-${{ github.sha }}
cache-from: type=gha,scope=linux-arm64
cache-to: type=gha,mode=max,scope=linux-arm64
build-args: |
DEPS_HASH=${{ hashFiles('pdm.lock') }}
security-scan:
name: Security Scan
runs-on: ubuntu-latest
environment: release
permissions:
contents: read
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: audit
- name: Check out the repo
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
fetch-depth: 0
- name: Free disk space
run: |
# Remove unnecessary packages to free up space for Docker build + Trivy scan
# GitHub runners have limited space; Trivy needs to export the full image to /tmp
sudo rm -rf /usr/share/dotnet || true
sudo rm -rf /usr/local/lib/android || true
sudo rm -rf /opt/ghc || true
sudo rm -rf /opt/hostedtoolcache/CodeQL || true
sudo docker image prune --all --force || true
df -h
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Build Docker image for security scan
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
platforms: linux/amd64
push: false
load: true
tags: local-deep-research:security-scan
cache-from: type=gha,scope=linux-amd64
cache-to: type=gha,mode=max,scope=linux-amd64
build-args: |
DEPS_HASH=${{ hashFiles('pdm.lock') }}
# Generate SARIF report for GitHub Security tab (all severities, doesn't fail)
- name: Generate Trivy SARIF report
uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
with:
image-ref: local-deep-research:security-scan
format: 'sarif'
output: 'trivy-release-scan.sarif'
ignore-unfixed: true
exit-code: '0'
version: 'v0.69.2'
# Separate scan that fails build only on fixable HIGH/CRITICAL vulnerabilities
- name: Check for fixable HIGH/CRITICAL vulnerabilities
uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
with:
image-ref: local-deep-research:security-scan
severity: 'CRITICAL,HIGH'
ignore-unfixed: true # Only fail on vulnerabilities with available fixes
trivyignores: '.trivyignore' # Ignore bundled library CVEs that can't be fixed
exit-code: '1'
version: 'v0.69.2'
- name: Upload Trivy scan results from release
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
if: always()
with:
name: trivy-release-scan
path: trivy-release-scan.sarif
retention-days: 7 # Reduced for security
create-manifest:
name: Create Multi-Platform Manifest
needs: [build-amd64, build-arm64, security-scan]
runs-on: ubuntu-latest
environment: release
permissions:
contents: read
id-token: write # Required for Sigstore keyless signing
packages: write
steps:
- name: Harden Runner
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: audit
- name: Check out the repo
- name: Check out the repo at the triggering commit
# Pin the checkout to the EXACT commit that the prerelease was
# built/scanned from, so .trivyignore (and any other repo-state-
# dependent file the promote step reads) matches that commit.
# We use github.sha, NOT inputs.tag — the v* git tag is created
# by create-release LATER in this run (after publish-docker
# completes), so it does not exist yet when this checkout runs
# on a push-to-main trigger. github.sha is the triggering commit
# for every event type (push to main, tag push, workflow_dispatch)
# and is the same SHA the build/prerelease-docker jobs used.
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ github.sha }}
persist-credentials: false
fetch-depth: 0
@@ -206,192 +98,158 @@ jobs:
- name: Install Cosign
uses: sigstore/cosign-installer@6f9f17788090df1f26f669e9d70d6ae9567deba6 # v4.1.2
with:
# Pin to cosign v2.x to match the version that signed the artifact
# in prerelease-docker.yml. Mismatched versions across sign/verify
# work today but new-bundle-format (cosign v3 default) would only
# produce/consume on v3.
cosign-release: 'v2.6.3'
- name: Install Syft
uses: anchore/sbom-action/download-syft@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
- name: Determine version tag
- name: Determine release tags
id: version
env:
EVENT_NAME: ${{ github.event_name }}
DISPATCH_TAG: ${{ github.event.client_payload.tag }}
RELEASE_TAG: ${{ github.event.release.tag_name }}
DISPATCH_TAG: ${{ inputs.tag }}
SOURCE_TAG: ${{ inputs.source_tag }}
EXPECTED_DIGEST: ${{ inputs.expected_digest }}
run: |
set -euo pipefail
if [ "$EVENT_NAME" = "repository_dispatch" ]; then
TAG="$DISPATCH_TAG"
else
TAG="$RELEASE_TAG"
if [[ -z "$DISPATCH_TAG" || -z "$SOURCE_TAG" || -z "$EXPECTED_DIGEST" ]]; then
echo "::error::Missing required workflow_call input. Got tag='${DISPATCH_TAG}' source_tag='${SOURCE_TAG}' expected_digest='${EXPECTED_DIGEST}'"
exit 1
fi
echo "tag=$TAG" >> "$GITHUB_OUTPUT"
# Extract version without 'v' prefix
VERSION="${TAG#v}"
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
# Extract major.minor
if [[ "$EXPECTED_DIGEST" != sha256:* ]]; then
echo "::error::expected_digest must be of the form sha256:... — got '${EXPECTED_DIGEST}'"
exit 1
fi
VERSION="${DISPATCH_TAG#v}"
MAJOR_MINOR=$(echo "$VERSION" | cut -d. -f1,2)
echo "major_minor=$MAJOR_MINOR" >> "$GITHUB_OUTPUT"
- name: Extract metadata for Docker
id: meta
uses: docker/metadata-action@030e881283bb7a6894de51c315a6bfe6a94e05cf # v6.0.0
with:
images: ${{ secrets.DOCKER_USERNAME }}/local-deep-research
tags: |
type=raw,value=${{ steps.version.outputs.version }}
type=raw,value=${{ steps.version.outputs.major_minor }}
type=raw,value=latest
- name: Create and push multi-platform manifest
id: manifest
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
META_TAGS: ${{ steps.meta.outputs.tags }}
run: |
set -euo pipefail
# Get the tags from metadata (newline separated)
TAGS="$META_TAGS"
# Store first tag for attestation
FIRST_TAG=$(echo "$TAGS" | head -n 1)
echo "primary_tag=$FIRST_TAG" >> "$GITHUB_OUTPUT"
# Create manifest for each tag
while IFS= read -r tag; do
if [ -n "$tag" ]; then
echo "Creating manifest for: $tag"
docker buildx imagetools create -t "$tag" \
"${DOCKER_USERNAME}/local-deep-research:amd64-${{ github.sha }}" \
"${DOCKER_USERNAME}/local-deep-research:arm64-${{ github.sha }}"
fi
done <<< "$TAGS"
# Get the manifest digest for the primary tag (used for signing)
DIGEST=$(docker buildx imagetools inspect "$FIRST_TAG" --format '{{json .Manifest.Digest}}' | tr -d '"')
echo "digest=$DIGEST" >> "$GITHUB_OUTPUT"
echo "Manifest digest: $DIGEST"
- name: Sign Docker images with Cosign
env:
DIGEST: ${{ steps.manifest.outputs.digest }}
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
run: |
set -euo pipefail
# Sign by digest for reliability with manifest lists
# All tags point to the same manifest, so we sign once by digest
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
echo "Signing image by digest: $IMAGE_REF"
cosign sign --yes "$IMAGE_REF"
# Brief sleep to allow registry to propagate signature
echo "Waiting for signature propagation..."
sleep 5
- name: Generate SLSA provenance attestation
env:
DIGEST: ${{ steps.manifest.outputs.digest }}
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
run: |
set -euo pipefail
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
# Generate SLSA provenance predicate
# Note: SLSA spec requires "sha1" field name for git commit digest
cat > provenance.json <<EOF
# Group writes into one redirect (shellcheck SC2129).
{
"buildType": "https://github.com/${{ github.repository }}/docker-build@v1",
"builder": {
"id": "https://github.com/actions/runner"
},
"invocation": {
"configSource": {
"uri": "https://github.com/${{ github.repository }}",
"digest": {
"sha1": "${{ github.sha }}"
},
"entryPoint": ".github/workflows/docker-publish.yml"
}
},
"metadata": {
"buildInvocationId": "${{ github.run_id }}",
"buildStartedOn": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"completeness": {
"parameters": true,
"environment": true,
"materials": true
},
"reproducible": false
},
"materials": [
{
"uri": "https://github.com/${{ github.repository }}",
"digest": {
"sha1": "${{ github.sha }}"
}
}
]
}
EOF
echo "tag=${DISPATCH_TAG}"
echo "version=${VERSION}"
echo "major_minor=${MAJOR_MINOR}"
echo "source_tag=${SOURCE_TAG}"
echo "expected_digest=${EXPECTED_DIGEST}"
} >> "$GITHUB_OUTPUT"
# Attach provenance to image by digest
cosign attest --yes --predicate provenance.json --type slsaprovenance "$IMAGE_REF"
- name: Verify image signature
- name: Verify source digest matches expected
env:
DIGEST: ${{ steps.manifest.outputs.digest }}
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
SOURCE_TAG: ${{ steps.version.outputs.source_tag }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
set -euo pipefail
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
echo "Verifying signature for: $IMAGE_REF"
# Defends against the prerelease tag being swapped between
# prerelease-docker's signing and this promote step.
SOURCE="${DOCKER_USERNAME}/local-deep-research:${SOURCE_TAG}"
ACTUAL=$(docker buildx imagetools inspect "$SOURCE" --format '{{json .Manifest.Digest}}' | tr -d '"')
if [[ "$ACTUAL" != "$EXPECTED_DIGEST" ]]; then
echo "::error::Source digest mismatch — possible tag tampering between prerelease and promote"
echo " expected: $EXPECTED_DIGEST"
echo " actual: $ACTUAL"
exit 1
fi
echo "Source digest verified: $ACTUAL"
# Retry logic to handle registry propagation delay after signing
MAX_RETRIES=5
RETRY_DELAY=10
for i in $(seq 1 "$MAX_RETRIES"); do
echo "Verification attempt $i of $MAX_RETRIES..."
if cosign verify \
--certificate-identity-regexp="https://github.com/${{ github.repository }}" \
--certificate-oidc-issuer=https://token.actions.githubusercontent.com \
"$IMAGE_REF"; then
echo "Signature verification successful!"
exit 0
fi
if [ "$i" -lt "$MAX_RETRIES" ]; then
echo "Verification failed, waiting ${RETRY_DELAY}s before retry..."
sleep "$RETRY_DELAY"
- name: Promote (retag) prerelease manifest to release tags
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
SOURCE_TAG: ${{ steps.version.outputs.source_tag }}
VERSION: ${{ steps.version.outputs.version }}
MAJOR_MINOR: ${{ steps.version.outputs.major_minor }}
run: |
set -euo pipefail
SOURCE="${DOCKER_USERNAME}/local-deep-research:${SOURCE_TAG}"
# Single imagetools create with multiple -t — registry-side
# metadata-only operation, takes seconds, preserves digest.
docker buildx imagetools create \
-t "${DOCKER_USERNAME}/local-deep-research:${VERSION}" \
-t "${DOCKER_USERNAME}/local-deep-research:${MAJOR_MINOR}" \
-t "${DOCKER_USERNAME}/local-deep-research:latest" \
"$SOURCE"
echo "Promoted ${SOURCE} to :${VERSION}, :${MAJOR_MINOR}, :latest"
- name: Verify promoted tags share the source digest
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
VERSION: ${{ steps.version.outputs.version }}
MAJOR_MINOR: ${{ steps.version.outputs.major_minor }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
set -euo pipefail
# Defends against `imagetools create` re-encoding the manifest.
# If digests diverge, signatures and attestations (keyed by the
# original digest) won't be discoverable from the new tags.
for TAG in "${VERSION}" "${MAJOR_MINOR}" "latest"; do
REF="${DOCKER_USERNAME}/local-deep-research:${TAG}"
ACTUAL=$(docker buildx imagetools inspect "$REF" --format '{{json .Manifest.Digest}}' | tr -d '"')
if [[ "$ACTUAL" != "$EXPECTED_DIGEST" ]]; then
echo "::error::Digest mismatch on ${TAG} — imagetools create may have re-encoded the manifest"
echo " expected: $EXPECTED_DIGEST"
echo " actual: $ACTUAL"
exit 1
fi
echo "${TAG} -> ${ACTUAL} ✓"
done
echo "Signature verification failed after $MAX_RETRIES attempts"
exit 1
- name: Attach SBOM to image
# Catches CVE-database updates that landed between the prerelease
# build and this promote step. Use the SHA-pinned action wrapper
# (same pin as prerelease-docker.yml's security-scan) with an
# explicit binary version pin — the prior `apt-get install -y trivy`
# approach was unpinned and exposed the release path to the Trivy
# apt-repo supply chain. The pinned action downloads the v0.69.2
# binary from GitHub releases by exact tag, which is the same
# binary the prerelease scan validated, keeping the two scans
# consistent.
#
# Unlike prerelease-docker.yml's security-scan (which scans a
# locally-loaded image, no registry pull), this step scans by
# registry digest — Trivy must pull the manifest + layers from
# Docker Hub. TRIVY_USERNAME/TRIVY_PASSWORD is the action's
# documented auth path; the `docker/login-action` above also
# writes ~/.docker/config.json which Trivy reads as a fallback,
# but the explicit env vars are more reliable (Trivy has
# documented docker.io credential-helper quirks — aquasecurity/
# trivy#432, aquasecurity/trivy#8385) and the image we scan IS on
# Docker Hub so this is the path most likely to keep working.
- name: Re-scan release digest with Trivy
uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
env:
TRIVY_USERNAME: ${{ secrets.DOCKER_USERNAME }}
TRIVY_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
with:
image-ref: ${{ secrets.DOCKER_USERNAME }}/local-deep-research@${{ steps.version.outputs.expected_digest }}
severity: 'CRITICAL,HIGH'
ignore-unfixed: true
trivyignores: '.trivyignore'
exit-code: '1'
version: 'v0.69.2'
- name: Verify cosign signature on promoted digest
env:
DIGEST: ${{ steps.manifest.outputs.digest }}
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
REPO: ${{ github.repository }}
run: |
set -euo pipefail
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
# Generate SBOM using syft
docker pull "$IMAGE_REF"
syft "$IMAGE_REF" -o spdx-json > sbom.spdx.json
# Attach SBOM to image by digest
cosign attach sbom --sbom sbom.spdx.json "$IMAGE_REF"
# Sign the SBOM
cosign sign --yes --attachment sbom "$IMAGE_REF"
- name: Upload SBOM artifact
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: sbom
path: sbom.spdx.json
retention-days: 90
- name: Clean up temporary tags
run: |
echo "Manifest creation complete. Temporary platform-specific tags can be removed manually if desired."
# The cert was issued to the prerelease-docker.yml workflow when
# signing happened there, so the identity regex must match that
# workflow's path. Fulcio's SAN is built from job_workflow_ref,
# which for reusable workflows is the CALLEE.
#
# Verify by IMMUTABLE digest (not the :VERSION tag) so this step
# is invariant under any retag race between the verify-promoted-
# tags step above and this one. Trivy's re-scan above also uses
# @${EXPECTED_DIGEST}; keeping cosign on the same reference is
# consistent and avoids a tag-resolution TOCTOU window.
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
echo "Verifying signature for: $IMAGE_REF"
cosign verify \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
--certificate-identity-regexp "^https://github.com/${REPO}/\.github/workflows/prerelease-docker\.yml@refs/(heads|tags)/" \
--certificate-github-workflow-repository "${REPO}" \
"$IMAGE_REF"
echo "Signature transitivity verified ✓"
- name: Clean up prerelease tags
continue-on-error: true
@@ -454,3 +312,23 @@ jobs:
done
echo "Prerelease tag cleanup complete. Deleted ${DELETED} tag(s) matching ${PREFIX}*."
- name: Summary
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
VERSION: ${{ steps.version.outputs.version }}
MAJOR_MINOR: ${{ steps.version.outputs.major_minor }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
{
echo "## Docker Release Promoted"
echo ""
echo "**Digest:** \`${EXPECTED_DIGEST}\`"
echo ""
echo "**Tags:** \`${VERSION}\`, \`${MAJOR_MINOR}\`, \`latest\` — all share the same digest as the prerelease manifest, so cosign signatures, SBOM, and SLSA provenance from the prerelease step are transitively valid."
echo ""
echo '```'
echo "docker pull ${DOCKER_USERNAME}/local-deep-research:${VERSION}"
echo "docker pull ${DOCKER_USERNAME}/local-deep-research:latest"
echo '```'
} >> "$GITHUB_STEP_SUMMARY"

View File

@@ -1,17 +1,56 @@
name: Prerelease Docker Image
# Build a prerelease Docker image for local testing before the official
# release is published. Triggered exclusively via repository_dispatch from
# release.yml (after all gates pass and approval is granted).
# Build the canonical Docker image for a release. In the build-once-promote
# pipeline, this workflow IS the build — docker-publish.yml only retags the
# manifest produced here. Cosign signing, SBOM attestation, and SLSA
# provenance are attached here once, keyed by manifest digest, so they're
# discoverable from any tag (including the release tags later created by
# imagetools create).
#
# NO workflow_dispatch — security is enforced upstream in release.yml.
# Triggered exclusively via workflow_call from release.yml (after security
# gates pass). No workflow_dispatch — security and gate semantics are
# enforced by the caller. The build runs automatically; the only human
# approval in the release flow is the `release` env on this workflow's
# jobs + publish-docker + trigger-pypi + create-release in release.yml
# (gates the actual publish, not the canonical build).
on:
repository_dispatch:
types: [publish-prerelease-docker]
workflow_call:
inputs:
version:
description: "Bare semver, e.g. '1.6.9' (no leading 'v')"
type: string
required: true
short_sha:
description: "First 7 chars of commit SHA (used in the prerelease tag)"
type: string
required: true
secrets:
# Explicit secrets contract instead of `secrets: inherit` on the
# caller side — narrower blast radius if a future caller misuses
# this reusable workflow.
DOCKER_USERNAME:
required: true
description: "Docker Hub username for image push"
DOCKER_PASSWORD:
required: true
description: "Docker Hub PAT (Read+Write+Delete scopes)"
outputs:
manifest_digest:
description: "sha256:... digest of the multi-arch prerelease manifest. Used by docker-publish.yml to verify retag preserves the digest."
value: ${{ jobs.create-manifest.outputs.digest }}
permissions: {} # Minimal top-level for OSSF Scorecard Token-Permissions
# No approval gate at the build step — the build runs automatically once
# security gates and CI gates in release.yml pass. The only meaningful
# human decision in the release flow is "should this signed, attested,
# tested image become the official release?" — gated by the `release`
# environment on this workflow's jobs + `publish-docker` + `trigger-pypi`
# + `create-release` in release.yml. The maintainer can pull
# `:prerelease-v<ver>-<sha>` and smoke-test between build completion
# and approving the release env.
jobs:
build-amd64:
name: Build AMD64 Prerelease Image
@@ -48,11 +87,9 @@ jobs:
context: .
platforms: linux/amd64
push: true
tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ github.event.client_payload.version }}-${{ github.event.client_payload.short_sha }}-amd64
tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ inputs.version }}-${{ inputs.short_sha }}-amd64
cache-from: type=gha,scope=linux-amd64
cache-to: type=gha,mode=max,scope=linux-amd64
build-args: |
DEPS_HASH=${{ hashFiles('pdm.lock') }}
build-arm64:
name: Build ARM64 Prerelease Image
@@ -89,11 +126,9 @@ jobs:
context: .
platforms: linux/arm64
push: true
tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ github.event.client_payload.version }}-${{ github.event.client_payload.short_sha }}-arm64
tags: ${{ secrets.DOCKER_USERNAME }}/local-deep-research:prerelease-v${{ inputs.version }}-${{ inputs.short_sha }}-arm64
cache-from: type=gha,scope=linux-arm64
cache-to: type=gha,mode=max,scope=linux-arm64
build-args: |
DEPS_HASH=${{ hashFiles('pdm.lock') }}
security-scan:
name: Security Scan
@@ -136,8 +171,6 @@ jobs:
tags: local-deep-research:security-scan
cache-from: type=gha,scope=linux-amd64
cache-to: type=gha,mode=max,scope=linux-amd64
build-args: |
DEPS_HASH=${{ hashFiles('pdm.lock') }}
# Generate Trivy SARIF for archival as a workflow artifact (all severities, never fails).
# Severity-gating happens in the next step.
@@ -177,6 +210,11 @@ jobs:
environment: release
permissions:
contents: read
id-token: write # Required for cosign keyless OIDC signing
# No `packages: write` — Docker Hub auth uses DOCKER_PASSWORD secret,
# not GITHUB_TOKEN. `packages: write` only matters for ghcr.io pushes.
outputs:
digest: ${{ steps.capture-digest.outputs.digest }}
steps:
- name: Harden Runner
@@ -199,11 +237,22 @@ jobs:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Install Cosign
uses: sigstore/cosign-installer@6f9f17788090df1f26f669e9d70d6ae9567deba6 # v4.1.2
with:
# Pin to cosign v2.x — see release.yml for rationale (v3 enables
# --new-bundle-format by default which changes the on-wire format
# and breaks downstream verifiers still on v2).
cosign-release: 'v2.6.3'
- name: Install Syft
uses: anchore/sbom-action/download-syft@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
- name: Create and push multi-platform manifest
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
VERSION: ${{ github.event.client_payload.version }}
SHORT_SHA: ${{ github.event.client_payload.short_sha }}
VERSION: ${{ inputs.version }}
SHORT_SHA: ${{ inputs.short_sha }}
run: |
set -euo pipefail
TAG="prerelease-v${VERSION}-${SHORT_SHA}"
@@ -216,17 +265,220 @@ jobs:
# Floating tag: re-point :prerelease at the manifest just created so
# testers can pin compose to `:prerelease` and pull the latest RC via
# `docker compose pull` without editing the tag each cycle. The
# versioned tag above remains for reproducibility.
# versioned tag above remains for reproducibility (and is what
# docker-publish.yml retags by digest into :1.6.9 / :1.6 / :latest).
echo "Updating floating tag: ${DOCKER_USERNAME}/local-deep-research:prerelease"
docker buildx imagetools create -t "${DOCKER_USERNAME}/local-deep-research:prerelease" \
"${DOCKER_USERNAME}/local-deep-research:${TAG}"
echo "Floating :prerelease tag updated"
- name: Capture manifest digest
id: capture-digest
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
VERSION: ${{ inputs.version }}
SHORT_SHA: ${{ inputs.short_sha }}
run: |
set -euo pipefail
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research:prerelease-v${VERSION}-${SHORT_SHA}"
# Same form as the existing docker-publish.yml inspector — avoids jq.
DIGEST=$(docker buildx imagetools inspect "$IMAGE_REF" --format '{{json .Manifest.Digest}}' | tr -d '"')
if [[ -z "$DIGEST" || "$DIGEST" != sha256:* ]]; then
echo "::error::Failed to capture manifest digest (got '${DIGEST}')"
exit 1
fi
echo "digest=${DIGEST}" >> "$GITHUB_OUTPUT"
echo "Manifest digest: ${DIGEST}"
- name: Sign manifest with Cosign
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
DIGEST: ${{ steps.capture-digest.outputs.digest }}
run: |
set -euo pipefail
# Sign by digest — signature artifact lands at sha256-<digest>.sig
# in the same repo, discoverable from ANY tag pointing at the same
# digest (including release tags created later by docker-publish.yml's
# imagetools-create retag).
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
echo "Signing image by digest: $IMAGE_REF"
cosign sign --yes "$IMAGE_REF"
# Brief sleep to allow registry to propagate signature
sleep 5
- name: Generate SLSA provenance attestation
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
DIGEST: ${{ steps.capture-digest.outputs.digest }}
run: |
set -euo pipefail
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
# entryPoint is the TOP-LEVEL caller (release.yml), not this
# reusable workflow. Per SLSA GHA buildtype v1 and the canonical
# slsa-github-generator, reusable workflows are explicitly NOT
# entryPoints. github.run_id / github.repository / github.sha all
# resolve to the caller's run context inside a reusable workflow.
# builder.id pins the workflow that actually defines the build
# steps — the trust root a verifier policy can pin against. We
# compose it from `github.repository` and a hardcoded path to
# THIS workflow file, with `github.ref` for the ref portion.
# Rationale: inside a workflow_call callee, the `github` context
# is scoped to the CALLER, so `github.workflow_ref` would point
# at release.yml (the wrong builder). The `job` context has no
# `workflow_ref` property either (only check_run_id, container,
# services, status — actionlint confirms). For a local-path
# reusable workflow (`uses: ./.github/workflows/...`), the
# callee's ref equals the caller's `github.ref`, so composing
# the path manually gives the correct
# `<owner>/<repo>/.github/workflows/prerelease-docker.yml@<ref>`
# format that matches the Fulcio cert SAN. Cosign and
# slsa-verifier both anchor on the cert anyway, so this fix is
# about correctness for raw-JSON policy engines / audit tools
# that read builder.id directly.
#
# completeness.* are FALSE because we don't capture invocation
# parameters or environment, and the build does network I/O for
# apt/pip/npm. Honest emptiness > false claims of completeness.
#
# buildInvocationId includes run_attempt so re-runs are
# distinguishable in audit logs.
cat > provenance.json <<EOF
{
"buildType": "https://github.com/${{ github.repository }}/docker-build@v1",
"builder": {
"id": "https://github.com/${{ github.repository }}/.github/workflows/prerelease-docker.yml@${{ github.ref }}"
},
"invocation": {
"configSource": {
"uri": "https://github.com/${{ github.repository }}",
"digest": {
"sha1": "${{ github.sha }}"
},
"entryPoint": ".github/workflows/release.yml"
}
},
"metadata": {
"buildInvocationId": "${{ github.run_id }}-${{ github.run_attempt }}",
"buildStartedOn": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"completeness": {
"parameters": false,
"environment": false,
"materials": false
},
"reproducible": false
},
"materials": [
{
"uri": "https://github.com/${{ github.repository }}",
"digest": {
"sha1": "${{ github.sha }}"
}
}
]
}
EOF
# --replace prevents duplicate SLSA attestations on re-run. Cosign's
# Replace logic is keyed by predicate-type URI, so it leaves the
# SBOM SPDX attestation (different predicateType) untouched.
cosign attest --yes --replace --predicate provenance.json --type slsaprovenance "$IMAGE_REF"
- name: Verify image signature
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
DIGEST: ${{ steps.capture-digest.outputs.digest }}
run: |
set -euo pipefail
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${DIGEST}"
echo "Verifying signature for: $IMAGE_REF"
# Retry to handle registry propagation delay after signing
MAX_RETRIES=5
RETRY_DELAY=10
for i in $(seq 1 "$MAX_RETRIES"); do
echo "Verification attempt $i of $MAX_RETRIES..."
if cosign verify \
--certificate-identity-regexp="https://github.com/${{ github.repository }}" \
--certificate-oidc-issuer=https://token.actions.githubusercontent.com \
"$IMAGE_REF"; then
echo "Signature verification successful!"
exit 0
fi
if [ "$i" -lt "$MAX_RETRIES" ]; then
echo "Verification failed, waiting ${RETRY_DELAY}s before retry..."
sleep "$RETRY_DELAY"
fi
done
echo "::error::Signature verification failed after $MAX_RETRIES attempts"
exit 1
- name: Generate per-platform SBOMs and attest each
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
DIGEST: ${{ steps.capture-digest.outputs.digest }}
run: |
set -euo pipefail
REPO="${DOCKER_USERNAME}/local-deep-research"
MANIFEST_REF="${REPO}@${DIGEST}"
# Multi-arch SBOM correctness: syft against a manifest list digest
# only scans the host platform's layers (per anchore/syft#1708),
# which would lie to ARM64 consumers. We attest each per-arch
# digest with its OWN SBOM so end-user verification is honest.
# We deliberately do NOT also produce a "manifest-level SBOM" —
# that would be amd64-only (host arch) and re-introduce the lie
# for any arm64 consumer running the README verifier recipe.
# The README documents the per-arch verification flow instead.
MANIFEST_JSON=$(docker buildx imagetools inspect "${MANIFEST_REF}" --raw)
# Defense against future buildx output changes: assert at least
# one per-arch entry exists. Without this, an empty/malformed
# manifest list would silently produce zero SBOMs and pass CI green.
PER_ARCH_COUNT=$(echo "${MANIFEST_JSON}" \
| jq '[.manifests[] | select(.platform.architecture != "unknown")] | length')
if [[ "${PER_ARCH_COUNT}" -lt 1 ]]; then
echo "::error::No per-arch manifest entries found in ${MANIFEST_REF} — SBOM generation cannot proceed"
echo "Raw manifest: ${MANIFEST_JSON}"
exit 1
fi
echo "Found ${PER_ARCH_COUNT} per-arch manifest(s) to scan"
echo "$MANIFEST_JSON" \
| jq -r '.manifests[] | select(.platform.architecture != "unknown") | "\(.platform.os)/\(.platform.architecture)\t\(.digest)"' \
| while IFS=$'\t' read -r PLAT PER_ARCH_DIGEST; do
ARCH="${PLAT##*/}"
PER_ARCH_REF="${REPO}@${PER_ARCH_DIGEST}"
SBOM_FILE="sbom-${ARCH}.spdx.json"
echo "=== Scanning ${PLAT} (${PER_ARCH_DIGEST}) ==="
# --platform tells syft which arch to scan — matters when
# the host runner can't natively execute the image.
syft --platform "${PLAT}" "${PER_ARCH_REF}" -o spdx-json > "${SBOM_FILE}"
# --replace prevents accumulation when a re-run lands on
# the same digest (e.g. "Re-run failed jobs" after a flake).
# Per cosign source pkg/cosign/remote/remote.go, --replace
# is per-predicate-type, so it doesn't disturb the SLSA
# attestation already on the manifest list digest.
cosign attest --yes --replace \
--predicate "${SBOM_FILE}" --type spdxjson "${PER_ARCH_REF}"
done
- name: Upload SBOMs artifact
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: sbom
# Per-arch SBOMs only — `sbom-amd64.spdx.json`, `sbom-arm64.spdx.json`,
# one per platform in the manifest list. No manifest-level SBOM is
# produced (would be host-arch-only and misleading for non-amd64
# consumers).
path: sbom-*.spdx.json
retention-days: 90
- name: Summary
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
VERSION: ${{ github.event.client_payload.version }}
SHORT_SHA: ${{ github.event.client_payload.short_sha }}
VERSION: ${{ inputs.version }}
SHORT_SHA: ${{ inputs.short_sha }}
DIGEST: ${{ steps.capture-digest.outputs.digest }}
run: |
TAG="prerelease-v${VERSION}-${SHORT_SHA}"
{
@@ -235,8 +487,14 @@ jobs:
echo "**Versioned tag:** \`${TAG}\`"
echo "**Floating tag:** \`prerelease\` (now points at this build)"
echo ""
echo "**Digest:** \`${DIGEST}\`"
echo ""
echo '```'
echo "docker pull ${DOCKER_USERNAME}/local-deep-research:${TAG}"
echo "docker pull ${DOCKER_USERNAME}/local-deep-research:prerelease"
echo '```'
echo ""
echo "Signed and attested. After release approval, docker-publish.yml"
echo "will retag this exact digest as \`:${VERSION}\`, \`:major.minor\`,"
echo "and \`:latest\`."
} >> "$GITHUB_STEP_SUMMARY"

View File

@@ -17,7 +17,11 @@ jobs:
outputs:
has-frontend: ${{ steps.check.outputs.has-frontend }}
container:
image: node:20-alpine@sha256:bcd88137d802e2482c9df3cdec71e0431857ebbbdba6973776b5593214056d86 # node:20-alpine
# Node 24 to match `package.json`'s `engines: { node: ">=24.0.0" }`.
# Was previously node:20 — npm could resolve dependencies that target
# APIs missing on 20 and the wheel-building publish path could ship
# frontend assets that break at runtime on the Node-24 Docker image.
image: node:24-alpine@sha256:d1b3b4da11eefd5941e7f0b9cf17783fc99d9c6fc34884a665f40a06dbdfc94f # node:24-alpine
# Note: Network is needed for npm ci to work, but no secrets are available
options: --user 1001
permissions:

View File

@@ -343,7 +343,12 @@ jobs:
- name: Install Node.js for Vite build
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6.4.0
with:
node-version: '20'
# Match package.json's `engines: { node: ">=24.0.0" }` and
# publish.yml's node:24-alpine. Building the wheel-tested
# frontend on Node 20 here while the real publish runs on
# Node 24 means the gate validates a different runtime than
# ships, defeating the gate's stated purpose.
node-version: '24'
- name: Set up PDM (build only)
uses: pdm-project/setup-pdm@973541a5febeafcfdadf8a51211435be6ecfd90f # v4.5

View File

@@ -86,9 +86,13 @@ jobs:
# The build job depends on this gate, ensuring no release can be created
# without passing all security checks.
#
# NOTE: Docker/PyPI publishing only happens via repository_dispatch from this
# workflow. Creating a release via GitHub UI will NOT trigger publishing
# (this is by design to prevent security gate bypass).
# NOTE: Docker publishing now runs inline (reusable workflow_call to
# docker-publish.yml), and PyPI publishing is still dispatched via
# repository_dispatch to publish.yml (PyPI Trusted Publishing does not
# support reusable workflows — see pypa/gh-action-pypi-publish#166 and
# pypi/warehouse#11096). Neither publish path is triggered by creating a
# release via the GitHub UI — only this workflow can dispatch them — so
# the security-gate flow is preserved.
# ============================================================================
release-gate:
needs: [version-check]
@@ -282,41 +286,65 @@ jobs:
fi
echo "Version verified: $EXPECTED_VERSION"
- name: Generate SBOM (Software Bill of Materials)
# SBOM generation uses the SHA-pinned trivy-action with an explicit
# binary version pin. Previously this step shelled out
# `apt-get install -y trivy` from the (unpinned) Aqua apt repo, which
# in a job carrying `id-token: write` (line 229) meant a compromised
# apt mirror could exfiltrate the OIDC request token AND tamper with
# the SBOM bytes that the next step (Sign release artifacts) signs
# under this repo's Sigstore identity — i.e., a fraudulent SBOM
# carrying a legitimate cosign cert. Pin the action by SHA + the
# binary by version tag so the toolchain is reproducible and matches
# docker-publish.yml / prerelease-docker.yml.
- name: Generate SBOM (SPDX JSON)
if: steps.check_release.outputs.exists == 'false'
uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
with:
scan-type: 'fs'
scan-ref: '.'
format: 'spdx-json'
output: 'sbom-spdx.json'
version: 'v0.69.2'
- name: Generate SBOM (CycloneDX)
if: steps.check_release.outputs.exists == 'false'
uses: aquasecurity/trivy-action@ed142fd0673e97e23eac54620cfb913e5ce36c25 # v0.36.0
with:
scan-type: 'fs'
scan-ref: '.'
format: 'cyclonedx'
output: 'sbom-cyclonedx.json'
version: 'v0.69.2'
- name: List SBOMs
if: steps.check_release.outputs.exists == 'false'
run: |
echo "=== Generating Software Bill of Materials ==="
# Install Trivy for SBOM generation (using signed-by for secure key management)
curl -fsSL https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install -y trivy
# Generate SBOM in SPDX format (JSON)
trivy fs --format spdx-json --output sbom-spdx.json .
echo "Generated SBOM in SPDX-JSON format"
# Generate SBOM in CycloneDX format (JSON)
trivy fs --format cyclonedx --output sbom-cyclonedx.json .
echo "Generated SBOM in CycloneDX format"
# Display summary
echo ""
echo "SBOM files generated:"
ls -lh sbom-*.json
- name: Install Cosign
if: steps.check_release.outputs.exists == 'false'
uses: sigstore/cosign-installer@6f9f17788090df1f26f669e9d70d6ae9567deba6 # v4.1.2
with:
# Pin to cosign v2.x — v4.1.2 of the installer ships cosign v3.0.6
# by default, but v3 enables --new-bundle-format ON BY DEFAULT.
# That changes the signature/attestation on-wire format and would
# require all downstream verifiers (including end-users running
# the README cosign-verify recipe) to also be on v3+. Pin to
# v2.6.3 (latest v2.x, includes the GHSA-w6c6-c85g-mmv6 patch)
# for the legacy tag-based sigstore format until we explicitly
# migrate to v3.
cosign-release: 'v2.6.3'
- name: Sign release artifacts with Sigstore
if: steps.check_release.outputs.exists == 'false'
run: |
echo "=== Signing release artifacts with Sigstore ==="
# Sign SBOM files using keyless signing (OIDC)
# cosign v3.0.2+ uses --bundle which contains signature, certificate, and metadata in one file
# Sign SBOM files using keyless signing (OIDC).
# `--bundle` writes a Sigstore protobuf bundle containing the
# signature, certificate, and Rekor inclusion proof in one file.
# Supported in cosign v2.4.0+ (we're pinned to v2.6.3 above).
cosign sign-blob --yes --bundle sbom-spdx.json.bundle sbom-spdx.json
cosign sign-blob --yes --bundle sbom-cyclonedx.json.bundle sbom-cyclonedx.json
@@ -364,9 +392,87 @@ jobs:
provenance-name: "provenance.intoto.jsonl"
compile-generator: true # Build from source to bypass TUF key validation issues
create-release:
# Build, sign, and push the canonical Docker image. This is the single
# build for the release — publish-docker (the reusable workflow_call
# to docker-publish.yml) later retags this manifest by digest. The jobs
# inside this reusable workflow declare `environment: release` so they
# can read the env-scoped DOCKER_USERNAME / DOCKER_PASSWORD; the FIRST
# (and only) `release` env approval click in this run unlocks ALL
# release-env jobs together: prerelease-docker, publish-docker,
# trigger-pypi, monitor-pypi, and create-release. After the
# atomicity refactor, docker-publish is inline (reusable workflow_call)
# so it does not require a separate second approval — its result is
# visible to downstream jobs (create-release, cleanup-on-rejection) in
# the same run. (Earlier iterations of the build-once-promote refactor
# tried to run the canonical build pre-approval to give a real
# test-then-approve window; that required repo-level Docker Hub
# secrets, which we deliberately don't have.) The manifest_digest
# output flows through to the promote step so digest preservation can
# be verified end-to-end.
prerelease-docker:
needs: [build, provenance]
if: ${{ !cancelled() && needs.build.result == 'success' && needs.provenance.result == 'success' && needs.build.outputs.release_exists == 'false' }}
uses: ./.github/workflows/prerelease-docker.yml
with:
version: ${{ needs.build.outputs.version }}
short_sha: ${{ needs.build.outputs.short_sha }}
secrets:
# Explicit list (not `secrets: inherit`) — defense-in-depth so a
# future edit to prerelease-docker.yml that references an unrelated
# secret would fail loudly rather than silently accessing inherited
# values like PAT_TOKEN, OPENROUTER_API_KEY, or SERPER_API_KEY.
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
permissions:
# Caller permissions cap what the called workflow can request.
# Must include id-token: write so cosign keyless OIDC works on the
# called workflow's create-manifest job.
# NOTE: no `packages: write` — Docker Hub auth uses DOCKER_PASSWORD
# secret, not GITHUB_TOKEN. `packages: write` only matters for ghcr.io.
contents: read
id-token: write
# Retag the prerelease manifest as the release tags (:VERSION, :MAJOR_MINOR,
# :latest). Runs as a reusable workflow_call so its result is visible to
# downstream jobs in this run — specifically: create-release blocks on
# success here, and cleanup-on-rejection keys its cosign-deletion logic on
# `needs.publish-docker.result`. Before this refactor, docker-publish.yml
# was triggered via repository_dispatch and its outcome was invisible to
# the parent release.yml run, which made cosign artifact cleanup unsafe
# after a partial retag failure (release tags could exist sharing the
# prerelease manifest digest, and deleting `sha256-<digest>.{sig,att}`
# would invalidate those release-tag signatures).
publish-docker:
needs: [build, prerelease-docker]
if: ${{ !cancelled() && needs.prerelease-docker.result == 'success' && needs.build.outputs.release_exists == 'false' }}
uses: ./.github/workflows/docker-publish.yml
with:
tag: ${{ needs.build.outputs.tag }}
source_tag: prerelease-v${{ needs.build.outputs.version }}-${{ needs.build.outputs.short_sha }}
expected_digest: ${{ needs.prerelease-docker.outputs.manifest_digest }}
secrets:
# Explicit secret pass — env-scoped (`release`) DOCKER_USERNAME and
# DOCKER_PASSWORD become available to the callee's `promote` job
# because that job declares `environment: release`.
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
permissions:
# cosign VERIFY (the only sigstore operation in the callee) is a
# read-only check against the public Rekor/Fulcio infrastructure —
# no GitHub OIDC token is minted, so id-token: write is unnecessary.
# OIDC is only needed for cosign SIGN, which happens upstream in
# prerelease-docker.yml.
contents: read
create-release:
# create-release waits for prerelease-docker (canonical image exists,
# signed and attested), publish-docker (release tags retagged with
# digest preserved, cosign transitivity verified), and monitor-pypi
# (PyPI publish has completed). Only then is the GitHub Release
# published — so the Release page never points at non-existent
# artifacts. Gated by the `release` environment.
needs: [build, provenance, prerelease-docker, publish-docker, monitor-pypi]
if: ${{ !cancelled() && needs.build.result == 'success' && needs.provenance.result == 'success' && needs.prerelease-docker.result == 'success' && needs.publish-docker.result == 'success' && needs.monitor-pypi.result == 'success' && needs.build.outputs.release_exists == 'false' }}
runs-on: ubuntu-latest
environment: release
permissions:
@@ -436,6 +542,35 @@ jobs:
with:
name: provenance.intoto.jsonl
- name: Download container-image SBOMs
# Produced by prerelease-docker.yml — one SBOM per architecture
# (sbom-amd64.spdx.json, sbom-arm64.spdx.json) from Syft. The
# cosign attestations on each per-arch digest are the
# cryptographically authoritative copies; attaching the raw JSON
# to the GitHub Release makes them discoverable to humans
# browsing the release page.
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
name: sbom
path: container-sbom
- name: Stage container-image SBOMs for release attachment
# Rename per-arch SBOMs to avoid collision with the filesystem SBOM
# `sbom-spdx.json` produced by the build job (Trivy scan of source
# tree). Container SBOMs are the Syft scan of built image layers
# per platform — different content.
run: |
set -euo pipefail
# Per-arch SBOMs: rename sbom-amd64.spdx.json → sbom-container-amd64.spdx.json
# (and similarly for arm64). prerelease-docker.yml no longer
# produces a manifest-level sbom.spdx.json.
for f in container-sbom/sbom-*.spdx.json; do
[ -f "$f" ] || continue
base=$(basename "$f")
mv "$f" "sbom-container-${base#sbom-}"
done
ls -la sbom-container*.spdx.json || echo "No container SBOMs found"
- name: List artifacts
run: |
echo "Release artifacts:"
@@ -780,7 +915,18 @@ jobs:
print(f"Release body truncated to {len(content)} chars", file=sys.stderr)
PYEOF
# Includes SBOM files, Sigstore signatures, and SLSA provenance
# Includes filesystem SBOM (Trivy scan of source tree) AND
# container-image SBOM (Syft scan of built image layers, per-arch),
# Sigstore signatures, and SLSA provenance. The two SBOM kinds
# describe different surfaces — both are useful for downstream
# supply-chain consumers.
# `find ... -print0 | xargs -0 ...` handles the variable number
# of per-arch container SBOMs gracefully (zero or more).
set -euo pipefail
CONTAINER_SBOMS=()
while IFS= read -r -d '' f; do
CONTAINER_SBOMS+=("$f")
done < <(find . -maxdepth 1 -name 'sbom-container*.spdx.json' -print0)
gh release create "$RELEASE_TAG" \
--repo "$GITHUB_REPOSITORY" \
--title "Release $RELEASE_VERSION" \
@@ -789,11 +935,17 @@ jobs:
sbom-spdx.json.bundle \
sbom-cyclonedx.json \
sbom-cyclonedx.json.bundle \
provenance.intoto.jsonl
provenance.intoto.jsonl \
"${CONTAINER_SBOMS[@]}"
env:
# PAT_TOKEN required here because GITHUB_TOKEN cannot trigger downstream
# workflows (repository_dispatch). Minimum scopes: repo (for release
# creation + dispatch triggers), workflow (for triggering workflows).
# PAT_TOKEN is required here because GITHUB_TOKEN cannot trigger
# workflows that listen on `release:` (backwards-compatibility.yml,
# sbom.yml) — that's a documented GITHUB_TOKEN limitation.
# Minimum scope: `repo` (full scope; `public_repo` is NOT sufficient
# for the release-creation API on private repos and `repo` works
# uniformly). The `workflow` scope is NOT needed — it only governs
# editing files under .github/workflows/ via the API, which this
# step does not do.
GITHUB_TOKEN: ${{ secrets.PAT_TOKEN }}
RELEASE_TAG: ${{ needs.build.outputs.tag }}
RELEASE_VERSION: ${{ needs.build.outputs.version }}
@@ -979,47 +1131,23 @@ jobs:
automation
maintenance
trigger-prerelease-docker:
needs: [build, provenance]
if: ${{ !cancelled() && needs.build.result == 'success' && needs.provenance.result == 'success' && needs.build.outputs.release_exists == 'false' }}
runs-on: ubuntu-latest
# Separate environment from `release` so the GitHub "Review deployments"
# modal shows two independent checkboxes — letting maintainers approve or
# reject the prerelease Docker test independently of the actual release
# publish (create-release + trigger-workflows still gate on `release`).
environment: prerelease
permissions:
contents: read
# NOTE: trigger-prerelease-docker and the old combined trigger-workflows
# (which used to dispatch BOTH publish-docker AND publish-pypi) were
# removed in the build-once-promote + atomicity refactors.
# prerelease-docker.yml and docker-publish.yml are now invoked as
# reusable workflows earlier in this file. Only PyPI publishing remains
# on repository_dispatch — PyPI Trusted Publishing requires the publish
# step to run in a top-level (non-reusable) workflow.
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: audit
- name: Trigger prerelease Docker build
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
env:
RELEASE_VERSION: ${{ needs.build.outputs.version }}
SHORT_SHA: ${{ needs.build.outputs.short_sha }}
with:
# PAT_TOKEN: repository_dispatch requires a PAT (GITHUB_TOKEN cannot trigger workflows)
github-token: ${{ secrets.PAT_TOKEN }}
script: |
await github.rest.repos.createDispatchEvent({
owner: context.repo.owner,
repo: context.repo.repo,
event_type: 'publish-prerelease-docker',
client_payload: {
version: process.env.RELEASE_VERSION,
short_sha: process.env.SHORT_SHA
}
});
console.log('Triggered prerelease Docker build');
trigger-workflows:
needs: [build, create-release]
if: ${{ !cancelled() && needs.build.result == 'success' && needs.create-release.result == 'success' && needs.build.outputs.release_exists == 'false' }}
# Dispatch PyPI publish via repository_dispatch. Kept as a dispatch
# (rather than a reusable workflow_call) because PyPI's Trusted
# Publisher matches the OIDC `workflow_ref` claim, which points to the
# CALLER when a workflow is invoked via workflow_call — so a reusable
# publish.yml would fail with `invalid-publisher`. Tracked in
# pypa/gh-action-pypi-publish#166 and pypi/warehouse#11096.
trigger-pypi:
needs: [build, prerelease-docker, publish-docker]
if: ${{ !cancelled() && needs.prerelease-docker.result == 'success' && needs.publish-docker.result == 'success' && needs.build.outputs.release_exists == 'false' }}
runs-on: ubuntu-latest
environment: release
permissions:
@@ -1033,63 +1161,50 @@ jobs:
with:
egress-policy: audit
# Recorded BEFORE the dispatches so monitor-publish can filter on createdAt
# without missing runs created during this job.
# Recorded BEFORE the dispatch so monitor-pypi can filter on createdAt
# without missing the run created during this job.
- name: Record dispatch time
id: dispatch-time
run: echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$GITHUB_OUTPUT"
- name: Trigger Docker publish workflow
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
env:
RELEASE_TAG: ${{ needs.build.outputs.tag }}
with:
# PAT_TOKEN: repository_dispatch requires a PAT (GITHUB_TOKEN cannot trigger workflows)
github-token: ${{ secrets.PAT_TOKEN }}
script: |
await github.rest.repos.createDispatchEvent({
owner: context.repo.owner,
repo: context.repo.repo,
event_type: 'publish-docker',
client_payload: {
tag: process.env.RELEASE_TAG
}
});
console.log('Triggered Docker publish workflow');
- name: Trigger PyPI publish workflow
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
env:
RELEASE_TAG: ${{ needs.build.outputs.tag }}
with:
# PAT_TOKEN: repository_dispatch requires a PAT (GITHUB_TOKEN cannot trigger workflows)
# PAT_TOKEN: repository_dispatch from a workflow run cannot be
# fired with GITHUB_TOKEN — the API rejects it to prevent
# workflow-trigger loops. Minimum scope: `repo` (full scope —
# `public_repo` is rejected by createDispatchEvent regardless
# of repo visibility).
github-token: ${{ secrets.PAT_TOKEN }}
script: |
// Forward `prerelease: false` explicitly. publish.yml gates
// Test PyPI vs prod PyPI on `client_payload.prerelease == true`;
// if absent, the expression evaluates to '' and falls through to
// prod PyPI. Setting false here makes the choice explicit. The
// current pipeline only releases stable versions through this
// dispatch — true prereleases (if added later) would need a
// separate trigger path that flips this flag.
await github.rest.repos.createDispatchEvent({
owner: context.repo.owner,
repo: context.repo.repo,
event_type: 'publish-pypi',
client_payload: {
tag: process.env.RELEASE_TAG
tag: process.env.RELEASE_TAG,
prerelease: false
}
});
console.log('Triggered PyPI publish workflow');
- name: Summary
env:
RELEASE_VERSION: ${{ needs.build.outputs.version }}
run: |
echo "Release $RELEASE_VERSION created successfully!"
echo "Triggered PyPI and Docker publishing workflows via repository_dispatch"
echo "SBOM files (SPDX, CycloneDX) attached to release"
echo "Sigstore bundles (.bundle) attached for verification (contain signature + certificate)"
echo "SLSA provenance (provenance.intoto.jsonl) attached for supply chain security"
echo "Check the releases page: https://github.com/LearningCircuit/local-deep-research/releases"
# Monitor publish workflows and create issue on partial failure
monitor-publish:
needs: [build, trigger-workflows]
if: ${{ !cancelled() && needs.trigger-workflows.result == 'success' }}
# Block on the dispatched publish.yml run so create-release downstream
# only fires once PyPI has actually shipped. If PyPI fails, this job
# fails and create-release is skipped — preventing the GH Release from
# publishing with a missing PyPI artifact. (Docker promote already
# blocked synchronously above via publish-docker reusable call.)
monitor-pypi:
needs: [build, trigger-pypi]
if: ${{ !cancelled() && needs.trigger-pypi.result == 'success' }}
runs-on: ubuntu-latest
timeout-minutes: 90
permissions:
@@ -1103,120 +1218,106 @@ jobs:
with:
egress-policy: audit
- name: Wait for publish workflows to complete
- name: Wait for PyPI publish workflow to complete
id: wait
env:
GH_TOKEN: ${{ github.token }}
# gh CLI cannot infer the repo here (this job has no checkout step),
# and it does NOT fall back to GITHUB_REPOSITORY. Without GH_REPO,
# `gh run list` fails with "failed to determine base repo", the
# error is swallowed, and every poll sees an empty result — which
# made monitor-publish always time out and falsely open a
# "Partial publish failure" issue even when both publishes succeeded.
# error is swallowed, and every poll sees an empty result.
GH_REPO: ${{ github.repository }}
RELEASE_TAG: ${{ needs.build.outputs.tag }}
DISPATCH_TIME: ${{ needs.trigger-workflows.outputs.dispatch_time }}
DISPATCH_TIME: ${{ needs.trigger-pypi.outputs.dispatch_time }}
run: |
echo "Monitoring publish workflows for tag $RELEASE_TAG (dispatched at $DISPATCH_TIME)..."
# pipefail is essential here: the poll loop pipes `gh run list`
# into `jq`, and without pipefail a transient `gh` failure
# (network, auth, rate limit) is swallowed by `jq` returning
# empty input, causing the loop to spin silently for the full
# 40-minute budget rather than surfacing the error immediately.
set -euo pipefail
echo "Monitoring publish.yml for tag $RELEASE_TAG (dispatched at $DISPATCH_TIME)..."
# Wait for workflows to start (repository_dispatch is async)
# Wait for the dispatched run to start (repository_dispatch is async)
sleep 30
check_workflow() {
local workflow_name="$1"
local max_wait=2400 # 40 minutes
local elapsed=0
local run status conclusion
max_wait=2400 # 40 minutes
elapsed=0
conclusion=""
while [ "$elapsed" -lt "$max_wait" ]; do
# Compare timestamps numerically — lexicographic comparison breaks
# when GitHub returns sub-second precision (e.g. "...:56.500Z"
# sorts before "...:56Z"). fromdateiso8601 doesn't accept
# fractional seconds, so strip them first. The 60s buffer
# tolerates minor clock skew between the runner and GitHub's API.
# Stderr is intentionally NOT redirected so gh failures are
# visible in the runner log (silent failures previously caused
# the loop to time out without explanation).
run=$(gh run list --workflow="$workflow_name" --limit=20 --json status,conclusion,createdAt \
| jq -r --arg since "$DISPATCH_TIME" '
def to_epoch: sub("\\.[0-9]+Z$"; "Z") | fromdateiso8601;
(($since | to_epoch) - 60) as $s
| [.[] | select((.createdAt | to_epoch) >= $s)] | .[0]
')
if [ "$run" = "null" ] || [ -z "$run" ]; then
echo "$workflow_name: waiting for run to appear..." >&2
sleep 30
elapsed=$((elapsed + 30))
continue
fi
status=$(echo "$run" | jq -r '.status')
conclusion=$(echo "$run" | jq -r '.conclusion')
if [ "$status" = "completed" ]; then
echo "$workflow_name: $conclusion" >&2
echo "$conclusion"
return
fi
while [ "$elapsed" -lt "$max_wait" ]; do
# Compare timestamps numerically — lexicographic comparison
# breaks when GitHub returns sub-second precision (e.g.
# "...:56.500Z" sorts before "...:56Z"). fromdateiso8601
# doesn't accept fractional seconds, so strip them first. The
# 60s buffer tolerates minor clock skew between the runner and
# GitHub's API. Stderr is intentionally NOT redirected so gh
# failures stay visible in the runner log.
run=$(gh run list --workflow=publish.yml --limit=20 --json status,conclusion,createdAt \
| jq -r --arg since "$DISPATCH_TIME" '
def to_epoch: sub("\\.[0-9]+Z$"; "Z") | fromdateiso8601;
(($since | to_epoch) - 60) as $s
| [.[] | select((.createdAt | to_epoch) >= $s)] | .[0]
')
if [ "$run" = "null" ] || [ -z "$run" ]; then
echo "publish.yml: waiting for run to appear..." >&2
sleep 30
elapsed=$((elapsed + 30))
done
continue
fi
echo "$workflow_name: timed out after ${max_wait}s" >&2
echo "timed_out"
}
status=$(echo "$run" | jq -r '.status')
conclusion=$(echo "$run" | jq -r '.conclusion')
DOCKER_RESULT=$(check_workflow "docker-publish.yml")
PYPI_RESULT=$(check_workflow "publish.yml")
if [ "$status" = "completed" ]; then
echo "publish.yml: $conclusion" >&2
break
fi
# Mark that monitoring completed (distinguishes from infra failure)
{
echo "monitor_completed=true"
echo "docker_result=$DOCKER_RESULT"
echo "pypi_result=$PYPI_RESULT"
} >> "$GITHUB_ENV"
sleep 30
elapsed=$((elapsed + 30))
done
if [ "$DOCKER_RESULT" = "success" ] && [ "$PYPI_RESULT" = "success" ]; then
echo "Both publish workflows completed successfully!"
else
echo "::warning::Partial publish failure detected"
if [ -z "$conclusion" ]; then
echo "::error::publish.yml run did not start or complete within ${max_wait}s"
echo "pypi_result=timed_out" >> "$GITHUB_OUTPUT"
exit 1
fi
- name: Create issue on partial failure
if: env.monitor_completed == 'true' && (env.docker_result != 'success' || env.pypi_result != 'success')
echo "pypi_result=$conclusion" >> "$GITHUB_OUTPUT"
if [ "$conclusion" = "success" ]; then
echo "PyPI publish completed successfully"
exit 0
else
echo "::error::PyPI publish completed with conclusion: $conclusion"
exit 1
fi
- name: Create issue on PyPI publish failure
# Always run on failure so maintainers see a clear, persistent
# record that captures the workflow run ID. The Actions UI shows
# the failure too but issues are easier to search and triage.
if: failure()
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9.0.0
env:
RELEASE_TAG: ${{ needs.build.outputs.tag }}
PYPI_RESULT: ${{ steps.wait.outputs.pypi_result }}
with:
script: |
const { owner, repo } = context.repo;
const tag = process.env.RELEASE_TAG;
const docker = process.env.docker_result || 'unknown';
const pypi = process.env.pypi_result || 'unknown';
const title = `Partial publish failure for ${tag}`;
const pypi = process.env.PYPI_RESULT || 'unknown';
const title = `PyPI publish failure for ${tag}`;
const failed = [
{ name: 'Docker', result: docker },
{ name: 'PyPI', result: pypi },
].filter(t => t.result !== 'success');
const hasTimeout = failed.some(t => t.result === 'timed_out');
const hasFailure = failed.some(t => t.result === 'failure');
const action = pypi === 'timed_out'
? '**Suggested action:** publish.yml did not complete within 40 minutes. Check the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions) — if the run never appeared, this likely indicates a dispatch issue; re-trigger via `repository_dispatch` event_type `publish-pypi`. If still running, no action may be needed.'
: '**Suggested action:** Inspect publish.yml logs in the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions), fix the underlying cause, then either (a) re-dispatch publish.yml with the same tag via `repository_dispatch` and re-run release.yml to complete create-release, or (b) manually publish the GitHub Release if Docker promote also succeeded.';
let action;
if (hasTimeout && !hasFailure) {
action = '**Suggested action:** One or more workflows did not complete within 40 minutes. Check the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions) — if a run never appeared, this likely indicates an infrastructure or dispatch issue and the publish can be re-triggered. If still running, no action may be needed.';
} else if (hasFailure && !hasTimeout) {
action = '**Suggested action:** Inspect the failed workflow logs in the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions), fix the underlying cause, then re-trigger the publish via `repository_dispatch`.';
} else {
action = '**Suggested action:** Mixed results — investigate each non-success target individually in the [Actions tab](https://github.com/' + owner + '/' + repo + '/actions).';
}
const body = `## PyPI publish failed for ${tag}\n\nResult: \`${pypi}\`\n\nNote: At this point in the atomicity flow, Docker promote (publish-docker job) has already succeeded — the Docker release tags exist and are signed. Only PyPI and the GitHub Release are missing.\n\n${action}`;
const body = `## Publish Status for ${tag}\n\n| Target | Result |\n|--------|--------|\n| Docker | ${docker} |\n| PyPI | ${pypi} |\n\n${action}`;
// De-dup: if an open issue with the same title already exists
// (e.g. from a re-run of this workflow on the same tag), comment
// on it instead of opening a duplicate.
// De-dup: if an open issue with the same title already exists,
// comment on it instead of opening a duplicate.
const existing = await github.paginate(github.rest.issues.listForRepo, {
owner, repo, state: 'open', labels: 'ci-cd', per_page: 100,
});
@@ -1232,3 +1333,218 @@ jobs:
owner, repo, title, body, labels: ['ci-cd'],
});
}
# ============================================================================
# CLEANUP ON REJECTION
# ============================================================================
# In the build-once-promote model, prerelease-docker signs the manifest
# BEFORE the publish step runs. If publish-docker (the retag step) fails
# or the maintainer rejects the `release` env approval, the prerelease
# tag and its cosign artifacts (`sha256-<digest>.{sig,att}`) are left
# orphaned on Docker Hub forever — the existing cleanup loop inside
# docker-publish.yml only runs on its success path.
#
# SAFETY: cosign signature/attestation artifacts are stored at a tag named
# `sha256-<manifest-digest>.{sig,att}` and discovered by manifest digest,
# NOT by image tag. After publish-docker SUCCEEDS, the release tags
# (`:VERSION`, `:MAJOR_MINOR`, `:latest`) share the prerelease manifest
# digest (imagetools retag preserves the digest), so the cosign artifacts
# anchor BOTH the deleted prerelease tag AND the live release tags.
# Deleting them after a successful retag would invalidate release-tag
# cosign verification.
#
# Therefore: this job ONLY fires when publish-docker did not succeed.
# Beyond that point, docker-publish.yml's success-path cleanup handles
# prerelease-tag deletion, and cosign artifacts must stay.
#
# Edge case: partial retag failure (e.g., `:1.6.9` lands but `:latest`
# fails) — publish-docker exits failure with some release tags already
# created. The cleanup script enumerates the three possible release tags
# against Docker Hub and rolls back any that exist BEFORE deleting
# cosign artifacts. This is the only case where we delete release tags
# rather than leave them.
# ============================================================================
cleanup-on-rejection:
name: Clean up orphan prerelease tags and signatures
needs: [build, prerelease-docker, publish-docker]
if: >-
${{ always()
&& needs.prerelease-docker.result != 'skipped'
&& needs.build.outputs.release_exists == 'false'
&& (needs.prerelease-docker.result == 'failure'
|| needs.prerelease-docker.result == 'cancelled'
|| needs.publish-docker.result == 'failure'
|| needs.publish-docker.result == 'cancelled') }}
runs-on: ubuntu-latest
# DOCKER_USERNAME / DOCKER_PASSWORD are env-scoped to `release`
# (deliberately not repo-level — see comment on the build job above).
# Without `environment: release` here, those secrets resolve to empty
# strings and the Docker Hub login at the bottom of this job exits 1,
# leaving the orphan tags + cosign artifacts the cleanup is meant to
# remove. The `release` env approval was already granted upstream in
# this run, so this does not add a new prompt.
environment: release
permissions:
contents: read
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@ab7a9404c0f3da075243ca237b5fac12c98deaa5 # v2.19.3
with:
egress-policy: audit
- name: Roll back partial release tags then delete prerelease + cosign artifacts
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
VERSION: ${{ needs.build.outputs.version }}
SHORT_SHA: ${{ needs.build.outputs.short_sha }}
DIGEST: ${{ needs.prerelease-docker.outputs.manifest_digest }}
run: |
set -euo pipefail
# Authenticate with Docker Hub API (same JWT flow as docker-publish.yml cleanup)
TOKEN=$(curl -sS -X POST -H "Content-Type: application/json" \
-d "{\"username\":\"${DOCKER_USERNAME}\",\"password\":\"${DOCKER_PASSWORD}\"}" \
https://hub.docker.com/v2/users/login/ | jq -r .token)
if [ -z "$TOKEN" ] || [ "$TOKEN" = "null" ]; then
echo "::error::Failed to authenticate with Docker Hub API — cannot clean up orphans"
exit 1
fi
REPO="${DOCKER_USERNAME}/local-deep-research"
MANIFEST_TAG="prerelease-v${VERSION}-${SHORT_SHA}"
MAJOR_MINOR=$(echo "$VERSION" | cut -d. -f1,2)
delete_tag() {
local TAG="$1"
local STATUS
STATUS=$(curl -sS -o /dev/null -w "%{http_code}" -X DELETE \
-H "Authorization: JWT ${TOKEN}" \
"https://hub.docker.com/v2/repositories/${REPO}/tags/${TAG}/" || echo "ERR")
case "$STATUS" in
200|204) echo "Deleted ${TAG}";;
404) echo "Skip ${TAG} (already absent — expected)";;
401|403)
echo "::error::Auth failure deleting ${TAG} — DOCKER_PASSWORD may be missing Delete scope on the Docker Hub PAT"
exit 1
;;
*) echo "::warning::Unexpected HTTP ${STATUS} for ${TAG}";;
esac
}
# =====================================================================
# STEP 1 — Partial retag rollback.
# =====================================================================
# If publish-docker failed mid-way through `docker buildx imagetools
# create -t :VERSION -t :MAJOR_MINOR -t :latest`, one or two of the
# three release tags may have landed. Those tags would point at the
# prerelease manifest digest, which we're about to delete cosign
# artifacts for. We MUST roll back any landed release tags BEFORE
# touching cosign — otherwise the rollback leaves them broken.
#
# If publish-docker was skipped (prerelease failed before reaching
# retag), the release tags can't exist, so this loop is a cheap no-op
# (three 404 responses).
# =====================================================================
echo "STEP 1 — Roll back any release tags that landed during a partial retag..."
for RELEASE_TAG in "${VERSION}" "${MAJOR_MINOR}" "latest"; do
# HEAD-check before DELETE so the "expected absent" case is silent.
# Docker Hub's tag endpoint returns 200 on HEAD if tag exists, 404 if not.
CHECK=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: JWT ${TOKEN}" \
"https://hub.docker.com/v2/repositories/${REPO}/tags/${RELEASE_TAG}/" || echo "ERR")
if [ "$CHECK" = "200" ]; then
echo "::warning::Release tag ${RELEASE_TAG} exists from a partial retag — rolling back"
delete_tag "${RELEASE_TAG}"
else
echo "Release tag ${RELEASE_TAG}: not present (HTTP ${CHECK}) — nothing to roll back"
fi
done
# =====================================================================
# STEP 2 — Prerelease tag cleanup.
# =====================================================================
# Always-safe targets: the prerelease manifest list + its per-arch
# children + the floating `:prerelease` tag. Cleanup is best-effort
# here (404s are normal — e.g., build failed before pushing arm64).
#
# The floating `:prerelease` tag (re-pointed by prerelease-docker.yml
# to the current run's manifest) is included so a rejected release
# does not leave `:prerelease` pointing at a manifest whose cosign
# artifacts step 4 below is about to delete — pulling `:prerelease`
# in that window would yield an image whose signature is gone, and
# the README cosign-verify recipe would fail. Returning 404 on
# `:prerelease` is unambiguously safer than serving an unverifiable
# image; the next successful prerelease-docker run re-creates it.
# =====================================================================
echo "STEP 2 — Delete prerelease tags..."
for TAG in "${MANIFEST_TAG}" "${MANIFEST_TAG}-amd64" "${MANIFEST_TAG}-arm64" "prerelease"; do
delete_tag "${TAG}"
done
# =====================================================================
# STEP 3 — Discover per-arch digests from the manifest list BEFORE
# we delete cosign artifacts.
# =====================================================================
# prerelease-docker.yml's per-arch SBOM step attests against each
# per-arch digest, producing artifacts at
# `sha256-<per-arch-digest>.{sig,att,sbom}` in addition to the
# manifest-list-digest artifacts. Need to enumerate from the manifest
# while it still exists (we just deleted the TAG, but the manifest
# body persists until Docker Hub GC, and we can inspect by digest).
# Use the captured DIGEST to inspect the manifest list directly.
# =====================================================================
PER_ARCH_TAGS=()
if [[ -n "${DIGEST:-}" && "$DIGEST" == sha256:* ]]; then
echo "STEP 3 — Discovering per-arch digests from manifest list ${DIGEST}..."
if docker buildx imagetools inspect "${REPO}@${DIGEST}" --raw > /tmp/manifest.json 2>/dev/null; then
while IFS= read -r PER_ARCH_DIGEST; do
if [[ -n "${PER_ARCH_DIGEST}" && "${PER_ARCH_DIGEST}" == sha256:* ]]; then
PER_ARCH_TAG_PREFIX="${PER_ARCH_DIGEST/:/-}"
PER_ARCH_TAGS+=(
"${PER_ARCH_TAG_PREFIX}.sig"
"${PER_ARCH_TAG_PREFIX}.att"
"${PER_ARCH_TAG_PREFIX}.sbom"
)
echo " queued cleanup for per-arch artifacts at ${PER_ARCH_DIGEST}"
fi
done < <(jq -r '.manifests[] | select(.platform.architecture != "unknown") | .digest' /tmp/manifest.json)
else
echo "::warning::Could not inspect manifest at ${DIGEST} — skipping per-arch attestation cleanup"
fi
else
echo "STEP 3 — No valid manifest digest captured (got '${DIGEST:-<empty>}'); skipping per-arch discovery"
fi
# =====================================================================
# STEP 4 — Cosign signature/attestation artifact cleanup.
# =====================================================================
# Safe to delete only because STEP 1 already rolled back any release
# tags that might be sharing these digests. The .sbom entries are
# legacy from `cosign attach sbom` (current code uses `cosign attest
# --type spdxjson` which writes to .att) — kept as belt-and-suspenders
# cleanup for any leftovers from older releases.
# =====================================================================
COSIGN_TAGS=()
if [[ -n "${DIGEST:-}" && "$DIGEST" == sha256:* ]]; then
DIGEST_TAG_PREFIX="${DIGEST/:/-}"
COSIGN_TAGS+=(
"${DIGEST_TAG_PREFIX}.sig"
"${DIGEST_TAG_PREFIX}.att"
"${DIGEST_TAG_PREFIX}.sbom"
)
fi
COSIGN_TAGS+=("${PER_ARCH_TAGS[@]}")
if [ "${#COSIGN_TAGS[@]}" -gt 0 ]; then
echo "STEP 4 — Delete cosign signature/attestation artifacts..."
for TAG in "${COSIGN_TAGS[@]}"; do
delete_tag "${TAG}"
done
else
echo "STEP 4 — No cosign artifact tags to clean up (no captured digest)"
fi
# NOTE: NO `continue-on-error: true` at job level — auth failures
# (401/403) are loud so missing token scopes can't silently let
# orphans accumulate forever.

View File

@@ -8,6 +8,14 @@ SHELL ["/bin/bash", "-o", "pipefail", "-c"]
ARG DEBIAN_FRONTEND=noninteractive
# `apt-get upgrade -y` is INTENTIONAL — we want every build to pull the
# latest patched Debian packages so security fixes flow into the image.
# This trades bit-for-bit reproducibility (two rebuilds of the same source
# can produce different layer digests across a Debian patch window) for
# always-fresh-on-CVE behavior. The build-once-promote pipeline mitigates
# the reproducibility loss: prerelease-docker.yml builds once per release
# and the resulting digest is what gets retagged to :1.6.9 / :1.6 / :latest,
# so the released image is bit-identical to the one tested.
# Install system dependencies for SQLCipher and Node.js for frontend build
# Using Acquire::Retries to handle transient Debian mirror errors during CI
RUN apt-get update -o Acquire::Retries=3 && apt-get upgrade -y -o Acquire::Retries=3 \
@@ -63,9 +71,11 @@ ENV PDM_CHECK_UPDATE=false
# This helps prevent httpcore.ReadTimeout errors during CI network congestion
ENV PDM_REQUEST_TIMEOUT=120
# Build argument to invalidate cache when dependencies change
ARG DEPS_HASH
# NOTE: `DEPS_HASH` was previously declared as a cache-invalidation arg but
# never referenced in a RUN/COPY, so it had no effect — Docker only honors
# ARG values for cache when they're actually used downstream. Cache
# invalidation on dependency changes happens naturally via `COPY pdm.lock`
# below, since the file's content hash changes when deps change.
WORKDIR /install
# Copy dependency files first (changes rarely)
@@ -120,6 +130,8 @@ ARG DEBIAN_FRONTEND=noninteractive
# Install additional runtime dependencies for testing tools
# Note: Node.js is already installed from builder-base
# Using Acquire::Retries to handle transient Debian mirror errors during CI
# `apt-get upgrade -y` is INTENTIONAL — see the rationale comment on the
# corresponding upgrade in the builder-base stage (top of file).
RUN apt-get update -o Acquire::Retries=3 && apt-get upgrade -y -o Acquire::Retries=3 \
&& apt-get install -y --no-install-recommends -o Acquire::Retries=3 \
xauth \
@@ -238,7 +250,9 @@ ARG DEBIAN_FRONTEND=noninteractive
# — Scorecard alert #7742 dismissed as won't-fix on the same basis.
RUN pip3 install --no-cache-dir pip==26.1
# Install runtime dependencies for SQLCipher and WeasyPrint
# Install runtime dependencies for SQLCipher and WeasyPrint.
# `apt-get upgrade -y` is INTENTIONAL — see rationale on the builder-base
# upgrade (top of file). Trade reproducibility for always-fresh CVE patches.
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
sqlcipher \
@@ -294,13 +308,30 @@ RUN HOME=/home/ldruser setpriv --reuid=ldruser --regid=ldruser --init-groups --
sqlcipher = get_sqlcipher_module(); \
print(f'✓ SQLCipher module loaded successfully: {sqlcipher}')"
# Create volume for persistent configuration
# Use /app for configuration to support non-root user
# Persistent state. Without VOLUME directives the user loses all research
# data + DBs on `docker rm`. Recommend bind-mounting these in production.
# - /app/.config/local_deep_research: legacy config path (kept for backcompat)
# - /data: where the entrypoint creates logs/, cache/, encrypted_databases/ —
# the actual user state, see scripts/ldr_entrypoint.sh.
#
# LDR_DATA_DIR pins the application to /data. Without this, the Python
# code falls back to platformdirs.user_data_dir() which resolves to
# /home/ldruser/.local/share/local-deep-research — NOT under any
# declared VOLUME, so a `docker run -v vol:/data ...` user (without
# also setting -e LDR_DATA_DIR=/data) would silently lose all data on
# `docker rm`. Documented run paths (docker-compose.yml, README docker
# run examples) already pass this env var explicitly; setting it here
# makes the VOLUME actually load-bearing for bare `docker run -v ...`
# invocations too.
ENV LDR_DATA_DIR=/data
VOLUME /app/.config/local_deep_research
VOLUME /data
# Create volume for Ollama start script
VOLUME /scripts/
# Copy the Ollama entrypoint script
# NOTE: /scripts/ is image content (ollama entrypoint baked in below), NOT
# user state. Previously declared as VOLUME, but a VOLUME on a directory
# that the image populates causes anonymous-volume creation on every
# `docker run` and silently shadows the script if a user bind-mounts it.
# Removed for correctness.
COPY --chown=ldruser:ldruser scripts/ollama_entrypoint.sh /scripts/ollama_entrypoint.sh
# Copy LDR entrypoint script to handle volume permissions

View File

@@ -158,11 +158,38 @@ Your data stays yours. Each user gets their own isolated SQLCipher database encr
**In-memory credentials**: Like all applications that use secrets at runtime — including [password managers](https://www.ise.io/casestudies/password-manager-hacking/), browsers, and API clients — credentials are held in plain text in process memory during active sessions. This is an [industry-wide accepted reality](https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html), not specific to LDR: if an attacker can read process memory, they can also read any in-process decryption key. We mitigate this with session-scoped credential lifetimes and core dump exclusion. Ideas for further improvements are always welcome via [GitHub Issues](https://github.com/LearningCircuit/local-deep-research/issues). See our [Security Policy](SECURITY.md) for details.
**Supply Chain Security**: Docker images are signed with [Cosign](https://github.com/sigstore/cosign), include SLSA provenance attestations, and attach SBOMs. Verify with:
**Supply Chain Security**: Docker images are signed with [Cosign](https://github.com/sigstore/cosign) using GitHub's keyless OIDC flow, include SLSA provenance attestations, and ship with attested SPDX SBOMs. Verify the image and its SBOM before running:
```bash
cosign verify localdeepresearch/local-deep-research:latest
# 1. Verify image signature
cosign verify \
--certificate-identity-regexp "^https://github\.com/LearningCircuit/local-deep-research/\.github/workflows/prerelease-docker\.yml@.*$" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
--certificate-github-workflow-repository "LearningCircuit/local-deep-research" \
localdeepresearch/local-deep-research:latest
# 2. Verify SBOM attestation (SPDX JSON) for YOUR platform
# SBOM attestations are stored per-architecture (amd64, arm64) on the
# per-arch image digest, not on the multi-arch manifest list. Resolve to
# your platform's digest first.
ARCH=$(uname -m | sed -e 's/^x86_64$/amd64/' -e 's/^aarch64$/arm64/')
PLATFORM_DIGEST=$(docker buildx imagetools inspect localdeepresearch/local-deep-research:latest --raw \
| jq -r --arg arch "$ARCH" '.manifests[] | select(.platform.architecture==$arch) | .digest')
if [ -z "$PLATFORM_DIGEST" ]; then
echo "No per-arch digest found for $ARCH — image may be single-arch or" \
"from a pre-build-once-promote release. Skip step 2 in that case."
exit 1
fi
cosign verify-attestation \
--type spdxjson \
--certificate-identity-regexp "^https://github\.com/LearningCircuit/local-deep-research/\.github/workflows/prerelease-docker\.yml@.*$" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
--certificate-github-workflow-repository "LearningCircuit/local-deep-research" \
"localdeepresearch/local-deep-research@${PLATFORM_DIGEST}"
```
The image-signature check confirms the image was built by the official `prerelease-docker.yml` workflow in `LearningCircuit/local-deep-research` — not by a forked repo or a leaked credential. The per-platform SBOM verification ensures you're inspecting the actual package set you're going to run, not the SBOM of a different architecture. Requires [cosign v2.0+](https://docs.sigstore.dev/cosign/installation/), [`jq`](https://jqlang.github.io/jq/), and `docker buildx` (bundled with Docker Desktop and Docker Engine ≥ 23.0; install the standalone plugin on older installs). Releases before the build-once-promote refactor were signed by `docker-publish.yml` and carried a single manifest-level SBOM rather than per-arch ones; for those, substitute `docker-publish.yml` for `prerelease-docker.yml` in the regex on both steps and skip the per-platform digest lookup (use the manifest list tag directly).
**Security Transparency**: Scanner suppressions are documented with justifications in [Security Alerts Assessment](.github/SECURITY_ALERTS.md), [Scorecard Compliance](.github/SECURITY_SCORECARD.md), [Container CVE Suppressions](.trivyignore), and [SAST Rule Rationale](bearer.yml). Some alerts (Dependabot, code scanning) can only be dismissed or are very difficult to suppress outside the [GitHub Security tab](https://docs.github.com/en/code-security/dependabot/dependabot-alerts/viewing-and-updating-dependabot-alerts), so the files above do not cover every dismissed finding.
[Detailed Architecture →](docs/architecture.md) | [Security Policy →](SECURITY.md) | [Security Review Process →](docs/processes/security-review-process/)

View File

@@ -129,10 +129,11 @@ pre-commit install-hooks
| Workflow | Trigger | Purpose |
|----------|---------|---------|
| `docker-publish.yml` | Release, push | Build and publish Docker images |
| `prerelease-docker.yml` | `workflow_call` from release.yml | Canonical multi-arch Docker build, cosign sign, SBOM/SLSA attestations. Jobs declare `environment: release` so the first `release` env approval gates the build (env-scoped Docker Hub secrets). |
| `docker-publish.yml` | `workflow_call` from release.yml | Retag prerelease manifest as `:1.6.9` / `:1.6` / `:latest` (gated by `release` env). No rebuild — registry-side metadata only. Inlined as a reusable workflow so its result is visible to downstream jobs in release.yml (lets create-release block on Docker success, lets cleanup-on-rejection safely scope cosign artifact deletion). |
| `docker-multiarch-test.yml` | PR, push | Multi-architecture build test |
| `publish.yml` | Release | Publish to PyPI |
| `release.yml` | Manual | Create releases |
| `publish.yml` | `repository_dispatch` from release.yml | Publish to PyPI. Stays on `repository_dispatch` (not `workflow_call`) because PyPI Trusted Publishing rejects OIDC claims from reusable workflows — `pypa/gh-action-pypi-publish#166`, `pypi/warehouse#11096`. |
| `release.yml` | Push to `main`, tag `v*.*.*`, manual | Orchestrate release: gates → build → provenance → prerelease-docker → publish-docker → trigger-pypi → monitor-pypi → create-release (last) |
### Code Quality

View File

@@ -43,10 +43,82 @@ and short-circuits everything downstream).
`version-check` job sets `should_release=false` and every downstream
job (security gate, CI gate, build, publish) is skipped.
### 2. **Automatic Publishing** (with approval)
- **GitHub Release** → triggers:
- **PyPI publishing** (requires `release` environment approval)
- **Docker publishing** (requires `release` environment approval)
### 2. **Approval and Publishing**
The release pipeline uses the `release` GitHub environment to gate the
publish steps. `DOCKER_USERNAME` / `DOCKER_PASSWORD` are scoped to that
environment, so any job that pushes to Docker Hub must declare
`environment: release` and therefore goes through the approval gate.
When you merge to `main` (or push a tag), the pipeline runs in this
order:
1. Security gates + CI gates run automatically.
2. `build` job runs (version pin, SBOM, Sigstore bundles), then
`provenance` job generates SLSA provenance for those artifacts.
3. **One `release` env approval prompt** in `release.yml`. Approving
unlocks all release-env jobs in the same run, which then execute
sequentially:
1. `prerelease-docker` — canonical multi-arch Docker build, cosign
sign, SBOM/SLSA attestations, push as `prerelease-v<ver>-<sha>`
and re-point the floating `:prerelease` tag.
2. `publish-docker` — retags the prerelease manifest as `:1.6.9`,
`:1.6`, `:latest` (no rebuild, digest-preserving), then re-verifies
digest + cosign + Trivy on the promoted tag.
3. `trigger-pypi` — dispatches `publish.yml` via `repository_dispatch`
(PyPI Trusted Publishing requires the publish step to run in a
top-level workflow, so this can't be a reusable workflow_call).
4. `monitor-pypi` — polls `publish.yml` for completion. The inner
polling loop times out at 40 minutes (after which the job fails);
the surrounding GH Actions `timeout-minutes` is 90 to leave a
safety margin around the poll budget.
5. `create-release` — publishes the GitHub Release with
SBOM/sig/provenance assets. Runs **last**, gated on all of the
above succeeding, so the public Release never points at missing
Docker tags or a missing PyPI version.
If any of `prerelease-docker`, `publish-docker`, or `monitor-pypi` fails,
`create-release` is skipped and no public GitHub Release is created. The
`cleanup-on-rejection` job then handles failure-mode cleanup:
- If `publish-docker` failed mid-retag (e.g., `:1.6.9` landed but
`:latest` failed), it rolls back any landed release tags BEFORE
deleting prerelease tags and cosign artifacts (deleting cosign
artifacts while release tags share the manifest digest would invalidate
release-tag signatures).
- If `publish-docker` succeeded but a later step (PyPI or
create-release) failed, `cleanup-on-rejection` does NOT fire — Docker
release tags exist and their cosign artifacts must stay. See
"Recovery from PyPI failure" below.
### Recovery from PyPI failure (atomicity hole)
The one orphan state the pipeline cannot fully clean up: `publish-docker`
succeeded, PyPI failed. At this point Docker `:1.6.9` / `:1.6` /
`:latest` exist and are signed; PyPI has nothing; no GitHub Release.
`monitor-pypi` opens a tracking issue labeled `ci-cd`. To recover:
1. Inspect the `publish.yml` workflow run, fix the underlying cause.
2. Manually re-dispatch PyPI publish:
```bash
gh api repos/LearningCircuit/local-deep-research/dispatches \
-f event_type=publish-pypi \
-F 'client_payload[tag]=v<X.Y.Z>'
```
3. Once PyPI publishes successfully, manually create the GitHub Release
from the existing tag (the SBOM/sig/provenance artifacts are still
uploaded as workflow artifacts on the failed `release.yml` run; you
can download them and attach manually, or re-run `create-release`
manually if the run is still re-runnable in the Actions UI).
> Earlier iterations of this refactor described a single approval gate
> with a pre-approval testing window. That design required
> `DOCKER_USERNAME` / `DOCKER_PASSWORD` to be repo-level secrets so the
> canonical build could run without env approval. They are env-scoped to
> `release` instead, so the gate sits in front of the build. The
> atomicity refactor preserves this single-approval model — one click
> unlocks the whole chain, and create-release runs last so the
> "published Release with broken artifacts" failure mode is closed.
## 👥 Who Can Release
@@ -95,11 +167,14 @@ with auto-notes plus AI summary only).
### Option A: Manual Trigger
- Go to Actions → "Create Release" → "Run workflow"
- Specify version and prerelease flag
- No inputs are required: the workflow reads the version from
`src/local_deep_research/__version__.py` at HEAD. To release an
older or different version, use Option B (push a version tag).
### Option B: Version Tags
- `git tag v0.4.3 && git push origin v0.4.3`
- Automatically creates release
- Automatically creates release; the workflow uses the tag's commit
SHA (not `main` HEAD), so this is the correct path for backporting.
## 🛡️ Branch Protection
@@ -117,10 +192,27 @@ Follow [Semantic Versioning](https://semver.org/):
## 🚨 Emergency Procedures
If automation fails:
1. **Manual GitHub release** still triggers PyPI/Docker
2. **Contact code owners** for assistance
3. **Check workflow logs** in GitHub Actions
If automation fails, do NOT create a GitHub release through the UI as
the first recovery step — under the atomicity refactor, a manually
created GitHub release does NOT trigger `publish.yml` (it listens only
on `repository_dispatch`) and does NOT trigger `docker-publish.yml`
(workflow_call only). The downstream `release:` listeners that DO fire
(`backwards-compatibility.yml`, `sbom.yml`) are observability-only.
Recovery, in order of preference:
1. **Check workflow logs** in GitHub Actions to identify which job
failed, and use the targeted recovery for that failure mode:
- PyPI failure with Docker already promoted: see
[Recovery from PyPI failure](#recovery-from-pypi-failure-atomicity-hole) above.
- Any other failure: re-run the failed job via the Actions UI if
it's still re-runnable (typically within 30 days).
2. **Re-trigger the full pipeline** via `workflow_dispatch` if
re-running individual jobs isn't possible. Safe for digest-keyed
cosign verification — old digests remain valid because their cosign
artifacts persist; the new run produces a new digest with its own
signatures.
3. **Contact code owners** if recovery requires manual Docker Hub or
PyPI intervention.
## 📝 Release-notes flow (towncrier news fragments)