ci(release): fixes from multi-round subagent review

Round 1 surfaced 14 candidate findings; Round 2 verified 7 as real bugs and
refuted 4 as false positives. This commit applies the verified fixes.

CONFIRMED bugs fixed:

1. **Approval gate was per-job, not workflow-wide.** The previous
   `environment: prerelease` on `build-amd64` only let `build-arm64` and
   `security-scan` run pre-approval (GitHub environments are
   job-scoped per docs + community/discussions/174381). Replaced with a
   sentinel `approval-gate` job that all three build jobs `needs:`. Single
   approval click still gates everything, but now actually blocks all
   parallel jobs.

2. **`cleanup-on-rejection` if-condition missed the prerelease-rejection
   path.** When prerelease-docker.result was `failure`, both create-release
   and trigger-workflows became `skipped` (their `if:` requires success),
   and the cleanup `if:` only fired on `failure`/`cancelled` of dependents.
   Added explicit `prerelease-docker.result == 'failure'` clause so the
   most common rejection path actually triggers cleanup.

3. **Trivy re-scan ran AFTER retag.** A failing scan would leave release
   tags `:1.6.9`, `:1.6`, `:latest` publicly published with no rollback.
   Reordered: scan source digest BEFORE retag. Content is bit-identical
   (same digest), so scanning the prerelease tag tests what would be
   promoted — but failure now leaves no public broken tags. Also moved
   cosign verify before retag for the same reason.

4. **Trivy only scanned linux/amd64 by default** against a manifest list
   digest (per Trivy docs + aquasecurity/trivy#7847). Replaced single scan
   with two explicit per-platform invocations
   (`--platform linux/amd64`, then `linux/arm64`) so arm64 layers are also
   gated by the freshness check.

5. **Trivy DB freshness wasn't guaranteed.** apt-installed Trivy may use a
   stale embedded DB. Added explicit `trivy image --download-db-only`
   before the scans so the CVE-DB freshness window the re-scan exists for
   is actually exercised.

6. **`cosign attest` re-runs accumulated attestation layers** (verified via
   cosign 2.x `mutate.go` `dedupeAndReplace`). Added `--replace` to both
   attest calls (SLSA provenance + SBOM). Sigstore spec allows multi-sig
   so `cosign sign` is left as-is.

7. **SLSA provenance values inherited from old code were misleading.**
   - `builder.id`: changed from `https://github.com/actions/runner` (the
     agent binary) to the workflow ref the build is actually defined in
     (per SLSA v0.2 spec — builder.id should be a verifiable trust root).
   - `completeness.{parameters,environment,materials}`: flipped from
     `true` to `false`. The predicate captures no workflow_call inputs,
     no environment, and the build does network I/O — claiming
     completeness was a public signed false statement.
   - `buildInvocationId`: now includes `${run_id}-${run_attempt}` so
     re-runs are distinguishable.

REFUTED (kept as-is, with confidence):

- `imagetools create` does NOT change the digest in this case. Buildx's
  Combine() in util/imagetools/create.go has an explicit short-circuit
  for single-source manifest-list inputs that returns the bytes
  byte-for-byte (no annotations + same registry required, both true here).
- Concurrent rejection digest collision is not a real concern — Docker
  builds in this pipeline are not bit-deterministic (apt, network, file
  timestamps, default provenance attestations all vary).
- The `prerelease-v1.6.9-*` cleanup pattern does NOT collide with
  `prerelease-v1.6.91-*` (trailing dash in the prefix disambiguates).
- Reusable-workflow approval prompts appear inline on the caller run
  page for single-level calls — not a UX regression.
This commit is contained in:
LearningCircuit
2026-05-10 14:14:30 +02:00
parent 19170ea625
commit 68606b2990
3 changed files with 134 additions and 68 deletions

View File

@@ -98,6 +98,74 @@ jobs:
fi
echo "Source digest verified: $ACTUAL"
# Pre-promote gating: scan + verify cosign BEFORE retag so a failure
# leaves no public release tags pointing at a broken image. The
# content of the prerelease manifest is bit-identical to what the
# release tags would point at (same digest), so scanning the source
# tag tests the same content.
- name: Install Trivy
run: |
set -euo pipefail
curl -fsSL https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install -y trivy
- name: Refresh Trivy CVE database
# apt-installed trivy may carry a slightly stale cache; force a
# fresh DB pull so the freshness window between prerelease build
# and promote is actually exercised.
run: trivy image --download-db-only
- name: Re-scan source digest with Trivy (linux/amd64)
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
set -euo pipefail
# Trivy defaults to amd64 only when given a manifest list digest
# (per Trivy docs + aquasecurity/trivy#7847). Scan each platform
# explicitly so arm64 layers are also gated.
trivy image \
--platform linux/amd64 \
--severity CRITICAL,HIGH \
--ignore-unfixed \
--exit-code 1 \
--trivyignores .trivyignore \
"${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
- name: Re-scan source digest with Trivy (linux/arm64)
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
set -euo pipefail
trivy image \
--platform linux/arm64 \
--severity CRITICAL,HIGH \
--ignore-unfixed \
--exit-code 1 \
--trivyignores .trivyignore \
"${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
- name: Verify cosign signature on source digest
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
set -euo pipefail
# Verify by digest BEFORE promoting — if signature is bad, no
# release tags get created. Cert identity matches the called
# workflow's path (Fulcio SAN = job_workflow_ref = callee).
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
echo "Verifying signature for: $IMAGE_REF"
cosign verify \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
--certificate-identity-regexp "^https://github.com/${{ github.repository }}/\.github/workflows/prerelease-docker\.yml@refs/(heads|tags)/" \
--certificate-github-workflow-repository "${{ github.repository }}" \
"$IMAGE_REF"
echo "Signature verified on source digest ✓"
- name: Promote (retag) prerelease manifest to release tags
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
@@ -108,7 +176,11 @@ jobs:
set -euo pipefail
SOURCE="${DOCKER_USERNAME}/local-deep-research:${SOURCE_TAG}"
# Single imagetools create with multiple -t — registry-side
# metadata-only operation, takes seconds, preserves digest.
# metadata-only operation, takes seconds, preserves digest. Per
# buildx Combine() in util/imagetools/create.go, when a single
# source that is already a manifest list is passed with NO
# --annotation flags (and same registry), the source manifest
# bytes are returned byte-for-byte. Digest is preserved.
docker buildx imagetools create \
-t "${DOCKER_USERNAME}/local-deep-research:${VERSION}" \
-t "${DOCKER_USERNAME}/local-deep-research:${MAJOR_MINOR}" \
@@ -124,9 +196,10 @@ jobs:
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
set -euo pipefail
# Defends against `imagetools create` re-encoding the manifest.
# If digests diverge, signatures and attestations (keyed by the
# original digest) won't be discoverable from the new tags.
# Defense-in-depth against any future buildx behavior change
# that could re-encode the manifest. If digests diverge,
# signatures (keyed by digest) won't be discoverable from the
# new tags.
for TAG in "${VERSION}" "${MAJOR_MINOR}" "latest"; do
REF="${DOCKER_USERNAME}/local-deep-research:${TAG}"
ACTUAL=$(docker buildx imagetools inspect "$REF" --format '{{json .Manifest.Digest}}' | tr -d '"')
@@ -139,48 +212,6 @@ jobs:
echo "${TAG} -> ${ACTUAL} ✓"
done
- name: Install Trivy
run: |
set -euo pipefail
curl -fsSL https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install -y trivy
- name: Re-scan release digest with Trivy
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
run: |
set -euo pipefail
# Catches CVE-database updates that landed between prerelease build
# and release promote. Same severity gate as the prerelease scan.
trivy image \
--severity CRITICAL,HIGH \
--ignore-unfixed \
--exit-code 1 \
--trivyignores .trivyignore \
"${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
- name: Verify cosign signature on promoted tag
env:
DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
VERSION: ${{ steps.version.outputs.version }}
run: |
set -euo pipefail
# The cert was issued to the prerelease-docker.yml workflow when
# signing happened there, so the identity regex must match that
# workflow's path. Fulcio's SAN is built from job_workflow_ref,
# which for reusable workflows is the CALLEE.
IMAGE_REF="${DOCKER_USERNAME}/local-deep-research:${VERSION}"
echo "Verifying signature for: $IMAGE_REF"
cosign verify \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
--certificate-identity-regexp "^https://github.com/${{ github.repository }}/\.github/workflows/prerelease-docker\.yml@refs/(heads|tags)/" \
--certificate-github-workflow-repository "${{ github.repository }}" \
"$IMAGE_REF"
echo "Signature transitivity verified ✓"
- name: Clean up prerelease tags
continue-on-error: true
env:

View File

@@ -31,14 +31,24 @@ on:
permissions: {} # Minimal top-level for OSSF Scorecard Token-Permissions
jobs:
# Single approval gate that blocks ALL parallel build jobs. Putting
# `environment: prerelease` only on `build-amd64` would let `build-arm64`
# and `security-scan` run pre-approval (environments are job-scoped per
# GitHub Actions docs), defeating the gate's purpose. The sentinel job
# below is the canonical pattern from community/discussions/174381.
approval-gate:
name: Wait for prerelease approval
runs-on: ubuntu-latest
environment: prerelease
permissions: {}
steps:
- name: Approved
run: echo "Prerelease build approved by maintainer."
build-amd64:
name: Build AMD64 Prerelease Image
needs: [approval-gate]
runs-on: ubuntu-latest
# `prerelease` environment gates the canonical build — maintainer must
# approve before any tags are pushed. Rejecting marks this job
# `failure`, which cascades to create-manifest (skipped) and to release.yml's
# downstream `needs:` chain (create-release/trigger-workflows skip).
environment: prerelease
permissions:
contents: read
@@ -78,6 +88,7 @@ jobs:
build-arm64:
name: Build ARM64 Prerelease Image
needs: [approval-gate]
runs-on: ubuntu-24.04-arm
permissions:
contents: read
@@ -118,6 +129,7 @@ jobs:
security-scan:
name: Security Scan
needs: [approval-gate]
runs-on: ubuntu-latest
permissions:
contents: read
@@ -289,11 +301,22 @@ jobs:
# slsa-github-generator, reusable workflows are explicitly NOT
# entryPoints. github.run_id / github.repository / github.sha all
# resolve to the caller's run context inside a reusable workflow.
#
# builder.id points at the reusable workflow that actually defines
# the build steps — what a verifier's policy can pin. Per SLSA v0.2,
# builder.id should identify the trusted entity that performed the
# build, not the agent binary.
#
# completeness.* are FALSE because we don't capture invocation
# parameters (workflow_call inputs) or environment (runner image,
# buildx version, base-image digest) and the build does network
# I/O for apt/pip/npm. Honest emptiness beats false claims of
# completeness.
cat > provenance.json <<EOF
{
"buildType": "https://github.com/${{ github.repository }}/docker-build@v1",
"builder": {
"id": "https://github.com/actions/runner"
"id": "https://github.com/${{ github.repository }}/.github/workflows/prerelease-docker.yml@${{ github.ref }}"
},
"invocation": {
"configSource": {
@@ -305,12 +328,12 @@ jobs:
}
},
"metadata": {
"buildInvocationId": "${{ github.run_id }}",
"buildInvocationId": "${{ github.run_id }}-${{ github.run_attempt }}",
"buildStartedOn": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"completeness": {
"parameters": true,
"environment": true,
"materials": true
"parameters": false,
"environment": false,
"materials": false
},
"reproducible": false
},
@@ -325,7 +348,10 @@ jobs:
}
EOF
cosign attest --yes --predicate provenance.json --type slsaprovenance "$IMAGE_REF"
# --replace ensures retries (e.g., "Re-run failed jobs" after a
# Rekor flake) don't accumulate duplicate attestation layers at
# the same digest.
cosign attest --yes --replace --predicate provenance.json --type slsaprovenance "$IMAGE_REF"
- name: Verify image signature
env:
@@ -370,7 +396,8 @@ jobs:
# Attest the SBOM (NOT the deprecated `cosign attach sbom`).
# `cosign attest --type spdxjson` writes a signed in-toto envelope
# to sha256-<digest>.att, also discoverable by digest.
cosign attest --yes --predicate sbom.spdx.json --type spdxjson "$IMAGE_REF"
# --replace ensures retries don't accumulate duplicate attestations.
cosign attest --yes --replace --predicate sbom.spdx.json --type spdxjson "$IMAGE_REF"
- name: Upload SBOM artifact
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1

View File

@@ -873,21 +873,29 @@ jobs:
# CLEANUP ON REJECTION
# ============================================================================
# In the build-once-promote model, prerelease-docker signs the manifest
# BEFORE release approval. If the maintainer rejects the `release` env (or
# create-release/trigger-workflows fails for any other reason), the
# prerelease tag and its cosign artifacts (sha256-<digest>.sig / .att) are
# left orphaned on Docker Hub forever — the existing cleanup loop in
# docker-publish.yml only runs on the success path.
# BEFORE release approval. If the prerelease env was rejected, OR the
# maintainer rejects the `release` env (or create-release/trigger-workflows
# fails for any other reason), the prerelease tag and its cosign artifacts
# (sha256-<digest>.sig / .att) are left orphaned on Docker Hub forever —
# the existing cleanup loop in docker-publish.yml only runs on the success
# path.
#
# This job runs only when prerelease completed (success OR failure — the
# latter catches "signing failed mid-way" leaving partial artifacts) AND
# the release path did NOT succeed. Allowlist `failure` and `cancelled`
# explicitly to avoid firing on `skipped` (e.g., release_exists guard).
# The `if:` fires in three rejection paths:
# 1. prerelease-docker itself failed (env rejection or build/sign failure).
# Dependents become `skipped`, so we must check this explicitly —
# otherwise the most common rejection path doesn't trigger cleanup.
# 2. create-release failed/cancelled (after a successful prerelease).
# 3. trigger-workflows failed/cancelled (after a successful prerelease
# and successful create-release).
#
# The `release_exists == 'false'` guard prevents firing on duplicate-run
# short-circuit paths, where preserving existing release artifacts is
# correct.
# ============================================================================
cleanup-on-rejection:
name: Clean up orphan prerelease tags and signatures
needs: [build, prerelease-docker, create-release, trigger-workflows]
if: ${{ always() && needs.prerelease-docker.result != 'skipped' && (needs.create-release.result == 'failure' || needs.create-release.result == 'cancelled' || needs.trigger-workflows.result == 'failure' || needs.trigger-workflows.result == 'cancelled') && needs.build.outputs.release_exists == 'false' }}
if: ${{ always() && needs.prerelease-docker.result != 'skipped' && needs.build.outputs.release_exists == 'false' && (needs.prerelease-docker.result == 'failure' || needs.create-release.result == 'failure' || needs.create-release.result == 'cancelled' || needs.trigger-workflows.result == 'failure' || needs.trigger-workflows.result == 'cancelled') }}
runs-on: ubuntu-latest
permissions:
contents: read