ci(release): fixes from multi-round subagent review

Round 1 surfaced 14 candidate findings; Round 2 verified 7 as real bugs and refuted 4 as false positives. This commit applies the verified fixes. CONFIRMED bugs fixed: 1. **Approval gate was per-job, not workflow-wide.** The previous `environment: prerelease` on `build-amd64` only let `build-arm64` and `security-scan` run pre-approval (GitHub environments are job-scoped per docs + community/discussions/174381). Replaced with a sentinel `approval-gate` job that all three build jobs `needs:`. Single approval click still gates everything, but now actually blocks all parallel jobs. 2. **`cleanup-on-rejection` if-condition missed the prerelease-rejection path.** When prerelease-docker.result was `failure`, both create-release and trigger-workflows became `skipped` (their `if:` requires success), and the cleanup `if:` only fired on `failure`/`cancelled` of dependents. Added explicit `prerelease-docker.result == 'failure'` clause so the most common rejection path actually triggers cleanup. 3. **Trivy re-scan ran AFTER retag.** A failing scan would leave release tags `:1.6.9`, `:1.6`, `:latest` publicly published with no rollback. Reordered: scan source digest BEFORE retag. Content is bit-identical (same digest), so scanning the prerelease tag tests what would be promoted — but failure now leaves no public broken tags. Also moved cosign verify before retag for the same reason. 4. **Trivy only scanned linux/amd64 by default** against a manifest list digest (per Trivy docs + aquasecurity/trivy#7847). Replaced single scan with two explicit per-platform invocations (`--platform linux/amd64`, then `linux/arm64`) so arm64 layers are also gated by the freshness check. 5. **Trivy DB freshness wasn't guaranteed.** apt-installed Trivy may use a stale embedded DB. Added explicit `trivy image --download-db-only` before the scans so the CVE-DB freshness window the re-scan exists for is actually exercised. 6. **`cosign attest` re-runs accumulated attestation layers** (verified via cosign 2.x `mutate.go` `dedupeAndReplace`). Added `--replace` to both attest calls (SLSA provenance + SBOM). Sigstore spec allows multi-sig so `cosign sign` is left as-is. 7. **SLSA provenance values inherited from old code were misleading.** - `builder.id`: changed from `https://github.com/actions/runner` (the agent binary) to the workflow ref the build is actually defined in (per SLSA v0.2 spec — builder.id should be a verifiable trust root). - `completeness.{parameters,environment,materials}`: flipped from `true` to `false`. The predicate captures no workflow_call inputs, no environment, and the build does network I/O — claiming completeness was a public signed false statement. - `buildInvocationId`: now includes `${run_id}-${run_attempt}` so re-runs are distinguishable. REFUTED (kept as-is, with confidence): - `imagetools create` does NOT change the digest in this case. Buildx's Combine() in util/imagetools/create.go has an explicit short-circuit for single-source manifest-list inputs that returns the bytes byte-for-byte (no annotations + same registry required, both true here). - Concurrent rejection digest collision is not a real concern — Docker builds in this pipeline are not bit-deterministic (apt, network, file timestamps, default provenance attestations all vary). - The `prerelease-v1.6.9-*` cleanup pattern does NOT collide with `prerelease-v1.6.91-*` (trailing dash in the prefix disambiguates). - Reusable-workflow approval prompts appear inline on the caller run page for single-level calls — not a UX regression.
2026-06-16 12:02:34 +03:00 · 2026-05-10 14:14:30 +02:00
parent 19170ea625
commit 68606b2990
3 changed files with 134 additions and 68 deletions
--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@@ -98,6 +98,74 @@ jobs:
          fi
          echo "Source digest verified: $ACTUAL"

+      # Pre-promote gating: scan + verify cosign BEFORE retag so a failure
+      # leaves no public release tags pointing at a broken image. The
+      # content of the prerelease manifest is bit-identical to what the
+      # release tags would point at (same digest), so scanning the source
+      # tag tests the same content.
+      - name: Install Trivy
+        run: |
+          set -euo pipefail
+          curl -fsSL https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
+          echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee /etc/apt/sources.list.d/trivy.list
+          sudo apt-get update
+          sudo apt-get install -y trivy
+
+      - name: Refresh Trivy CVE database
+        # apt-installed trivy may carry a slightly stale cache; force a
+        # fresh DB pull so the freshness window between prerelease build
+        # and promote is actually exercised.
+        run: trivy image --download-db-only
+
+      - name: Re-scan source digest with Trivy (linux/amd64)
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
+        run: |
+          set -euo pipefail
+          # Trivy defaults to amd64 only when given a manifest list digest
+          # (per Trivy docs + aquasecurity/trivy#7847). Scan each platform
+          # explicitly so arm64 layers are also gated.
+          trivy image \
+            --platform linux/amd64 \
+            --severity CRITICAL,HIGH \
+            --ignore-unfixed \
+            --exit-code 1 \
+            --trivyignores .trivyignore \
+            "${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
+
+      - name: Re-scan source digest with Trivy (linux/arm64)
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
+        run: |
+          set -euo pipefail
+          trivy image \
+            --platform linux/arm64 \
+            --severity CRITICAL,HIGH \
+            --ignore-unfixed \
+            --exit-code 1 \
+            --trivyignores .trivyignore \
+            "${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
+
+      - name: Verify cosign signature on source digest
+        env:
+          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
+          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
+        run: |
+          set -euo pipefail
+          # Verify by digest BEFORE promoting — if signature is bad, no
+          # release tags get created. Cert identity matches the called
+          # workflow's path (Fulcio SAN = job_workflow_ref = callee).
+          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
+          echo "Verifying signature for: $IMAGE_REF"
+          cosign verify \
+            --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
+            --certificate-identity-regexp "^https://github.com/${{ github.repository }}/\.github/workflows/prerelease-docker\.yml@refs/(heads|tags)/" \
+            --certificate-github-workflow-repository "${{ github.repository }}" \
+            "$IMAGE_REF"
+          echo "Signature verified on source digest ✓"
+
      - name: Promote (retag) prerelease manifest to release tags
        env:
          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
@@ -108,7 +176,11 @@ jobs:
          set -euo pipefail
          SOURCE="${DOCKER_USERNAME}/local-deep-research:${SOURCE_TAG}"
          # Single imagetools create with multiple -t — registry-side
-          # metadata-only operation, takes seconds, preserves digest.
+          # metadata-only operation, takes seconds, preserves digest. Per
+          # buildx Combine() in util/imagetools/create.go, when a single
+          # source that is already a manifest list is passed with NO
+          # --annotation flags (and same registry), the source manifest
+          # bytes are returned byte-for-byte. Digest is preserved.
          docker buildx imagetools create \
            -t "${DOCKER_USERNAME}/local-deep-research:${VERSION}" \
            -t "${DOCKER_USERNAME}/local-deep-research:${MAJOR_MINOR}" \
@@ -124,9 +196,10 @@ jobs:
          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
        run: |
          set -euo pipefail
-          # Defends against `imagetools create` re-encoding the manifest.
-          # If digests diverge, signatures and attestations (keyed by the
-          # original digest) won't be discoverable from the new tags.
+          # Defense-in-depth against any future buildx behavior change
+          # that could re-encode the manifest. If digests diverge,
+          # signatures (keyed by digest) won't be discoverable from the
+          # new tags.
          for TAG in "${VERSION}" "${MAJOR_MINOR}" "latest"; do
            REF="${DOCKER_USERNAME}/local-deep-research:${TAG}"
            ACTUAL=$(docker buildx imagetools inspect "$REF" --format '{{json .Manifest.Digest}}' | tr -d '"')
@@ -139,48 +212,6 @@ jobs:
            echo "${TAG} -> ${ACTUAL} ✓"
          done

-      - name: Install Trivy
-        run: |
-          set -euo pipefail
-          curl -fsSL https://aquasecurity.github.io/trivy-repo/deb/public.key | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
-          echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee /etc/apt/sources.list.d/trivy.list
-          sudo apt-get update
-          sudo apt-get install -y trivy
-
-      - name: Re-scan release digest with Trivy
-        env:
-          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
-          EXPECTED_DIGEST: ${{ steps.version.outputs.expected_digest }}
-        run: |
-          set -euo pipefail
-          # Catches CVE-database updates that landed between prerelease build
-          # and release promote. Same severity gate as the prerelease scan.
-          trivy image \
-            --severity CRITICAL,HIGH \
-            --ignore-unfixed \
-            --exit-code 1 \
-            --trivyignores .trivyignore \
-            "${DOCKER_USERNAME}/local-deep-research@${EXPECTED_DIGEST}"
-
-      - name: Verify cosign signature on promoted tag
-        env:
-          DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
-          VERSION: ${{ steps.version.outputs.version }}
-        run: |
-          set -euo pipefail
-          # The cert was issued to the prerelease-docker.yml workflow when
-          # signing happened there, so the identity regex must match that
-          # workflow's path. Fulcio's SAN is built from job_workflow_ref,
-          # which for reusable workflows is the CALLEE.
-          IMAGE_REF="${DOCKER_USERNAME}/local-deep-research:${VERSION}"
-          echo "Verifying signature for: $IMAGE_REF"
-          cosign verify \
-            --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
-            --certificate-identity-regexp "^https://github.com/${{ github.repository }}/\.github/workflows/prerelease-docker\.yml@refs/(heads|tags)/" \
-            --certificate-github-workflow-repository "${{ github.repository }}" \
-            "$IMAGE_REF"
-          echo "Signature transitivity verified ✓"
-
      - name: Clean up prerelease tags
        continue-on-error: true
        env:
--- a/.github/workflows/prerelease-docker.yml
+++ b/.github/workflows/prerelease-docker.yml
@@ -31,14 +31,24 @@ on:
 permissions: {}  # Minimal top-level for OSSF Scorecard Token-Permissions

 jobs:
+  # Single approval gate that blocks ALL parallel build jobs. Putting
+  # `environment: prerelease` only on `build-amd64` would let `build-arm64`
+  # and `security-scan` run pre-approval (environments are job-scoped per
+  # GitHub Actions docs), defeating the gate's purpose. The sentinel job
+  # below is the canonical pattern from community/discussions/174381.
+  approval-gate:
+    name: Wait for prerelease approval
+    runs-on: ubuntu-latest
+    environment: prerelease
+    permissions: {}
+    steps:
+      - name: Approved
+        run: echo "Prerelease build approved by maintainer."
+
  build-amd64:
    name: Build AMD64 Prerelease Image
+    needs: [approval-gate]
    runs-on: ubuntu-latest
-    # `prerelease` environment gates the canonical build — maintainer must
-    # approve before any tags are pushed.  Rejecting marks this job
-    # `failure`, which cascades to create-manifest (skipped) and to release.yml's
-    # downstream `needs:` chain (create-release/trigger-workflows skip).
-    environment: prerelease
    permissions:
      contents: read

@@ -78,6 +88,7 @@ jobs:

  build-arm64:
    name: Build ARM64 Prerelease Image
+    needs: [approval-gate]
    runs-on: ubuntu-24.04-arm
    permissions:
      contents: read
@@ -118,6 +129,7 @@ jobs:

  security-scan:
    name: Security Scan
+    needs: [approval-gate]
    runs-on: ubuntu-latest
    permissions:
      contents: read
@@ -289,11 +301,22 @@ jobs:
          # slsa-github-generator, reusable workflows are explicitly NOT
          # entryPoints. github.run_id / github.repository / github.sha all
          # resolve to the caller's run context inside a reusable workflow.
+          #
+          # builder.id points at the reusable workflow that actually defines
+          # the build steps — what a verifier's policy can pin. Per SLSA v0.2,
+          # builder.id should identify the trusted entity that performed the
+          # build, not the agent binary.
+          #
+          # completeness.* are FALSE because we don't capture invocation
+          # parameters (workflow_call inputs) or environment (runner image,
+          # buildx version, base-image digest) and the build does network
+          # I/O for apt/pip/npm. Honest emptiness beats false claims of
+          # completeness.
          cat > provenance.json <<EOF
          {
            "buildType": "https://github.com/${{ github.repository }}/docker-build@v1",
            "builder": {
-              "id": "https://github.com/actions/runner"
+              "id": "https://github.com/${{ github.repository }}/.github/workflows/prerelease-docker.yml@${{ github.ref }}"
            },
            "invocation": {
              "configSource": {
@@ -305,12 +328,12 @@ jobs:
              }
            },
            "metadata": {
-              "buildInvocationId": "${{ github.run_id }}",
+              "buildInvocationId": "${{ github.run_id }}-${{ github.run_attempt }}",
              "buildStartedOn": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
              "completeness": {
-                "parameters": true,
-                "environment": true,
-                "materials": true
+                "parameters": false,
+                "environment": false,
+                "materials": false
              },
              "reproducible": false
            },
@@ -325,7 +348,10 @@ jobs:
          }
          EOF

-          cosign attest --yes --predicate provenance.json --type slsaprovenance "$IMAGE_REF"
+          # --replace ensures retries (e.g., "Re-run failed jobs" after a
+          # Rekor flake) don't accumulate duplicate attestation layers at
+          # the same digest.
+          cosign attest --yes --replace --predicate provenance.json --type slsaprovenance "$IMAGE_REF"

      - name: Verify image signature
        env:
@@ -370,7 +396,8 @@ jobs:
          # Attest the SBOM (NOT the deprecated `cosign attach sbom`).
          # `cosign attest --type spdxjson` writes a signed in-toto envelope
          # to sha256-<digest>.att, also discoverable by digest.
-          cosign attest --yes --predicate sbom.spdx.json --type spdxjson "$IMAGE_REF"
+          # --replace ensures retries don't accumulate duplicate attestations.
+          cosign attest --yes --replace --predicate sbom.spdx.json --type spdxjson "$IMAGE_REF"

      - name: Upload SBOM artifact
        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -873,21 +873,29 @@ jobs:
  # CLEANUP ON REJECTION
  # ============================================================================
  # In the build-once-promote model, prerelease-docker signs the manifest
-  # BEFORE release approval. If the maintainer rejects the `release` env (or
-  # create-release/trigger-workflows fails for any other reason), the
-  # prerelease tag and its cosign artifacts (sha256-<digest>.sig / .att) are
-  # left orphaned on Docker Hub forever — the existing cleanup loop in
-  # docker-publish.yml only runs on the success path.
+  # BEFORE release approval. If the prerelease env was rejected, OR the
+  # maintainer rejects the `release` env (or create-release/trigger-workflows
+  # fails for any other reason), the prerelease tag and its cosign artifacts
+  # (sha256-<digest>.sig / .att) are left orphaned on Docker Hub forever —
+  # the existing cleanup loop in docker-publish.yml only runs on the success
+  # path.
  #
-  # This job runs only when prerelease completed (success OR failure — the
-  # latter catches "signing failed mid-way" leaving partial artifacts) AND
-  # the release path did NOT succeed. Allowlist `failure` and `cancelled`
-  # explicitly to avoid firing on `skipped` (e.g., release_exists guard).
+  # The `if:` fires in three rejection paths:
+  #   1. prerelease-docker itself failed (env rejection or build/sign failure).
+  #      Dependents become `skipped`, so we must check this explicitly —
+  #      otherwise the most common rejection path doesn't trigger cleanup.
+  #   2. create-release failed/cancelled (after a successful prerelease).
+  #   3. trigger-workflows failed/cancelled (after a successful prerelease
+  #      and successful create-release).
+  #
+  # The `release_exists == 'false'` guard prevents firing on duplicate-run
+  # short-circuit paths, where preserving existing release artifacts is
+  # correct.
  # ============================================================================
  cleanup-on-rejection:
    name: Clean up orphan prerelease tags and signatures
    needs: [build, prerelease-docker, create-release, trigger-workflows]
-    if: ${{ always() && needs.prerelease-docker.result != 'skipped' && (needs.create-release.result == 'failure' || needs.create-release.result == 'cancelled' || needs.trigger-workflows.result == 'failure' || needs.trigger-workflows.result == 'cancelled') && needs.build.outputs.release_exists == 'false' }}
+    if: ${{ always() && needs.prerelease-docker.result != 'skipped' && needs.build.outputs.release_exists == 'false' && (needs.prerelease-docker.result == 'failure' || needs.create-release.result == 'failure' || needs.create-release.result == 'cancelled' || needs.trigger-workflows.result == 'failure' || needs.trigger-workflows.result == 'cancelled') }}
    runs-on: ubuntu-latest
    permissions:
      contents: read