The existing TBD-A6 + TBD-A20 system catches drift between Chart.yaml, bootstrap-kit pin, and blueprint.yaml spec.version AFTER chart-publish commits land on main, but it cannot detect the "chart bumped but never published" failure mode: the bootstrap-kit pin points at a chart version that GHCR never received because blueprint-release.yaml failed (e.g. TBD-A20 YAML scanner break, race with TBD-A20 lockstep, runner cancellation, transient GHCR push 5xx). Concrete observed failure (2026-05-18/19): bp-catalyst-platform 1.4.180 and 1.4.181 were "lost" during the TBD-A20 scanner break window (21:04Z → 22:07Z). The pin sync audit reported chart=pin=1.4.181 PASS while ghcr.io/openova-io/bp-catalyst-platform:1.4.181 did NOT exist until A58 manually re-fired the workflow via dispatch. Fresh Sovereigns silently fell back to the last working tag. What this adds - scripts/check-bootstrap-kit-pin-sync.sh gains `--check-ghcr` (and optional `--ghcr-org <org>`). For every chart pinned in the kit, it lists ghcr.io/<org>/<chart> tags via `gh api /orgs/<org>/packages/container/<chart>/versions --paginate`, then asserts the pinned version appears. Exits 1 on any missing tag. - A per-chart tag cache avoids redundant paginations. - .github/workflows/test-bootstrap-kit.yaml `pin-sync-audit` job now passes `--check-ghcr` on `push` to main + `workflow_dispatch` (PR mode stays `--changed-only` and skips GHCR — PRs cannot publish to GHCR anyway). The job stays `continue-on-error: true` under the same observational umbrella as the existing post-merge full sweep so a transient API blip cannot red-flag every chart bump; the missing-tag list still surfaces on the run summary for operator attention. - Job grants `packages: read` so the workflow GITHUB_TOKEN can list private package versions. Verification (origin/main snapshot, 2026-05-19) - Full sweep default: 50/50 chart→pin pairs OK, no GHCR check. - Full sweep `--check-ghcr`: 50/50 pairs OK AND 50/50 GHCR tags present — PASS exit 0. - Negative test: with products/catalyst/chart/Chart.yaml + slot 13 both set to a non-existent 99.99.99, the script exits 1 with `GHCR MISS bp-catalyst-platform:99.99.99 — tag NOT FOUND` and the remediation hint pointing at `gh workflow run blueprint-release.yaml`. - `--changed-only --base origin/main` against a no-change tree: clean exit 0 with the existing "nothing to check" message. Refs #1872, #1864, #1856. Closes #1872 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
190 lines
7.7 KiB
YAML
190 lines
7.7 KiB
YAML
name: Test — Bootstrap Kit (kind cluster + Flux)
|
|
|
|
# Closes #145 — integration test that the 11-component bootstrap kit's
|
|
# Flux Kustomizations are well-formed and accepted by a real K8s API
|
|
# server. Spins up a kind cluster, installs Flux, and asserts that all
|
|
# 11 Kustomizations get registered. Does NOT wait for full reconciliation
|
|
# (chart pulls + cloud creds belong to #141 Hetzner E2E).
|
|
|
|
on:
|
|
push:
|
|
paths:
|
|
- 'tests/e2e/bootstrap-kit/**'
|
|
- 'platform/**/blueprint.yaml'
|
|
- 'platform/**/chart/**'
|
|
- 'products/**/chart/**'
|
|
- 'clusters/**'
|
|
- 'scripts/check-bootstrap-deps.sh'
|
|
- 'scripts/check-bootstrap-kit-pin-sync.sh'
|
|
- 'scripts/expected-bootstrap-deps.yaml'
|
|
- '.github/workflows/test-bootstrap-kit.yaml'
|
|
branches: [main]
|
|
pull_request:
|
|
paths:
|
|
- 'tests/e2e/bootstrap-kit/**'
|
|
- 'platform/**/blueprint.yaml'
|
|
- 'platform/**/chart/**'
|
|
- 'products/**/chart/**'
|
|
- 'clusters/**'
|
|
- 'scripts/check-bootstrap-deps.sh'
|
|
- 'scripts/check-bootstrap-kit-pin-sync.sh'
|
|
- 'scripts/expected-bootstrap-deps.yaml'
|
|
- '.github/workflows/test-bootstrap-kit.yaml'
|
|
workflow_dispatch:
|
|
|
|
jobs:
|
|
dependency-graph-audit:
|
|
# Audit the bootstrap-kit dependency graph against the expected DAG declared
|
|
# in scripts/expected-bootstrap-deps.yaml. Mechanically verifies every HR's
|
|
# spec.dependsOn matches the design contract in
|
|
# docs/BOOTSTRAP-KIT-EXPANSION-PLAN.md §2 + §3, and detects cycles. Runs on
|
|
# every PR that touches a bootstrap-kit HR or the audit data files. Owned by
|
|
# W2.K0; consumed by W2.K1-K4 PRs to validate slot 15-48 additions.
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
|
|
- name: Install yq
|
|
run: |
|
|
sudo wget -qO /usr/local/bin/yq \
|
|
https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
|
|
sudo chmod +x /usr/local/bin/yq
|
|
yq --version
|
|
|
|
- name: Run bootstrap-kit dependency audit
|
|
run: bash scripts/check-bootstrap-deps.sh
|
|
|
|
pin-sync-audit:
|
|
# TBD-A6 regression test. Asserts every Chart.yaml in platform/* or
|
|
# products/* whose chart is pinned in clusters/_template/bootstrap-
|
|
# kit/ has the SAME version on both sides.
|
|
#
|
|
# On `pull_request` we use --changed-only --base <base-ref> so a PR
|
|
# is only blocked on chart→pin pairs IT modified. This keeps the
|
|
# gate effective (every new chart bump must update the pin) without
|
|
# forcing pre-existing drifts (13 charts as of 2026-05-18) to be
|
|
# fixed before any unrelated PR can land. The auto-bump hook in
|
|
# blueprint-release.yaml will heal those drifts on the next bump
|
|
# of each lagging chart.
|
|
#
|
|
# On `push` to main and `workflow_dispatch` we run the FULL sweep
|
|
# so post-merge drift is observable on the run summary even if the
|
|
# PR gate let it through.
|
|
#
|
|
# TBD-A17 mitigation (#1849, 2026-05-18): the full sweep on `push`
|
|
# to main races with the blueprint-release auto-bump hook. When a
|
|
# PR bumps a Chart.yaml version, the merge commit (which is what
|
|
# this push event sees) does NOT yet contain the matching
|
|
# bootstrap-kit pin bump — the auto-bump hook runs in a DIFFERENT
|
|
# workflow (blueprint-release.yaml) and pushes the pin bump as a
|
|
# follow-up bot commit, which (per GITHUB_TOKEN convention) does
|
|
# NOT retrigger this workflow. So the FIRST run on every chart-
|
|
# bumping merge sees `chart=N pin=N-1` drift and would block.
|
|
# The actual desired-state is that the follow-up bot commit heals
|
|
# the drift within ~60s. Push-mode is therefore observational, not
|
|
# blocking; we use `continue-on-error: true` so the workflow stays
|
|
# green while the drift is still visible on the run summary.
|
|
#
|
|
# TBD-A26 (issue #1872, 2026-05-19): full-sweep mode ALSO runs the
|
|
# `--check-ghcr` phase, which verifies every pinned chart version
|
|
# exists as a tag on ghcr.io/openova-io/<chart>. Catches the
|
|
# "chart bumped but never published" failure mode that TBD-A6 +
|
|
# TBD-A20 cannot see (e.g. blueprint-release.yaml failed with
|
|
# startup_failure, race against TBD-A20 lockstep). Stays under the
|
|
# same continue-on-error umbrella — observational on push/dispatch,
|
|
# so a transient GHCR API blip doesn't red-flag every chart bump.
|
|
# The job summary surfaces the missing-tag list for any operator
|
|
# who notices the warning.
|
|
runs-on: ubuntu-latest
|
|
continue-on-error: ${{ github.event_name == 'push' || github.event_name == 'workflow_dispatch' }}
|
|
permissions:
|
|
# `gh api /orgs/<org>/packages/container/<chart>/versions` needs
|
|
# the read:packages scope for private package metadata. The
|
|
# workflow GITHUB_TOKEN inherits this from the `packages: read`
|
|
# block when explicitly requested.
|
|
contents: read
|
|
packages: read
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
with:
|
|
# Need history back to the PR base for the --changed-only diff.
|
|
fetch-depth: 0
|
|
|
|
- name: Run pin-sync audit (changed-only on PR, full sweep + --check-ghcr otherwise)
|
|
env:
|
|
# `gh` defers to GH_TOKEN when running on a runner; pass the
|
|
# workflow token explicitly so the package-listing API call
|
|
# picks up the `packages: read` scope granted above.
|
|
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
|
run: |
|
|
set -euo pipefail
|
|
if [ "${{ github.event_name }}" = "pull_request" ]; then
|
|
base="${{ github.event.pull_request.base.sha }}"
|
|
echo "Running --changed-only against base ${base}"
|
|
bash scripts/check-bootstrap-kit-pin-sync.sh --changed-only --base "${base}"
|
|
else
|
|
echo "Running full sweep + --check-ghcr (event=${{ github.event_name }})"
|
|
bash scripts/check-bootstrap-kit-pin-sync.sh --check-ghcr
|
|
fi
|
|
|
|
manifest-validation:
|
|
# Static-only validation: blueprint.yaml + chart Chart.yaml + clusters/_template
|
|
# parsing + dependency order check. Runs on every push.
|
|
runs-on: ubuntu-latest
|
|
needs: dependency-graph-audit
|
|
defaults:
|
|
run:
|
|
working-directory: tests/e2e/bootstrap-kit
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
|
|
- name: Set up Go
|
|
uses: actions/setup-go@v5
|
|
with:
|
|
go-version: '1.22'
|
|
cache-dependency-path: tests/e2e/bootstrap-kit/go.sum
|
|
|
|
- name: Run static validation
|
|
run: go test -v -count=1
|
|
|
|
kind-reconciliation:
|
|
# Kind-cluster reconciliation: brings up kubernetes-in-docker, installs
|
|
# Flux, and verifies the API server accepts our 11 bootstrap-kit
|
|
# Kustomizations. Runs only on main to keep PRs fast — the ticket calls
|
|
# for "all 11 phases install in sequence on a kind cluster (CI)" so this
|
|
# is the long-form gate.
|
|
runs-on: ubuntu-latest
|
|
needs: manifest-validation
|
|
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
|
|
defaults:
|
|
run:
|
|
working-directory: tests/e2e/bootstrap-kit
|
|
steps:
|
|
- name: Checkout
|
|
uses: actions/checkout@v4
|
|
|
|
- name: Set up Go
|
|
uses: actions/setup-go@v5
|
|
with:
|
|
go-version: '1.22'
|
|
cache-dependency-path: tests/e2e/bootstrap-kit/go.sum
|
|
|
|
- name: Set up kind
|
|
uses: helm/kind-action@v1
|
|
with:
|
|
cluster_name: bootstrap-kit-test
|
|
version: v0.25.0
|
|
node_image: kindest/node:v1.30.6
|
|
|
|
- name: Install Flux CLI
|
|
uses: fluxcd/flux2/action@main
|
|
|
|
- name: Run kind-reconciliation test
|
|
env:
|
|
BOOTSTRAP_KIT_KIND_TEST: '1'
|
|
BOOTSTRAP_KIT_GIT_URL: https://github.com/${{ github.repository }}
|
|
run: go test -v -count=1 -run TestBootstrapKit_KindReconciliation -timeout 10m
|