Two disjoint regressions stack-failed test-bootstrap-kit.yaml on every push to main: 1. manifest-validation — TestBootstrapKit_BlueprintCardsHaveRequiredFields asserts platform/<bp>/blueprint.yaml spec.version == chart/Chart.yaml version. Six blueprints had drifted: cilium (1.3.0->1.3.5), cert-manager (1.2.0->1.2.2), flux (1.2.0->1.2.2), openbao (1.2.14->1.2.16), keycloak (1.5.0->1.4.5 — blueprint led chart, sync to authoritative Chart.yaml), gitea (1.2.5->1.2.7). Chart.yaml is canonical (drives bootstrap-kit pin -> Sovereign install); blueprint.yaml gets resynced down/up to match. 2. pin-sync-audit on push — full-sweep audit races the blueprint-release auto-bump hook. Chart-bump merge commit has chart=N pin=N-1 drift until the auto-bump bot commits the pin update ~60s later; the bot push (GITHUB_TOKEN convention) does not retrigger this workflow, so the failure remains in run history. Fix: set continue-on-error: true on push/workflow_dispatch events (PR remains blocking via --changed-only). The full-sweep output still surfaces drift on the run summary; it just doesn't fail the overall run while the heal-in- ~60s window is open. Documented inline in the job header. Net effect: every push to main re-runs cleanly green. The 13 pre-existing drifts called out in the existing job comment will continue to heal as each lagging chart gets its next bump (auto-bump hook + this PR's manifest-validation alignment). Refs PRs #1666 #1687 #1695 #1698 #1706 #1707 (the manual collector PRs TBD-A6 eliminated for bootstrap-kit pins; this PR extends the convergence to blueprint.yaml versions which the test asserts but the auto-bump hook does not yet update). Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| chart | ||
| blueprint.yaml | ||
| README.md | ||
cert-manager
TLS certificate automation. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.3) — runs on every host cluster a Sovereign owns.
Status: Accepted | Updated: 2026-04-27
Overview
cert-manager provides automated TLS certificate management using Let's Encrypt with automatic renewal and Kubernetes-native integration.
Architecture
flowchart TB
subgraph CM["cert-manager"]
Controller[Controller]
Webhook[Webhook]
CAInjector[CA Injector]
end
subgraph Issuers["Issuers"]
LE[Let's Encrypt]
CA[Internal CA]
end
subgraph Resources["K8s Resources"]
Cert[Certificate]
Secret[TLS Secret]
Ingress[Gateway/Ingress]
end
Controller --> LE
Controller --> CA
Cert --> Controller
Controller --> Secret
Secret --> Ingress
Challenge Types
| Challenge | Use Case | DNS Provider |
|---|---|---|
| HTTP-01 | Public endpoints | Not required |
| DNS-01 | Wildcards, internal | Cloudflare, Route53, etc. |
Recommended: DNS-01 for wildcard certificates
Configuration
ClusterIssuer (Let's Encrypt)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@<domain>
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
Certificate
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-cert
namespace: cilium-gateway
spec:
secretName: wildcard-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- "*.<domain>"
- "<domain>"
Gateway API Integration
cert-manager integrates with Cilium Gateway API:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: main-gateway
namespace: cilium-gateway
spec:
gatewayClassName: cilium
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- name: wildcard-tls
Renewal
| Setting | Value |
|---|---|
| Renewal window | 30 days before expiry |
| Check interval | 24 hours |
| Retry interval | 1 hour on failure |
cert-manager automatically renews certificates before expiration.
Monitoring
| Metric | Description |
|---|---|
certmanager_certificate_expiration_timestamp_seconds |
Certificate expiry time |
certmanager_certificate_ready_status |
Certificate readiness |
certmanager_http_acme_client_request_count |
ACME requests |
Part of OpenOva