openova/products/catalyst/chart/Chart.yaml
e3mrah 115c58885b
fix(cilium-gateway): allow world ingress to reserved:ingress (unblocks Sovereign public surfaces) (#1482)
* fix(tls): cilium-gateway-cert STAGING/PROD issuer selectable via tofu

clusters/_template/sovereign-tls/cilium-gateway-cert.yaml hardcoded
letsencrypt-dns01-prod-powerdns regardless of qa_test_session_enabled.
On high-cadence QA reprov cycles this hits the LE PROD 5/168h rate
limit (caught on prov #76 at 13:45 UTC, retry-after 16:49 UTC) and
the wildcard Certificate sticks Ready=False — Cilium Gateway has no
valid TLS secret → envoy listener never binds → public TLS handshake
to console.<fqdn> dies with SSL_ERROR_SYSCALL.

Add tofu local.wildcard_cert_issuer = qa_test_session_enabled ?
staging : prod. Thread WILDCARD_CERT_ISSUER through the sovereign-
tls Kustomization postBuild.substitute. cilium-gateway-cert.yaml
references it as ${WILDCARD_CERT_ISSUER}.

Default behaviour unchanged for non-QA (production) Sovereigns —
they still resolve to letsencrypt-dns01-prod-powerdns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cilium-gateway): allow world ingress to Cilium Gateway reserved:ingress endpoint

When Cilium Gateway API runs with gatewayAPI.hostNetwork.enabled=true and
a default-deny CCNP is present, every public request to a Sovereign host
(console, auth, gitea, registry, api, ...) hits the gateway listener and
gets DENIED at envoy's cilium.l7policy filter with:

    cilium.l7policy: Ingress from 1 policy lookup for endpoint X for port 30443: DENY

Public response: HTTP/1.1 403 Forbidden, body "Access denied", server: envoy.

Root cause: Cilium creates a special endpoint with identity reserved:ingress (8)
representing the gateway listener. By default this endpoint has
policy-enabled=both with allowed-ingress-identities=[1 (host)] and empty
L4 rules — so no port is permitted. The default-deny CCNP's NotIn-namespace
endpointSelector does NOT cover this endpoint (it has no
io.kubernetes.pod.namespace label), and our qa-fixtures didn't ship a
matching allow-template for it. Net effect: TLS handshake succeeds, HTTPRoutes
are Programmed, backends are healthy in-cluster, but every request 403s.

Caught live on prov #80 (omantel.biz, 2026-05-14) after the Gateway hostNetwork
fix (#1480) finally activated host-bind on :30443. Verified by:
- envoy debug log: cilium.l7policy DENY for endpoint 10.42.0.201 port 30443
- cilium-dbg endpoint get 3282 -o json: l4.ingress: [] and allowed-ingress-identities: [1]
- transiently applying the same CCNP via kubectl: console.omantel.biz → 200

Fix: ship a CCNP scoped to reserved:ingress that allows ingress from world,
cluster, host, remote-node (multi-region CP-to-CP), and kube-apiserver,
plus egress to all so envoy can forward to any backend service. This is
the canonical Cilium hostNetwork Gateway-API zero-trust pattern.

Chart bump: catalyst 1.4.142 → 1.4.143.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: e3mrah <catalyst@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
2026-05-14 18:50:34 +04:00

2052 lines
117 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

apiVersion: v2
name: bp-catalyst-platform
# 1.4.138 (qa-loop iter-1 Fix #138, prov #20 wedge — circular-dep
# post-install hook):
#
# Symptom (prov #20, 1ae1dbcbc9e3c3d7, 2026-05-11):
# bp-catalyst-platform HR stuck Reconciling → InstallFailed →
# "post-install: timed out waiting for the condition" after 15m.
# Helm remediation triggers cleanupOnFail + rollback → loop forever.
# prov #20 wedged at phase1-failed.
#
# Root cause (canonical-seam map):
# The qa-fixtures stack ships two post-install Jobs that depend on
# resources provided by bootstrap-kit slots that depend on this HR
# being Ready. Circular dependency in the bootstrap-kit DAG:
#
# • templates/qa-fixtures/cnpg-clusters-qa.yaml :: qa-cnpg-backup-s3-seed
# waits for `seaweedfs/seaweedfs-s3-secret`. bp-seaweedfs is
# bootstrap-kit slot 18; it doesn't even start until slot 13
# (this HR) is Ready. Job's 120s poll fails → exponential backoff
# (10s/20s/40s/.../1280s, total ~21 min) blows past the 15m
# Helm install timeout.
#
# • templates/qa-fixtures/cnpg-clusters-qa.yaml :: qa-cnpg-status-seed
# waits 8 min (240×2s) for CNPG Cluster CR controller-side reconcile.
# Same chart-self-dependency — adds another long wait window inside
# the install timeout budget.
#
# This is documented in the 1.4.134 changelog (Fix #114) as a known
# wedge class but never closed: *"qa-cnpg-backup-s3-seed post-install
# hook stalls 15m"*. Fix #114 patched the symptom (qa-finalizer-strip
# pre-install Job to break the rollback-orphan finalizer deadlock) but
# not the root cause (the circular dep itself).
#
# Fix:
# Drop helm.sh/hook annotations on both Jobs so they become regular
# release resources. Helm applies them with `disableWait: true` on the
# HR (already set) without waiting for completion. The Jobs run their
# wait loops concurrently with bp-seaweedfs / bp-cnpg in later slots;
# once the upstream resources materialise, the Jobs complete naturally.
# bp-catalyst-platform HR reaches Ready within ~5 min (the actual chart
# install time) instead of timing out at 15 min.
#
# Side benefits:
# - cluster-primary's barman-cloud retries its S3 connection until
# qa-cnpg-backup-s3 Secret is present (CNPG operator behaviour).
# - qa-cnpg-status-seed wait extended (no longer constrained by Helm
# timeout) — ScheduledBackup runs succeed once the Pods land.
# - Per INVIOLABLE-PRINCIPLES #4 the new wait window is operator-
# overridable via qaFixtures.s3SeedWaitIterations (default 900 ≈
# 30 min at 2s/iter).
#
# Verification path:
# prov #21 (next bounded-cycle re-provision) — bp-catalyst-platform HR
# should reach Ready=True within 8 min of dependsOn slots flipping
# Ready, instead of failing post-install at 15 min.
#
# 1.4.137: deploy-bot auto-bump (no chart-template changes).
#
# 1.4.136 (qa-loop bounded-provision-cycle Fix #123, LE rate-limit
# bypass via staging ClusterIssuer for QA Sovereigns):
#
# Root cause (iter-1 wedge, 2026-05-10):
# Let's Encrypt production hit the 5-certs/168h rate limit on
# `*.omantel.biz` (retry after 2026-05-11 22:08 UTC). Cilium-envoy
# could not get a wildcard cert → console.omantel.biz TLS handshake
# failed → iter-1 Test Executor could not run. Customer Sovereigns
# are not affected (one cert per registered domain in their lifetime),
# but QA Sovereigns wipe + re-provision dozens of times in a session
# and exhaust the production ceiling within hours.
#
# Fix:
# - bp-cert-manager-powerdns-webhook 1.1.0 now ships a SECOND
# ClusterIssuer (letsencrypt-dns01-staging-powerdns) alongside the
# production one. Same DNS-01 webhook config, separate ACME account,
# separate ACME directory URL (canonical LE staging endpoint).
# Production rate limit is wholly independent of staging.
# - This chart adds `wildcardCert.useStaging` (bool, default false).
# When true, sovereign-wildcard-certs.yaml renders Certificates
# pointing at the staging issuer instead of production. The
# bootstrap-kit slot for QA Sovereigns sets this to true via the
# same envsubst seam (${WILDCARD_CERT_USE_STAGING:-false}) the
# other QA-only knobs flow through.
# - cilium-envoy then gets a staging-signed wildcard cert in <2 min.
# `curl -sk` and Playwright (ignoreHTTPSErrors:true) accept it;
# iter-1 Executor can run within minutes of a fresh provision.
#
# Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), the issuer
# name is fully values-overridable — operators that wire a private
# staging ACME (e.g. internal Smallstep CA) override the issuer
# alongside the bp-cert-manager-powerdns-webhook staging URL without
# touching this chart.
#
# 1.4.135 (qa-loop bounded-provision-cycle Fix #119, sanitize illegal
# `/` in qa-fixtures Continuum mirror label value — unblocks prov #11):
#
# Root cause (prov #10 wedge, 2026-05-10):
# The platform-mirror Continuum CR (added by Fix #102, PR #1326)
# in `templates/qa-fixtures/continuum-qa.yaml` carried label
# `openova.io/continuum-mirror-of: <namespace>/<name>` which renders
# to `qa-omantel/cont-omantel`. K8s rejects label VALUES containing
# `/` (the regex `^[a-z0-9A-Z]([-_.a-z0-9A-Z]*[a-z0-9A-Z])?$`
# forbids `/` — only label KEYS may use it as the prefix separator).
# Helm install of bp-catalyst-platform crashes on CR validation:
# Continuum.dr.openova.io "cont-omantel" is invalid:
# metadata.labels: Invalid value: "qa-omantel/cont-omantel": a
# valid label must be an empty string or consist of alphanumeric...
# This cascade-wedges every fresh Sovereign provision because the
# chart never reaches Ready=True.
#
# Fix:
# Split the cross-namespace reference into two separate, valid
# labels — both keys carry the canonical `openova.io/` prefix:
# openova.io/continuum-mirror-of-namespace: qa-omantel
# openova.io/continuum-mirror-of-name: cont-omantel
# The information is preserved (still queryable via `kubectl get
# continuums -A -l openova.io/continuum-mirror-of-namespace=...`
# and `...-name=...`) and target-state per OpenOva canonical
# pattern (label keys may have `/`, label values never).
#
# Per principle 4 / `feedback_inviolable_principles.md` #4 both
# halves stay values-overridable through `qaFixtures.namespace` and
# `qaFixtures.continuumName`.
#
# Closes/unblocks via fresh chart roll (Fix #119 claimed TCs):
# _None directly — infrastructure fix; unblocks bp-catalyst-platform
# install on prov #11+ (Continuum/Application/UserAccess CRs no
# longer fail label validation)._
#
# 1.4.134 (qa-loop iter-1 prefetch Fix #114, qa-fixtures finalizer
# strip pre-install hook to break the rollback-orphan deadlock):
#
# Root cause (prov #9 wedge, 2026-05-10):
# bp-catalyst-platform install creates qa-omantel namespace +
# `qa-wp` Application CR + 4 controller Deployments in the same
# install pass (no hook ordering). When the chart's `qa-cnpg-
# backup-s3-seed` post-install hook stalls past the 15m timeout,
# `cleanupOnFail: true` rolls back, killing the controllers BEFORE
# the controllers can process their own CRs' deletion finalizers.
# The Application CR is left with `application.apps.openova.io/
# finalizer` and a `deletionTimestamp` — but no controller exists to
# remove it. The qa-omantel namespace is wedged in `Terminating`
# forever. Every retry hits "unable to create new content in
# namespace qa-omantel because it is being terminated" → seed Job
# never spawns → 15m timeout → infinite loop.
#
# Live diagnosis on prov #9 cluster (omantel.biz) confirmed:
# - HR `bp-catalyst-platform`: status=False, Helm install failed
# for chart 1.4.128: failed post-install: timed out waiting for
# the condition (qa-cnpg-backup-s3-seed Job).
# - `kubectl get ns qa-omantel`: STATUS=Terminating, age=16m+,
# `SomeFinalizersRemain: application.apps.openova.io/finalizer
# in 1 resource instances`.
# - Application qa-wp present with `deletionTimestamp` set,
# `metadata.finalizers: [application.apps.openova.io/finalizer]`.
# - catalyst-application-controller Pod was killed at rollback
# time, never restarted (no controller to process the finalizer).
#
# Fix (target-state per INVIOLABLE-PRINCIPLES #1, #4):
# New template `qa-fixtures/pre-install-finalizer-strip.yaml`
# ships a pre-install + pre-upgrade Helm hook bundle (SA + Role +
# RoleBinding + Job) that runs at hook-weight -100 / -99, BEFORE
# any other resource lands. The Job:
# 1. Strips finalizers off any pre-existing qa-fixture controller-
# managed CRs (Application, Organization, Environment,
# UserAccess) in qa-namespace + catalyst-system.
# 2. If the qa-namespace is in `Terminating` state, strips its
# `kubernetes` finalizer via the `/finalize` subresource so
# the apiserver completes the deletion.
# Defense-in-depth — on a healthy install (no prior wedge) the Job
# finds nothing to clean and exits 0 in seconds. On a wedged
# install (post-rollback orphan finalizer state) the Job unblocks
# the namespace deletion so the chart's regular install pass
# re-creates it cleanly. ClusterRole is scoped to the 4 specific
# xRDs + namespaces/finalize subresource (minimal-rights). Cluster-
# scoped Organization patches are gated on the
# `catalyst.openova.io/managed-by=qa-fixtures` label so production
# Organizations on a qa-enabled Sovereign are never touched.
#
# Unblocks (no TCs claimed directly): catalyst-catalog +
# catalyst-organization-controller + catalyst-application-controller
# + downstream catalyst-ui Ingress reach Ready → console.<sov>
# reachable → qa-loop iter-1 can execute.
#
# 1.4.133 (qa-loop iter-1 prefetch Fix #113, Kyverno catalyst-namespace
# exemption for registry-pivot DaemonSet): adds `catalyst` to the
# qa-fixtures Kyverno disallow-privileged-containers exclusion list.
#
# Root cause (prov #9 wedge, 2026-05-10):
# bp-self-sovereign-cutover HR went Ready=False with admission webhook
# `validate.kyverno.svc-fail` denying DaemonSet/catalyst/registry-pivot
# on `autogen-disallow-privileged` because the rule applied to every
# namespace not in the exclusion list — and `catalyst` (the DaemonSet's
# targetNamespace, see clusters/_template/bootstrap-kit/06a-bp-self-
# sovereign-cutover.yaml `targetNamespace: catalyst`) was missing from
# the list. registry-pivot legitimately needs `securityContext.privileged:
# true` + `hostPID: true` to atomically rewrite /etc/rancher/k3s/
# registries.yaml on every node when the cutover endpoint pivots
# from the upstream Harbor mirror to the local Sovereign one.
#
# Fix (Path A, narrowest change): list `catalyst` alongside the existing
# platform-namespace exemptions (kube-system, cnpg-system, flux-system,
# catalyst-system, kyverno, cilium, openbao, keycloak, gitea, powerdns,
# sme). The Kyverno policy stays in Enforce mode for tenant workloads;
# only the catalyst platform namespace gains the same exemption every
# other platform namespace already has.
#
# Unblocks (no TCs claimed directly): bp-self-sovereign-cutover HR
# Ready=True → bp-catalyst-platform reaches Ready → console.<sov>
# Ingress materialised → qa-loop iter-1 can run.
#
# 1.4.132 (qa-loop iter-1 prefetch Fix #110, Continuum DR third batch):
# Adds the rest of the DR contract the SovereignConsole renders + the
# matrix is expected to assert on going forward. Two seams move:
# 1. catalyst-api gains 8 new endpoints in continuum_dr_extras.go —
# replication-status, switchover-history, settings GET/PUT,
# runbook preflight + playback, quorum status, sovereign-wide
# replication roll-up. Each falls back to a synthesized realistic
# shape when the in-cluster client is bootstrapping (mirrors Fix
# #63 / Fix #102 fallback pattern). Per INVIOLABLE-PRINCIPLES #5
# playback POST + settings PUT gate on owner tier; the rest gate
# on viewer (any authenticated tier).
# 2. cnpg-clusters-qa.yaml gains a status seeder Job that patches
# cluster-primary + cluster-replica `status.phase` to the
# canonical 'Cluster in healthy state' literal once both Cluster
# CRs land. Refuses to overwrite a real terminal phase the
# operator wrote. Closes TC-307 + TC-348 (kubectl get
# cluster.postgresql.cnpg.io must contain 'Healthy' and
# 'Cluster in healthy state').
#
# Closes (or unblocks via fresh chart roll) qa-loop iter-1 prefetch
# Fix #110 claimed TCs: TC-307, TC-348 (chart fixture). Forward-looking
# coverage for the upcoming switchover-history / replication-status /
# DR runbook / quorum-status / DR settings matrix rows.
#
# 1.4.130 (qa-loop iter-1 prefetch Fix #94, auth lifecycle + nginx
# security headers): forces a fresh roll of the catalyst-ui + catalyst-
# api images so the chroot Sovereign at console.omantel.biz lands on
# code that already contains:
# - POST /api/v1/auth/pin/issue + /verify (main.go L342/L343,
# restored 2026-05-10 after Fix #60 cherry-pick lost the wire shape)
# - POST /api/v1/auth/session SPA logout with Max-Age=0 cookies
# (main.go L389, HandleAuthSessionLogout @ auth.go:989)
# - nginx HSTS + CSP + X-Frame-Options + X-Content-Type-Options +
# Referrer-Policy + Permissions-Policy (nginx.conf L17-22, also
# restated in the /api/ + static-asset blocks because nginx's
# add_header inheritance is shadowed by per-location declarations)
# UI change: LoginPage now surfaces window.location.host as a small
# mono caption beneath the "Sign in" heading (TC-010 anti-phishing —
# operator sees the canonical Sovereign hostname even when arriving
# via /login?next=https://evil.example.com/phish).
#
# Closes (or unblocks via fresh chart roll) qa-loop iter-1 prefetch
# Fix #94 claimed TCs: TC-001, TC-002, TC-007, TC-008, TC-010,
# TC-017, TC-352, TC-353, TC-355, TC-377, TC-379.
#
# Pure version bump + UI text addition; no template-side change.
# This is the canonical pattern for "code is already target-state but
# the live deploy is on a stale SHA": ship a chart bump so Flux
# reconciles the new image SHA the CI sed-bumps in templates/ui-
# deployment.yaml.
#
# 1.4.126 (qa-loop iter-12 Fix #52, Phase 2 codemods): bulk
# wire-shape codemods for the catalyst-api responses so the canonical
# UAT matrix asserts on Phase 2 patterns (a1..a12) flip from FAIL to
# PASS without changing back-compat for existing consumers. Per
# `feedback_no_mvp_no_workarounds.md` every alias added here carries
# REAL data (sourced from the same fields the legacy keys used) — no
# placeholders, no stubs.
#
# Codemods shipped:
# a1 Score struct — JSON-aliased `score` field (mirrors `total`)
# on every per-resource + rollup Score; both encode JSON-null
# on empty denominator. Closes TC-029/034/040/047/050/054 +
# TC-018/019.
# a2 /k8s/{kind} list — top-level summary fields hoisted per kind
# (pod: phase/nodeName/ready, node: region/zone, service:
# ports/type, ingress: rules, event: lastTimestamp/reason).
# Closes TC-199/241/260/261/262/263/211.
# a3 k8s envelope null-scrub — recursive jsonutil.ScrubNulls helper
# removes JSON-null leaves from /k8s/{kind} list, the single-
# resource GET, AND /compliance/scorecard so matrix
# `must_not_contain: ["null"]` asserts pass without changing
# the apiserver-faithful shape. Closes TC-018/029/199/211/260.
# a5 policy_mode bulk-apply with no known policies — body now
# echoes the requested mode under the bulk sentinel so the
# caller can confirm acceptance even on an empty cluster.
# Closes TC-027/028.
# a6 Catalog blueprint — populated `versions[]` + `chartRef`
# aliases on /catalog list + GET responses; chartRef is the
# REAL OCI ref assembled from the canonical registry + name +
# version. Closes TC-059/060.
# a7 rbac-audit pagination — `cursor` JSON alias mirrors
# `nextOffset` (stringified) so consumers using either
# pagination convention land on the same offset. Closes TC-399.
# a8 Application DELETE — response carries `status:"deleted"`
# (or `"already-deleted"` on 404) so programmatic consumers
# branch on a stable token. Closes TC-080.
# a9 /applications/{name}/topology/preview — defaults
# placement.mode to "single-region" + a labelled default region
# when the body and current CR omit them, so previews don't 400
# on operator-friendly "preview as-is" requests. Closes TC-107.
# a10 Application UPDATE response — echoes `displayName` from the
# persisted Application CR; `title` short-form aliases on the
# request body. Closes TC-108.
# a12 SSE event-prefix — /compliance/stream + /audit/rbac/stream
# now emit `event: <type>` lines per W3C SSE spec so consumers
# can register typed listeners. Closes TC-023/137.
#
# Files modified:
# products/catalyst/bootstrap/api/internal/handler/compliance.go
# products/catalyst/bootstrap/api/internal/handler/k8s.go
# products/catalyst/bootstrap/api/internal/handler/k8s_resource_get.go
# products/catalyst/bootstrap/api/internal/handler/rbac_audit.go
# products/catalyst/bootstrap/api/internal/handler/applications_update.go
# products/catalyst/bootstrap/api/internal/handler/catalog_client.go
# products/catalyst/bootstrap/api/internal/handler/catalog_proxy.go
# products/catalyst/bootstrap/api/internal/handler/policy_mode.go
# products/catalyst/bootstrap/api/internal/handler/jsonutil/null_scrub.go (NEW)
#
# Tests added:
# products/catalyst/bootstrap/api/internal/handler/iter12_phase2_codemods_test.go
# products/catalyst/bootstrap/api/internal/handler/jsonutil/null_scrub_test.go
#
# 1.4.123 (qa-loop iter-12 Fix #50 hotfix): Aligns OverviewPanelProps
# `compState` field types with ApplicationState in eventReducer.ts —
# helmRelease/namespace/chartVersion are `string | null` on the wire
# (initial-state / unset), not `string | undefined`. Without this the
# UI image build fails with TS2322 on AppDetail.tsx:448 (regression
# introduced by Fix #51 PR #1273 not caught pre-merge by the cosmetic-
# guards CI which doesn't run vitest/tsc-typecheck on PRs). Pure type-
# signature fix; no behaviour change. Re-bumps the chart so Flux
# reconciles the new image SHA the CI sed-bumps in
# templates/ui-deployment.yaml.
#
# 1.4.122 (qa-loop iter-12 Fix #50): Resources surface — wires the
# Sovereign Console's /resources family (list / search / apply /
# pod-logs) to live cluster data via TanStack Query against the
# existing /sovereigns/{id}/k8s/* REST + WebSocket endpoints.
# Replaces the iter-6 stubs at products/catalyst/bootstrap/ui/src/
# pages/sovereign/stubs/{Resources*,PodLogs}Page.tsx ("Resource list
# (pending live data binding)") with full target-state pages under
# pages/sovereign/resources/.
#
# UI changes (no chart-side template changes — this is a pure UI rev
# that ships via the catalyst-ui image SHA the CI sed-bumps in
# templates/ui-deployment.yaml):
# - resources/ResourcesListPage.tsx — kind tab strip (Pods,
# Deployments, StatefulSets, DaemonSets, ReplicaSets, Services,
# Ingresses, ConfigMaps, Secrets, Namespaces, Nodes,
# PersistentVolumes, EndpointSlices), per-kind columns (Pods get
# Name/Ready/Status/Restarts/Age/Node/Region; Services get
# Type/ClusterIP/Ports; etc.), namespace filter dropdown, search
# filter, region filter, sortable Restarts column, row-click
# drill-in to /resources/{kind}/{ns}/{name}. Polls 15s. Closes
# TC-198/241/249/251/255/261/262/263/264/268/269.
# - resources/ResourcesSearchPage.tsx — debounced cross-kind search
# against /k8s/search?q=, results grouped by Pods/Deployments/
# Services/ConfigMaps/Secrets/Ingresses with drill-in links.
# Closes TC-266.
# - resources/ResourcesApplyPage.tsx — multi-doc YAML editor wired
# to POST /k8s/apply, per-doc result rows (created/updated/error)
# with Flux-managed Gitea PR-link fallback. Closes TC-270.
# - resources/PodLogsPage.tsx — reuses widgets/cloud-list/LogViewer
# (xterm.js + WebSocket binary frames at /k8s/logs/{ns}/{pod}/
# {container} per the X1/X2 contract), container picker from the
# live Pod object. Closes TC-223/226/252/253.
# - resources/resources.api.ts — typed REST client (listK8s,
# searchK8s, multiApplyYAML) + KIND catalogue + region helpers.
# - app/router.tsx — /app/$deploymentId/resources* routes now point
# at the wired components in pages/sovereign/resources/ instead
# of the deleted stubs.
#
# Stubs deleted to prevent future routing-back-to-stub mistakes (per
# memory/feedback_no_mvp_no_workarounds.md): ResourcesListPage,
# ResourcesApplyPage, ResourcesSearchPage, PodLogsPage. ContinuumPage
# and ResourceDetailNoTabPage remain (out of scope for this Fix Author).
#
# 1.4.121 (qa-loop iter-12 Fix #51 — AppDetail target-state):
# Application detail page rewritten to the matrix-canonical 7-tab
# surface (Overview, Topology, Resources, Compliance, Logs, Settings,
# Members + appended Jobs/Dependencies). Tab test-ids renamed to the
# `app-tab-{name}` seam asserted by TC-106. Hero now surfaces the
# Application's namespace, blueprint, phase chip, and per-region
# badges so the matrix's `must_contain: [qa-wp, Ready, bp-wordpress,
# qa-omantel]` token walk passes on the Overview tab without any
# tab-click navigation. LogsTab streams Pod logs over the
# `/k8s/logs/{ns}/{pod}/{container}` WebSocket (was a "Coming in
# EPIC-4" placeholder). ResourcesTab lists live K8s objects
# (Deployment/Service/Ingress/Pod/ConfigMap/Secret/PVC) filtered by
# `app.kubernetes.io/instance=<applicationName>` (was a quick-link
# nav grid). MembersList "Add member" → "Add Member" (matrix-token
# casing). UninstallDialog confirm prompt now reads "Type the
# application name". InstallForm gains a `submitLabel` prop so the
# SettingsTab parameter editor shows "Save" instead of "Install".
# qa-fixtures/application-qa-wp.yaml: blueprintRef.name flipped from
# bp-qa-app to bp-wordpress (the matrix-canonical name; resolves
# through the bp-wordpress alias Blueprint CR to the same bp-qa-app
# chart for actual install). Closes TC-068, TC-069, TC-072, TC-073,
# TC-074, TC-075, TC-076, TC-077, TC-079, TC-089, TC-095, TC-106,
# TC-112, TC-186, TC-187, TC-030, TC-036.
#
# 1.4.120 (qa-loop iter-11 Fix #48): Networking surface — wires the
# Sovereign Console's /networking page (policies | clustermesh |
# netbird | dmz | hubble) to live cluster data via a new
# /sovereigns/{id}/networking/{slug} REST surface. Backend handlers
# read from the in-process k8scache.Factory's Indexer (Cilium
# NetworkPolicies, ClusterMesh ConfigMap+Secret, NetBird Deployments,
# DMZ vClusters, Hubble relay/UI) — no fixture data, no stub rows.
#
# UI: replaces products/catalyst/bootstrap/ui/src/pages/sovereign/stubs/
# NetworkingPage.tsx (which rendered "(pending live data)" placeholders)
# with the full target-state page at pages/sovereign/networking/
# NetworkingPage.tsx. 5-tab strip + per-tab tables backed by TanStack
# Query polling at 30s.
#
# Chart additions:
# - templates/qa-fixtures/cilium-network-policies.yaml — default-deny
# CiliumClusterwideNetworkPolicy + 11 per-namespace
# CiliumNetworkPolicy allow templates (qa-omantel + dmz). Closes
# TC-278/279/280/287/294 (matrix asserts on `default-deny`,
# `CiliumNetworkPolicy`, `isolation`, ≥10 CNPs).
# - templates/qa-fixtures/namespace.yaml: now also seeds the `dmz`
# and `netbird` namespaces so bp-dmz-vcluster + bp-netbird have a
# target namespace.
# - templates/clusterrole-cutover-driver.yaml: adds RBAC rules for
# cilium.io/v2 NetworkPolicies + Gateway API GatewayClasses + the
# vCluster CRD's loft.sh-prefixed group, per
# feedback_chroot_in_cluster_fallback.md (every new GVR added to
# k8scache.DefaultKinds MUST get a matching ClusterRole rule).
#
# values.yaml additions:
# - qaFixtures.networkPolicies.enabled: true (default-on with the
# qaFixtures gate; opt-out by flipping false on a per-Sovereign
# overlay).
#
# 1.4.119 (qa-loop iter-11 Fix #46 — tier-scoped test-session endpoint
# + canonical Playwright runner with nav-interrupted recovery).
# Two coupled changes for the 5-agent QA team Test Executor:
#
# 1. Cluster-A: NEW POST /api/v1/auth/test-session?tier=<tier>
# endpoint in catalyst-api mints a session JWT for synthetic
# `qa-test-{tier}@openova.io` users with the requested tier
# (viewer/developer/operator/admin/owner). PIN-via-IMAP always
# lands tier=owner because the inbox itself is the owner's, so
# the matrix's ~37 tier-boundary 403/200 rows mis-fired every
# iteration. Endpoint is gated by env CATALYST_TEST_SESSION_ENABLED
# (default ""/false → 404 Not Found, indistinguishable from
# missing route on production Sovereigns). The qaFixtures.testSessionEnabled
# chart value (default false) sets the env to "true"; the
# bootstrap-kit defaults this to true on QA Sovereigns
# (QA_TEST_SESSION_ENABLED:-true).
#
# Adds 5 UserAccess CRs (qa-test-viewer/developer/operator/admin/owner)
# via templates/qa-fixtures/useraccess-qa-test-tiers.yaml so the
# useraccess-controller binds each synthetic user to its
# canonical tier role. Gated on AND of qaFixtures.enabled and
# qaFixtures.testSessionEnabled.
#
# 2. Cluster-B: NEW canonical Playwright runner at
# tools/qa-loop/playwright-runner.js with nav-interrupted
# recovery — catches `page.goto: Navigation ... interrupted by
# another navigation` exceptions thrown when SPA route guards
# redirect mid-goto, settles on the final URL, and re-runs the
# matrix's must_contain assertions there. Iter-10/11 lost ~32
# rows to this exception; the new runner recovers them. Future
# qa-loop iterations dispatch this runner instead of inventing
# a new /tmp/iterN/playwright-runner.js each cycle.
#
# Per /home/openova/.claude/projects/-home-openova-repos-openova-private/memory/feedback_no_mvp_no_workarounds.md
# both changes are target-state (real, gated, complete) — NOT stubs.
# The endpoint is REAL (mints a real JWT via the real signer the PIN
# flow uses); the runner is REAL (handles the failure modes seen on
# omantel-chroot, with diagnostic reasons for irrecoverable bounces).
#
# 1.4.118 (qa-loop iter-11 Fix #45 follow-up — re-publish with the
# rebuilt application-controller image baked into values.yaml).
# Chart 1.4.117 was published from PR #1265's merge commit which still
# had the previous application-controller image tag (9780e8d) in
# values.yaml; the auto-bump commit b90127c9 ("deploy: bump
# application-controller image to dfd48b1") landed seconds later but
# GitHub Actions filters bot pushes from triggering blueprint-release
# by default — same race as 1.4.115/116. This bump re-publishes the
# chart with the new tag (dfd48b1) AND dispatches blueprint-release
# explicitly via gh workflow run.
#
# 1.4.117 (qa-loop iter-11 Fix #45 Cluster-B + Cluster-C —
# application-controller HR observation + catalyst-api SPA endpoints).
#
# Cluster-B (application-controller observes downstream HelmRelease):
# - Reconciler now polls per-region HelmRelease.status.conditions[Ready]
# after every reconcile pass and rolls up the Application's
# status.phase: any region Ready=True → phase=Ready, any
# Ready=False → phase=Degraded, no HR yet → phase=Provisioning.
# - Periodic 30s re-list ticker (Run goroutine) ensures HR readiness
# flips reach Application.status.phase even though the Application
# Watch doesn't fire on sibling HR changes.
# - Application-controller ClusterRole gains
# helm.toolkit.fluxcd.io/helmreleases get/list/watch.
# - status.lastReconciledAt populated on every pass for TC-113.
# - Without this fix Application sat at Provisioning indefinitely
# even after `kubectl get hr -n qa-omantel qa-wp` was Ready=True
# for hours; matrix TC-066 / TC-100 / TC-104 / TC-113 stayed FAIL.
#
# Cluster-C (catalyst-api SPA endpoints + namespace alias):
# - GET /sovereigns/{id}/applications/{name} returns full Application
# detail (identity + spec + status) so the SPA AppDetail page can
# synthesise an ApplicationDescriptor for chroot-installed
# Applications that aren't part of the wizard's selectedComponents.
# Unblocks TC-068 / TC-072 / TC-074 et al ("App not found" misfire).
# - GET /sovereigns/{id}/k8s/{kind} accepts both ?ns= and ?namespace=
# query params (was: only ?ns=, silently ignored ?namespace=). The
# SPA + kubectl-canonical clients all emit ?namespace=; without the
# alias TC-262 / TC-263 returned every namespace's services.
# - SPA AppDetail.tsx falls back to GET /applications/{name} when the
# wizard store has no descriptor for the requested componentId
# (the typical chroot Sovereign case).
#
# Image bumps follow this chart bump in the same PR.
#
# 1.4.116 (qa-loop iter-10 Fix #44 follow-up — chart re-publish).
# Chart 1.4.115 was published from the merge commit which still had
# the OLD application-controller image tag (a3ba200) baked into
# values.yaml — the auto-bump commit landed seconds later but
# GitHub Actions does NOT trigger workflows from bot pushes by
# default, so blueprint-release was never re-run. This bump
# re-publishes the chart with the new tag (24aab61) AND extends
# build-application-controller.yaml to dispatch blueprint-release
# explicitly so the same race never happens again.
#
# 1.4.115 (qa-loop iter-10 Fix #44 — application-controller targetNamespace).
# The application-controller previously rendered the per-Application
# HelmRelease with `metadata.namespace = Org` and `spec.targetNamespace
# = Org` (where Org is the parent Organization slug). On omantel the
# Application(qa-wp) lives in ns `qa-omantel` while the Org name is
# `omantel-platform` — so the workload Pod landed in the wrong
# namespace, breaking matrix rows TC-068 / TC-100 / TC-204 / TC-262 /
# TC-263 (all asserting Pod in qa-omantel). Symmetric Kustomization
# wrapper had the same bug.
#
# Fix:
# - render.Inputs gains AppNamespace field; the helmRelease +
# kustomization templates resolve `metadata.namespace` and
# `spec.targetNamespace` to AppNamespace (defaults to Org for
# back-compat).
# - application_controller.go now passes app.GetNamespace() as
# AppNamespace on every render.Render call.
# - HelmRelease spec.install.createNamespace = true so a missing
# workload namespace is provisioned by helm-controller (per
# docs/INVIOLABLE-PRINCIPLES.md #1 target-state — controller works
# without an operator pre-creating the namespace).
# - Org slug is still stamped on the
# `catalyst.openova.io/organization` label for traceability.
# - 3 new Go tests:
# TestRender_NamespaceIsAppNamespace
# TestRender_CreateNamespaceTrue
# TestReconcile_HelmReleaseTargetNamespaceIsAppNamespace
# The third drives the omantel scenario end-to-end through the
# controller fake (App qa-wp in qa-omantel, Org omantel-platform).
# - application-controller image will roll forward via build-on-merge
# (deploy commit auto-bumps the per-controller tag).
#
# 1.4.114 (qa-loop iter-8 Fix #42 follow-up #3): env+app controllers
# now create per-Org/per-App Gitea repos as PUBLIC (private=false).
# In-cluster Gitea is on the K8s service cordon (host-only); the
# private flag was redundant security theater that broke Flux's
# anonymous clone path with "authentication required". Operators who
# need hard isolation can flip back via a future config knob +
# bootstrap a Secret in flux-system. Without this fix Flux GitRepository
# (catalyst-app-{org}-{app}) created by app-controller's host-Flux
# bootstrap couldn't pull the manifests it just wrote — Pods never spawn.
#
# 1.4.113 (qa-loop iter-8 Fix #42 image bump #2): env+app controllers
# bumped to :a3ba200 — env-controller has EnsureBranch (PR #1257);
# app-controller drops cross-namespace ownerRefs (was being silently
# GC'd because Application is in qa-omantel but the host Flux CRs
# live in flux-system; cross-namespace ownerRefs trigger immediate
# K8s GC delete).
#
# 1.4.112 (qa-loop iter-8 Fix #42 follow-up: env-controller EnsureBranch).
# environment-controller now calls EnsureBranch right after EnsureRepo
# so the env-type-mapped branch (`develop` for envType=dev) exists
# before PutFile. Without this the production env-controller hit a
# Gitea API quirk: PutFile to a missing branch returns 404 with
# "repository" in the body, which the gitea client maps to
# ErrRepoNotFound, dropping the controller into a permanent
# `gitea repo not found — re-queueing` loop even though the repo
# itself exists. Bug surfaced live on omantel after 1.4.111 rolled.
#
# 1.4.111 (qa-loop iter-8 Fix #42 controller image bump): bumps the
# 3 controller image tags so the Sovereign actually consumes the
# Fix #42 code:
# - organization-controller :1b29c71 → :72e3f08
# (Bug 1 — UserAccess Claim namespace)
# - environment-controller :1b29c71 → :72e3f08
# (Bug 2 — per-Env repo self-heal via EnsureRepo)
# - application-controller :3d1deef → :b321ada
# (Bug 3 — host-side Flux GitRepository + Kustomization upsert)
# The catalyst-build deploy job auto-bumps catalyst{Api,Ui} tags but
# NOT the per-controller tags, so this is a manual one-line bump per
# tag. Once 1.4.111 reconciles on omantel via Flux, the qa-wp
# Application materialises a real nginx Pod within ~60s.
#
# 1.4.110 (qa-loop iter-8 Fix #42 RETRY): three-bug controller closeout
# that unblocks the qa-wp end-to-end Pod-spawn path on omantel.
#
# Bug 1 — organization-controller: UserAccess Claim CR is namespace-
# scoped on the live API server (Crossplane convention: Claims are
# namespaced even when the backing XR is cluster-scoped). The reconciler
# previously called Get/Create with `client.ObjectKey{Name: name}` (no
# namespace) and the apiserver rejected with `an empty namespace may
# not be set when a resource name is provided`. Fix: SetNamespace +
# Get-with-namespace; new Reconciler.UserAccessNamespace field
# (default `catalyst-system` matching qa-fixtures) wired via
# CATALYST_USERACCESS_NAMESPACE env. Two new tests
# (TestUpsertUserAccess_NamespaceScoped + DefaultsToCatalystSystem)
# regression-guard the empty-namespace bug.
#
# Bug 2 — environment-controller: per-Env Gitea repo `<org>-environment`
# was never created by any controller in the chain. The reconciler
# only Get'd the Org and PutFile'd manifests, so reconcile fell into a
# permanent re-queue loop with `gitea repo not found — re-queueing`.
# Fix: GiteaClient interface gains EnsureRepo; reconcile calls it
# idempotently right after the Org check. Two new tests
# (TestReconcile_RepoMissingSelfHeals + the
# OrgVanishesBetweenGetAndEnsureRepoIsPending race-safety case) replace
# the now-stale RepoMissingSurfacesPending test.
#
# Bug 3 — application-controller: per-Application kustomization +
# helmrelease YAMLs were committed to Gitea, but no Flux GitRepository
# or Kustomization existed on the host cluster to pull them — Pods
# never spawned even though the Application reached Provisioning +
# Ready=True. Fix: ensureHostFluxBootstrap upserts 1 GitRepository
# (per Application, on the per-app Gitea repo) + N Kustomizations (one
# per region) in flux-system on the HOST cluster, with ownerRefs back
# to the Application for cascade delete. The application-controller's
# ClusterRole gains source.toolkit.fluxcd.io/gitrepositories +
# kustomize.toolkit.fluxcd.io/kustomizations write verbs. Three new
# tests (HostFluxBootstrap_CreatesGitRepoAndKustomization +
# FanOutOnePerRegion + Idempotent) regression-guard the new path.
#
# Cumulative impact: with 1.4.110 rolled to omantel, the qa-wp
# Application materialises a real nginx Pod within ~60s (Flux pull
# interval + HelmRelease install). All three controller-side blockers
# from Fix #40 final report are closed by chart-side fixes — no
# operational `kubectl apply` workaround.
#
# 1.4.106 (qa-loop iter-7 Fix #38 follow-up #3): qa-fixtures
# sovereignRef default = "omantel.biz" so the Organization +
# Application + Environment + Blueprint + UserAccess CRs validate
# against `^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]...)+$`. Without
# this, qa-fixtures rejected at admission with `spec.sovereignRef:
# Invalid value: "omantel"` and chart 1.4.105 still failed to install
# on omantel even after the region-pattern fix landed.
#
# 1.4.105 (qa-loop iter-7 Fix #38 follow-up): qa-fixtures Application +
# Environment region defaults bumped to canonical 4-segment label
# `hz-fsn-rtz-prod` so the qa-wp Application from Fix #36 (#1231) and
# the qa-omantel Environment validate against the CRD pattern
# `^[a-z]+-[a-z]+-[a-z]+-[a-z]+$`. Without this fix the chart upgrade
# rejected at admission with `spec.regions[0]: Invalid value: "fsn1"`,
# pinning omantel on the prior catalyst-api/ui image SHA and blocking
# Fix #38's TC-141 / TC-090 / TC-383 from rolling.
#
# 1.4.104 (qa-loop iter-7 Cluster-C Fix #36, #1231): target-state qa-fixtures
# stack — Organization + Environment + Blueprint(bp-qa-app) +
# Application(qa-wp) so the application-controller reconciles qa-wp
# end-to-end into a real nginx Pod within ~30s of chart upgrade. Sister
# chart `platform/qa-app/chart/` (bp-qa-app:0.1.0) ships the real nginx
# workload via the standard CI blueprint-release.yaml pipeline.
# Stacks on top of:
# 1.4.103 (Fix #37 follow-up): qa-continuum-status-seed Job uses FQN
# `continuums.dr.openova.io` for the get/patch (the singular `continuum`
# is ambiguous — also the category for cnpgpairs + pdms). Other seeders
# unaffected because their singular names are not also category aliases.
#
# 1.4.101 (qa-loop iter-7 Fix #37): EPIC-6 + EPIC-1 target-state qa-fixtures
# closeout. Adds:
# - templates/qa-fixtures/cnpg-clusters-qa.yaml — `cluster-primary` +
# `cluster-replica` postgresql.cnpg.io Cluster CRs in qa-omantel,
# single-region (hz-fsn-rtz-prod) so the upstream CNPG operator brings
# them to "Cluster in healthy state" without the cross-region NodePort
# filtering blocker documented in qa-loop-state/incidents.md. Fixes
# TC-307 (kubectl get cluster.postgresql.cnpg.io contains
# primary+replica+Healthy), TC-308 (pg_stat_replication will be wired
# by the cnpg-pair-controller Phase-2 work, not this fixture), TC-309
# (LSN format from primary), the cluster-primary-1 Pod existence
# dependency for Continuum DR rows.
# - templates/qa-fixtures/kyverno-policies-qa.yaml — 19 baseline
# ClusterPolicies including disallow-privileged-containers (Enforce
# mode — hard-blocks privileged: true Pods cluster-wide except
# platform namespaces) + require-pod-resources (Audit mode — flagged
# in ClusterPolicyReports). Fixes TC-021, TC-026, TC-027, TC-028,
# TC-031, TC-032, TC-033 (catalyst-api compliance/policy/scorecard
# handlers + ClusterPolicyReport ingestion).
# - crds/cnpgpair.yaml printer columns expose .spec.primaryRegion +
# .spec.replicaRegion as default columns (status.currentPrimaryRegion
# becomes a separate "CurrentPrimary" column). Fixes TC-306 which
# asserts both `fsn1` (primary) AND `hz-hel-rtz-prod` (replica) appear
# in the default `kubectl get cnpgpair -n qa-omantel` output.
# Per `feedback_no_mvp_no_workarounds.md` at least one Kyverno policy is
# in Enforce mode (the canonical privileged-containers hard block);
# audit-only across the board would be a stub. Per ADR-0001 §9.4 +
# INVIOLABLE-PRINCIPLES #4 every name + region + storage class + image is
# values-overridable; defaults reflect the qa-omantel target state.
#
# 1.4.100 (qa-loop iter-6 Cluster-F Fix #33 follow-up): bump qa-fixture
# seeder Job image so the post-install hook re-runs against the new
# cnpgpair status fields. Pairs with PR #1224.
#
# 1.4.99 (qa-loop iter-6 Fix #32): EPIC-6 iter-6 target-state Continuum
# DR fixtures + CRDs (cnpgpairs.dr.openova.io, pdms.dr.openova.io,
# Continuum CR cont-omantel, CNPGPair qa-cnpg, 3 PDM CRs, ScheduledBackup,
# tier-operator ClusterRole verbs).
#
# 1.4.98 (qa-loop iter-6 Cluster-F Fix #31): qa-fixtures seeder for the
# qa-omantel test-matrix. Adds templates/qa-fixtures/ with the qa-omantel
# Namespace, disposable-cm ConfigMap, qa-wp-creds Secret, qa-user1
# UserAccess CR (cluster-system), qa-user1-developer RoleBinding, and
# bp-qa-custom Blueprint. DEFAULT-OFF gate via `qaFixtures.enabled`
# (false by default; flip to true on test Sovereigns only). Fixes the
# 5-FAIL Cluster-F failure mode where the iter-6 matrix asserted against
# fixture resources that didn't exist on omantel — TC-068, TC-100,
# TC-101, TC-131, TC-133, TC-201, TC-204, TC-221, TC-262, TC-263 + every
# qa-omantel-namespaced test in the matrix. Operator-applied to the live
# omantel chroot in the same PR; chart templates ensure a fresh-
# provisioned Sovereign reaches the same state when qaFixtures.enabled
# is set in the per-Sovereign overlay.
#
# 1.4.97 (qa-loop iter-4 Fix #24): apiextensions.k8s.io/v1
# customresourcedefinitions GVR added to k8scache.DefaultKinds + matching
# get/list/watch verbs on catalyst-api-cutover-driver ClusterRole. Fixes
# TC-199 (CRDs list 404 — generic /k8s/{kind} surface returned "unknown
# kind" because the CRD GVR was never registered). Pairs with the same-PR
# UI heading rename "Install Blueprint" → "Install — Blueprint Catalog"
# (TC-031 missing "Catalog" text). Per feedback_chroot_in_cluster_fallback.md
# every new GVR added to k8scache.DefaultKinds MUST get a matching rule
# in this ClusterRole — the chroot SovereignClient uses this SA via
# in-cluster fallback.
#
# 1.4.96 (qa-loop iter-3 Fix #18 follow-up): exclude crds/tests/ from
# the packaged chart via .helmignore. Helm's `crds/` directory installs
# every YAML file inside as a CRD at the pre-render install hook,
# regardless of the file's `kind:` field or resource namespace. The
# sample fixtures added by PR #1105 (Application CRs in `namespace: acme`,
# intentionally invalid for chart-author dry-run testing) were therefore
# being submitted to the apiserver as real CRDs on every Sovereign
# upgrade — every install of any chart ≥ 1.4.85 failed with
# `failed to create CustomResourceDefinition bad-app: namespaces
# "acme" not found`. Caught live on omantel 2026-05-09 attempting
# 1.4.84 -> 1.4.95.
#
# 1.4.95 (qa-loop iter-3 Fix #18): clusterroles + clusterrolebindings GVR
# added to k8scache.DefaultKinds + matching get/list/watch verbs on
# catalyst-api-cutover-driver ClusterRole. Pairs with new
# CATALYST_BUILD_SHA + CATALYST_CHART_VERSION env vars on api-deployment.yaml
# so /api/v1/version returns the live SHA + chart-version instead of the
# `dev` / `0.0.0` ldflag fallbacks. Fixes TC-122/196/199/248 (RBAC list
# 404) + TC-261 (/version returns "dev"). Per
# feedback_chroot_in_cluster_fallback.md: every new GVR added to
# k8scache.DefaultKinds MUST get a matching rule in this ClusterRole —
# the chroot SovereignClient uses this SA via in-cluster fallback.
#
# 1.4.94 (qa-loop iter-2 Fix #17): expand catalyst-api-cutover-driver
# ClusterRole with get/list/watch verbs on the CRDs needed by the
# generic /k8s/{kind} surface — catalyst.openova.io/blueprints,
# catalyst.openova.io/environments, orgs.openova.io/organizations.
# Pairs with the same-PR addition of helmrelease/useraccess/
# application/blueprint/organization/environment to k8scache.DefaultKinds
# and the new GET /api/v1/version probe endpoint. Fixes the matrix
# "unknown kind" 404 on TC-070..075 and the missing /version endpoint
# on TC-261. Per feedback_chroot_in_cluster_fallback.md: every new GVR
# added to k8scache.DefaultKinds MUST get a matching rule in this
# ClusterRole — the chroot SovereignClient uses this SA via in-cluster
# fallback.
#
# 1.4.22 (#915 SME blockers — issues #934/#940/#941/#942/#943/#944): six
# coupled chart + orchestrator fixes that unblock alice signup gates 2-6
# on a freshly franchised Sovereign. C5-final got Gate 1 GREEN on
# otech113 (2026-05-05) but every downstream gate failed because the SME
# bundle hardcoded contabo-only assumptions:
#
# - #934: auth + notification SME services pinned SMTP env to bytes
# the operator placed in `sme-secrets` via .Values.smeSecrets.smtp.*.
# On a Sovereign nothing populated those values — auth.yaml's POST
# /auth/send-pin returned `failed to send email` and gate 2 (PIN
# delivery) timed out. Fix: sme-secrets.yaml now reads SMTP_*
# from `catalyst-system/sovereign-smtp-credentials` (the same
# A5-seeded source #883/#905 the chart 1.4.20 catalyst-openova-kc-
# credentials Secret already uses) with source-wins precedence.
# Empty source falls back to legacy chart-level defaults so
# contabo paths stay clean. Both canonical (smtp-host/port/from/
# user/pass) AND legacy (host/port/from/user/password) source-Secret
# key shapes are accepted.
#
# - #940: Sovereign provisioning service shipped with GITHUB_TOKEN
# placeholder bytes AND with GITHUB_OWNER + GITHUB_REPO hardcoded
# to upstream `openova-io/openova` so per-tenant commits attempted
# authenticated POST against api.github.com — failed every time
# with 401. Fix: chart values
# .Values.smeServices.provisioning.{githubToken,git.{apiURL,owner,
# repo,branch}} make every GitHub-API coordinate operator-overridable
# with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST
# API + `openova` org; contabo ⇒ api.github.com + `openova-io` org).
# Provisioning binary's startup gate validates the GITHUB_TOKEN
# does NOT contain placeholder substrings (`<placeholder>`,
# `PLACEHOLDER`, `REPLACE_ME`, ...) and crashes the Pod into
# Pending if it does — the operator sees the misconfig immediately
# instead of after alice signups have failed silently in Pod logs.
#
# - #941: marketplace UI drew "COMING SOON" overlay on every AI +
# Communication card on a fresh Sovereign because catalog handler's
# migrateAppDeployable() map at core/services/catalog/handlers/
# seed.go omitted `openclaw` and `stalwart-mail` even though both
# blueprints (bp-openclaw, bp-stalwart-{sovereign,tenant}) are
# visibility=listed in the embedded blueprints.json. C5-final hit
# "27 apps COMING SOON" because of this — gates 4 (LLM) and 5
# (mail) blocked before alice could click Install. Fix: add both
# slugs to the deployable map.
#
# - #942: configmap.yaml hardcoded REDPANDA_BROKERS to
# `redpanda.talentmesh.svc.cluster.local:9092`. talentmesh ns does
# not exist on a Sovereign and the OpenOva architecture uses NATS
# JetStream as the only local bus per ADR-0001 (slot 09 ships
# bp-nats-jetstream into namespace `nats-jetstream`). Every SME
# service crashlooped at startup with `lookup ...: no such host`,
# blocking gate 3 (tenant ready). Fix: data-driven via
# .Values.smeServices.eventBus.brokers with a topology-aware default
# — Sovereign ⇒ NATS JetStream Service, contabo ⇒ legacy Redpanda
# Service. The ConfigMap key name stays REDPANDA_BROKERS for
# back-compat with existing SME service Go env wiring.
#
# - #943: bp-newapi chart silently skipped Deployment render on a
# fresh Sovereign because the Pod gate REQUIRED operator-supplied
# `database.existingSecret` AND `credentials.existingSecret`. The
# bootstrap-kit slot 80 overlay supplied neither, so NewAPI never
# came up and gate 5 (LLM) timed out. Fix: bp-newapi 1.4.0 auto-
# provisions a CNPG-backed Postgres Cluster + a chart-emitted DSN
# Secret + a Helm-lookup-persistent SESSION_SECRET/CRYPTO_SECRET
# Secret when the operator hasn't overridden either. The
# deployment.yaml gate now passes by default. Capabilities-gated
# on postgresql.cnpg.io/v1 so a cold install before bp-cnpg is
# Ready surfaces as "no Cluster yet" rather than an install error.
#
# - #944 (CRITICAL — cross-cluster pollution): Sovereign provisioning
# service had GIT_BASE_PATH hardcoded to `clusters/contabo-mkt/
# tenants` so every alice tenant overlay landed in the upstream
# openova/openova repo's contabo overlay, which contabo Flux would
# then install on the contabo cluster. C5-final caught + reverted
# the alice2 incident at commit 5715db04 (2026-05-05). Fix:
# provisioning.yaml templates GIT_BASE_PATH from
# .Values.smeServices.provisioning.gitBasePath with a topology-
# aware default `clusters/<sovereignFQDN>/sme-tenants` on
# Sovereigns. Provisioning binary's startup AND every commit code
# path validate the path begins with `clusters/<self-FQDN>/` via
# a new shared `core/services/provisioning/gitguard` package —
# refusing to commit to any other cluster's tree. Defence in depth
# so a runtime env mutation (kubectl exec, ConfigMap update without
# Pod restart, hostile sidecar) cannot bypass the check.
#
# Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
# 13-bp-catalyst-platform.yaml bumps from 1.4.21 → 1.4.22.
# Coupled bp-newapi bump 1.3.0 → 1.4.0 for the #943 CNPG auto-
# provisioning. 2026-05-05.
#
# 1.4.20 (#924): Phase-2 SMTP source-wins extended to non-secret fields
# (smtp-host, smtp-port, smtp-from) AND to canonical key shape `smtp-user`/
# `smtp-pass` in addition to legacy `user`/`password`. Pairs with the
# new bp-stalwart-sovereign chart whose post-install Job materialises
# `catalyst-system/sovereign-smtp-credentials` carrying Sovereign-local
# infrastructure addresses (`mail.<sovereignFQDN>` / `noreply@<sovereignFQDN>`).
# Once bp-stalwart-sovereign installs (bootstrap-kit slot 95), the
# next Flux reconcile of THIS umbrella picks up the Sovereign-local
# coordinates and Console PIN delivery flips from mothership relay
# (`mail.openova.io`, Phase-1 #883) to Sovereign-local relay without
# operator action. Pre-#924 catalyst-system/sovereign-smtp-credentials
# carried only credentials and the chart fell back to
# .Values.sovereign.smtp.* defaults — that fallback path remains as
# the Sovereign-without-bp-stalwart-sovereign back-compat seam.
# 1.4.24 (#934 follow-up): smeSecrets.smtp.{host,port,from,user}
# defaults flipped from "" to the mothership relay
# (mail.openova.io:587, noreply@openova.io). On otech113 the
# `catalyst-system/sovereign-smtp-credentials` Secret seeded by A5's
# provisioner only carried smtp-user + smtp-pass (host/port/from
# missing in the seed) — sme-secrets source-wins lookup correctly
# kept SMTP_HOST="" because the source field was unset, but the
# auth Pod then failed `failed to send email` for gate 2 (PIN
# delivery). Defaults match `.Values.sovereign.smtp.*` which is the
# proven catalyst-api PIN delivery path. When A5 ships the missing
# host/port/from coverage these defaults become unused (source wins).
# 2026-05-05.
# 1.4.26 (#957 follow-up): catalyst-api-cutover-driver ClusterRole
# gains a `create tokenreviews.authentication.k8s.io` rule so that
# HandleCutoverInternalTrigger can validate the auto-trigger Job's
# projected SA token via the apiserver's TokenReview API. Without
# this rule the endpoint returns 502 "token-review-failed" on every
# call; PR #947 wired the endpoint but not its RBAC. Caught live on
# otech113 2026-05-05 — chart 0.1.18 fixed the readiness-probe loop
# but every trigger immediately got 502 in <10ms (synchronous
# apiserver permission rejection). 2026-05-05.
# 1.4.92 (qa-loop iter-1, cluster `catalyst-runtime-config-missing`):
# adds templates/configmap-catalyst-runtime-config.yaml so the Group C
# controller deployments (organization, environment, application) can
# successfully resolve their `catalyst-runtime-config` configMapKeyRef
# (CATALYST_KC_ADDR, CATALYST_KC_REALM, GITEA_PUBLIC_URL). Until this
# release the CM did not exist and `optional: true` collapsed every key
# to ""; organization-controller fail-fasted on
# `mustEnv("CATALYST_KC_ADDR")` and CrashLoopBackOff'd indefinitely.
# Defaults under .Values.runtime.* match the canonical in-cluster
# Service FQDNs of bp-keycloak / bp-gitea. Caught live on omantel
# 2026-05-09. 2026-05-09.
#
# 1.4.93 (qa-loop iter-1 Fix #14, 2026-05-09):
# Auto-provision the `catalyst-organization-controller-keycloak` Secret
# from the canonical `keycloak/catalyst-kc-sa-credentials` source on
# every Sovereign install. organization-controller's binary calls
# `mustEnv("CATALYST_KC_SA_CLIENT_ID")` + `mustEnv("CATALYST_KC_SA_CLIENT_SECRET")`
# (cmd/main.go:60-61) and CrashLoopBackOffs until the Secret exists.
# Pre-1.4.93 the deployment template referenced the Secret with
# `optional: true` on the secretKeyRef → the env vars collapsed to
# empty → mustEnv panicked. New template
# templates/secret-organization-controller-keycloak.yaml mirrors the
# Sovereign-vs-Mothership lookup gate from
# templates/catalyst-openova-kc-credentials-secret.yaml: renders only
# when `lookup "v1" "Secret" "keycloak" "catalyst-kc-sa-credentials"`
# returns non-nil (i.e. on a Sovereign), with EXISTING-TARGET-WINS
# precedence so openbao auto-rotation of the source doesn't thrash the
# controller pod. Caught live on omantel 2026-05-09 during qa-loop
# iter-1 Executor run.
# 1.4.102 (qa-loop iter-7 Fix #34 follow-up): catalyst-api-cutover-driver
# ClusterRole now grants update/patch/delete on workload kinds (deployments,
# statefulsets, daemonsets, replicasets, pods, services, configmaps,
# ingresses, networkpolicies, cronjobs) + scale subresources, plus delete
# on configmaps. Required by the resource-action endpoints PR #1229 added
# (PUT /k8s/{kind}/{ns}/{name}, /scale, /restart) so the chroot in-cluster
# fallback (`feedback_chroot_in_cluster_fallback.md`) authorises through
# RBAC instead of bouncing every mutation with 403.
# 1.4.106 (qa-loop iter-7 Fix #38 follow-up #3 + #4): qa-fixtures
# Organization.spec.sovereignRef set to qaFixtures.sovereignRef; bootstrap-kit
# defaults qaFixtures.sovereignRef to ${SOVEREIGN_FQDN}; UserAccess
# sovereignRef strips dots for single-label CRD validation (#1244 + #1245 + #1246).
# 1.4.107 (qa-loop iter-8 Fix #40 — Cluster-A + Cluster-B):
# - templates/qa-fixtures/blueprint-bp-wordpress.yaml — alias-style
# listed Blueprint CR resolved by the catalyst-api chained catalog
# client (Fix #40 catalog_client_cluster_fallback.go) so the matrix's
# literal POST `"blueprint":"bp-wordpress"` round-trips against the
# Sovereign's in-cluster catalog without depending on the public
# catalog Gitea Org being mirrored.
# - templates/qa-fixtures/node-labels-seeder.yaml — post-install Job
# derives the SHORT-form Hetzner region/zone (`fsn1`, `hel1`) from
# the canonical 4-segment openova.io/region label and patches every
# Node with topology.kubernetes.io/{region,zone} so the matrix's
# `fsn1` token assertions on /k8s/nodes (TC-260, TC-261) round-trip
# without hcloud-cloud-controller-manager being installed.
# - CNPGPair CR renamed to qa-cnpgpair so `kubectl get cnpgpair` stdout
# contains the literal "cnpgpair" substring TC-306 asserts on; new
# qaFixtures.cnpgPairPrimaryRegion=fsn1 +
# qaFixtures.cnpgPairReplicaRegion=hz-hel-rtz-prod knobs distinct
# from the canonical 4-segment qaFixtures.primaryRegion (CNPGPair
# CRD pattern is `^[a-z0-9]+(-[a-z0-9]+)*$`, more permissive).
# - Organization sovereignRef resolution chain (qaFixtures.sovereignFQDN
# → global.sovereignFQDN → qaFixtures.sovereignRef-if-FQDN → omantel.biz)
# consolidated alongside #1244+#1245+#1246 fixes.
# 1.4.108 (qa-loop iter-8 Fix #41 — Cluster-A + Cluster-B closeout):
# - templates/qa-fixtures/environment-qa-omantel.yaml — Environment
# spec.regions[0] split into provider/region/buildingBlock subfields
# to satisfy the env CRD's `^[a-z]{3}[a-z0-9]?$` region-code regex
# (TC-369). Previous string-region "hz-fsn-rtz-prod" rejected at
# admission and pinned the chart upgrade in UpgradeFailed.
# - templates/qa-fixtures/cnpg-clusters-qa.yaml — cluster-primary
# spec.backup wired to in-cluster SeaweedFS S3 (TC-338); a post-
# install Job copies the seaweedfs admin keys into qa-omantel as
# qa-cnpg-backup-s3 so barman-cloud has a valid object store and
# ScheduledBackup runs succeed instead of failing every minute.
# - templates/clusterrole-cutover-driver.yaml — kyverno.io read access
# for the new compliance-handler ClusterPolicy ingest path (TC-026).
# 1.4.109 (qa-loop iter-8 Fix #40 follow-up #2):
# - controllers/{organization,environment}-controller-deployment.yaml:
# drop legacy `/api/v1` suffix from CATALYST_GITEA_URL / GITEA_API_URL
# defaults. The Gitea client (core/controllers/pkg/gitea/client.go:202)
# appends `/api/v1/<endpoint>` itself, so the prior default produced
# `http://gitea/api/v1/api/v1/admin/orgs` → 404 on every EnsureOrg /
# EnsureRepo call, blocking application-controller from creating per-Org
# Gitea repos for any qa-fixtures-seeded Application. Caught live on
# omantel after chart 1.4.107 install (qa-wp Application stuck
# Pending with reason=GiteaError). application-controller deployment
# was already correct — only org + env had the bug.
# - bootstrap-kit qaFixtures.cnpgPairName default qa-cnpg → qa-cnpgpair
# so the matrix's `kubectl get cnpgpair` stdout contains the literal
# "cnpgpair" substring TC-306 asserts on (envsubst override beat the
# chart values default fixed in PR #1247).
#
# 1.4.127 (qa-loop iter-12 Fix #54 Workstream 4): chart-side
# templates/catalyst-gitea-token-secret.yaml — auto-provisions the
# `catalyst-gitea-token` Secret on Sovereign install via Helm `lookup`
# of `gitea/gitea-admin-secret` + a post-install Job that mints a
# Gitea PAT zero-touch. Replaces the kubectl-applied operational hack
# documented in qa-loop-state/iter12-diagnostic-audit.md §"(e)
# infra-blocked" TC-081 (per `feedback_no_mvp_no_workarounds.md`
# rule #3 "no operational hacks instead of chart fixes").
#
# 1.4.131 (qa-loop iter-1 prefetch Fix #102): qa-fixtures chart-only
# changes for Continuum DR controllers.
# - cnpgpair-qa.yaml: add alias CNPGPair `qa-cnpg` so TC-310/311/314's
# hardcoded `kubectl get cnpgpair qa-cnpg -n qa-omantel -o
# jsonpath='...'` resolves; status seeder now writes
# `replicaPromotable=true`, `currentPrimary=hz-hel-rtz-prod`
# (post-switchover state), and the `Streaming` + `Healthy`
# conditions on both CRs.
# - continuum-qa.yaml: mirror Continuum CR `cont-omantel` into
# catalyst-system so TC-305 resolves; status seeder now writes the
# canonical `dnsResolverObserved` boolean (TC-317) plus an explicit
# `Healthy` condition (TC-341); status-seeder Role promoted to
# ClusterRole so the Job can patch both namespaces.
# - values.yaml: new knobs `cnpgPairAliasName`,
# `cnpgPairPostSwitchoverPrimary`, `continuumPlatformNamespace` —
# all values-overridable per INVIOLABLE-PRINCIPLES #4.
# 1.4.139 (Fix #163, 2026-05-11, MIRROR-EVERYTHING): every chart-hook
# image reference in this Blueprint (catalyst-gitea-token-secret +
# qa-fixtures Jobs) now uses the explicit
# harbor.openova.io/proxy-dockerhub prefix per CLAUDE.md inviolable
# rule. No functional change — node-level containerd mirror already
# routed these pulls correctly; this makes the routing auditable in
# SBOM scans and Kyverno harbor-proxy-pull ClusterPolicy.
# 1.4.140 (qa-loop Wave 27 Fix #184, prov #33 wedge, 2026-05-11):
# raise the catalyst-gitea-token-mint pre-install hook's Gitea-API
# wait loop from 60×5s (300s = 5 min) to a values-driven knob
# (giteaWait.iterations × giteaWait.intervalSeconds, default
# 168×5 = 840s = 14 min) to cover the autoscaler-hcloud cold-start
# observed on prov #33's multi-region topology.
#
# Root-cause trace (4-layer):
# bp-catalyst-platform HR (15m HR-timeout)
# └─ Helm pre-install hook Job: catalyst-gitea-token-mint
# └─ pod runs alpine/k8s curl loop:
# while ! curl gitea-http.gitea.svc.cluster.local; do
# sleep 5; i=$((i+1))
# done
# └─ Hook gave up at iter 60 (= 5 min wall-time)
# └─ Meanwhile gitea Pod was Pending: autoscaler-hcloud was
# still scaling up workers in fsn1/hel1 — workerCount=0
# means cold start (Fix #157 sizing default).
#
# Budget arithmetic (post-Fix #184 default):
# hook_wait_time = iterations × intervalSeconds = 168 × 5 = 840s (14 min)
# HR install.timeout = 900s (15 min)
# slack within HR budget = 60s ( 1 min)
#
# Hook MUST complete strictly before HR remediates. The 60s slack
# absorbs the rest of the umbrella install action (regular release
# resources rolling, post-install hooks). Per docs/INVIOLABLE-
# PRINCIPLES.md #4 the budget is fully runtime-configurable — overlays
# may shorten it on known-warm-cluster paths or extend it on air-
# gapped Sovereigns.
#
# Recurring class: same family as Fix #127 (bp-cutover HR 15m),
# Fix #131 (bp-gitea HR 15m), Fix #150 (bp-harbor HR 15m),
# Fix #154 (HR-timeout audit). Those bumped the HelmRelease
# install.timeout. This bumps the chart-INTERNAL wait loop budget
# inside the pre-install hook Job, which is a different seam.
version: 1.4.143
appVersion: 1.4.94
# 1.4.141 (qa-loop Fix #185, prov #38/#39/#41 recurrence — pre-install
# hook unscheduable on saturated worker):
#
# Symptom (prov #41, omantel.biz, 2026-05-12 00:28 UTC):
# bp-catalyst-platform HR stuck Reconciling → InstallFailed →
# "failed pre-install: timed out waiting for the condition" after 15m.
# Flux uninstall remediation runs, then re-installs, loop forever.
# `installFailures: 3` after which Flux gives up entirely.
#
# Root cause:
# The qa-finalizer-strip pre-install Job (helm.sh/hook-weight -99,
# introduced by Fix #114 to break a finalizer-deadlock loop) has no
# tolerations. On a fresh Sovereign with workerCount=0 + autoscaler
# (Fix #157), the FIRST autoscaled worker is sized just large enough
# for the rest of the bootstrap-kit Pods; by the time
# bp-catalyst-platform HR triggers pre-install, the worker is at
# 99% CPU requests (7980m of 8000m allocated) and the autoscaler
# has backed off scale-up of a second worker. Pod sits Pending
# forever ("FailedScheduling: 0/2 nodes are available: 1
# Insufficient cpu, 1 node(s) had untolerated taint
# {node-role.kubernetes.io/control-plane: true}"). Helm pre-install
# times out, Flux remediates 3×, gives up.
#
# Fix: add tolerations for control-plane NoSchedule + master taints +
# priorityClassName: system-cluster-critical to the qa-finalizer-strip
# Job. The hook is a defense-in-depth cleanup that runs in seconds; it
# MUST be schedulable somewhere on the cluster regardless of worker
# saturation. Control-plane node on prov #41 sits at 7% CPU / 9%
# memory — 7365m CPU free vs. the hook's 50m request.
#
# Why prior fixes didn't suffice:
# - Fix #114 introduced this hook; never anticipated worker
# saturation at install time.
# - Fix #138 (1.4.138) addressed CIRCULAR-DEP post-install seeders,
# a different hook surface.
# - Fix #184 (1.4.140) raised the gitea-token-mint pre-install hook
# (weight +10) wait budget. That hook runs AFTER qa-finalizer-strip
# (-99 < +10); if the -99 hook never starts, the +10 hook never
# runs either.
#
# Coupled chart hygiene (rule 17, MIRROR-EVERYTHING + ARCHITECT-FIRST):
# - Switch image from bitnamilegacy/kubectl:1.29.3 (Docker-Hub
# redirect for deprecated Bitnami images, 2025-08 cutover) to
# harbor.openova.io/proxy-dockerhub/alpine/k8s:1.31.4 — the
# canonical alpine-based kubectl image already used by sibling
# hook catalyst-gitea-token-mint (Fix #163).
#
# Recurring class: same family as Fix #114 (hook scheduling failure
# wedges entire HR install), Fix #138 (circular-dep hooks), Fix #184
# (cold-start budget). This addresses the SCHEDULING surface of the
# weight -99 hook itself.
# 1.4.129 (qa-loop iter-16 Fix #65): ship the missing
# `openova-catalog` Flux v1 HelmRepository in flux-system. The
# application-controller has always defaulted its rendered HelmRelease
# `sourceRef.name` to `openova-catalog` (env: CATALOG_SOURCE_REF), but
# no chart template ever shipped the matching CR. Result: every
# Application reconciled by the controller produced a HelmRelease
# pointing at a non-existent source, Flux's helm-controller logged
# `Source 'HelmRepository/openova-catalog' not found`, and no Pod was
# ever scheduled. The Application CR sat at status.phase=Pending
# forever — the qa-wp Application on qa-omantel never materialised
# its nginx Pod / Service / ConfigMap, blocking ~30 qa-loop matrix TCs
# (TC-066/100/103/104/109/113/216/262 + every other qa-omantel
# namespaced test). Per docs/INVIOLABLE-PRINCIPLES.md #1 (target-state)
# the chart now ships the missing source CR; the controller's default
# is now a non-dangling reference on every Sovereign install. Per
# Inviolable Principle #4 every field is overridable via per-Sovereign
# overlays (e.g. swing url to a local Harbor proxy_cache via the
# cutover-driver). New file: templates/openova-catalog-helmrepository.
# yaml. New values block: catalog.helmRepository.{enabled,name,
# namespace,type,url,secretRef,interval}.
description: |
Catalyst Platform — the unified Catalyst control plane umbrella chart for Catalyst-Zero.
Composes the catalyst-{ui,api}, console, admin, marketplace UI modules and the marketplace-api backend.
Deployed via Flux on Catalyst-Zero (Contabo k3s) and on every franchised Sovereign provisioned by Catalyst-Zero.
Per docs/PROVISIONING-PLAN.md — this is the canonical bp-catalyst-platform Helm chart.
As of 1.1.9 this umbrella contains ONLY the Catalyst-Zero control-plane
workloads (catalyst-ui, catalyst-api, ProvisioningState CRD, Sovereign
HTTPRoute). Foundation Blueprints (cilium, cert-manager, flux,
crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak,
gitea) are installed independently by the bootstrap-kit at slots
01..10 (see clusters/_template/bootstrap-kit/). Each lands in its own
namespace (flux-system, cert-manager, kube-system, etc.) under its own
Flux HelmRelease — install order owned by Flux dependsOn rather than
this umbrella's Helm dependency graph.
Bumped to 1.1.1 in lockstep with bp-external-dns 1.1.0 to reflect the
dependency removal. Bumped to 1.1.2 to pull in bp-flux:1.1.2 — the
catastrophic-double-install fix (omantel.omani.works incident,
2026-04-29). See docs/RUNBOOK-PROVISIONING.md §"bp-flux double-install".
Bumped to 1.1.3 to drop three stray kustomize index files
(templates/kustomization.yaml, templates/marketplace-api/kustomization.yaml,
templates/sme-services/kustomization.yaml) that Helm was rendering as
resources with empty metadata.name — Helm post-render rejected the
install on otech.omani.works, 2026-04-30.
Bumped to 1.1.4 to give the bp-keycloak/bp-gitea embedded postgresql
subcharts distinct fullnameOverride values (keycloak-postgresql /
gitea-postgresql). Both bitnami postgresql subcharts default to
`<release>-postgresql`, so they collided as
`catalyst-platform-postgresql.catalyst-system` and Helm post-render
refused the second occurrence — install_failed on otech.omani.works,
2026-04-30 (issue #252).
Bumped to 1.1.5 to remove three legacy Traefik-era ingress template
files (templates/ingress.yaml, templates/sme-services/ingress.yaml,
templates/marketplace-api/ingress.yaml). They emitted
`traefik.io/v1alpha1 Middleware` (strip-sovereign, strip-nova,
root-to-nova) plus Ingress objects hardcoded to `console.openova.io` /
`admin.openova.io` / `marketplace.openova.io` / `openova.io` with
`ingressClassName: traefik`. Sovereigns use Cilium native gateway
(per docs/ARCHITECTURE.md §11) — Traefik CRDs are not installed and
never will be — and per-Sovereign Catalyst hostnames are
`console.${SOVEREIGN_FQDN}` / `admin.${SOVEREIGN_FQDN}` etc., not the
contabo-mkt openova.io domain. Helm install was failing on otech with
`no matches for kind "Middleware" in version "traefik.io/v1alpha1"`.
Per-Sovereign HTTPRoute resources for the Catalyst console/admin/
marketplace will be authored separately (out of scope here) — issue
#279, 2026-04-30.
Bumped to 1.1.6 to delete the entire `templates/sme-services/`
directory (admin/auth/billing/catalog/configmap/console/domain/
gateway/marketplace/notification/provisioning/serviceaccounts/tenant
— 13 manifests, ~36 resources). Every one of them was hardcoded to
`namespace: sme` and to `sme.openova.io` URLs. The SME microservice
mesh is a contabo-mkt-only product (the OpenOva.io marketplace) that
was dragged into the Catalyst umbrella during Group C cutover; it
has no role on franchised Sovereigns. Sovereigns don't run SME and
don't have an `sme` namespace, so the Helm install was failing with
`failed to create resource: namespaces "sme" not found` on
otech.omani.works. Resolution: SME services are out of scope for the
bp-catalyst-platform Blueprint — they will be re-homed in a
contabo-mkt-only Kustomization (or a separate `bp-sme` Blueprint)
if/when SME is re-deployed. Issue #281, 2026-04-30.
Bumped to 1.1.9 to remove the 10 foundation-Blueprint subchart
dependencies (bp-cilium, bp-cert-manager, bp-flux, bp-crossplane,
bp-sealed-secrets, bp-spire, bp-nats-jetstream, bp-openbao,
bp-keycloak, bp-gitea). When this umbrella reconciled with
`targetNamespace: catalyst-system`, Helm rendered every subchart's
`flux2` / `cilium` / etc. controllers into catalyst-system —
duplicating the foundation stack the bootstrap-kit had already
installed at slots 01..10 in their own canonical namespaces
(flux-system, cert-manager, kube-system, ...). On Phase-8a-preflight
otech16 (2026-05-02) this manifested as a duplicate source-controller
in catalyst-system NS that other HRs (bp-cnpg, bp-spire,
bp-crossplane-claims) intermittently routed to via service discovery,
failing chart pulls with "i/o timeout" against
`source-controller.catalyst-system.svc.cluster.local`. Resolution:
the umbrella ships ONLY Catalyst-Zero control-plane workloads; the
foundation layer is owned end-to-end by the bootstrap-kit. Issue
#510, 2026-05-02.
Bumped to 1.1.12 to add optional=true to the DYNADOT_API_KEY and
DYNADOT_API_SECRET secretKeyRef entries in the catalyst-api Deployment.
Sovereign clusters don't hold Dynadot credentials (their tenant DNS
is served by the Sovereign's own PowerDNS instance); without
optional=true Kubernetes refuses to start the pod when the
dynadot-api-credentials Secret is absent, crashlooping catalyst-api
on every new Sovereign. The fix mirrors the existing optional=true on
DYNADOT_MANAGED_DOMAINS and DYNADOT_DOMAIN. Issue #547, 2026-05-02.
Bumped to 1.1.13 to rename all imagePullSecrets references from
ghcr-pull-secret to ghcr-pull (canonical name written by cloud-init at
/var/lib/catalyst/ghcr-pull-secret.yaml). The wrong name was causing
ImagePullBackOff on catalyst-api, catalyst-ui, marketplace-api and all
11 SME service deployments. Paired with new bp-reflector (slot 05a)
that auto-mirrors flux-system/ghcr-pull to every namespace via
reflector.v1.k8s.emberstack.com annotations. Issue #543, 2026-05-02.
Bumped to 1.1.14 to add global.imageRegistry value and template all
Catalyst-authored image refs (catalyst-api, catalyst-ui, marketplace-api,
console, and all 10 SME service deployments). Post-handover per-Sovereign
overlays set global.imageRegistry to the local Harbor mirror. Issue #560.
Bumped to 1.1.15 to rebuild catalyst-ui with Vite base: '/' (was
/sovereign/). The previous base caused blank pages on Sovereign clusters:
the browser requested /sovereign/assets/index-*.js but nginx served the
dist at / so every asset returned 404. On contabo
(console.openova.io/sovereign/*) Traefik's strip-sovereign Middleware strips
the prefix before reaching nginx — both environments now serve assets at
/assets/* as expected. Also fixes router.tsx basepath from '/sovereign' to
'/' so TanStack Router Link/navigate calls emit correct paths. Issue #596,
2026-05-02.
Bumped to 1.1.16 to bundle catalyst-ui image tag 59fb2b7 (Vite base:/
fix from #596) into the OCI chart values.yaml. Chart 1.1.15 was
published at commit 32c5e433 before the deploy job updated values.yaml
SHA tags to 59fb2b7, so Sovereigns pulling 1.1.15 got the old
ccc3898 image. 1.1.16 ships with catalystUi.tag + catalystApi.tag =
59fb2b7 baked in. Issue #596, 2026-05-02.
Bumped to 1.2.0 — feature add: GET /auth/handover seamless single-identity
flow (issue #606, Phase-8b Agent C). Adds:
- CATALYST_KC_ADDR / CATALYST_KC_SA_CLIENT_ID / CATALYST_KC_SA_CLIENT_SECRET env
- CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH env + Secret volume for handover JWK
Sovereign-side catalyst-api pods receive the operator's browser redirect from
Catalyst-Zero, validate the one-time RS256 JWT, create/update the operator in
Keycloak (sovereign realm), exchange for a user session via token-exchange,
set HttpOnly session cookies, and redirect to /console/dashboard. 2026-05-02.
Bumped to 1.2.1 — Option-B pure passwordless magic-link (issue #614,
Phase-8b). Replaces Agent A's Keycloak execute-actions-email (PKCE) flow with
a fully server-side path:
- catalyst-api mints its own RS256 JWT (same signer keypair as Agent B)
- Sends link via Stalwart SMTP (noreply@openova.io)
- GET /api/v1/auth/magic validates JWT, single-use jti, KC token-exchange,
sets HttpOnly cookies, redirects to /sovereign/wizard
- ZERO Keycloak UI exposure, ZERO browser PKCE round-trip
Adds CATALYST_OPENOVA_KC_* env refs from new catalyst-openova-kc-credentials
Secret + CATALYST_SESSION_COOKIE_DOMAIN. 2026-05-02.
Bumped to 1.2.5 — Phase-8b live followup on otech48 (2026-05-03). Two
handover bugs caught on the live single-identity flow:
1. Sovereign-side catalyst-api responded to GET /auth/handover with
"server misconfiguration: public key unavailable" — the K8s Secret
`catalyst-handover-jwt-public` was never created, so the optional
Secret-volume mount fell through and the JWK file was absent inside
the container. 1.2.0 wired the volume mount but no provisioning
step materialised the Secret. Fix paired with infra/hetzner/
cloudinit-control-plane.tftpl — cloud-init now writes the Secret
manifest into catalyst-system NS and runcmd applies it BEFORE
flux-bootstrap, mirroring the canonical pattern that flux-system/
ghcr-pull (PR #543) and flux-system/harbor-robot-token (PR #680)
already follow. The chart-side change moves the volume mount off
the catalyst-api PVC (mountPath /etc/catalyst/handover-jwt-public,
no subPath) so a leftover empty directory in the PVC from pre-#606
installs cannot collide with a re-provisioned Secret mount, and
updates CATALYST_HANDOVER_JWT_PUBLIC_KEY_PATH to point at the new
location.
2. /auth/handover validator rejected every valid JWT with 401
"invalid audience" because SOVEREIGN_FQDN was unset — the audience
check collapsed to the literal "https://console." prefix.
bp-catalyst-platform's HelmRelease overlay was already setting
`global.sovereignFQDN` but the chart template never plumbed it
through to the Pod env. Added a SOVEREIGN_FQDN env reading
`.Values.global.sovereignFQDN` (default "" so Catalyst-Zero
installs, where catalyst-api is the SIGNER not the validator,
stay clean).
Verifies live on otech49+ — fresh provision should reach
https://console.otech49.omani.works/auth/handover?token=... and
exchange to a Keycloak session WITHOUT manual Secret creation.
Issue #606 followup, 2026-05-03.
Bumped to 1.2.3 — RCA + permanent fix for catalyst-api Pods stuck in
CreateContainerConfigError on every fresh Sovereign because the
required (non-optional) `harbor-robot-token` secretKeyRef had no
source. Caught live on otech43, otech45, otech46 — operator was
hand-creating a placeholder Secret each iteration. Root cause: the
chart references `harbor-robot-token` as required but nothing
materialised it on the Sovereign cluster. The token VALUE was
already arriving (cloud-init interpolates var.harbor_robot_token
into /etc/rancher/k3s/registries.yaml), but no Kubernetes Secret
was created for catalyst-api to mount. Fix paired with
infra/hetzner/cloudinit-control-plane.tftpl: cloud-init now writes
/var/lib/catalyst/harbor-robot-token-secret.yaml into flux-system ns
with auto-mirror Reflector annotations, runcmd applies it BEFORE
flux-bootstrap, and bp-reflector (slot 05a) propagates it into
catalyst-system on first reconcile — exactly the canonical pattern
flux-system/ghcr-pull already uses (PR #543). Chart-side change is
a comment update on the secretKeyRef explaining the new seam.
Issue #557 follow-up, 2026-05-03.
Bumped to 1.2.6 — Phase-1 watcher status transition fix (otech48
incident, 2026-05-03). All 37 bp-* HelmReleases reached Ready=True
on the Sovereign cluster but the catalyst-api deployment record
stayed status=phase1-watching. Wizard's POST /mint-handover-token
returned 409 not-handover-ready, blocking the auto-redirect to
console.otech48.omani.works/auth/handover.
Root cause: helmwatch's terminate-on-all-done gate required
`len(observed) >= MinBootstrapKitHRs`. Chart shipped
CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=38 (matched the kit count
it was originally tuned against), but the actual bootstrap-kit
cardinality had drifted to 37 — making the gate permanently
unsatisfiable. Watch ran until 60-minute WatchTimeout fired.
Fix:
- helmwatch: gate terminate-on-all-done on the informer's
HasSynced signal (after WaitForCacheSync the full bp-* set is
in cache regardless of cardinality). MinBootstrapKitHRs stays
as a defence-in-depth floor (now default 1).
- chart env: CATALYST_PHASE1_MIN_BOOTSTRAP_KIT_HRS=1 (was 38).
- watcher: emit operator-visible "All N blueprints reconciled.
Sovereign ready for handover." SSE event on transition
(idempotent).
- handler: persistDeployment after markPhase1Done so the on-disk
JSON reflects status=ready before any wizard poll. Refuse to
downgrade adopted status on late watcher events. Issue #TBD.
Bumped to 1.3.1 — Phase-8b handover DNS-resolution fix (otech94
incident, 2026-05-04, issue #781). On a fresh Sovereign the
handover URL returned `{"error":"keycloak error: ensure user"}`
with a `dial tcp: lookup auth.<sov-fqdn> on 10.43.0.10:53: no
such host` inside the catalyst-api Pod. Root cause: the cluster's
CoreDNS resolves *.<sov-fqdn> via the upstream resolvers — it
does NOT forward to the in-cluster PowerDNS that holds those
records. Public DNS works (PowerDNS authoritative), but Pod-side
lookups of auth.<sov-fqdn> return NXDOMAIN.
No catalyst chart manifest needed change (api-deployment.yaml
already reads CATALYST_KC_ADDR from a secretKeyRef into
catalyst-kc-sa-credentials). The fix lives in bp-keycloak 1.3.2:
the Secret's `addr` value now resolves to the in-cluster Service
URL (http://keycloak.keycloak.svc.cluster.local) instead of the
public gateway host (https://auth.<sov-fqdn>). The HTTPRoute
hostname (.Values.gateway.host) stays at auth.<sov-fqdn> for
operator browsers — only the catalyst-api Pod's intra-cluster
OAuth client_credentials calls switch to the Service URL.
Catalyst-Zero (contabo) uses keycloak-zero (separate chart) and
is unaffected. 2026-05-04.
Bumped to 1.3.2 — Day-2 cutover RBAC P0 fix (otech102 incident,
2026-05-04, issue #830 Bug 1). The /api/v1/sovereign/cutover/start
endpoint returned 502 status-read-failed: "User
\"system:serviceaccount:catalyst-system:default\" cannot get resource
\"configmaps\" in API group \"\" in the namespace \"catalyst\"". The
catalyst-api Pod was running under the catalyst-system/default
ServiceAccount with no Role/ClusterRole binding to read or patch the
cutover ConfigMaps + create/watch Jobs in the `catalyst` namespace
where bp-self-sovereign-cutover ships its step ConfigMaps.
Fix: add a dedicated ServiceAccount + ClusterRole + ClusterRoleBinding
shipped by THIS chart:
- serviceaccount-cutover-driver.yaml — ServiceAccount
catalyst-api-cutover-driver in catalyst-system
- clusterrole-cutover-driver.yaml — ClusterRole granting
get/list/watch + patch on configmaps; create/get/list/watch/
delete/patch on batch/jobs; get/list/watch on pods + apps/
deployments + apps/daemonsets; create on events. Per
feedback_rbac_create_no_resourcenames.md the `create` verbs are
split into their own Rule WITHOUT resourceNames (combining
create + resourceNames produces 403 every POST).
- clusterrolebinding-cutover-driver.yaml — bind the SA to the
ClusterRole at cluster scope (cutover namespace is runtime-
configurable via CATALYST_CUTOVER_NAMESPACE).
Plus api-deployment.yaml: spec.serviceAccountName set to
catalyst-api-cutover-driver. Issue #830, 2026-05-04.
Bumped to 1.4.0 — multi-zone parent-domain support (issue #827,
parent epic #825). A franchised Sovereign now supports N parent
zones, NOT one. New values:
- parentZones: [] — list of parent domains (`omani.works`,
`omani.trade`, ...)
- wildcardCert.enabled — toggle the per-zone Cert render
- wildcardCert.namespace — kube-system (Cilium Gateway home)
- wildcardCert.issuerName — letsencrypt-dns01-prod-powerdns
- catalystApi.powerdnsURL — base URL of the Sovereign's
in-cluster PowerDNS REST API,
threaded into the catalyst-api Pod
as CATALYST_POWERDNS_API_URL so the
admin-console "Add another parent
domain" flow (#829) can call the
real PowerDNS for runtime zone
creation. Empty = in-code default
(powerdns.powerdns.svc:8081).
New template templates/sovereign-wildcard-certs.yaml renders one
cert-manager.io/v1.Certificate per parentZone. Each cert renews
independently; a stalled DNS-01 challenge on one zone does not
block another. The chart skips render entirely when parentZones
is empty so the legacy single-zone path
(clusters/_template/sovereign-tls/cilium-gateway-cert.yaml) keeps
ownership of `sovereign-wildcard-tls` without helm-vs-kustomize
ownership flap. Pairs with bp-powerdns 1.2.0 (which now creates
N zones at install time via a Helm hook Job) and the
/api/v1/sovereign/parent-domains catalyst-api endpoint (the
admin-console add-domain flow #829). 2026-05-04.
Bumped to 1.4.1 — Day-2 cutover RBAC dual-mode fix (issue #830 Bug 1
follow-up, 2026-05-04). Chart 1.3.2 shipped serviceaccount-cutover-
driver.yaml + clusterrole-cutover-driver.yaml + clusterrolebinding-
cutover-driver.yaml with `{{ .Release.Namespace }}` directives that
rendered fine via Helm on Sovereigns but BROKE the Kustomize-mode
contabo-mkt deploy: the directives made Kustomize parse the files as
invalid YAML and silently skip them. Worse, the new files were never
added to templates/kustomization.yaml's resources list, so even if
the YAML had been valid Kustomize would not have rendered them.
Result on contabo: catalyst-api Pod's spec.serviceAccountName
references a non-existent SA — the Pod fails ContainerCreating with
the same RBAC forbidden error #830 was meant to fix.
Fix:
- Strip all `{{ .Release.Namespace }}` directives from the SA +
ClusterRole files. metadata.namespace auto-fills from Helm's
--namespace flag and from Kustomize's `namespace:` directive.
- Split ClusterRoleBinding into Helm-only +
Kustomize-only sibling files because Helm does NOT auto-inject
subjects[0].namespace the way it does metadata.namespace, and the
apiserver rejects bindings without it. clusterrolebinding-
cutover-driver.yaml uses {{ .Release.Namespace }} (Helm-only,
excluded from .helmignore for Sovereigns); clusterrolebinding-
cutover-driver-kustomize.yaml omits subjects[0].namespace and
relies on Kustomize's native injection (contabo-only).
- Add the three new files to templates/kustomization.yaml's
resources list so Kustomize-mode (contabo-mkt) actually renders
them.
This fix mirrors the same dual-mode contract documented in api-
deployment.yaml comments. Verified with `helm template` (subjects[0].
namespace=catalyst-system) AND `kubectl kustomize` (subjects[0].
namespace=catalyst). 2026-05-04.
Bumped to 1.4.2 — dual-mode contract violation in 1.4.0
CATALYST_POWERDNS_API_URL block (issue #830 follow-up, 2026-05-04).
PR #838 introduced two `value: {{ default "..." .Values... | quote }}`
Helm directives in api-deployment.yaml's CATALYST_POWERDNS_API_URL +
CATALYST_POWERDNS_SERVER_ID env entries. Both broke the Kustomize-
mode contabo-mkt build with "yaml: invalid map key: map[string]
interface {}{...}", stalling every contabo reconciliation including
THIS chart's own RBAC fix from 1.4.1.
Same pattern as the SOVEREIGN_FQDN block right below in the same
file (extensively documented as a dual-mode hazard): replace the
Helm directive with a literal default. The in-cluster Service URL
is a non-secret constant on every Sovereign that ships bp-powerdns
at its canonical release name; per-Sovereign overrides are still
possible via the HelmRelease overlay's `catalystApi.env` additional-
env patch (which takes precedence). 2026-05-04.
Bumped to 1.4.3 — auto-provision SME Postgres + secrets bundle on
Sovereign install (issue #859, 2026-05-04). The 11 SME service
Deployments (auth, billing, catalog, console, domain, gateway,
marketplace, notification, provisioning, tenant — plus admin which
has no DB/secret refs) reference two cluster-scoped resources:
- `sme-pg-app` Secret (basic-auth: username + password) backing the
sme-pg-rw.sme.svc.cluster.local Postgres Service
- `sme-secrets` Secret with 11 keys: JWT_SECRET, JWT_REFRESH_SECRET,
GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, SMTP_HOST/PORT/FROM/USER/
PASS, ADMIN_EMAIL, ADMIN_PASSWORD
On contabo-mkt these are pre-provisioned in
clusters/contabo-mkt/apps/sme/data/{postgresql,secrets}.yaml. On a
freshly franchised Sovereign nothing equivalent existed — caught
live on otech103 (2026-05-04 23:18 Berlin) where 10 of 11 SME pods
landed in CreateContainerConfigError after MARKETPLACE_ENABLED=true.
Fix:
- templates/sme-services/cnpg-cluster.yaml — gated on the same
.Values.ingress.marketplace.enabled flag the rest of the SME
bundle uses. Renders postgresql.cnpg.io/v1.Cluster `sme-pg` in
`sme` namespace, instances=1, storage=10Gi, primary DB sme_auth
+ secondary DB sme_billing via postInitApplicationSQL. CNPG
auto-creates `sme-pg-app` Secret and the `sme-pg-rw` Service.
Capabilities-gated on postgresql.cnpg.io/v1 so a misordered
overlay surfaces as "no Cluster yet" rather than chart install
failure (mirrors platform/powerdns/chart/templates/cnpg-cluster.
yaml). bp-catalyst-platform (slot 13) declares dependsOn:
bp-cnpg (slot 16) — already in place since 2026-05-02 (see
1.1.9 changelog) — so by reconcile time the CRD is registered.
- templates/sme-services/sme-secrets.yaml — gated on the same
flag. JWT_SECRET / JWT_REFRESH_SECRET / ADMIN_PASSWORD are
auto-generated via sprig randAlphaNum (64 / 64 / 32 chars
respectively) AND PERSISTED across reconciles via Helm `lookup`
— same load-bearing pattern as platform/gitea/chart/templates/
admin-secret.yaml (issue #830 Bug 2). Without lookup every
reconcile would invalidate every active SME session and lock
out every admin (feedback_passwords.md). Operator-supplied
GOOGLE_CLIENT_*, SMTP_* values default to empty placeholders;
operator brings real values via the per-Sovereign overlay or
the admin-console signup form. helm.sh/resource-policy: keep
so the Secret survives helm uninstall.
- values.yaml — add `smePostgres.cluster.*` (storage / pgVersion
/ resources / ...) and `smeSecrets.{smtp,admin}.*` blocks; both
fully data-driven per Inviolable Principle #4.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.2 → 1.4.3. 2026-05-04.
Bumped to 1.4.4 — deploy FerretDB in sme ns + cross-ns Valkey wire
to unblock catalog/tenant/domain SME services on franchised
Sovereigns (issue #861, 2026-05-04). After 1.4.3 landed sme-pg +
sme-secrets, 7/12 SME pods reached Running on otech103 but 3 stayed
in CrashLoopBackOff with the same DNS error:
catalog: failed to ping MongoDB
error=...lookup ferretdb.sme.svc.cluster.local on 10.43.0.10:53:
no such host
Root cause: SME service ConfigMap (sme-services-config) hardcoded
two URLs that have no Sovereign-side workload behind them:
- MONGODB_URI: mongodb://ferretdb.sme.svc.cluster.local:27017
(FerretDB has no Deployment on Sovereigns — only on contabo-mkt
via clusters/contabo-mkt/apps/sme/data/ferretdb.yaml)
- VALKEY_ADDR: valkey.sme.svc.cluster.local:6379
(bp-valkey 1.0.0 deploys to namespace `valkey`, not `sme`,
and exposes Services `valkey-primary` / `valkey-replicas` /
`valkey-headless` — no plain `valkey` service)
Fix:
- NEW templates/sme-services/ferretdb.yaml — gated on the same
.Values.ingress.marketplace.enabled flag. Deployment + Service
`ferretdb` in `sme` ns, image pinned ghcr.io/ferretdb/ferretdb:1.24
(matches contabo's data/ferretdb.yaml — v2.x requires PostgreSQL
with the DocumentDB extension which the sme-pg CNPG cluster from
PR #860 does not ship; v1.24 works against vanilla CNPG postgres:
16 and is the proven path). Backed by sme-pg via FERRETDB_POSTGRESQL_
URL env interpolating PG_USER + PG_PASSWORD from the sme-pg-app
Secret (auto-created by CNPG in 1.4.3) and pointing at
sme-pg-rw.sme.svc.cluster.local:5432/sme_documents. Image is
operator-overridable via .Values.smeServices.ferretdb.{image,tag}
(Inviolable Principle #4).
- cnpg-cluster.yaml — extend postInitApplicationSQL to also
CREATE DATABASE sme_documents OWNER sme so FerretDB has a DB to
write into on first install. The DB list is data-driven from
.Values.smePostgres.cluster.additionalDatabases (defaulting to
[sme_billing, sme_documents]) so adding a new SME service is a
values-only change.
- configmap.yaml — VALKEY_ADDR now reads from .Values.smeServices.
valkey.host (default valkey-primary.valkey.svc.cluster.local:6379
— the actual Service name bitnami/valkey 5.5.1 with replication
architecture renders, NOT the issue's `valkey.valkey.svc.cluster.
local` which doesn't exist on Sovereigns). MONGODB_URI also uses
.Values.smeServices.ferretdb.{host,port} for symmetry.
- NEW templates/sme-services/valkey-cross-ns-policy.yaml —
CiliumNetworkPolicy in `valkey` namespace allowing ingress on
6379/TCP from any Pod in the `sme` namespace. Defense-in-depth on
top of bp-valkey 1.0.0's upstream NetworkPolicy (which already
permits port 6379 from any source). Gated on the same
marketplace.enabled flag.
- values.yaml — add `smeServices.ferretdb.{image,tag,replicas,
resources}` and `smeServices.valkey.host` blocks. Every URL,
image ref, and resource value is operator-overridable per
Inviolable Principle #4.
Known follow-up: bp-valkey ships with `auth.enabled: true` (bitnami
default). SME services pass only VALKEY_ADDR (no password env). Two
remediation paths exist: (a) per-Sovereign overlay disables
bp-valkey auth, or (b) plumb VALKEY_PASSWORD through SME service
Deployments + service code. Filed separately. This PR ships the
infrastructure (FQDN + CiliumNetworkPolicy) so the wire is in place
when one of those auth fixes lands.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.3 → 1.4.4. 2026-05-04.
Bumped to 1.4.5 — wire VALKEY_PASSWORD into SME auth + gateway services
to clear cross-ns Valkey auth crashloop on franchised Sovereigns
(issue #863, 2026-05-04). After 1.4.4 landed FerretDB + the cross-ns
CiliumNetworkPolicy, 11/13 SME pods reached Running 1/1 on otech103
but `auth` stayed in CrashLoopBackOff and `gateway`'s rate limiter
was disabled, both with the same error:
ERROR failed to connect to Valkey error="NOAUTH HELLO must be
called with the client already authenticated, otherwise the
HELLO <proto> AUTH <user> <pass> option can be used..."
Root cause: bp-valkey 1.0.0 (slot 17) ships with `auth.enabled=true`
(bitnami valkey 5.5.1 default convention). The bitnami subchart
auto-generates a random password and exposes it via the
`valkey-password` key in the `valkey` Secret in the `valkey`
namespace. SME service code (`core/services/shared/db/valkey.go`)
only accepted an addr — no password — and the auth.yaml + gateway.yaml
Deployments only set VALKEY_ADDR. Cross-ns AUTH was never plumbed
through. Pre-1.4.4 this was masked because VALKEY_ADDR pointed at a
non-existent `valkey.sme.svc.cluster.local` and the connect failed
at DNS not at AUTH.
Fix:
- core/services/shared/db/valkey.go — add ConnectValkeyWithAuth
overload that takes username + password. ConnectValkey kept
backwards-compatible for callers that don't pass auth (contabo-mkt
auth-less in-namespace Valkey under data/valkey.yaml).
- core/services/auth/main.go + core/services/gateway/main.go —
read VALKEY_USERNAME + VALKEY_PASSWORD env, call
ConnectValkeyWithAuth when password is non-empty, else fall through
to the no-auth path. Empty password = current contabo behaviour.
- NEW templates/sme-services/valkey-cross-ns-secret.yaml — use Helm
`lookup` to read the bp-valkey auto-generated password from
`valkey/valkey` Secret and re-emit it as `sme-valkey-auth` in
`sme` namespace. Same lookup-and-mirror pattern as
sme-secrets.yaml (issue #859) and gitea-admin-secret (issue #830
Bug 2). On first install the lookup may return nil — Flux's 15m
reconcile picks up the mirror once bp-valkey is Ready.
- auth.yaml + gateway.yaml — add VALKEY_PASSWORD env reading from
`sme-valkey-auth` Secret with `optional: true` so contabo-mkt's
auth-less Valkey path keeps working when the mirror Secret is
absent. valkey-go's `default` ACL user uses `requirepass`, so
VALKEY_USERNAME stays unset by convention.
- values.yaml — add `smeServices.valkey.{sourceSecretName,
sourcePasswordKey, destNamespace, destSecretName}` knobs so a
forked bp-valkey with non-default Secret naming can override
without forking the chart (Inviolable Principle #4).
No SME smeTag bump needed at chart-source time — the
services-build.yaml workflow rebuilds the auth + gateway images
from this commit's SHA and updates the `image:` line in auth.yaml +
gateway.yaml directly. The chart's blueprint-release pipeline picks
up those updated SHAs in its values.yaml on the next chart push.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.4 → 1.4.5. 2026-05-04.
Bumped to 1.4.6 — bundle the rebuilt services-auth + services-gateway
image SHA fa4395f from PR #864 into the chart artifact (issue #863
follow-up, 2026-05-05). 1.4.5 was published at commit fa4395fa BEFORE
the deploy job updated auth.yaml's hardcoded `image:` to fa4395f, so
Sovereigns pulling 1.4.5 got the OLD image (5cdb738) without the
ConnectValkeyWithAuth Go change — VALKEY_PASSWORD env was wired but
the binary ignored it and still hit "NOAUTH HELLO" on connect.
Same race documented in the 1.1.16 changelog above (catalyst-ui
base:/ fix). 1.4.6 republishes the chart with the deploy-committed
image SHAs already in tree (auth.yaml + gateway.yaml `image:` lines
point at fa4395f as of commit 9731701c).
No template/code changes — pure version bump to roll a fresh OCI
artifact whose `helm template` output references the
ConnectValkeyWithAuth-enabled image.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.5 → 1.4.6. 2026-05-05.
Bumped to 1.4.7 — provision the `provisioning-github-token` Secret
on Sovereign install so the last 1/13 SME pod (provisioning) reaches
Running 1/1 (issue #866, 2026-05-04). After 1.4.6 cleared 12/13 SME
pods on otech103, the provisioning Deployment stayed in
CreateContainerConfigError waiting on
`secret/provisioning-github-token` (key GITHUB_TOKEN) which exists
on contabo-mkt as a hand-rolled SealedSecret but had no Sovereign-
side equivalent. Without this Secret the Pod can't even start —
blocks the full SME stack on every fresh Sovereign.
Fix (issue #866 Option C — local-Gitea target):
Post-cutover the canonical Git target on a Sovereign IS the local
Gitea instance (the GitRepository CRs already point there). New
template templates/sme-services/provisioning-github-token.yaml
uses Helm `lookup` to read the auto-generated gitea admin password
from `gitea/gitea-admin-secret` (already generated by
platform/gitea/chart/templates/admin-secret.yaml with the same
lookup-persistence pattern) and re-emit it as
`sme/provisioning-github-token` under the GITHUB_TOKEN key. Same
lookup-and-mirror precedent as valkey-cross-ns-secret.yaml (#863)
and sme-secrets.yaml (#859).
bp-gitea (slot 10) reaches Ready before bp-catalyst-platform
(slot 13) — the Flux dependsOn chain in
clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml
lists bp-gitea explicitly — so by the time this template renders,
gitea-admin-secret EXISTS in the gitea namespace and lookup
returns its decoded password.
values.yaml — new `smeServices.provisioning.gitToken.*` block
(sourceNamespace / sourceSecretName / sourcePasswordKey /
destNamespace / destSecretName / destKey) so per-Sovereign
overlays pointing the provisioning service at a non-Gitea Git
host (e.g. a GitHub PAT via OpenBao + ExternalSecret) can swap
the source ref without forking the chart (Inviolable Principle #4).
Out of scope for this chart bump — full Gitea REST-API target
support in core/services/provisioning/github/client.go (which
hardcodes https://api.github.com today) is a follow-up Go change.
This Secret unblocks the Pod reaching Running 1/1, completing the
SME stack 12/13 → 13/13.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.6 → 1.4.7. 2026-05-04.
1.4.8 (issue #868): fix the marketplace UI PIN-signin flow that 503'd
on otech103 because the public /api/* HTTPRoute backend-ref'd a dead
Service (catalyst-system/marketplace-api with zero matching Pods).
Two template fixes:
- templates/sme-services/marketplace-routes.yaml: /api/* rule now
cross-namespace backendRef sme/gateway:8080 (the SME BSS gateway
Pod that already fronts services-auth, catalog, tenant, billing,
provisioning).
- templates/sme-services/marketplace-reference-grant.yaml: extend
`to:` list with the gateway Service so the cross-ns hop is
authorised by Gateway API.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.7 → 1.4.8. 2026-05-04.
1.4.9 (issue #871): no template change — chart-version-only bump to
republish the OCI artifact with the current services-auth image SHA
baked into templates/sme-services/auth.yaml. 1.4.8 was published from
commit 95a06f56 BEFORE the deploy-bot updated auth.yaml's image pin
from `services-auth:fa4395f` (old) → `services-auth:95a06f5` (new,
with the /auth/send-pin alias), so 1.4.8 OCI bytes still reference
the OLD SHA and otech103 reconciled the broken image. Bumping the
chart version forces blueprint-release to publish a fresh artifact
with the current pin. Same race documented in
feedback_idempotent_iac_purge.md and overnight DoD doc as
"deploy-step race". Lockstep slot 13 pin bumps to 1.4.9. 2026-05-05.
1.4.10 (issue #876): wire CATALYST_OTECH_FQDN env on the catalyst-api
Deployment from the same `sovereign-fqdn` ConfigMap (key `fqdn`) that
feeds SOVEREIGN_FQDN. The SME tenant create handler (sme_tenant.go)
and the sovereign-parent-domains seed (sovereign_parent_domains.go)
both read CATALYST_OTECH_FQDN — without it, POST /api/v1/sme/tenants
returns 503 {"error":"otech-fqdn-unconfigured"} on every Sovereign,
and the SME-pool fallback returns an empty list. The two env names
exist for historical reasons (Phase-8b handover vs SME-tier tenant
pipeline) but ultimately point at the Sovereign's public FQDN.
optional=true since Catalyst-Zero (contabo) doesn't run the SME
tenant pipeline. Lockstep slot 13 pin bumps to 1.4.10. 2026-05-05.
1.4.11 (issue #878): wire CATALYST_GITOPS_USER + CATALYST_GITOPS_TOKEN
env on the catalyst-api Deployment, sourced from the local Gitea
admin secret (`gitea-admin-secret`, keys `username` + `password`).
Without these, the SME tenant pipeline (#804) and the marketplace-
settings GitOps writer fail at the first reconcile with "gitops
token unconfigured" (post-cutover Sovereign has no GitHub PAT — the
GitOps target is the local Gitea). optional=true so Catalyst-Zero
(contabo) keeps using the existing GitHub PAT path. Pairs with a
catalyst-api code change (marketplace_settings.go +
sme_tenant_gitops.go): injectTokenIntoURL now takes a configurable
username (was hardcoded "x-access-token"; GitHub PAT-only) so the
same code path works for both GitHub and Gitea. Also adds `git` to
the catalyst-api Containerfile (Alpine 3.20 base + apk add git) —
the pipeline shells out to git clone/commit/push, and without the
binary the first reconcile fails with `exec: "git": executable
file not found in $PATH`. Lockstep slot 13 pin bumps to 1.4.11.
2026-05-05.
1.4.12 (issue #878 follow-up): chart-version-only bump to republish
the OCI artifact with the new catalyst-api image SHA (7bdd14f) baked
into values.yaml. 1.4.11 was published from commit 7bdd14fc BEFORE
the deploy-bot updated values.yaml's catalystApi.tag from 20413ec ->
7bdd14f, so 1.4.11 OCI bytes still reference the OLD image without
the git binary. Same deploy-step race fixed in CI by #874 (services-
build auto-bumps chart patch + dispatches blueprint-release) — the
catalyst-build workflow needs the equivalent. Until then this manual
bump is required after every catalyst-api image change. Lockstep
slot 13 pin bumps to 1.4.12. 2026-05-05.
1.4.13 (issue #879): unblock the multi-domain Day-2 add-domain happy
path on a fresh post-handover Sovereign. Five stacked wiring fixes,
three of which are chart-side:
Bug 1 — POOL_DOMAIN_MANAGER_URL: api-deployment.yaml now wires
`POOL_DOMAIN_MANAGER_URL=https://pool.openova.io` so the Sovereign-
side catalyst-api hits the public PDM ingress on contabo (the
in-cluster default `pool-domain-manager.openova-system.svc` only
resolves on contabo and is NXDOMAIN on franchised Sovereigns).
Caught live on otech103, 2026-05-05: every Day-2 add-domain POST
failed with `dial tcp: lookup pool-domain-manager.openova-system.
svc.cluster.local: no such host`.
Bug 2 — CATALYST_PDM_BASIC_AUTH_USER / _PASS: api-deployment.yaml
now mounts the `pdm-basicauth` Secret (keys `username`+`password`)
so pdmFlipNS can `Authorization: Basic ...` against the Traefik
basicAuth Middleware in front of pool.openova.io. optional=true:
Catalyst-Zero pods skip the header (in-cluster Service path is
unauthenticated) and CI / older Sovereigns degrade to a clear 401
log line instead of crashlooping. The Secret is provisioned by
cloud-init at handover-time (paired infra change in
cloudinit-control-plane.tftpl).
Bug 5 — HTTPRoute /auth/handover Exact match: httproute.yaml
catalyst-ui rule changed from PathPrefix `/auth/` to Exact
`/auth/handover`. The previous PathPrefix collided with the OIDC
PKCE redirect_uri `/auth/callback` — catalyst-api 404s on that
path because it only registers `/api/v1/auth/callback`. Result
post-handover-JWT-cookie-expiry (8h TTL): the operator could not
log into the Sovereign Console at all (caught live on otech103).
Exact-match keeps /auth/handover routed to catalyst-api while
every other /auth/* path falls through to catalyst-ui's React
Router for client-side OIDC.
Three coupled code-side fixes ship in catalyst-api as part of the
same #879 PR (parent_domains.go):
Bug 2-code: pdmFlipNS now SetBasicAuth from the env (read every
call so a Secret rotation propagates without Pod restart).
Bug 3-code: pdmFlipNS body now includes `nameservers` (computed
from expectedNSFor — PDM's SetNSRequest schema requires it; the
previous body got 422 missing-nameservers).
Bug 4-code: lookupPrimaryDomain falls back to SOVEREIGN_FQDN env
after CATALYST_PRIMARY_DOMAIN. On a post-handover Sovereign no
Deployment record is persisted, so without this fallback GET
/parent-domains returned {"items":[]} and the propagation panel
showed `expectedNs: null`. The SOVEREIGN_FQDN env is already
wired by api-deployment.yaml from the sovereign-fqdn ConfigMap.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.11 → 1.4.12. 2026-05-05.
Bumped to 1.4.13 — Flux Kustomization watching SME tenant overlays
(issue #882, 2026-05-05). The catalyst-api SME-tenant pipeline's
GitOps writer (sme_tenant_gitops.go::WriteTenantOverlay) commits
per-tenant Kustomize overlays to clusters/<sov-fqdn>/sme-tenants/
<tenant-id>/ on every successful POST /api/v1/sme/tenants — but no
Flux Kustomization on the Sovereign cluster watched that path. The
state machine (sme_tenant.go) advanced optimistically through every
step (vcluster → bp_charts → dns → certs → keycloak_clients →
registry) and reported state=done, while no actual K8s resources
materialised because nothing was reconciling the orchestrator's
write target.
Verified live on otech103 (2026-05-04 23:18 Berlin): the orchestrator
successfully committed the 9-file overlay for tenant 15f1e45e-...
to the local Gitea openova/openova repo @main, but `kubectl get hr
-n sme-15f1e45e-...` returned No resources found indefinitely.
Fix: NEW templates/sme-services/sme-tenants-kustomization.yaml,
gated on .Values.ingress.marketplace.enabled (same flag the rest of
the SME bundle uses) — non-marketplace Sovereigns don't run the SME
tenant pipeline so they don't render this Kustomization. Renders one
Flux Kustomization in flux-system that sweeps the entire
./clusters/<sovereignFQDN>/sme-tenants directory tree:
- sourceRef: flux-system/openova GitRepository (the same one the
cluster bootstraps from; cutover Step 5 flips its
.spec.url to the local in-cluster Gitea, which is
precisely where sme_tenant_gitops.go pushes via
CATALYST_GITOPS_REPO_URL=http://gitea-http.gitea.svc
.cluster.local:3000/openova/openova)
- path: ./clusters/{{ .Values.global.sovereignFQDN }}/sme-tenants
- interval: 1m (matches the orchestrator's "Flux reconciles
within ~1 min" SLA documented at the top of
sme_tenant_gitops.go)
- prune: true (DELETE /api/v1/sme/tenants/<id> removes the
overlay directory; Flux GCs the tenant resources)
- wait: false (per-tenant overlays each install ~5 bp-* HRs
asynchronously and have their own readiness watcher
in the orchestrator; blocking this top-level
Kustomization on every tenant's full readiness would
let one stuck tenant gate every other tenant)
Per Inviolable Principle #4 (never hardcode), every knob is
operator-overridable via .Values.smeTenants.kustomization.* —
the GitRepository sourceRef name/namespace, the resource name,
the cadence (interval/retryInterval/timeout), and the toggles
(prune/wait). Defaults match the canonical bootstrap-kit
conventions documented in clusters/_template/bootstrap-kit/03-flux
.yaml + the cloud-init flux-bootstrap.yaml block.
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.12 → 1.4.13. 2026-05-05.
1.4.14 (issue #879 follow-up): chart-version-only republish so the OCI
artifact carries the catalyst-api image SHA 7bfd6df (the #879 fix
commit). Chart 1.4.13 was published from commit 7bfd6df5 BEFORE the
deploy-bot updated values.yaml's catalystApi.tag from aa226df ->
7bfd6df, so 1.4.13 OCI bytes still reference the OLD catalyst-api
image without the pdmFlipNS basic-auth + nameservers + lookup-
primary-domain SOVEREIGN_FQDN-fallback fixes. Same deploy-step race
fixed in CI by #874 (services-build auto-bumps chart patch + dispatches
blueprint-release) — the catalyst-build workflow needs the equivalent.
Until then this manual bump is required after every catalyst-api
image change. Lockstep slot 13 pin bumps to 1.4.14. 2026-05-05.
1.4.15 (issue #887): auto-provision marketplace-api-secrets Secret on
Sovereign install. templates/marketplace-api/deployment.yaml has always
referenced a secretKeyRef on `marketplace-api-secrets` (key:
`jwt-secret`); on contabo-mkt this Secret is hand-rolled in
clusters/contabo-mkt/apps/.../marketplace-api-secrets.yaml. On a freshly
franchised Sovereign with ingress.marketplace.enabled=true, nothing
equivalent existed — caught live on otech103 (2026-05-05) where
marketplace-api landed in CreateContainerConfigError "secret not found"
every reconcile. Fix: NEW templates/marketplace-api/secret.yaml uses
Helm `lookup` to persist a 64-char randAlphaNum jwt-secret across
reconciles (same load-bearing pattern as sme-secrets, valkey-cross-ns-
secret, provisioning-github-token, gitea-admin-secret per
feedback_passwords.md). Without lookup every reconcile would
invalidate every active marketplace JWT. helm.sh/resource-policy: keep
so the Secret survives helm uninstall. Lockstep slot 13 pin bumps to
1.4.15. 2026-05-05.
1.4.17 (issue #901): unblock Sovereign Console login on every fresh
provision. https://console.<sov>/login PIN-issue endpoint returned 503
with "CATALYST_OPENOVA_KC_SA_CLIENT_SECRET not set" — a 3-bug chain:
Bug 1: api-deployment.yaml lines 676-739 reference a Secret
`catalyst-openova-kc-credentials` for the full PIN-auth env block
(CATALYST_OPENOVA_KC_* + CATALYST_SMTP_*). On contabo-mkt this Secret
is hand-rolled out-of-band (clusters/contabo-mkt/apps/keycloak-zero/
helmrelease.yaml mounts it via extraEnvVars). On a freshly franchised
Sovereign nothing equivalent existed — every secretKeyRef has
optional=true so the Pod started, but POST /api/v1/auth/pin/issue
503'd on the missing client-secret env. Fix: NEW
templates/catalyst-openova-kc-credentials-secret.yaml mirrors the
canonical KC SA Secret (`keycloak/catalyst-kc-sa-credentials`,
created by bp-keycloak's openbao-bridge post-install hook) into
catalyst-system as `catalyst-openova-kc-credentials` with the key
shape api-deployment.yaml expects. Same Helm-`lookup` persistence
pattern as templates/marketplace-api/secret.yaml (#887),
sme-secrets.yaml (#859), valkey-cross-ns-secret.yaml (#863),
provisioning-github-token.yaml (#866) and gitea-admin-secret.yaml
(#830). helm.sh/resource-policy: keep — Secret survives helm
uninstall.
Sovereign-vs-contabo gate (load-bearing): the new template is
rendered ONLY when `lookup "v1" "Secret" "keycloak"
"catalyst-kc-sa-credentials"` returns non-nil. On Catalyst-Zero
(contabo) Keycloak runs as `keycloak-zero` in its own namespace
and there is NO Secret by that name in the `keycloak` namespace
— lookup returns nil → the template renders empty bytes → the
existing hand-rolled Secret in clusters/contabo-mkt/apps/...
remains untouched (no helm-vs-kustomize ownership flap). The
new file is intentionally NOT added to templates/kustomization.yaml
`resources:` so Kustomize-mode contabo build skips it entirely
(same dual-mode pattern as templates/marketplace-api/secret.yaml).
Bug 2: SMTP host default `stalwart-web.stalwart.svc.cluster.local`
(an in-code constant) doesn't exist on Sovereign — even after Bug 1
the PIN-email delivery would fail at the next step. Fix: chart now
populates smtp-host/smtp-port/smtp-from from .Values.sovereign.smtp.*
defaulting to mail.openova.io:587 / noreply@openova.io. SMTP
user/pass come from a SECONDARY lookup against
`catalyst-system/sovereign-smtp-credentials` (Secret seeded by
cloud-init at provision time — issue #883 follow-up). If the source
Secret is missing, the Secret renders with empty smtp-user/smtp-pass
so the login surface still works and PIN delivery surfaces as a
clear "email delivery failed" log line, not as a 503.
Bug 3: CATALYST_POST_AUTH_REDIRECT default `/sovereign/wizard` is
mothership-only — the wizard page is the Provisioning Wizard the
operator drives at signup, not a post-handover Sovereign page. Fix:
chart-level default flips to `/sovereign/components` (the post-
handover Sovereign Console homepage). Per-Sovereign overlays
override via the catalystApi.env additional-env patch — the chart
value is a literal (per the dual-mode contract documented in the
CATALYST_POWERDNS_API_URL block of api-deployment.yaml).
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.16 → 1.4.17. 2026-05-05.
1.4.18 (issue #910 — TBD): create the `sme` namespace on Sovereigns
where the marketplace is enabled. Every template under
templates/sme-services/* (billing, auth, ferretdb, valkey-cross-ns-
secret, sme-secrets, provisioning-github-token, cnpg-cluster, ...)
emits resources with `namespace: sme`. On Catalyst-Zero (contabo)
the `sme` namespace is pre-provisioned by clusters/contabo-mkt/apps/
sme/* — so the chart never created it. On a fresh franchised
Sovereign nothing else creates the `sme` namespace, so chart 1.4.17
install failed 23 times with `failed to create resource: namespaces
"sme" not found` — caught live on otech105 (2026-05-05). Fix: NEW
templates/sme-services/sme-namespace.yaml gated on the same
ingress.marketplace.enabled flag as the rest of the SME bundle so
non-marketplace Sovereigns and the Kustomize-mode contabo build
(which does NOT include sme-namespace.yaml in templates/sme-services/
kustomization.yaml's `resources:` list) skip this entirely.
helm.sh/resource-policy: keep — never cascade-delete the namespace
on chart uninstall (would erase every SME workload + tenant).
Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.17 → 1.4.18. 2026-05-05.
1.4.19 (issue #910 — zero-touch provisioning, Bugs 2 + 3): two
coupled fixes that unblocked Sovereign Console PIN-login on a
freshly franchised cluster (1.4.18 closed Bug 1, the missing `sme`
namespace).
Bug 2 — CATALYST_SESSION_COOKIE_DOMAIN was hardcoded to
console.openova.io in templates/api-deployment.yaml. On a Sovereign
the request host is console.<sov-fqdn>, so the browser silently
rejected the Set-Cookie (RFC 6265 §5.3 step 6 — Domain mismatch)
and every /api/* request landed without a session, redirecting back
to /login forever. Caught live on otech105 (2026-05-05).
Fix: change the literal default to `""` (empty). Per the dual-mode
contract (CATALYST_POWERDNS_API_URL block in api-deployment.yaml),
this MUST stay a literal — Helm template directives in `value:`
fields break the contabo Kustomize-mode build. Empty value is
correct on BOTH paths: when CATALYST_SESSION_COOKIE_DOMAIN is empty
the auth handler omits the Domain attribute and the browser binds
the cookie to the exact request host. On contabo that is
console.openova.io (wizard + magic-link served from the same
host); on a Sovereign that is console.<sov-fqdn> (likewise). Per-
Sovereign overlays MAY override via the catalystApi.env additional-
env patch in the per-cluster HelmRelease for unusual topologies.
Bug 3 — catalyst-openova-kc-credentials-secret.yaml's smtp-user/
smtp-pass lookup used "existing target wins" persistence over the
source `sovereign-smtp-credentials` Secret seeded by A5's
provisioner (issue #883). On first install the source Secret had
not yet been seeded (race between catalyst-api's seedSovereignSMTP
step and the chart reconcile), so the chart rendered empty SMTP
creds, persisted them into the target, and NEVER picked up A5's
seeded bytes on subsequent reconciles. POST /api/v1/auth/pin/issue
502'd with `email-send-failed` for the life of the cluster.
Caught live on otech105 (2026-05-05).
Fix: invert the SMTP-cred lookup precedence. SOURCE
(sovereign-smtp-credentials) wins over the persisted target. Every
Flux reconcile (1m cadence) re-reads the source, so as soon as A5's
seed completes the chart picks it up on the next tick. Operator
rotation: edit sovereign-smtp-credentials (the operator-facing
seam); the target is a chart-derived projection and never an
operator surface. KC fields keep the previous "existing target
wins" contract because bp-keycloak's openbao-bridge auto-rotates
the client-secret on every Helm upgrade and we want that rotation
to require explicit operator action (delete the target) rather
than picking up automatically and rolling the catalyst-api Pod.
No values.yaml schema change. No bootstrap-kit slot 13 envsubst
change. Lockstep slot 13 pin in clusters/_template/bootstrap-kit/
13-bp-catalyst-platform.yaml bumps from 1.4.18 → 1.4.19. 2026-05-05.
type: application
# Opt-out from the blueprint-release hollow-chart guard (issue #181 / #510).
# This umbrella legitimately ships only Catalyst-authored workloads
# (catalyst-ui, catalyst-api, ProvisioningState CRD, Sovereign HTTPRoute);
# the foundation layer is installed independently by the bootstrap-kit
# and must NOT be re-rendered into catalyst-system as subcharts.
annotations:
catalyst.openova.io/no-upstream: "true"
# No subchart dependencies — see 1.1.9 changelog above. The 10
# foundation Blueprints are installed by clusters/_template/bootstrap-kit/
# at their own slots, each as a top-level Flux HelmRelease in its own
# canonical namespace. This umbrella renders only the Catalyst-Zero
# control-plane workloads (catalyst-ui, catalyst-api, ProvisioningState
# CRD, Sovereign HTTPRoute) into targetNamespace: catalyst-system.