fix(cnp): allow ingress from catalyst ns (cutover Pods) — fresh-prov handover blocker (Refs PR #1785 regression, t24 zero-touch finding) (#1847)
PR #1785 (chart 1.4.171) shipped a baseline default-deny CiliumNetworkPolicy in catalyst-system whose ingress allowlist was limited to: - reserved.ingress: "" (cilium-gateway endpoint) - same-namespace catalyst-system Pods - host / remote-node / kube-apiserver entities The bp-self-sovereign-cutover chart stamps Jobs into the `catalyst` namespace, including the 10-auto-trigger Job whose Pod curls catalyst-api.catalyst-system.svc.cluster.local:8080 to fire /api/v1/internal/cutover/trigger. With #1785 in effect on a FRESH prov, every auto-trigger Pod times out at WAIT_TIMEOUT_SECONDS=1500s, handoverFiredAt stays null, and the D0 auto-redirect to the Sovereign Console never happens — the operator is stuck on mothership /jobs forever. Caught by t24 zero-touch verification (2026-05-18): handover_status: "BLOCKED — cutover auto-trigger Pod in 'catalyst' ns cannot reach catalyst-api in 'catalyst-system' ns because baseline-default-deny CNP allows ingress only from {reserved.ingress, catalyst-system ns, host entities}" The companion symptom on t22 was masked because t22's cutover Job had already completed before the CNP rolled out — the CNP did not gate ingress there. Fix ───────────────────────────────────────────────────────────────── Add a fourth ingress rule to baseline-default-deny allowing fromEndpoints in the operator-tunable list .Values.security.baselineCnp.allowedIngressNamespaces. Defaults: - catalyst — cutover Pods (the load-bearing fix) - flux-system — Helm/Kustomize/Source controllers probing Service readiness for HelmRelease health rollups (worked pre-#1785 via no-CNP default) - kube-system — Cilium operator + hcloud-ccm + CoreDNS that do cluster introspection calls (the reserved.ingress gateway endpoint here is still matched by rule 1's reserved.ingress: "" selector — this rule covers non-gateway Pods) The list mirrors the existing allowedPlatformNamespaces pattern on the egress side. No other rule semantics change. Chart bump 1.4.177 → 1.4.178. Companion regression to chart 1.4.177 (PR #1803, SMTP egress) — both are sub-regressions from the same #1785 baseline-CNP ship. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
61948474b5
commit
31b7dc5859
@ -1169,6 +1169,29 @@ name: bp-catalyst-platform
|
||||
# overwritten by the mirror rerun at 17:54:55Z, Organization CR never
|
||||
# materialised, customer-journey step 16 hung indefinitely. (Closes
|
||||
# #1794 — 5th C18 layer, the last customer-journey blocker.)
|
||||
# 1.4.178 (TBD-CNP-Catalyst-NS-Ingress, 2026-05-18): SECOND regression from
|
||||
# #1785 (1.4.171 baseline CNPs). The catalyst-system baseline-default-deny
|
||||
# CNP allowed ingress only from {reserved.ingress (cilium-gateway),
|
||||
# same-namespace, host/remote-node/kube-apiserver}. The bp-self-sovereign-
|
||||
# cutover chart stamps Jobs into the `catalyst` namespace, including the
|
||||
# 10-auto-trigger Job whose Pod curls catalyst-api in `catalyst-system` to
|
||||
# fire /api/v1/internal/cutover/trigger. With #1785 in effect on a FRESH
|
||||
# prov, every auto-trigger Pod times out at WAIT_TIMEOUT_SECONDS=1500s →
|
||||
# handoverFiredAt stays null → D0 auto-redirect to Sovereign Console never
|
||||
# happens → operator is stuck on mothership /jobs forever. Caught by t24
|
||||
# zero-touch verification (2026-05-18). The companion symptom on t22 was
|
||||
# masked because t22's cutover Job had already completed before the CNP
|
||||
# rolled out, so the CNP did not gate ingress.
|
||||
#
|
||||
# Fix: add an explicit fromEndpoints rule allowing ingress from the
|
||||
# `catalyst` namespace (cutover Pods), `flux-system` (Helm/Kustomize/Source
|
||||
# controllers probing Service readiness for HelmRelease health rollups),
|
||||
# and `kube-system` Pods (Cilium operator + hcloud-ccm + CoreDNS that do
|
||||
# cluster introspection calls — the reserved.ingress gateway endpoint also
|
||||
# in kube-system is matched separately via reserved.ingress: ""). The list
|
||||
# is operator-tunable via `.Values.security.baselineCnp.allowedIngressNamespaces`,
|
||||
# mirroring the existing allowedPlatformNamespaces pattern on the egress
|
||||
# side. Refs PR #1785 (1.4.171), t24 zero-touch finding.
|
||||
# 1.4.177 (TBD-CNP-SMTP-Egress, 2026-05-18): fix regression introduced by
|
||||
# #1785 (1.4.171 baseline CNPs). The catalyst-system default-deny CNP
|
||||
# limited world egress to TCP/443 only, which silently broke SMTP
|
||||
@ -1186,8 +1209,8 @@ name: bp-catalyst-platform
|
||||
# 25/TCP (legacy SMTP fallback). All three are explicitly scoped to
|
||||
# `toEntities: world`, matching the existing 443/TCP allow. No other
|
||||
# rule semantics change. (Fixes PIN-issue 502 regression from #1785.)
|
||||
version: 1.4.177
|
||||
appVersion: 1.4.177
|
||||
version: 1.4.178
|
||||
appVersion: 1.4.178
|
||||
# 1.4.172 — feat(security): baseline CiliumNetworkPolicies for
|
||||
# catalyst-system + cilium-gateway (Closes #1746, Refs TBD-Cov-12 /
|
||||
# C12-009). Cov-bench audit on the live Sovereign cluster confirmed
|
||||
|
||||
@ -71,6 +71,7 @@ when qaFixtures is disabled.
|
||||
*/}}
|
||||
{{- if .Values.security.baselineCnp.enabled }}
|
||||
{{- $allowedPlatform := .Values.security.baselineCnp.allowedPlatformNamespaces | default (list "keycloak" "gitea" "powerdns" "cnpg-system" "openbao" "harbor" "nats-system" "loki" "mimir" "tempo" "alloy" "opentelemetry" "external-secrets-system" "cert-manager") }}
|
||||
{{- $allowedIngressNs := .Values.security.baselineCnp.allowedIngressNamespaces | default (list "catalyst" "flux-system" "kube-system") }}
|
||||
apiVersion: cilium.io/v2
|
||||
kind: CiliumNetworkPolicy
|
||||
metadata:
|
||||
@ -116,6 +117,46 @@ spec:
|
||||
- host
|
||||
- remote-node
|
||||
- kube-apiserver
|
||||
# 4. Allow ingress from operator-tunable adjacent namespaces.
|
||||
# Defaults cover three load-bearing senders:
|
||||
#
|
||||
# - `catalyst` (the cutover namespace) — the
|
||||
# bp-self-sovereign-cutover chart stamps Jobs into this
|
||||
# namespace, INCLUDING the 10-auto-trigger Job whose Pod
|
||||
# curls `catalyst-api.catalyst-system.svc.cluster.local:8080`
|
||||
# to fire /api/v1/internal/cutover/trigger. Without this
|
||||
# allow the auto-trigger Pod cannot reach the API service,
|
||||
# the trigger never fires, handoverFiredAt stays null, and
|
||||
# D0 auto-redirect to the Sovereign Console never happens —
|
||||
# the operator is stuck on mothership /jobs forever
|
||||
# (regression caught by t24 zero-touch verification, fresh
|
||||
# prov 2026-05-18).
|
||||
#
|
||||
# - `flux-system` (the Flux controllers) — Helm Controller,
|
||||
# Kustomize Controller, and Source Controller probe the
|
||||
# backend Services they reconcile (e.g. catalyst-api
|
||||
# readinessProbes for HelmRelease health rollups). Pre-#1785
|
||||
# this worked implicitly because no CNP existed.
|
||||
#
|
||||
# - `kube-system` (the Cilium operator + kube-proxy + CCM) —
|
||||
# beyond the kubelet-host probes already covered by
|
||||
# rule 3, kube-system Pods (cilium-operator, hcloud-ccm,
|
||||
# coredns recursive probes) need to be allowed for cluster
|
||||
# introspection. The reserved.ingress endpoint that handles
|
||||
# public traffic is also in kube-system but is matched by
|
||||
# rule 1's `reserved.ingress: ""` selector — this rule
|
||||
# covers the non-gateway kube-system Pods.
|
||||
#
|
||||
# Operators may extend or shrink this list via
|
||||
# `.Values.security.baselineCnp.allowedIngressNamespaces`.
|
||||
- fromEndpoints:
|
||||
- matchExpressions:
|
||||
- key: k8s:io.kubernetes.pod.namespace
|
||||
operator: In
|
||||
values:
|
||||
{{- range $allowedIngressNs }}
|
||||
- {{ . | quote }}
|
||||
{{- end }}
|
||||
egress:
|
||||
# 1. kube-apiserver — every controller informer (organization,
|
||||
# application, environment, useraccess, blueprint, tenant)
|
||||
|
||||
@ -1509,3 +1509,19 @@ security:
|
||||
- opentelemetry
|
||||
- external-secrets-system
|
||||
- cert-manager
|
||||
# Adjacent namespaces whose Pods are allowed to INGRESS into
|
||||
# catalyst-system. Defaults cover:
|
||||
# - catalyst — bp-self-sovereign-cutover Jobs (incl. the
|
||||
# 10-auto-trigger Pod that fires /api/v1/internal/cutover/
|
||||
# trigger; without this allow handover NEVER FIRES on a
|
||||
# fresh prov — regression caught by t24 zero-touch).
|
||||
# - flux-system — Helm/Kustomize/Source controllers probing
|
||||
# Service readiness for HelmRelease health rollups.
|
||||
# - kube-system — Cilium operator + hcloud-ccm + CoreDNS that
|
||||
# do cluster introspection calls into catalyst-system. The
|
||||
# reserved.ingress gateway endpoint also in kube-system is
|
||||
# matched by a separate rule via reserved.ingress: "" label.
|
||||
allowedIngressNamespaces:
|
||||
- catalyst
|
||||
- flux-system
|
||||
- kube-system
|
||||
|
||||
Loading…
Reference in New Issue
Block a user