fix(cilium-gateway): allow world ingress to reserved:ingress (unblocks Sovereign public surfaces) (#1482)
* fix(tls): cilium-gateway-cert STAGING/PROD issuer selectable via tofu clusters/_template/sovereign-tls/cilium-gateway-cert.yaml hardcoded letsencrypt-dns01-prod-powerdns regardless of qa_test_session_enabled. On high-cadence QA reprov cycles this hits the LE PROD 5/168h rate limit (caught on prov #76 at 13:45 UTC, retry-after 16:49 UTC) and the wildcard Certificate sticks Ready=False — Cilium Gateway has no valid TLS secret → envoy listener never binds → public TLS handshake to console.<fqdn> dies with SSL_ERROR_SYSCALL. Add tofu local.wildcard_cert_issuer = qa_test_session_enabled ? staging : prod. Thread WILDCARD_CERT_ISSUER through the sovereign- tls Kustomization postBuild.substitute. cilium-gateway-cert.yaml references it as ${WILDCARD_CERT_ISSUER}. Default behaviour unchanged for non-QA (production) Sovereigns — they still resolve to letsencrypt-dns01-prod-powerdns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cilium-gateway): allow world ingress to Cilium Gateway reserved:ingress endpoint When Cilium Gateway API runs with gatewayAPI.hostNetwork.enabled=true and a default-deny CCNP is present, every public request to a Sovereign host (console, auth, gitea, registry, api, ...) hits the gateway listener and gets DENIED at envoy's cilium.l7policy filter with: cilium.l7policy: Ingress from 1 policy lookup for endpoint X for port 30443: DENY Public response: HTTP/1.1 403 Forbidden, body "Access denied", server: envoy. Root cause: Cilium creates a special endpoint with identity reserved:ingress (8) representing the gateway listener. By default this endpoint has policy-enabled=both with allowed-ingress-identities=[1 (host)] and empty L4 rules — so no port is permitted. The default-deny CCNP's NotIn-namespace endpointSelector does NOT cover this endpoint (it has no io.kubernetes.pod.namespace label), and our qa-fixtures didn't ship a matching allow-template for it. Net effect: TLS handshake succeeds, HTTPRoutes are Programmed, backends are healthy in-cluster, but every request 403s. Caught live on prov #80 (omantel.biz, 2026-05-14) after the Gateway hostNetwork fix (#1480) finally activated host-bind on :30443. Verified by: - envoy debug log: cilium.l7policy DENY for endpoint 10.42.0.201 port 30443 - cilium-dbg endpoint get 3282 -o json: l4.ingress: [] and allowed-ingress-identities: [1] - transiently applying the same CCNP via kubectl: console.omantel.biz → 200 Fix: ship a CCNP scoped to reserved:ingress that allows ingress from world, cluster, host, remote-node (multi-region CP-to-CP), and kube-apiserver, plus egress to all so envoy can forward to any backend service. This is the canonical Cilium hostNetwork Gateway-API zero-trust pattern. Chart bump: catalyst 1.4.142 → 1.4.143. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: e3mrah <catalyst@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
This commit is contained in:
parent
fb99ae5fd0
commit
115c58885b
@ -473,7 +473,7 @@ spec:
|
||||
# from bitnamilegacy/kubectl:1.29.3 → alpine/k8s:1.31.4 in same
|
||||
# commit (rule-17 MIRROR-EVERYTHING hygiene; bitnamilegacy is
|
||||
# the Docker-Hub redirect for deprecated Bitnami 2025-08 cutover).
|
||||
version: 1.4.142
|
||||
version: 1.4.143
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-catalyst-platform
|
||||
|
||||
@ -1058,7 +1058,7 @@ name: bp-catalyst-platform
|
||||
# Fix #154 (HR-timeout audit). Those bumped the HelmRelease
|
||||
# install.timeout. This bumps the chart-INTERNAL wait loop budget
|
||||
# inside the pre-install hook Job, which is a different seam.
|
||||
version: 1.4.142
|
||||
version: 1.4.143
|
||||
appVersion: 1.4.94
|
||||
# 1.4.141 (qa-loop Fix #185, prov #38/#39/#41 recurrence — pre-install
|
||||
# hook unscheduable on saturated worker):
|
||||
|
||||
@ -67,6 +67,54 @@ spec:
|
||||
egress:
|
||||
- {}
|
||||
---
|
||||
# 1b/12 — Allow external traffic into Cilium Gateway (reserved:ingress).
|
||||
#
|
||||
# Root cause: Cilium Gateway API (gatewayAPI.hostNetwork.enabled=true)
|
||||
# creates a special endpoint with identity `reserved:ingress` (8) that
|
||||
# represents the gateway listener. By default this endpoint has
|
||||
# policy-enabled=both, allowed-ingress-identities=[1 (host)], and an
|
||||
# empty L4 rule set — i.e. world traffic that arrives at the gateway
|
||||
# is dropped by cilium.l7policy with a 403 "Access denied" before any
|
||||
# HTTPRoute is evaluated.
|
||||
#
|
||||
# Symptom: every public Sovereign host (console, auth, gitea, api, …)
|
||||
# returns `HTTP/1.1 403 Forbidden` body=`Access denied` server=envoy
|
||||
# even though the HTTPRoutes are Programmed, the Gateway is Accepted,
|
||||
# and the backend services are healthy in-cluster. Caught live on
|
||||
# prov #80 (omantel.biz, 2026-05-14): TLS handshake OK with the
|
||||
# correct cert, envoy reachable on :30443, but every request 403'd.
|
||||
# Confirmed via `cilium-dbg endpoint get 3282 -o json` showing
|
||||
# `l4.ingress: []` and `allowed-ingress-identities: [1]` only.
|
||||
#
|
||||
# Fix: a CCNP scoped to the `reserved:ingress` endpoint that allows
|
||||
# ingress from `world`, `cluster`, `host`, `remote-node` (multi-region
|
||||
# CP-to-CP), and `kube-apiserver`, plus egress to `all` so envoy can
|
||||
# forward to any backend service. This is the canonical Cilium pattern
|
||||
# for hostNetwork Gateway-API zero-trust — without it the gateway
|
||||
# becomes a black hole the moment a default-deny CCNP is present.
|
||||
apiVersion: cilium.io/v2
|
||||
kind: CiliumClusterwideNetworkPolicy
|
||||
metadata:
|
||||
name: allow-gateway-world-ingress
|
||||
labels:
|
||||
openova.io/managed-by: qa-fixtures
|
||||
openova.io/policy-tier: gateway-allow
|
||||
spec:
|
||||
description: "Allow world + cluster traffic to reach the Cilium Gateway listener; default-deny would otherwise drop all public requests at the gateway."
|
||||
endpointSelector:
|
||||
matchLabels:
|
||||
reserved.ingress: ""
|
||||
ingress:
|
||||
- fromEntities:
|
||||
- world
|
||||
- cluster
|
||||
- host
|
||||
- remote-node
|
||||
- kube-apiserver
|
||||
egress:
|
||||
- toEntities:
|
||||
- all
|
||||
---
|
||||
# 2/12 — qa-omantel: allow DNS egress (kube-dns)
|
||||
apiVersion: cilium.io/v2
|
||||
kind: CiliumNetworkPolicy
|
||||
|
||||
Loading…
Reference in New Issue
Block a user