openova/products/catalyst/bootstrap
e3mrah 50bf7a59ed
fix: F8 - double bp-catalyst-platform HR timeout (15m→30m) + catalyst-api phase1 budget (60m→120m) (#1428)
prov #44 (d9399223c3caa4f9) hit the catalyst-api 60m phase1 watch cap
with bp-catalyst-platform HR still mid-retry (failures=3) and 41/45 HRs
True. F1-F7 are correct and live on main (qa-finalizer-strip Completed,
autoscaler workers joined). The remaining wall is total bootstrap-kit
install time exceeding the outer watch budget on a fresh cpx42×1
Sovereign without a warm Harbor proxy-cache.

Two lock-step changes widen both bounds:

1. clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml:
   install.timeout 15m → 30m, upgrade.timeout 15m → 30m. The umbrella
   chart genuinely needs >15m worst case when the full SME + Catalyst
   service stack rolls cold.

2. products/catalyst/bootstrap/api/internal/helmwatch/helmwatch.go:
   DefaultWatchTimeout 60m → 120m. Worst-case inner HR retry chain is
   now 30m × 3 = 90m; the outer phase1 budget MUST be larger so the
   watch never terminates while helm-controller still has remediation
   attempts left. CATALYST_PHASE1_WATCH_TIMEOUT env-var override path
   was already wired (issue #538 baseline) — chart template now
   declares the explicit "120m" value so the runtime knob is
   discoverable for capacity-bounded environments. Per INVIOLABLE-
   PRINCIPLES.md #4 the knob remains runtime-configurable.

New unit test TestPhase1WatchConfig_ProductionDefaultIs120m pins the
F8 floor against future regression. Existing env-var override + field-
override tests still pass unchanged.

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 08:10:24 +04:00
..
api fix: F8 - double bp-catalyst-platform HR timeout (15m→30m) + catalyst-api phase1 budget (60m→120m) (#1428) 2026-05-12 08:10:24 +04:00
ui fix(JobsPage): use FlowNode.id in row anchor href (region prefix) (#1414) 2026-05-11 22:29:46 +04:00