prov #61 (2e197a934a0e0461, 2026-05-13): catalyst-api OOMKilled 6× during
phase-1 watch on a 3-region Sovereign. The in-memory state has grown
substantially since the 1Gi limit was set:
- 1 primary helmwatch.Watcher (45 HRs + informer cache)
- N secondary helmwatch.Watchers (45 HRs × 2 secondary regions, each
with its own informer cache)
- jobs.Store backed by on-disk + in-memory tree
- per-/snapshot poll: composes per-region region groups across all
Job rows + cross-references hrDeps from the live primary watcher
Combined steady-state exceeds 1Gi on cpx32-equivalent loads. Bumped
limits to 4Gi (request 512Mi up from 128Mi). The mothership node has
8GB+ resident, no other tight constraint. Future fix: persist region
in Job rows so secondary watchers don't need to be retained post
phase-1 (orthogonal cleanup).
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>