openova

Author	SHA1	Message	Date
Hatice Yildiz	200e0038b3	feat(catalyst): add Go API backend, Containerfiles, and nginx SPA config - Add catalyst-api: Chi router, SSE provisioning logs, Hetzner token validation, deployment lifecycle simulation (hexagonal-lite layout) - Add catalyst-ui Containerfile: multi-stage Node→nginx build with VITE_APP_MODE baked in at build time - Add nginx.conf: SPA routing, /api/ proxy to catalyst-api, SSE support, /healthz endpoint, K3s DNS resolver - Wire wizard to real API: StepCredentials validates token via POST /api/v1/credentials/validate; StepReview POSTs to /api/v1/deployments and stores deploymentId in Zustand; ProvisionPage streams SSE logs - Add deploymentId to wizard state; fix currentStep initial value (1→1) - Add region code/countryCode fields; fix cluster context naming convention Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 18:35:11 +01:00
Eren Baysal	5ff6c17e50	fix(catalyst): wizard step navigation, naming convention in cluster context - Fix currentStep initial value (0→1) so wizard advances correctly on first Continue - Add `code` + `countryCode` to HETZNER_REGIONS; drop deprecated `role: primary\|dr` from Region model - Use region code in cluster context names: hz-fsn-rtz-prod (not hz-fsn1-rtz-prod) - Derive success page URLs and kubeconfig from wizard store (orgDomain, region) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 13:59:46 +01:00
Emrah Baysal	3e3f9428ff	feat(catalyst): Bootstrap UI — full wizard, auth, dashboard, provision, success Complete React 18 + TypeScript + Tailwind v4 UI for OpenOva Catalyst Bootstrap. Architecture: - Feature-Sliced Design (FSD): app/, pages/, widgets/, features/, entities/, shared/ - TanStack Router (type-safe), TanStack Query, Zustand wizard store - React Hook Form + Zod validation on every form - Framer Motion animated step transitions and micro-interactions - Dark-first design system with Tailwind v4 CSS custom properties Pages: - Auth: Login, Signup, Forgot Password (SaaS mode) - Dashboard: deployment list with status badges and stats - Wizard: 6-step animated wizard (Org → Provider → Credentials → Infrastructure → Components → Review) - Provision: real-time SSE log stream with phase tracker and progress bar - Success: kubeconfig download, service URLs, next-steps guide Shared UI: - Button, Input, Badge, Card, Separator, Tooltip, Checkbox, Switch, Progress, Dialog, DropdownMenu, Select, Avatar (all owned, Radix primitives) Widgets: - StepIndicator with animated connectors and completion state - CloudProviderCard with coming-soon state and tooltip Features: - Two runtime modes: saas (full auth) and selfhosted (direct to wizard) - Live credential validation UX with feedback states - Component dependency enforcement with tooltip explanations - Cost estimate per region derived from Hetzner node size selection - Cluster context names auto-derived from naming convention (NAMING-CONVENTION.md) Builds clean: tsc -b + vite build, 34KB CSS, 723KB JS (pre-split) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 12:34:03 +01:00
e3mrah	2df03b7ea3	feat(axon): add V1 query() alongside V2 session pool with profile routing - Add thinking, effort, profile fields to ChatCompletionRequest - Add chatV1() and chatV1Stream() using query() with persistSession=false - Route to V1 when thinking/effort params present or profile='deep' - V2 session pool unchanged; V1 runs stateless with native systemPrompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 12:26:34 +01:00
e3mrah	5b04c6bd02	fix: increase axon-valkey probe initialDelaySeconds to survive slow RDB load	2026-03-18 16:10:21 +01:00
e3mrah	cf4c37b2df	fix: use tcpSocket probes for axon-valkey — exec probes fail on k3s OCI runtime	2026-03-18 14:02:09 +01:00
e3mrah	baf2d8445d	fix(axon): persistent Valkey reconnect — never give up retryStrategy Previous retryStrategy(times > 5) returned null, permanently destroying the ioredis client after 5 failed reconnects. After idle, the TCP connection drops, all 5 retries fail, and every subsequent command throws 'Connection is closed'. Changes: - retryStrategy now retries indefinitely (max 30s interval) — connection is always restored when Valkey comes back - 'end' event handler restarts the client if ioredis somehow stops retrying - getValkey() returns null when client.status is 'end'/'close' so callers skip persistence gracefully instead of throwing - maxRetriesPerRequest: 3 kept — commands fail fast, background reconnect handles recovery Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 06:09:07 +01:00
e3mrah	63a86772f7	fix(axon): evict idle sessions older than 5 min before handing to caller Sessions whose Claude CLI subprocess has exited (idle > MAX_IDLE_MS) are recycled in acquire() rather than returned. This prevents all-stale-pool scenarios that caused WriteRecsActivity/ExtractIntentActivity to fail with 'Connection is closed' after Axon sits idle overnight. - Added lastUsed: number to PoolEntry, set on warmup and release - acquire() skips idle entries older than 5 min, recycles each one - release() stamps lastUsed so the TTL resets on every successful use Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 06:05:05 +01:00
e3mrah	5f0421e967	fix(axon): restore word-level streaming in chatStream Re-add 2-3 word chunk splitting with 25-60ms delays that was lost during the includePartialMessages refactor. Fixes the "10s wait then dump" UX. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 16:09:18 +01:00
e3mrah	bc069483af	fix(axon): restore working assistant message handler, revert broken includePartialMessages	2026-03-12 15:54:23 +01:00
e3mrah	6f81cc7e79	debug(axon): log msg.type from session.stream() to diagnose empty output	2026-03-12 15:51:42 +01:00
e3mrah	3f010b4cca	fix(axon): fallback to complete assistant msg if stream_event not emitted	2026-03-12 15:49:07 +01:00
e3mrah	9aba8fe80c	fix(axon): cast includePartialMessages to bypass older SDK type version	2026-03-12 15:45:23 +01:00
e3mrah	5113295960	feat(axon): enable real token streaming via includePartialMessages Set includePartialMessages: true on SDK sessions so stream() emits SDKPartialAssistantMessage (stream_event) carrying content_block_delta events. chatStream() now yields actual token text as it is generated instead of waiting for the complete response and fake-streaming it with word-splits and delays. This gives true token-by-token TTFT (~200ms first token) rather than the previous 3-8s wait for the full response before any text appeared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 15:34:24 +01:00
e3mrah	e783b9329c	fix(axon): use XML tags in formatPrompt to prevent injection detection The Claude Agent SDK reuses sessions across conversations. When the full system prompt was re-sent on subsequent turns wrapped in [System instructions] tags, Claude flagged it as a prompt injection attempt. Switch to XML-style tags (<context>, <conversation>) that Claude recognises as structured prompt sections. Add <new_conversation/> boundary marker to isolate reused sessions from prior context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 21:26:30 +01:00
e3mrah	b911a74e7a	feat(axon): progressive word-level streaming for chat completions The Claude Agent SDK yields complete assistant messages rather than individual token deltas. This change splits the full text into 2-3 word groups and yields them as separate SSE chunks with small random delays (25-60ms), giving a natural typing experience on the client. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 21:05:59 +01:00
e3mrah	8fb961c897	docs: rewrite Axon README as client integration guide SDK examples (Python, Node.js), API reference, model aliases, streaming, conversations, self-hosting instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:09:58 +01:00
e3mrah	643cdd9d29	Revert "feat: add request tracing spans to chat completion path" This reverts commit `a2685dd158`.	2026-03-04 11:39:29 +01:00
e3mrah	a2685dd158	feat: add request tracing spans to chat completion path Traces: convLookup, formatPrompt, acquire, send, firstMsg, stream, release, convStore — logged per request for profiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 11:20:54 +01:00
e3mrah	023ee6d5e4	chore: increase Axon resource limits for single-node overprovisioning Axon: 2 CPU / 2Gi memory limits (50m/128Mi requests) Valkey: 500m CPU / 256Mi memory limits (10m/32Mi requests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 11:08:10 +01:00
e3mrah	97009daf59	fix: set HOME env and mount credentials at subpath K8s doesn't set HOME from Dockerfile USER directive. Mount credential file at subpath to preserve debug/ directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:41:47 +01:00
e3mrah	5ff69bd41b	fix: use fixed UID 1001 for axon user in container K8s runAsNonRoot requires numeric UID. Pin to 1001 in both Containerfile and Helm chart deployment template. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:39:09 +01:00
e3mrah	215ba69272	fix: create .claude/debug directory in Axon container Claude Agent SDK writes debug logs to ~/.claude/debug/ which must exist before session creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:26:28 +01:00
e3mrah	fe2e349246	feat: add Axon Helm chart and CI workflow Helm chart for deploying Axon LLM gateway with Valkey backing store, Traefik ingress with TLS, and Claude auth volume mount. CI workflow builds container image on push to products/axon/ and pushes SHA-pinned tags to GHCR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:22:54 +01:00
talent-mesh	616914cf45	feat: OpenOva Axon — stateless SDK, Valkey state store, 100% OpenAI-compatible API Rewrite Axon SaaS LLM gateway with three core changes: 1. Session pool acquire/release pattern — sessions stay alive and are reused across requests instead of killed after one use. Turn counting with automatic recycling after 200 turns. 2. Valkey-backed conversation store — all conversation state (messages, metadata, TTL) lives in Valkey, not filesystem. Sessions are stateless workers; any session can serve any conversation. 3. 100% OpenAI /v1/chat/completions compatibility — accepts every OpenAI request parameter (temperature, top_p, stop, frequency_penalty, presence_penalty, logit_bias, logprobs, seed, tools, tool_choice, response_format, stream_options, max_completion_tokens, user, store, metadata). Response shape matches OpenAI exactly: chatcmpl-* id, system_fingerprint, logprobs:null, refusal:null, usage chunk in streaming. OpenAI model names (gpt-4o, gpt-4) auto-mapped to Claude. Axon extension: conversation_id field for multi-turn conversations backed by Valkey with 7-day TTL. GET /v1/conversations/:id for history. Includes E2E test suite (67 tests, scripts/e2e-test.sh). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:36:26 +04:00
talent-mesh	435f49738d	feat: restructure platform to 52 components and 9 products Technology forecast and strategic review restructure: - Remove 13 components (backstage, mongodb, activemq, vitess, airflow, camel, dapr, superset, searxng, langserve, trino, lago, rabbitmq) - Add 10 components (sigstore, syft-grype, nemo-guardrails, langfuse, reloader, matrix, ferretdb, litmus, livekit, coraza) - Rename product: Synapse → Axon (SaaS LLM Gateway) - Merge products: Titan + Fuse → Fabric (Data & Integration) - New product: Relay (Communication) - Replace Backstage with Catalyst IDP - Replace MongoDB with FerretDB (MongoDB wire protocol on CNPG) - Add supply chain security (Sigstore/Cosign, Syft+Grype) - Add AI safety and observability (NeMo Guardrails, LangFuse) - Add technology forecast 2027-2030 document - Full verification pass: zero stale references across all docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 21:00:19 +00:00
talent-mesh	10245dff98	feat: ecosystem expansion to 55 components with license compliance - Replace BSL-licensed components with open-source alternatives: Terraform→OpenTofu (MPL 2.0), Vault→OpenBao (MPL 2.0), Redpanda→Strimzi/Kafka (Apache 2.0), n8n→Airflow (Apache 2.0) - Add 14 new platform components: activemq, camel, clickhouse, dapr, debezium, falco, flink, iceberg, opensearch, rabbitmq, superset, temporal, trino, vitess - Rename meta-platforms/ to products/ with new product names: Cortex (AI Hub), Fingate (Open Banking), Titan (Data Lakehouse), Fuse (Microservices Integration) - Update all documentation, READMEs, and cross-references Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 18:15:11 +00:00

27 Commits