openova

Author	SHA1	Message	Date
e3mrah	69706a80ec	feat(axon): make qwen3-coder thinking mode toggleable via request parameter Client sends `thinking: true` to enable reasoning tokens. Default remains disabled for instant streaming. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 09:20:33 +02:00
e3mrah	63fc7a381f	fix(axon): disable qwen3-coder thinking mode for instant streaming Qwen3-coder generates hundreds of `reasoning` tokens before `content` tokens, causing 10+ second perceived delay. The reasoning tokens stream through Axon but the ChatWidget only renders `delta.content`, so users see a long pause then a burst. Passing `enable_thinking: false` via chat_template_kwargs skips the reasoning phase entirely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 09:08:47 +02:00
e3mrah	5201bdc962	fix(axon): tighten WAF payload limits — system 4000, assistant 800, total 8000 3-turn conversations passed at ~9120 chars but 4-turn failed at ~10640. WAF anomaly threshold is between those values. Lowered all limits to keep multi-turn conversations well under the threshold. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 08:52:04 +02:00
e3mrah	00ddc1437c	fix(axon): cap assistant messages and total payload to prevent WAF rejection on long conversations WAF anomaly scoring accumulates across the entire request body. After 2-3 turns, assistant responses containing infrastructure terms (security, scanning, etc.) push the total past the threshold. Added per-assistant trim (1500 chars) and a 12000-char sliding window that drops oldest messages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 08:44:33 +02:00
e3mrah	40c4abe4f6	fix(axon): deduplicate system messages before forwarding to vLLM vLLM requires system messages to be at the beginning. When Axon merges conversation history with new messages, duplicate system messages cause a 400 error. Strip all but the first system message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 08:35:28 +02:00
e3mrah	4110161577	fix(axon): trim large system prompts to avoid vLLM WAF rejection The vLLM backend at Bank Dhofar runs behind an Istio/Envoy WAF with ModSecurity-style anomaly scoring. The ChatWidget's 41KB system prompt accumulates enough infrastructure/security keywords to trigger a 403. Trim system messages to 6000 chars (70% head + 30% tail) before forwarding to vLLM — preserves identity/behavior instructions at the start and FAQ/response guidelines at the end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 08:27:14 +02:00
e3mrah	85e1319e01	fix(axon): resolve unknown model names to vLLM default Clients (e.g. ChatWidget) send OpenAI model names like gpt-4o-mini which vLLM doesn't recognize. The provider now queries available models on startup and remaps any unrecognized name to the configured default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 07:54:07 +02:00
e3mrah	68fcbe1aed	feat(axon): add toggleable vLLM provider backend Introduces a provider abstraction so Axon can proxy to either Claude SDK (existing behavior) or a vLLM-compatible endpoint. Toggled via AXON_PROVIDER env var ("claude" \| "vllm"). When vllm, requests pass through as-is (no prompt translation), session pool and OAuth are skipped. Closes openova-io/openova#36 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-26 07:36:58 +02:00
e3mrah	dd2e9b1de3	fix(axon): handle missing credentials file in token refresh Skip refresh gracefully when .credentials.json doesn't exist (e.g. CI smoke test with no Claude auth mounted). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:08:28 +02:00
e3mrah	0cfe1bc361	feat(axon): add OAuth token refresh on startup and periodic timer The Claude Agent SDK does not refresh OAuth tokens. Axon now: 1. Refreshes the token on startup before creating session pool 2. Runs a periodic refresh every 4 hours 3. Writes refreshed credentials to disk so session subprocesses use them Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:07:07 +02:00
e3mrah	2da38e9f7a	feat(axon): CronJob for automatic OAuth token refresh The Claude Agent SDK does not handle OAuth token refresh. Adds a CronJob (every 4h) that refreshes the token via Anthropic's OAuth endpoint and updates the K8s secret. Disabled by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 14:44:40 +02:00
e3mrah	9a878336e3	revert: remove accidentally committed untracked files	2026-03-30 14:44:16 +02:00
e3mrah	52358eb8e7	feat(axon): CronJob for automatic OAuth token refresh The Claude Agent SDK does not handle OAuth token refresh — it reads the accessToken from .credentials.json and uses it directly. When the token expires (~8h), Axon returns 401 until manually refreshed. Adds a CronJob (every 4h by default) that: 1. Reads the refreshToken from the K8s secret 2. Calls Anthropic's OAuth token endpoint to get a fresh accessToken 3. Updates the K8s secret with the new credentials 4. Restarts the Axon deployment to pick up the new token Includes ServiceAccount, Role, and RoleBinding for least-privilege access. Disabled by default (axon.tokenRefresh.enabled: false). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 14:42:51 +02:00
e3mrah	a33cd5f9d3	fix(axon): writable credentials mount for OAuth token refresh The credentials were mounted as a read-only K8s secret subPath. When the Claude SDK refreshed the OAuth token, it couldn't persist the new token back to disk. On pod restart, the stale expired token was loaded again, causing 401 auth failures. Fix: initContainer copies credentials from secret to a writable emptyDir volume. The SDK can now refresh tokens and persist them within the pod lifecycle. Also creates the debug/ directory the SDK requires. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 14:38:10 +02:00
e3mrah	361db09507	fix(axon): add valkey maxmemory config to prevent OOM crash loop Valkey was crash-looping (372 restarts) because the 521MB RDB exceeded the 512Mi memory limit. Adds maxmemory and maxmemory-policy args to the valkey deployment template with configurable defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 06:37:42 +01:00
e3mrah	2df03b7ea3	feat(axon): add V1 query() alongside V2 session pool with profile routing - Add thinking, effort, profile fields to ChatCompletionRequest - Add chatV1() and chatV1Stream() using query() with persistSession=false - Route to V1 when thinking/effort params present or profile='deep' - V2 session pool unchanged; V1 runs stateless with native systemPrompt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 12:26:34 +01:00
e3mrah	5b04c6bd02	fix: increase axon-valkey probe initialDelaySeconds to survive slow RDB load	2026-03-18 16:10:21 +01:00
e3mrah	cf4c37b2df	fix: use tcpSocket probes for axon-valkey — exec probes fail on k3s OCI runtime	2026-03-18 14:02:09 +01:00
e3mrah	baf2d8445d	fix(axon): persistent Valkey reconnect — never give up retryStrategy Previous retryStrategy(times > 5) returned null, permanently destroying the ioredis client after 5 failed reconnects. After idle, the TCP connection drops, all 5 retries fail, and every subsequent command throws 'Connection is closed'. Changes: - retryStrategy now retries indefinitely (max 30s interval) — connection is always restored when Valkey comes back - 'end' event handler restarts the client if ioredis somehow stops retrying - getValkey() returns null when client.status is 'end'/'close' so callers skip persistence gracefully instead of throwing - maxRetriesPerRequest: 3 kept — commands fail fast, background reconnect handles recovery Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 06:09:07 +01:00
e3mrah	63a86772f7	fix(axon): evict idle sessions older than 5 min before handing to caller Sessions whose Claude CLI subprocess has exited (idle > MAX_IDLE_MS) are recycled in acquire() rather than returned. This prevents all-stale-pool scenarios that caused WriteRecsActivity/ExtractIntentActivity to fail with 'Connection is closed' after Axon sits idle overnight. - Added lastUsed: number to PoolEntry, set on warmup and release - acquire() skips idle entries older than 5 min, recycles each one - release() stamps lastUsed so the TTL resets on every successful use Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 06:05:05 +01:00
e3mrah	5f0421e967	fix(axon): restore word-level streaming in chatStream Re-add 2-3 word chunk splitting with 25-60ms delays that was lost during the includePartialMessages refactor. Fixes the "10s wait then dump" UX. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 16:09:18 +01:00
e3mrah	bc069483af	fix(axon): restore working assistant message handler, revert broken includePartialMessages	2026-03-12 15:54:23 +01:00
e3mrah	6f81cc7e79	debug(axon): log msg.type from session.stream() to diagnose empty output	2026-03-12 15:51:42 +01:00
e3mrah	3f010b4cca	fix(axon): fallback to complete assistant msg if stream_event not emitted	2026-03-12 15:49:07 +01:00
e3mrah	9aba8fe80c	fix(axon): cast includePartialMessages to bypass older SDK type version	2026-03-12 15:45:23 +01:00
e3mrah	5113295960	feat(axon): enable real token streaming via includePartialMessages Set includePartialMessages: true on SDK sessions so stream() emits SDKPartialAssistantMessage (stream_event) carrying content_block_delta events. chatStream() now yields actual token text as it is generated instead of waiting for the complete response and fake-streaming it with word-splits and delays. This gives true token-by-token TTFT (~200ms first token) rather than the previous 3-8s wait for the full response before any text appeared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 15:34:24 +01:00
e3mrah	e783b9329c	fix(axon): use XML tags in formatPrompt to prevent injection detection The Claude Agent SDK reuses sessions across conversations. When the full system prompt was re-sent on subsequent turns wrapped in [System instructions] tags, Claude flagged it as a prompt injection attempt. Switch to XML-style tags (<context>, <conversation>) that Claude recognises as structured prompt sections. Add <new_conversation/> boundary marker to isolate reused sessions from prior context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 21:26:30 +01:00
e3mrah	b911a74e7a	feat(axon): progressive word-level streaming for chat completions The Claude Agent SDK yields complete assistant messages rather than individual token deltas. This change splits the full text into 2-3 word groups and yields them as separate SSE chunks with small random delays (25-60ms), giving a natural typing experience on the client. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 21:05:59 +01:00
e3mrah	8fb961c897	docs: rewrite Axon README as client integration guide SDK examples (Python, Node.js), API reference, model aliases, streaming, conversations, self-hosting instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 13:09:58 +01:00
e3mrah	643cdd9d29	Revert "feat: add request tracing spans to chat completion path" This reverts commit `a2685dd158`.	2026-03-04 11:39:29 +01:00
e3mrah	a2685dd158	feat: add request tracing spans to chat completion path Traces: convLookup, formatPrompt, acquire, send, firstMsg, stream, release, convStore — logged per request for profiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 11:20:54 +01:00
e3mrah	023ee6d5e4	chore: increase Axon resource limits for single-node overprovisioning Axon: 2 CPU / 2Gi memory limits (50m/128Mi requests) Valkey: 500m CPU / 256Mi memory limits (10m/32Mi requests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 11:08:10 +01:00
e3mrah	97009daf59	fix: set HOME env and mount credentials at subpath K8s doesn't set HOME from Dockerfile USER directive. Mount credential file at subpath to preserve debug/ directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:41:47 +01:00
e3mrah	5ff69bd41b	fix: use fixed UID 1001 for axon user in container K8s runAsNonRoot requires numeric UID. Pin to 1001 in both Containerfile and Helm chart deployment template. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:39:09 +01:00
e3mrah	215ba69272	fix: create .claude/debug directory in Axon container Claude Agent SDK writes debug logs to ~/.claude/debug/ which must exist before session creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:26:28 +01:00
e3mrah	fe2e349246	feat: add Axon Helm chart and CI workflow Helm chart for deploying Axon LLM gateway with Valkey backing store, Traefik ingress with TLS, and Claude auth volume mount. CI workflow builds container image on push to products/axon/ and pushes SHA-pinned tags to GHCR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:22:54 +01:00
talent-mesh	616914cf45	feat: OpenOva Axon — stateless SDK, Valkey state store, 100% OpenAI-compatible API Rewrite Axon SaaS LLM gateway with three core changes: 1. Session pool acquire/release pattern — sessions stay alive and are reused across requests instead of killed after one use. Turn counting with automatic recycling after 200 turns. 2. Valkey-backed conversation store — all conversation state (messages, metadata, TTL) lives in Valkey, not filesystem. Sessions are stateless workers; any session can serve any conversation. 3. 100% OpenAI /v1/chat/completions compatibility — accepts every OpenAI request parameter (temperature, top_p, stop, frequency_penalty, presence_penalty, logit_bias, logprobs, seed, tools, tool_choice, response_format, stream_options, max_completion_tokens, user, store, metadata). Response shape matches OpenAI exactly: chatcmpl-* id, system_fingerprint, logprobs:null, refusal:null, usage chunk in streaming. OpenAI model names (gpt-4o, gpt-4) auto-mapped to Claude. Axon extension: conversation_id field for multi-turn conversations backed by Valkey with 7-day TTL. GET /v1/conversations/:id for history. Includes E2E test suite (67 tests, scripts/e2e-test.sh). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 18:36:26 +04:00
talent-mesh	435f49738d	feat: restructure platform to 52 components and 9 products Technology forecast and strategic review restructure: - Remove 13 components (backstage, mongodb, activemq, vitess, airflow, camel, dapr, superset, searxng, langserve, trino, lago, rabbitmq) - Add 10 components (sigstore, syft-grype, nemo-guardrails, langfuse, reloader, matrix, ferretdb, litmus, livekit, coraza) - Rename product: Synapse → Axon (SaaS LLM Gateway) - Merge products: Titan + Fuse → Fabric (Data & Integration) - New product: Relay (Communication) - Replace Backstage with Catalyst IDP - Replace MongoDB with FerretDB (MongoDB wire protocol on CNPG) - Add supply chain security (Sigstore/Cosign, Syft+Grype) - Add AI safety and observability (NeMo Guardrails, LangFuse) - Add technology forecast 2027-2030 document - Full verification pass: zero stale references across all docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 21:00:19 +00:00

38 Commits