- Fix currentStep initial value (0→1) so wizard advances correctly on first Continue
- Add `code` + `countryCode` to HETZNER_REGIONS; drop deprecated `role: primary|dr` from Region model
- Use region code in cluster context names: hz-fsn-rtz-prod (not hz-fsn1-rtz-prod)
- Derive success page URLs and kubeconfig from wizard store (orgDomain, region)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add thinking, effort, profile fields to ChatCompletionRequest
- Add chatV1() and chatV1Stream() using query() with persistSession=false
- Route to V1 when thinking/effort params present or profile='deep'
- V2 session pool unchanged; V1 runs stateless with native systemPrompt
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous retryStrategy(times > 5) returned null, permanently destroying the
ioredis client after 5 failed reconnects. After idle, the TCP connection drops,
all 5 retries fail, and every subsequent command throws 'Connection is closed'.
Changes:
- retryStrategy now retries indefinitely (max 30s interval) — connection
is always restored when Valkey comes back
- 'end' event handler restarts the client if ioredis somehow stops retrying
- getValkey() returns null when client.status is 'end'/'close' so callers
skip persistence gracefully instead of throwing
- maxRetriesPerRequest: 3 kept — commands fail fast, background reconnect
handles recovery
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sessions whose Claude CLI subprocess has exited (idle > MAX_IDLE_MS) are
recycled in acquire() rather than returned. This prevents all-stale-pool
scenarios that caused WriteRecsActivity/ExtractIntentActivity to fail with
'Connection is closed' after Axon sits idle overnight.
- Added lastUsed: number to PoolEntry, set on warmup and release
- acquire() skips idle entries older than 5 min, recycles each one
- release() stamps lastUsed so the TTL resets on every successful use
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Re-add 2-3 word chunk splitting with 25-60ms delays that was lost during
the includePartialMessages refactor. Fixes the "10s wait then dump" UX.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set includePartialMessages: true on SDK sessions so stream() emits
SDKPartialAssistantMessage (stream_event) carrying content_block_delta
events. chatStream() now yields actual token text as it is generated
instead of waiting for the complete response and fake-streaming it
with word-splits and delays.
This gives true token-by-token TTFT (~200ms first token) rather than
the previous 3-8s wait for the full response before any text appeared.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Claude Agent SDK reuses sessions across conversations. When the
full system prompt was re-sent on subsequent turns wrapped in
[System instructions] tags, Claude flagged it as a prompt injection
attempt. Switch to XML-style tags (<context>, <conversation>) that
Claude recognises as structured prompt sections. Add <new_conversation/>
boundary marker to isolate reused sessions from prior context.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Claude Agent SDK yields complete assistant messages rather than
individual token deltas. This change splits the full text into 2-3
word groups and yields them as separate SSE chunks with small random
delays (25-60ms), giving a natural typing experience on the client.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SDK examples (Python, Node.js), API reference, model aliases,
streaming, conversations, self-hosting instructions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Traces: convLookup, formatPrompt, acquire, send, firstMsg,
stream, release, convStore — logged per request for profiling.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
K8s doesn't set HOME from Dockerfile USER directive. Mount
credential file at subpath to preserve debug/ directory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
K8s runAsNonRoot requires numeric UID. Pin to 1001 in both
Containerfile and Helm chart deployment template.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Agent SDK writes debug logs to ~/.claude/debug/ which must
exist before session creation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Helm chart for deploying Axon LLM gateway with Valkey backing store,
Traefik ingress with TLS, and Claude auth volume mount.
CI workflow builds container image on push to products/axon/ and pushes
SHA-pinned tags to GHCR.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite Axon SaaS LLM gateway with three core changes:
1. Session pool acquire/release pattern — sessions stay alive and are
reused across requests instead of killed after one use. Turn counting
with automatic recycling after 200 turns.
2. Valkey-backed conversation store — all conversation state (messages,
metadata, TTL) lives in Valkey, not filesystem. Sessions are stateless
workers; any session can serve any conversation.
3. 100% OpenAI /v1/chat/completions compatibility — accepts every OpenAI
request parameter (temperature, top_p, stop, frequency_penalty,
presence_penalty, logit_bias, logprobs, seed, tools, tool_choice,
response_format, stream_options, max_completion_tokens, user, store,
metadata). Response shape matches OpenAI exactly: chatcmpl-* id,
system_fingerprint, logprobs:null, refusal:null, usage chunk in
streaming. OpenAI model names (gpt-4o, gpt-4) auto-mapped to Claude.
Axon extension: conversation_id field for multi-turn conversations
backed by Valkey with 7-day TTL. GET /v1/conversations/:id for history.
Includes E2E test suite (67 tests, scripts/e2e-test.sh).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>