Transport Contract
The platform exposes two real-time transports for browser clients. Both deliver the same event envelope; the only difference is the connection mechanism.
| Transport | Use case | Peer identifier |
|---|---|---|
| SSE | Standard browser apps. One-way server-to-client stream; chat requests sent via POST /sse/chat. |
Client-provided stream_id query param |
| Socket.IO | Apps that need bidirectional messaging or already use Socket.IO. | Connection sid (assigned by server) |
SSE Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/sse/stream | GET | Open the long-lived event stream. Requires stream_id query param. |
/sse/chat | POST | Send a chat message. Returns a synchronous acknowledgement (processing_started, followup_accepted, or steer_accepted). |
/sse/conv_status.get | POST | Request the current conversation status. |
SSE Stream Query Parameters
| Param | Required | Purpose |
|---|---|---|
stream_id | Yes | Unique peer identifier for this connection |
user_session_id | No | Reuse an existing authenticated session |
bearer_token | No | Access token fallback when headers are unavailable |
id_token | No | ID token fallback when headers are unavailable |
tenant | No | Override tenant for the stream |
project | No | Override project for the stream |
EventSource, and requires no extra library. Choose Socket.IO only when you need bidirectional event delivery from client to server beyond the POST /sse/chat send path.SSE Event Catalog
Each SSE frame carries an event name (transport route) and a JSON data payload with a semantic type. The event name tells the browser which listener fires; the payload type tells your code what the event means.
SSE Event Names
| SSE event | Payload type | Purpose |
|---|---|---|
ready | — | Stream is open and authenticated. Payload includes session_id, user_type, stream_id. |
chat_start | chat.start | Turn accepted and processing started. |
chat_step | chat.step or custom | Structured step update (progress, tool results, decisions). |
chat_delta | chat.delta | Streaming text chunks (answer, thinking, artifacts). |
chat_complete | chat.complete | Turn completed. Contains data.final_answer and optional data.followups. |
chat_error | chat.error | Turn failed. Contains data.error, optional data.error_type. |
chat_service | chat.service, gateway.*, rate_limit.* | Service-level events: rate limits, gateway rejections, queue status. |
conv_status | conv.status | Conversation state snapshot (idle, in_progress, error). |
server_shutdown | — | Server is draining. Reconnect after a short delay. |
Common Envelope Shape
All chat events share this JSON structure:
{
"type": "chat.step",
"timestamp": "2026-02-26T21:14:05.267Z",
"ts": 1700000000000,
"service": {
"request_id": "...", "tenant": "...",
"user": "...", "user_type": "registered"
},
"conversation": {
"session_id": "...", "conversation_id": "...",
"turn_id": "..."
},
"event": {
"agent": "...", "step": "...",
"status": "started|running|completed|error",
"title": "...", "markdown": "..."
},
"data": { },
"delta": { },
"extra": { }
}
Delta Markers
Streaming chunks (chat_delta) use a marker field to fan out to different UI channels:
| Marker | Meaning | Typical usage |
|---|---|---|
answer | Assistant response stream | Main answer text rendered in the chat bubble |
thinking | Reasoning stream | Internal analysis, shown in a collapsible panel |
canvas | Artifact stream | Documents, rendered HTML/JSON content. Uses extra.artifact_name for grouping. |
timeline_text | Timeline stream | Short status entries for an activity log |
subsystem | Structured JSON payloads | Widgets and tools. Routed by extra.sub_type (e.g. code_exec.status, web_search.filtered_results). |
Each delta chunk looks like:
{
"delta": {
"text": "Here is the answer.",
"index": 0,
"marker": "answer",
"completed": false
},
"extra": {
"format": "markdown",
"artifact_name": "...",
"sub_type": "..."
}
}
delta.completed is true, the server has finished sending chunks for that marker/artifact. Close the corresponding UI stream.Usage and Token Counting
After a turn completes, the server emits an accounting.usage event (on the chat_step route) containing a cost breakdown:
{
"type": "accounting.usage",
"data": {
"breakdown": [ ... ],
"cost_total_usd": 0.0042
},
"event": {
"step": "accounting",
"markdown": "Token usage: 1,240 in / 380 out"
}
}
Socket.IO Events
Connection Setup
Connect to the platform namespace and pass authentication fields in the auth payload:
const socket = io(baseUrl, {
auth: {
bearer_token: accessToken,
id_token: idToken,
user_session_id: sessionId, // optional: reuse existing session
tenant: "my-tenant", // optional override
project: "my-project" // optional override
}
});
On successful connection, the server assigns a sid that acts as the peer stream identifier for targeted event delivery (equivalent to SSE's stream_id).
Event Names and Payloads
Socket.IO events use the same semantic envelope as SSE. The event names match the SSE transport routes:
| Event | Direction | Payload |
|---|---|---|
ready | Server → Client | Session info: session_id, user_type, stream_id |
chat_start | Server → Client | Same envelope as SSE chat_start |
chat_step | Server → Client | Same envelope as SSE chat_step |
chat_delta | Server → Client | Same envelope as SSE chat_delta |
chat_complete | Server → Client | Same envelope as SSE chat_complete |
chat_error | Server → Client | Same envelope as SSE chat_error |
chat_service | Server → Client | Same envelope as SSE chat_service |
conv_status | Server → Client | Same envelope as SSE conv_status |
server_shutdown | Server → Client | Drain signal; reconnect with backoff |
Namespace and Room Patterns
Events are scoped to the authenticated session. The server manages rooms internally based on session_id. Clients do not join or leave rooms manually. Broadcast events go to all peers in the session room; peer-targeted events go only to the specific sid.
/api/integrations/* and includes the KDC-Stream-ID header with the Socket.IO sid, bundle-emitted events will target only that peer instead of broadcasting to the entire session.Authentication for Clients
The server resolves credentials in a fixed priority order. The first source that provides a token wins.
1. Explicit Headers (highest priority)
Set on REST, SSE POST, and integration requests:
Authorization: Bearer <token> | Access token |
X-ID-Token | ID token |
User-Session-ID | Reuse an existing session |
2. SSE / Socket.IO Auth Payload
When headers are unavailable (e.g. EventSource does not support custom headers), pass tokens as query params on the SSE stream URL or in the Socket.IO auth object:
bearer_token | Access token |
id_token | ID token |
3. Cookies (lowest priority)
Fallback for cookie-based / proxylogin deployments. The browser sends these automatically:
__Secure-LATC | Access token cookie |
__Secure-LITC | ID token cookie |
Useful Request Headers
| Header | Purpose |
|---|---|
KDC-Stream-ID | Peer identifier for targeted event delivery from REST/integration calls |
X-User-Timezone | User timezone (e.g. America/New_York) for server-formatted messages |
X-User-UTC-Offset | UTC offset in minutes |
Response Headers to Observe
| Header | Action |
|---|---|
X-Session-ID | Store and reuse to maintain session continuity |
X-User-Type | Resolved user type for the request |
Retry-After | Honor on 429 and 503 responses before retrying |
Token Refresh Pattern
When you receive a 401 or 403, refresh your access token through your identity provider and retry the request. For SSE streams, close the current EventSource, obtain fresh tokens, and reconnect with the new credentials. Keep stream_id stable across reconnects so the server can associate the new connection with the same peer.
Error Handling & Reconnection
SSE Reconnect Strategy
Use exponential backoff with jitter. The server does not guarantee sticky connections — any replica may serve your reconnect.
delay = min(30s, 2attempt + jitter(0..1s))
| Signal | Meaning | Action |
|---|---|---|
server_shutdown event |
Instance is draining | Close stream immediately. Reconnect after 1–2s + jitter. |
| Connection drop (no event) | Network issue or scaled-down replica | Reconnect with exponential backoff (start 1–2s, cap 30s). |
HTTP 503 with {"status":"draining"} |
Instance is draining (on REST calls) | Retry after 1–3s + jitter. |
Rate Limit Responses
Rate limits arrive as chat_service events and/or HTTP status codes:
| HTTP Status | Meaning | Action |
|---|---|---|
429 | Rate limit exceeded | Back off 2–5s + jitter. Honor Retry-After header. Max 5 retries. |
503 | Backpressure or draining | Back off 1–3s + jitter. Do not retry immediately. |
401 / 403 | Auth missing or invalid | Refresh tokens or redirect to login. |
In-stream rate-limit events (rate_limit.denied, rate_limit.warning) include a data.rate_limit object with retry_after_sec, reset_text, and a ready-to-display user_message. Prefer showing user_message directly.
Backpressure Signals
Gateway-level rejections arrive on chat_service with types such as gateway.backpressure, gateway.rate_limit, and gateway.circuit_breaker. These indicate the ingress is protecting the backend. Back off and retry.
Turn Interruption
If the processing worker dies after a turn has started, you may have already rendered partial chat_delta content. The server signals interruption with:
conv_statuswithdata.completion = "interrupted"chat_errorwithdata.error_type = "turn_interrupted"
Keep partial output visible, mark the turn as failed, and offer the user a manual retry. Do not auto-resubmit.
Multi-Tab Coordination
Leader Election
Use localStorage or BroadcastChannel to elect a single leader tab. Only the leader maintains the SSE connection. Follower tabs read events from shared storage or request on demand.
Burst Control
Coalesce requests on page load (aim for fewer than 10–15 requests in the first 10 seconds). Serialize chat sends — never fire concurrent POST /sse/chat calls. If polling is unavoidable, use intervals of 5–10s minimum.
Draining / Maintenance Mode
When the platform enters a drain cycle, active SSE streams receive a server_shutdown event with reason: "draining". REST endpoints return 503. This is expected, not fatal. Close connections gracefully and reconnect after a short delay. The load balancer will route you to a healthy replica.
session_id and auth tokens consistent, and the server will associate your requests correctly regardless of which instance handles them.