How it works

One runtime. Inside the org, public, or on-prem.

KDCube uses the same code path everywhere. A gateway handles admission, auth, and budgets. Workers run the bundle code. An isolated execution sidecar sandboxes any code the app generates. Postgres and Redis underneath. Push a new bundle version and the runtime hot-reloads it.

Ingress + control plane

chat-ingress :8010 handles JWT/Cognito auth, rate limits, ClamAV scanning, admission. RBAC, per-customer budgets, and feature gates are enforced here — before anything is queued.

Gateway docs →

Processor workers

chat-proc :8020 dequeues from Redis and runs the agent loop — ReAct v3, Claude Code, or your own. Hot-reloadable bundles deploy without restarting workers.

Processor docs →

Isolated execution

py-code-exec runs untrusted code in ephemeral Docker sandboxes (or Fargate, distributed). @venv(...) declarations pin per-task dependencies.

Execution docs →

Timeline + context

One conv.timeline.v1 artifact per session: plan, sources, tool calls, costs. Three-checkpoint caching and edit-over-rewrite compaction keep long sessions cheap.

ReAct v3 docs →

Streaming relay

Redis Pub/Sub fans chat events out to SSE and Socket.IO with the same channel contract; REST for non-streaming endpoints.

Communication docs →

Deploy modes

Docker Compose, Kubernetes manifests, or ECS/Fargate via Terraform. Same images, three runways. Self-hosted on your VPC.

Deployment docs →

From laptop to production VPC, on the same code path. Tenancy, governance, and observability come standard.

GitHub repo →