How it works

With a platform to absorb productization burdens.

A gateway and control plane for admission, RBAC, and budgets; processor workers running ReAct v2 / Claude Code / bundle code; and an isolated execution sidecar for sandboxed Python and tools. Postgres and Redis underneath. Same code path from laptop to production VPC — so you can stay in prototyping mode while the platform handles tenancy, governance, observability, and the rest of the heavy lift.

Ingress + control plane

chat-ingress :8010 handles JWT/Cognito auth, rate limits, ClamAV scanning, admission. RBAC, per-customer budgets, and feature gates are enforced here — before anything is queued.

Gateway docs →

Processor workers

chat-proc :8020 dequeues from Redis and runs the agent loop — ReAct v2, Claude Code, or your own. Hot-reloadable bundles deploy without restarting workers.

Processor docs →

Isolated execution

py-code-exec runs untrusted code in ephemeral Docker sandboxes (or Fargate, distributed). @venv(...) declarations pin per-task dependencies.

Execution docs →

Timeline + context

One conv.timeline.v1 artifact per session: plan, sources, tool calls, costs. Three-checkpoint caching and edit-over-rewrite compaction keep long sessions cheap.

ReAct v2 docs →

Streaming relay

Redis Pub/Sub fans chat events out to SSE and Socket.IO with the same channel contract; REST for non-streaming endpoints.

Communication docs →

Deploy modes

Docker Compose, Kubernetes manifests, or ECS/Fargate via Terraform. Same images, three runways. Self-hosted on your VPC.

Deployment docs →

From laptop to production VPC, on the same code path. Tenancy, governance, and observability come standard.

GitHub repo →