Deployment Options

Deployment options overview diagram
Deployment Options Overview of KDCube deployment modes including local Docker Compose and AWS ECS Platform Dev • Services run on host • infra via local-infra-stack • Fast iteration cycles • For SDK / platform devs DEV All-in-One Compose • Full stack in Docker Compose • Bootstrap with kdcube-setup • Best for bundle dev • Platform evaluation LOCAL Custom UI + Infra • KDCube + custom frontend • Postgres/Redis external • OpenResty auth templates • Cognito / Delegated auth PROD-READY AWS ECS (Production) • Descriptor-driven via Terraform • chat-proc on EC2-backed tasks • ALB + ACM TLS, Cloud Map • GitHub Actions CI/CD PRODUCTION
⚠️
Fargate has 10–30s startup latency — suitable for batch processing only. Interactive agentic loops require EC2-backed tasks. Use Docker exec mode (default) for live agent interactions.

ECS Topology

ECS topology diagram
ECS Topology AWS ECS deployment topology showing load balancer, ingress, processor, and data services AWS VPC Client HTTPS :443 ALB ACM TLS web-proxy OpenResty :80 ECS Cluster web-ui Fargate chat-ingress Fargate chat-proc EC2-backed proxylogin Fargate Cloud Map service discovery RDS PostgreSQL Redis ElastiCache EFS Bundle storage ECR images

Monitoring & Observability

The platform exposes operational signals through a combination of log streams, Redis-backed queue metrics, and a health endpoint. These signals are intended for operators monitoring a deployed instance.

Queue Depth

The chat-proc worker consumes a Redis-backed message queue. Queue depth (number of pending messages) is observable via standard Redis list length commands against the configured queue key. Elevated queue depth indicates processor backpressure or under-provisioning.

Processor Health

Each processor instance reports liveness through its internal health check. In ECS deployments the ALB target group tracks instance health; in Compose deployments a container health check is defined. Unhealthy processor instances are replaced or restarted by the orchestrator.

Log Streams

All platform services emit structured log output to stdout, captured by the container runtime. In ECS deployments logs flow to CloudWatch Logs; in Compose deployments they are available via docker compose logs. Log lines include request IDs, tenant/project scope, bundle ID, and error classification where applicable.

Metrics Endpoint

The Operations REST API (served by chat-proc) exposes operational data accessible to authorized callers. Bundle-level metrics (turn counts, economics state, error rates) can be queried via the Operations API using a privileged token. Prometheus-format scrape endpoints are not currently built in but can be added as a bundle operation.

ℹ️
Monitoring capabilities described here reflect what is documented in source. Dedicated metrics infrastructure (Prometheus, OpenTelemetry export) is not yet built in — see Roadmap for planned observability improvements.

Proxy Configuration

The OpenResty reverse proxy (web-proxy) sits in front of all backend services and handles TLS termination, security headers, rate limiting, auth cookie unmasking, and path-based routing.

Request Pipeline

  1. 1

    TLS termination

    Unwraps HTTPS, redirects HTTP to HTTPS, blocks direct IP access (returns 444). Injects HSTS, X-Frame-Options, X-XSS-Protection, and Referrer-Policy headers. Gzip compression on all text responses.

  2. 2

    Rate limiting

    Per-IP limit_req zones for chat, KB, upload, and auth routes. Runs before auth unmask so DDoS bursts are dropped before burning a proxylogin round-trip.

  3. 3

    Auth cookie unmask

    For every protected route, an access_by_lua call to unmask_token() issues an internal subrequest to proxylogin /v1/unmask, extracts real session cookies, and injects them into the upstream request.

  4. 4

    Upstream routing

    Path-based dispatch to backend services. SSE locations disable buffering with a 600s timeout; WebSocket locations handle protocol upgrade; SPA routes use a 404 fallback.

Upstream Backends

BackendAddressRoutes
web-uiweb-ui:80/chatbot/*, SPA fallback
proxyloginproxylogin/auth/*, internal /auth/unmask
chat-ingresschat-ingress:8010/sse/, /api/chat/, /api/cb/*, /admin/*, /monitoring
chat-procchat-proc:8020/api/integrations/, /admin/integrations/
kbkb:8000/api/kb/

Rate Limit Zones

ZoneRateApplied To
chat_api_zone10 r/s/api/chat/, /sse/
kb_api_zone10 r/s/api/kb/
monitoring_api10 r/s/monitoring
upload2 r/sUpload endpoints
ℹ️
Rate limit directives are defined but commented out by default. Enable per location with limit_req zone=chat_api_zone burst=20 nodelay;. For anti-DDoS hardening, add limit_conn zones, timeout hardening, and auth-specific rate zones. See the proxy ops guide for WAF options (lua-resty-waf or ModSecurity + OWASP CRS).

Connection Pooling

Redis and Postgres connection pools are created once per worker process during FastAPI lifespan startup and stored in app.state. Total connections scale with worker count.

Redis Pools

ComponentPools per WorkerMax Connections Formula
ingress / metrics3 (async, async_decode, sync)3 × redis_max_connections
proc1 (shared async)redis_max_connections

Pool size is controlled by GATEWAY_CONFIG_JSON.pools.<component>.redis_max_connections. If unset, the pool is unbounded and grows with load. Pub/sub and blocking calls consume connections from the pool, not on top of the cap.

Postgres Pool

Each worker creates one asyncpg pool. Size is controlled by pools.<component>.pg_pool_max_size and pg_pool_min_size. If not set, it defaults to service_capacity.<component>.concurrent_requests_per_process.

# Total connections per instance
pg_conns_per_instance = workers × pg_pool_max_size
redis_conns_per_instance = workers × pools_per_worker × redis_max_connections

Connection Lifecycle

Shared Redis clients set socket_connect_timeout=5, health_check_interval=30, socket_keepalive=True, and retry_on_timeout=True. A centralized Redis health monitor tracks PING health and emits up/down events. On reconnect, the service automatically rebuilds SSE relay subscriptions and reconnects gateway config pub/sub listeners.

⚠️
When you raise processes_per_instance, total Postgres and Redis connections increase proportionally. Ensure total_pg_connections fits within max_connections on the database and total_redis_connections fits within ElastiCache maxclients.

Health Checks

Every service exposes a GET /health endpoint for use by orchestrators (ECS target groups, Docker Compose health checks) and load balancers.

Endpoints by Service

ServiceEndpointChecksReadiness
chat-ingressGET /healthService up, draining flag, Socket.IO enabled, SSE enabled, instance ID200 healthy / 503 draining
chat-procGET /healthService up, draining flag, git bundle readiness + errors, instance ID200 healthy & bundles ready / 503 draining or bundles not ready
metricsGET /healthService up200 healthy
kbGET /api/kb/healthKB stats, orchestrator health, queue stats, storage path200 healthy / 503 unavailable
kb (process)GET /api/kb/health/processPer-process capacity200 healthy / 503 unavailable

Readiness vs Liveness

All health endpoints serve as both liveness (is the process running?) and readiness (can it accept traffic?) probes. The draining flag differentiates the two: a draining instance returns 503 so the load balancer stops sending new requests while in-flight work completes. For chat-proc, readiness additionally requires that git bundles have been cloned and are error-free.

ℹ️
In ECS deployments, the ALB target group tracks instance health automatically. In Compose deployments, configure a container health check against the /health endpoint. Unhealthy instances are replaced or restarted by the orchestrator.