Security & Governance

For the full security architecture, threat model, and compliance mapping, see Security & Governance.

Auth Flow

Authentication flow diagram
Auth Flow Request authorization pipeline from client through JWT validation to bundle execution Client Request Bearer / Cookie Token Extraction header / cookie / SSE param Auth Provider Cognito · SimpleIDP · Delegated Role Resolution privileged → paid → registered Session + UserType user_id · user_type · roles tenant · project scope Cognito (production) Bearer + ID token SimpleIDP (dev) idp_users.json Delegated proxylogin token exchange

Security Features

FeatureDescription
Content FilteringClamAV scans file uploads. event_filter.py in your bundle controls which tool events are visible to clients — prevent sensitive tool results from leaking.
Bundle FirewallPer-bundle tool access rules. Restricts what tools a bundle can call at the platform level — independent of bundle code.
Gateway ControlsPer-endpoint throttling (guarded_rest_patterns), bypass patterns, backpressure, and circuit breakers. Configured in gateway.yaml.
Network IsolationAll services are private in the VPC. Only web-proxy is ALB-facing. TLS terminated at ALB (ECS) or at the proxy (Compose).

Enforcement Model

KDCube's runtime enforcement is pre-execution. Budget, rate, and tenant checks happen before the LLM call is charged and before any external system receives a request. Every agent request passes through a fixed 6-step gate sequence:

  1. Session Resolution — authenticate and resolve user/session context.
  2. Rate Limit — enforce per-session and per-user-type limits.
  3. Backpressure — evaluate queue/capacity admission thresholds.
  4. Circuit Breaker — fail-fast on unhealthy system states.
  5. Economics / Quota — budget reservation and commit gates before execution.
  6. Atomic Enqueue — admit or reject at the enqueue boundary with an auditable decision event.

This model differs fundamentally from output filtering (which operates after the LLM responds). Budget, rate, and isolation controls prevent wrong actions from executing — not just detect them after the fact. See Security → Enforcement Model for the full sequence diagram and compliance mapping.

Gateway & Admission Control

The gateway is the admission-control layer that sits in front of every chat and REST request. It enforces capacity, throttling, and health checks before any work is enqueued — tying directly into the 6-step enforcement model above (steps 2-6 all run inside the gateway).

Capacity Configuration

Gateway capacity is driven by GATEWAY_CONFIG_JSON and is component-aware — ingress and processor each declare their own slice:

ParameterScopePurpose
concurrent_requests_per_processPer processHow many tasks a single Uvicorn worker can handle simultaneously.
processes_per_instancePer service instanceDrives the Uvicorn worker count (ingress) or processor worker count (proc). Changes require a restart.
avg_processing_time_secondsPer componentFallback estimate used by backpressure calculations when live metrics are unavailable.

Configuration is stored in Redis per tenant/project. Updates published via POST /admin/gateway/update-config propagate to all replicas at runtime through a tenant-scoped pub/sub channel.

Throttling Rules

Rate limits are per session, keyed by session_id and scoped to the tenant/project in Redis. Each user type has independent burst and hourly caps:

User TypeBurst (per window)Hourly
anonymous10 / 60 s120
registered100 / 60 s2 000
paid150 / 60 s4 000
privileged300 / 60 sunlimited

Rate-limited requests receive a 429. Limits are independently configurable per component (ingress vs proc).

Backpressure Handling

Backpressure is per tenant/project (not per session). It evaluates the global queue depth plus healthy processor capacity derived from heartbeat data. Key thresholds include:

  • Pressure thresholds — per user type (e.g., anonymous at 60 %, paid at 90 %). When queue pressure exceeds the threshold for a user type, new requests from that tier are rejected with 503.
  • Hard limit — a system-wide ceiling (default 95-98 %) above which all requests are rejected regardless of user type.
  • Capacity buffer — a headroom percentage reserved so the system never saturates completely.

Backpressure is checked in two places: once in the gateway (BackpressureManager.check_capacity) and again at atomic enqueue (AtomicChatQueueManager.enqueue_chat_task_atomic). The double check prevents race conditions where pressure changes between gate and enqueue.

Circuit Breaker

The circuit breaker provides fail-fast behavior when the system is in an unhealthy state. If upstream dependencies (Redis, processor heartbeats) are degraded, the circuit opens and incoming requests are rejected immediately rather than queued into a system that cannot process them. This is step 4 in the enforcement model.

Guarded vs Bypass REST Patterns

By default, REST endpoints are session-only — they authenticate but skip throttling and backpressure. Heavy endpoints can be opted into full gateway protection via two regex lists in GATEWAY_CONFIG_JSON:

  • guarded_rest_patterns — endpoints that receive the same rate-limit and backpressure checks as chat ingestion (e.g., /resources/link-preview, /integrations/.../operations/...).
  • bypass_throttling_patterns — endpoints that should skip rate limiting entirely (e.g., ^/webhooks/stripe$).

Patterns are regex-based with optional suffix matching, so short patterns work across different API base prefixes. Both lists are component-aware (ingress vs proc).

Relationship to the 6-Step Enforcement Model

The gateway is the runtime implementation of steps 2 through 6. After session resolution (step 1), the gateway runs rate limiting, backpressure evaluation, circuit-breaker check, economics/quota reservation, and atomic enqueue in sequence. A failure at any step short-circuits the pipeline and returns an appropriate HTTP status (401, 403, 429, or 503) — no LLM call is made and no downstream side effect occurs.