Security & Governance

Auth Flow

Security Features

Feature	Description
Content Filtering	ClamAV scans file uploads. `event_filter.py` in your bundle controls which tool events are visible to clients — prevent sensitive tool results from leaking.
Bundle Firewall	Per-bundle tool access rules. Restricts what tools a bundle can call at the platform level — independent of bundle code.
Gateway Controls	Per-endpoint throttling (`guarded_rest_patterns`), bypass patterns, backpressure, and circuit breakers. Configured in `gateway.yaml`.
Network Isolation	All services are private in the VPC. Only web-proxy is ALB-facing. TLS terminated at ALB (ECS) or at the proxy (Compose).
Generated-Code Isolation	When a product enables generated-code execution, Docker split mode runs untrusted generated code in a separate networkless executor container. Tool calls cross an authenticated supervisor channel; descriptor payloads, bundle storage, secret-provider access, and networked tools stay on the supervisor side.

Enforcement Model

KDCube's runtime enforcement is pre-execution. Budget, rate, and tenant checks happen before the LLM call is charged and before any external system receives a request. Every agent request passes through a fixed 6-step gate sequence:

Session Resolution — authenticate and resolve user/session context.
Rate Limit — enforce per-session and per-user-type limits.
Backpressure — evaluate queue/capacity admission thresholds.
Circuit Breaker — fail-fast on unhealthy system states.
Economics / Quota — budget reservation and commit gates before execution.
Atomic Enqueue — admit or reject at the enqueue boundary with an auditable decision event.

This model differs fundamentally from output filtering (which operates after the LLM responds). Budget, rate, and isolation controls prevent wrong actions from executing — not just detect them after the fact. Generated-code isolation is applied when the bundle/product uses the execution runtime; products that only use deterministic service tools or bundle APIs do not need that path. See Security → Enforcement Model for the full sequence diagram and compliance mapping.

Gateway & Admission Control

The gateway is the admission-control layer that sits in front of every chat and REST request. It enforces capacity, throttling, and health checks before any work is enqueued — tying directly into the 6-step enforcement model above (steps 2-6 all run inside the gateway).

Capacity Configuration

Gateway capacity is driven by GATEWAY_CONFIG_JSON and is component-aware — ingress and processor each declare their own slice:

Parameter	Scope	Purpose
`concurrent_requests_per_process`	Per process	How many tasks a single Uvicorn worker can handle simultaneously.
`processes_per_instance`	Per service instance	Drives the Uvicorn worker count (ingress) or processor worker count (proc). Changes require a restart.
`avg_processing_time_seconds`	Per component	Fallback estimate used by backpressure calculations when live metrics are unavailable.

Configuration is stored in Redis per tenant/project. Updates published via POST /admin/gateway/update-config propagate to all replicas at runtime through a tenant-scoped pub/sub channel.

Throttling Rules

Rate limits are per session, keyed by session_id and scoped to the tenant/project in Redis. Each user type has independent burst and hourly caps:

User Type	Burst (per window)	Hourly
anonymous	10 / 60 s	120
registered	100 / 60 s	2 000
paid	150 / 60 s	4 000
privileged	300 / 60 s	unlimited

Rate-limited requests receive a 429. Limits are independently configurable per component (ingress vs proc).

Backpressure Handling

Backpressure is per tenant/project (not per session). It evaluates the global queue depth plus healthy processor capacity derived from heartbeat data. Key thresholds include:

Pressure thresholds — per user type (e.g., anonymous at 60 %, paid at 90 %). When queue pressure exceeds the threshold for a user type, new requests from that tier are rejected with 503.
Hard limit — a system-wide ceiling (default 95-98 %) above which all requests are rejected regardless of user type.
Capacity buffer — a headroom percentage reserved so the system never saturates completely.

Backpressure is checked in two places: once in the gateway (BackpressureManager.check_capacity) and again at atomic enqueue (AtomicChatQueueManager.enqueue_chat_task_atomic). The double check prevents race conditions where pressure changes between gate and enqueue.

Circuit Breaker

The circuit breaker provides fail-fast behavior when the system is in an unhealthy state. If upstream dependencies (Redis, processor heartbeats) are degraded, the circuit opens and incoming requests are rejected immediately rather than queued into a system that cannot process them. This is step 4 in the enforcement model.

Guarded vs Bypass REST Patterns

By default, REST endpoints are session-only — they authenticate but skip throttling and backpressure. Heavy endpoints can be opted into full gateway protection via two regex lists in GATEWAY_CONFIG_JSON:

guarded_rest_patterns — endpoints that receive the same rate-limit and backpressure checks as chat ingestion (e.g., /resources/link-preview, /integrations/.../operations/...).
bypass_throttling_patterns — endpoints that should skip rate limiting entirely (e.g., ^/webhooks/stripe$).

Patterns are regex-based with optional suffix matching, so short patterns work across different API base prefixes. Both lists are component-aware (ingress vs proc).

Relationship to the 6-Step Enforcement Model

The gateway is the runtime implementation of steps 2 through 6. After session resolution (step 1), the gateway runs rate limiting, backpressure evaluation, circuit-breaker check, economics/quota reservation, and atomic enqueue in sequence. A failure at any step short-circuits the pipeline and returns an appropriate HTTP status (401, 403, 429, or 503) — no LLM call is made and no downstream side effect occurs.