Security & Governance
For the full security architecture, threat model, and compliance mapping, see Security & Governance.
Auth Flow
Security Features
| Feature | Description |
|---|---|
| Content Filtering | ClamAV scans file uploads. event_filter.py in your bundle controls which tool events are visible to clients — prevent sensitive tool results from leaking. |
| Bundle Firewall | Per-bundle tool access rules. Restricts what tools a bundle can call at the platform level — independent of bundle code. |
| Gateway Controls | Per-endpoint throttling (guarded_rest_patterns), bypass patterns, backpressure, and circuit breakers. Configured in gateway.yaml. |
| Network Isolation | All services are private in the VPC. Only web-proxy is ALB-facing. TLS terminated at ALB (ECS) or at the proxy (Compose). |
Enforcement Model
KDCube's runtime enforcement is pre-execution. Budget, rate, and tenant checks happen before the LLM call is charged and before any external system receives a request. Every agent request passes through a fixed 6-step gate sequence:
- Session Resolution — authenticate and resolve user/session context.
- Rate Limit — enforce per-session and per-user-type limits.
- Backpressure — evaluate queue/capacity admission thresholds.
- Circuit Breaker — fail-fast on unhealthy system states.
- Economics / Quota — budget reservation and commit gates before execution.
- Atomic Enqueue — admit or reject at the enqueue boundary with an auditable decision event.
This model differs fundamentally from output filtering (which operates after the LLM responds). Budget, rate, and isolation controls prevent wrong actions from executing — not just detect them after the fact. See Security → Enforcement Model for the full sequence diagram and compliance mapping.
Gateway & Admission Control
The gateway is the admission-control layer that sits in front of every chat and REST request. It enforces capacity, throttling, and health checks before any work is enqueued — tying directly into the 6-step enforcement model above (steps 2-6 all run inside the gateway).
Capacity Configuration
Gateway capacity is driven by GATEWAY_CONFIG_JSON and is component-aware — ingress and processor each declare their own slice:
| Parameter | Scope | Purpose |
|---|---|---|
concurrent_requests_per_process | Per process | How many tasks a single Uvicorn worker can handle simultaneously. |
processes_per_instance | Per service instance | Drives the Uvicorn worker count (ingress) or processor worker count (proc). Changes require a restart. |
avg_processing_time_seconds | Per component | Fallback estimate used by backpressure calculations when live metrics are unavailable. |
Configuration is stored in Redis per tenant/project. Updates published via POST /admin/gateway/update-config propagate to all replicas at runtime through a tenant-scoped pub/sub channel.
Throttling Rules
Rate limits are per session, keyed by session_id and scoped to the tenant/project in Redis. Each user type has independent burst and hourly caps:
| User Type | Burst (per window) | Hourly |
|---|---|---|
| anonymous | 10 / 60 s | 120 |
| registered | 100 / 60 s | 2 000 |
| paid | 150 / 60 s | 4 000 |
| privileged | 300 / 60 s | unlimited |
Rate-limited requests receive a 429. Limits are independently configurable per component (ingress vs proc).
Backpressure Handling
Backpressure is per tenant/project (not per session). It evaluates the global queue depth plus healthy processor capacity derived from heartbeat data. Key thresholds include:
- Pressure thresholds — per user type (e.g., anonymous at 60 %, paid at 90 %). When queue pressure exceeds the threshold for a user type, new requests from that tier are rejected with 503.
- Hard limit — a system-wide ceiling (default 95-98 %) above which all requests are rejected regardless of user type.
- Capacity buffer — a headroom percentage reserved so the system never saturates completely.
Backpressure is checked in two places: once in the gateway (BackpressureManager.check_capacity) and again at atomic enqueue (AtomicChatQueueManager.enqueue_chat_task_atomic). The double check prevents race conditions where pressure changes between gate and enqueue.
Circuit Breaker
The circuit breaker provides fail-fast behavior when the system is in an unhealthy state. If upstream dependencies (Redis, processor heartbeats) are degraded, the circuit opens and incoming requests are rejected immediately rather than queued into a system that cannot process them. This is step 4 in the enforcement model.
Guarded vs Bypass REST Patterns
By default, REST endpoints are session-only — they authenticate but skip throttling and backpressure. Heavy endpoints can be opted into full gateway protection via two regex lists in GATEWAY_CONFIG_JSON:
guarded_rest_patterns— endpoints that receive the same rate-limit and backpressure checks as chat ingestion (e.g.,/resources/link-preview,/integrations/.../operations/...).bypass_throttling_patterns— endpoints that should skip rate limiting entirely (e.g.,^/webhooks/stripe$).
Patterns are regex-based with optional suffix matching, so short patterns work across different API base prefixes. Both lists are component-aware (ingress vs proc).
Relationship to the 6-Step Enforcement Model
The gateway is the runtime implementation of steps 2 through 6. After session resolution (step 1), the gateway runs rate limiting, backpressure evaluation, circuit-breaker check, economics/quota reservation, and atomic enqueue in sequence. A failure at any step short-circuits the pipeline and returns an appropriate HTTP status (401, 403, 429, or 503) — no LLM call is made and no downstream side effect occurs.