Ship AI copilots to your customers.
Without losing control over cost, data, or compliance.
Other tools let you build agents. They leave policy enforcement, per-customer spend control, and tenant isolation as your problem. KDCube solves those before execution — so you can ship AI products your enterprise customers will actually approve.
Implementation depth lives in Docs. Threat model and enforcement details live in Security.
How components fit together #
Every request goes through admission, enforcement, execution, and audit layers. This page summarizes business value; full mechanics are in docs architecture.
1. Agent / Copilot
Any framework or custom Python. Sends a message or tool call into KDCube.
2. Gateway / Admission
Resolves session and applies rate, backpressure, circuit-breaker, and quota checks before atomic enqueue.
3. Semantic Runtime
Routes admitted requests into orchestration, streaming, and context lifecycle handling.
4. Execution Firewall
Applies pre-execution enforcement for budget/rate/tenant boundaries and supervisor-routed tool dispatch. Semantic per-action policy DSL remains roadmap.
5. Sandbox Execution
Subprocess-isolated executor in Docker ISO mode. Read-only root filesystem, no network access, no secret env passthrough. Writes only to workdir and outdir.
6. External Systems
DBs, APIs, LLMs. Reached only after Firewall allow.
Decision Log — Every allow/deny decision recorded: timestamp, agent, tenant, action. Runs alongside every layer. Self-hosted on your infrastructure.
Runtime components #
Each component addresses one business risk category: unsafe actions, overspend, cross-tenant leakage, or weak auditability.
Semantic Runtime #
Orchestration layer: request routing, streaming, and per-tenant context lifecycle.
Execution Firewall #
Works with Gateway / Admission to enforce session/rate/backpressure/circuit-breaker/economics checks before execution, then routes tool dispatch through the trusted supervisor boundary.
Sandbox Execution #
Subprocess-isolated executor in Docker ISO mode — read-only root FS, no network access, no secret env passthrough, non-root UID (1001). Workdir and outdir only.
Budget & Cost Controls #
Ledger-backed reservation and commit accounting across user/project/org scopes with hard caps enforced at admission — not post-hoc reporting.
Provenance & Citations #
Per-turn source pool with stable SIDs, citation-token linking in outputs, and artifact-level sources_used metadata for auditable claim tracing.
Audit Trail #
Timestamped log of every decision: agent, tenant, action, constraint, outcome. Self-hosted on your infrastructure.
Auth & Access Control #
Multi-provider auth: AWS Cognito (production) and delegated auth via proxy login service. Four role tiers — anonymous, registered, paid, privileged — drive rate limits and admission policy at runtime.
Economics controls #
Cost control is enforced before execution, not only observed in dashboards after incidents.
Layer 1: reserve then commit #
- Budget is checked at admission time
- Spend is finalized after execution
- Unused reservation is released
- Controls apply at user, project, and org scope
Layer 2: per-turn guardrails #
- Caps loop depth and high-cost retries
- Constrains tool attempts and token growth
- Blocks pathological turns early
- Produces explicit block events for audit
Why this matters: enterprise teams need both monthly budget governance and turn-level abuse containment. See detailed accounting in docs economics.
Provenance chain as a runtime component #
KDCube keeps response lineage auditable for compliance and internal review.
- Sources are normalized per turn
- Stable source IDs support traceable citations
- Artifacts retain
sources_usedlinkage - Timeline enables source-to-claim reconstruction
Full provenance internals: docs streaming/events.
Feedback & Quality Signals #
KDCube captures structured quality signals on every turn — from direct user reactions and automated Gate Agent assessments — stored as first-class timeline artifacts.
User Feedback API #
- Thumbs-up, thumbs-down, flag, and custom reaction labels
POST /conversations/{t}/{p}/{cid}/turns/{tid}/feedback- Add, replace, or clear reactions per turn
- Stored as
artifact:turn.log.reactionwith full provenance
Machine Feedback (Gate Agent) #
- Automated quality assessment post-generation
- Automated quality assessment runs post-generation
- Writes
origin: machinereaction — no user action required - Reaction counts queryable by origin and type for analytics
Full feedback architecture: docs feedback system.
Bundles & Extensibility #
Bundles are the primary extensibility unit — hot-loadable workflow plugins that run inside the platform's execution boundary. Each bundle can include local tools, MCP servers, and skills.
Bundle system #
- Hot-loadable workflow plugins (no platform restart needed)
- Dynamic bundle UIs — bundles ship their own React SPA interface, authorization-guarded and embedded in the platform UI; each bundle can expose a fully custom frontend experience
- Bundles deployed per-tenant or per-project via
AGENTIC_BUNDLES_ROOT - Each bundle is a directory of Python + optional frontend assets
No tool-calling lock-in #
- Bring your own tools or use the SDK — no forced framework
- Tool IDs:
<alias>.<name>local,mcp.<alias>.<name>for MCP-sourced - Memory tools:
react.read,react.hide,react.memsearch,react.patch - Source pools: Perplexity-style citation traceability built in
MCP Integration #
- Connect any MCP server via
stdio,http,sse, orstreamable-httptransports - Configure via
MCP_TOOL_SPECS+MCP_SERVICESenv vars - Auth:
bearer,api_key,header— with named secrets support - Named secrets keep external credentials out of bundle config
Skills Subsystem #
- Reusable agent behaviors defined in
SKILL.mdfiles (SKILL.md-compatible) - Namespaced as
public,internal, orcustom - Per-consumer filtering via
agents_configwildcard patterns controls which skills appear in each agent's catalog - Composable across bundles; reusable across agents
Rendering Tools #
- Built-in
rendering_toolsfamily generates PDF, DOCX, PPTX, PNG, and HTML artifacts - Agents produce structured documents as first-class outputs
- Artifacts integrate with the timeline and trigger downloadable-artifact UI events
- No external rendering dependency required
Web Search & Fetch #
web_tools.web_searchandweb_tools.web_fetchwith model-assisted content filtering- URL deduplication and result segmentation built in
- Results injected into the per-conversation source pool for cross-turn provenance
- Traceable citations link fetched content to agent outputs
Technical extension model: docs extensibility.
Autoscaling signals & observability #
Operational telemetry supports scaling decisions, incident response, and governance reviews.
Metrics service #
- Dedicated metrics service aggregates SSE connection counts and REST latency
- Latency percentiles: p50, p95, p99 per endpoint
- Queue depth and pool utilization tracked in real time
- Pre-built signals for autoscaling and ops dashboards
Latency & admission signals #
- Queue pressure and admission telemetry expose runtime health
- Latency trends support autoscaling thresholds
- Circuit-breaker transitions show fail-fast/recovery state
Draining & processor health #
- Ingress and processor capacity are tracked independently
- Heartbeat and queue depth drive scaling actions
- Rate-limit windows visible for control loops
Endpoint-level metrics references are documented in docs deployment.
Drop-in integration #
No rewrite required. Route existing agent logic through KDCube runtime controls.
Self-hosted: Agent traffic stays on your infrastructure. You control which LLMs, databases, and APIs agents can reach.
Endpoint note: Runtime ingress endpoints are transport-scoped (/sse/chat, /sse/stream, /socket.io/chat). If you expose additional app-specific routes, document them separately in your integration layer.
Protocol and endpoint specifics are maintained in docs frontend integration.
Not a framework replacement. LangGraph, LangChain, CrewAI, AutoGen, or custom Python — route through KDCube without modification.
Execution Runtime #
Agent code runs in an isolated executor. Privileged operations are mediated through a supervisor boundary before execution continues.
ReAct v2: timeline-native decision loop #
Every turn event is captured as a typed block on a conversation timeline — the single source of truth for memory, artifacts, and future reasoning. Enables reproducible, auditable, cache-efficient execution.
- Single-agent decision loop grounded in the ordered event timeline
- All blocks typed: messages, tool calls, artifacts, errors, usage
- Timeline supports state recovery, replayability, and cache reuse
Multi-channel streaming #
A tag-based protocol routes a single LLM stream into named logical channels in one LLM call.
- Channels: thinking, answer, follow-up suggestions, canvas (live JSON), usage sidecar
- No multi-call overhead — one stream, multiple named outputs
- Canvas channel delivers live structured JSON for UI rendering
Per-turn protection: additional turn-level controls can cap loop depth and reduce pathological retries. Full runtime mechanics are documented in docs execution model.
Executor (isolated) #
- Runs agent-generated code
- No network, no env vars
- Filesystem: workdir only
- Cannot call tools directly
Supervisor (trusted tools) #
- All tool dispatch routes here
- Socket-proxy interception, policy enforced
- Internet access limited by policy
- Returns string, bytes, or structured output
Guarantee: The executor never calls tools directly. Every tool call is intercepted and proxied through the supervisor.
Context Lifecycle & Timeline #
Ordered event timeline with shifting cache points, flagged-not-deleted history, and on-demand artifact re-fetch — keeps context windows lean at scale.
Timeline #
Ordered event sequence. Turn snapshots versioned per turn; state is recoverable and replayable.
Cache points #
Shift forward as conversation grows; preserve context efficiently.
Hidden items #
Flagged, not deleted. Replaced with summary text; re-fetchable on demand.
Lazy fetching: Fetch artifacts by name on demand, or pre-pull via scan. Both keep the active context window lean.
Execution Outputs #
Two reporting modes. Both produce canonical artifact shapes that integrate with the timeline.
Contract-based #
- Declare expected outputs upfront (path, MIME, description)
- Output verified against contract
- Missing or mismatched outputs are errors
Side-effect-based #
- All created/updated files reported automatically
- Text → string; binary → base64
- Runtime vs. program errors reported separately
Artifact shape #
Every artifact: name, MIME, size, description, readable payload. Integrates with the timeline; triggers downloadable-artifact UI events.
Runtime Internals #
Tool interception, error separation, and artifact reporting — the three enforcement mechanics.
Tool interception & proxying #
All tool calls from agent code intercepted via socket proxy. The Executor never calls tools directly — the Supervisor enforces policy before forwarding, unconditionally.
Error separation #
- Runtime errors — crash, timeout, resource exhaustion
- Program errors — exceptions in agent-generated code with traceback
Both surfaced in structured format; agents distinguish infrastructure from code-level failures.
Artifact reporting #
Canonical shape: name, MIME, size, description, readable payload. Integrates with timeline; triggers downloadable-artifact UI events.
Isolation scope: The Executor has zero outbound connections. Network tool calls are permitted only through the Supervisor, explicitly allowed by policy. The Supervisor operates with configurable permissions.
Build AI that doesn't break trust
Deploy runtime controls in under an hour. Review the code, run it in your environment, and evaluate the enforcement layer directly.