Ship AI copilots to your customers.
Without losing control over cost, data, or compliance.

Other tools let you build agents. They leave policy enforcement, per-customer spend control, and tenant isolation as your problem. KDCube solves those before execution — so you can ship AI products your enterprise customers will actually approve.

Implementation depth lives in Docs. Threat model and enforcement details live in Security.

Read the Docs View on GitHub

How components fit together #

Every request goes through admission, enforcement, execution, and audit layers. This page summarizes business value; full mechanics are in docs architecture.

1. Agent / Copilot

Any framework or custom Python. Sends a message or tool call into KDCube.

→

2. Gateway / Admission

Resolves session and applies rate, backpressure, circuit-breaker, and quota checks before atomic enqueue.

→

3. Semantic Runtime

Routes admitted requests into orchestration, streaming, and context lifecycle handling.

→

4. Execution Firewall

Applies pre-execution enforcement for budget/rate/tenant boundaries and supervisor-routed tool dispatch. Semantic per-action policy DSL remains roadmap.

→

5. Sandbox Execution

Subprocess-isolated executor in Docker ISO mode. Read-only root filesystem, no network access, no secret env passthrough. Writes only to workdir and outdir.

→

6. External Systems

DBs, APIs, LLMs. Reached only after Firewall allow.

Decision Log — Every allow/deny decision recorded: timestamp, agent, tenant, action. Runs alongside every layer. Self-hosted on your infrastructure.

Runtime components #

Each component addresses one business risk category: unsafe actions, overspend, cross-tenant leakage, or weak auditability.

Semantic Runtime #

Orchestration layer: request routing, streaming, and per-tenant context lifecycle.

Execution Firewall #

Works with Gateway / Admission to enforce session/rate/backpressure/circuit-breaker/economics checks before execution, then routes tool dispatch through the trusted supervisor boundary.

Sandbox Execution #

Subprocess-isolated executor in Docker ISO mode — read-only root FS, no network access, no secret env passthrough, non-root UID (1001). Workdir and outdir only.

Budget & Cost Controls #

Ledger-backed reservation and commit accounting across user/project/org scopes with hard caps enforced at admission — not post-hoc reporting.

Provenance & Citations #

Per-turn source pool with stable SIDs, citation-token linking in outputs, and artifact-level sources_used metadata for auditable claim tracing.

Audit Trail #

Timestamped log of every decision: agent, tenant, action, constraint, outcome. Self-hosted on your infrastructure.

Auth & Access Control #

Multi-provider auth: AWS Cognito (production) and delegated auth via proxy login service. Four role tiers — anonymous, registered, paid, privileged — drive rate limits and admission policy at runtime.

Economics controls #

Cost control is enforced before execution, not only observed in dashboards after incidents.

Layer 1: reserve then commit #

Budget is checked at admission time
Spend is finalized after execution
Unused reservation is released
Controls apply at user, project, and org scope

Layer 2: per-turn guardrails #

Caps loop depth and high-cost retries
Constrains tool attempts and token growth
Blocks pathological turns early
Produces explicit block events for audit

Why this matters: enterprise teams need both monthly budget governance and turn-level abuse containment. See detailed accounting in docs economics.

Provenance chain as a runtime component #

KDCube keeps response lineage auditable for compliance and internal review.

Sources are normalized per turn
Stable source IDs support traceable citations
Artifacts retain sources_used linkage
Timeline enables source-to-claim reconstruction

Full provenance internals: docs streaming/events.

Feedback & Quality Signals #

KDCube captures structured quality signals on every turn — from direct user reactions and automated Gate Agent assessments — stored as first-class timeline artifacts.

User Feedback API #

Thumbs-up, thumbs-down, flag, and custom reaction labels
POST /conversations/{t}/{p}/{cid}/turns/{tid}/feedback
Add, replace, or clear reactions per turn
Stored as artifact:turn.log.reaction with full provenance

Machine Feedback (Gate Agent) #

Automated quality assessment post-generation
Automated quality assessment runs post-generation
Writes origin: machine reaction — no user action required
Reaction counts queryable by origin and type for analytics

Full feedback architecture: docs feedback system.

Bundles & Extensibility #

Bundles are the primary extensibility unit — hot-loadable workflow plugins that run inside the platform's execution boundary. Each bundle can include local tools, MCP servers, and skills.

Bundle system #

Hot-loadable workflow plugins (no platform restart needed)
Dynamic bundle UIs — bundles ship their own React SPA interface, authorization-guarded and embedded in the platform UI; each bundle can expose a fully custom frontend experience
Bundles deployed per-tenant or per-project via AGENTIC_BUNDLES_ROOT
Each bundle is a directory of Python + optional frontend assets

No tool-calling lock-in #

Bring your own tools or use the SDK — no forced framework
Tool IDs: <alias>.<name> local, mcp.<alias>.<name> for MCP-sourced
Memory tools: react.read, react.hide, react.memsearch, react.patch
Source pools: Perplexity-style citation traceability built in

MCP Integration #

Connect any MCP server via stdio, http, sse, or streamable-http transports
Configure via MCP_TOOL_SPECS + MCP_SERVICES env vars
Auth: bearer, api_key, header — with named secrets support
Named secrets keep external credentials out of bundle config

Skills Subsystem #

Reusable agent behaviors defined in SKILL.md files (SKILL.md-compatible)
Namespaced as public, internal, or custom
Per-consumer filtering via agents_config wildcard patterns controls which skills appear in each agent's catalog
Composable across bundles; reusable across agents

Rendering Tools #

Built-in rendering_tools family generates PDF, DOCX, PPTX, PNG, and HTML artifacts
Agents produce structured documents as first-class outputs
Artifacts integrate with the timeline and trigger downloadable-artifact UI events
No external rendering dependency required

Web Search & Fetch #

web_tools.web_search and web_tools.web_fetch with model-assisted content filtering
URL deduplication and result segmentation built in
Results injected into the per-conversation source pool for cross-turn provenance
Traceable citations link fetched content to agent outputs

Technical extension model: docs extensibility.

Autoscaling signals & observability #

Operational telemetry supports scaling decisions, incident response, and governance reviews.

Metrics service #

Dedicated metrics service aggregates SSE connection counts and REST latency
Latency percentiles: p50, p95, p99 per endpoint
Queue depth and pool utilization tracked in real time
Pre-built signals for autoscaling and ops dashboards

Latency & admission signals #

Queue pressure and admission telemetry expose runtime health
Latency trends support autoscaling thresholds
Circuit-breaker transitions show fail-fast/recovery state

Draining & processor health #

Ingress and processor capacity are tracked independently
Heartbeat and queue depth drive scaling actions
Rate-limit windows visible for control loops

Endpoint-level metrics references are documented in docs deployment.

Drop-in integration #

No rewrite required. Route existing agent logic through KDCube runtime controls.

REST API #

Synchronous request/response integration path.

✅ Available

SSE #

Token streaming over standard HTTP.

✅ Available

Socket.IO #

Full-duplex real-time integration path.

✅ Available

Self-hosted: Agent traffic stays on your infrastructure. You control which LLMs, databases, and APIs agents can reach.

Endpoint note: Runtime ingress endpoints are transport-scoped (/sse/chat, /sse/stream, /socket.io/chat). If you expose additional app-specific routes, document them separately in your integration layer.

Protocol and endpoint specifics are maintained in docs frontend integration.

Not a framework replacement. LangGraph, LangChain, CrewAI, AutoGen, or custom Python — route through KDCube without modification.

Execution Runtime #

Agent code runs in an isolated executor. Privileged operations are mediated through a supervisor boundary before execution continues.

ReAct v2: timeline-native decision loop #

Every turn event is captured as a typed block on a conversation timeline — the single source of truth for memory, artifacts, and future reasoning. Enables reproducible, auditable, cache-efficient execution.

Single-agent decision loop grounded in the ordered event timeline
All blocks typed: messages, tool calls, artifacts, errors, usage
Timeline supports state recovery, replayability, and cache reuse

Multi-channel streaming #

A tag-based protocol routes a single LLM stream into named logical channels in one LLM call.

Channels: thinking, answer, follow-up suggestions, canvas (live JSON), usage sidecar
No multi-call overhead — one stream, multiple named outputs
Canvas channel delivers live structured JSON for UI rendering

Per-turn protection: additional turn-level controls can cap loop depth and reduce pathological retries. Full runtime mechanics are documented in docs execution model.

Executor (isolated) #

Runs agent-generated code
No network, no env vars
Filesystem: workdir only
Cannot call tools directly

Supervisor (trusted tools) #

All tool dispatch routes here
Socket-proxy interception, policy enforced
Internet access limited by policy
Returns string, bytes, or structured output

Guarantee: The executor never calls tools directly. Every tool call is intercepted and proxied through the supervisor.

Context Lifecycle & Timeline #

Ordered event timeline with shifting cache points, flagged-not-deleted history, and on-demand artifact re-fetch — keeps context windows lean at scale.

Timeline #

Ordered event sequence. Turn snapshots versioned per turn; state is recoverable and replayable.

Cache points #

Shift forward as conversation grows; preserve context efficiently.

Hidden items #

Flagged, not deleted. Replaced with summary text; re-fetchable on demand.

Lazy fetching: Fetch artifacts by name on demand, or pre-pull via scan. Both keep the active context window lean.

Execution Outputs #

Two reporting modes. Both produce canonical artifact shapes that integrate with the timeline.

Contract-based #

Declare expected outputs upfront (path, MIME, description)
Output verified against contract
Missing or mismatched outputs are errors

Side-effect-based #

All created/updated files reported automatically
Text → string; binary → base64
Runtime vs. program errors reported separately

Artifact shape #

Every artifact: name, MIME, size, description, readable payload. Integrates with the timeline; triggers downloadable-artifact UI events.

Runtime Internals #

Tool interception, error separation, and artifact reporting — the three enforcement mechanics.

Tool interception & proxying #

All tool calls from agent code intercepted via socket proxy. The Executor never calls tools directly — the Supervisor enforces policy before forwarding, unconditionally.

Error separation #

Runtime errors — crash, timeout, resource exhaustion
Program errors — exceptions in agent-generated code with traceback

Both surfaced in structured format; agents distinguish infrastructure from code-level failures.

Artifact reporting #

Canonical shape: name, MIME, size, description, readable payload. Integrates with timeline; triggers downloadable-artifact UI events.

Isolation scope: The Executor has zero outbound connections. Network tool calls are permitted only through the Supervisor, explicitly allowed by policy. The Supervisor operates with configurable permissions.

Build AI that doesn't break trust

Deploy runtime controls in under an hour. Review the code, run it in your environment, and evaluate the enforcement layer directly.

Get Started on GitHub Schedule demo

MIT Licensed · Self-Hosted · Open Source

Ship AI copilots to your customers.Without losing control over cost, data, or compliance.

How components fit together #

1. Agent / Copilot

2. Gateway / Admission

3. Semantic Runtime

4. Execution Firewall

5. Sandbox Execution

6. External Systems

Runtime components #

Semantic Runtime #

Execution Firewall #

Sandbox Execution #

Budget & Cost Controls #

Provenance & Citations #

Audit Trail #

Auth & Access Control #

Economics controls #

Layer 1: reserve then commit #

Layer 2: per-turn guardrails #

Provenance chain as a runtime component #

Feedback & Quality Signals #

User Feedback API #

Machine Feedback (Gate Agent) #

Bundles & Extensibility #

Bundle system #

No tool-calling lock-in #

MCP Integration #

Skills Subsystem #

Rendering Tools #

Web Search & Fetch #

Autoscaling signals & observability #

Metrics service #

Latency & admission signals #

Draining & processor health #

Drop-in integration #

REST API #

SSE #

Socket.IO #

Execution Runtime #

ReAct v2: timeline-native decision loop #

Multi-channel streaming #

Executor (isolated) #

Supervisor (trusted tools) #

Context Lifecycle & Timeline #

Timeline #

Cache points #

Hidden items #

Execution Outputs #

Contract-based #

Side-effect-based #

Artifact shape #

Runtime Internals #

Tool interception & proxying #

Error separation #

Artifact reporting #

Build AI that doesn't break trust

Ship AI copilots to your customers.
Without losing control over cost, data, or compliance.