Ship AI copilots to your customers.
Without losing control over cost, data, or compliance.

Other tools let you build agents. They leave policy enforcement, per-customer spend control, and tenant isolation as your problem. KDCube solves those before execution — so you can ship AI products your enterprise customers will actually approve.

Implementation depth lives in Docs. Threat model and enforcement details live in Security.

How components fit together #

Every request goes through admission, enforcement, execution, and audit layers. This page summarizes business value; full mechanics are in docs architecture.

1. Agent / Copilot

Any framework or custom Python. Sends a message or tool call into KDCube.

2. Gateway / Admission

Resolves session and applies rate, backpressure, circuit-breaker, and quota checks before atomic enqueue.

3. Semantic Runtime

Routes admitted requests into orchestration, streaming, and context lifecycle handling.

4. Execution Firewall

Applies pre-execution enforcement for budget/rate/tenant boundaries and supervisor-routed tool dispatch. Semantic per-action policy DSL remains roadmap.

5. Sandbox Execution

Subprocess-isolated executor in Docker ISO mode. Read-only root filesystem, no network access, no secret env passthrough. Writes only to workdir and outdir.

6. External Systems

DBs, APIs, LLMs. Reached only after Firewall allow.

Decision Log — Every allow/deny decision recorded: timestamp, agent, tenant, action. Runs alongside every layer. Self-hosted on your infrastructure.

Runtime components #

Each component addresses one business risk category: unsafe actions, overspend, cross-tenant leakage, or weak auditability.

Semantic Runtime #

Orchestration layer: request routing, streaming, and per-tenant context lifecycle.

Execution Firewall #

Works with Gateway / Admission to enforce session/rate/backpressure/circuit-breaker/economics checks before execution, then routes tool dispatch through the trusted supervisor boundary.

Sandbox Execution #

Subprocess-isolated executor in Docker ISO mode — read-only root FS, no network access, no secret env passthrough, non-root UID (1001). Workdir and outdir only.

Budget & Cost Controls #

Ledger-backed reservation and commit accounting across user/project/org scopes with hard caps enforced at admission — not post-hoc reporting.

Provenance & Citations #

Per-turn source pool with stable SIDs, citation-token linking in outputs, and artifact-level sources_used metadata for auditable claim tracing.

Audit Trail #

Timestamped log of every decision: agent, tenant, action, constraint, outcome. Self-hosted on your infrastructure.

Auth & Access Control #

Multi-provider auth: AWS Cognito (production) and delegated auth via proxy login service. Four role tiers — anonymous, registered, paid, privileged — drive rate limits and admission policy at runtime.

Economics controls #

Cost control is enforced before execution, not only observed in dashboards after incidents.

Layer 1: reserve then commit #

  • Budget is checked at admission time
  • Spend is finalized after execution
  • Unused reservation is released
  • Controls apply at user, project, and org scope

Layer 2: per-turn guardrails #

  • Caps loop depth and high-cost retries
  • Constrains tool attempts and token growth
  • Blocks pathological turns early
  • Produces explicit block events for audit

Why this matters: enterprise teams need both monthly budget governance and turn-level abuse containment. See detailed accounting in docs economics.

Provenance chain as a runtime component #

KDCube keeps response lineage auditable for compliance and internal review.

  • Sources are normalized per turn
  • Stable source IDs support traceable citations
  • Artifacts retain sources_used linkage
  • Timeline enables source-to-claim reconstruction

Full provenance internals: docs streaming/events.

Feedback & Quality Signals #

KDCube captures structured quality signals on every turn — from direct user reactions and automated Gate Agent assessments — stored as first-class timeline artifacts.

User Feedback API #

  • Thumbs-up, thumbs-down, flag, and custom reaction labels
  • POST /conversations/{t}/{p}/{cid}/turns/{tid}/feedback
  • Add, replace, or clear reactions per turn
  • Stored as artifact:turn.log.reaction with full provenance

Machine Feedback (Gate Agent) #

  • Automated quality assessment post-generation
  • Automated quality assessment runs post-generation
  • Writes origin: machine reaction — no user action required
  • Reaction counts queryable by origin and type for analytics

Full feedback architecture: docs feedback system.

Bundles & Extensibility #

Bundles are the primary extensibility unit — hot-loadable workflow plugins that run inside the platform's execution boundary. Each bundle can include local tools, MCP servers, and skills.

Bundle system #

  • Hot-loadable workflow plugins (no platform restart needed)
  • Dynamic bundle UIs — bundles ship their own React SPA interface, authorization-guarded and embedded in the platform UI; each bundle can expose a fully custom frontend experience
  • Bundles deployed per-tenant or per-project via AGENTIC_BUNDLES_ROOT
  • Each bundle is a directory of Python + optional frontend assets

No tool-calling lock-in #

  • Bring your own tools or use the SDK — no forced framework
  • Tool IDs: <alias>.<name> local, mcp.<alias>.<name> for MCP-sourced
  • Memory tools: react.read, react.hide, react.memsearch, react.patch
  • Source pools: Perplexity-style citation traceability built in

MCP Integration #

  • Connect any MCP server via stdio, http, sse, or streamable-http transports
  • Configure via MCP_TOOL_SPECS + MCP_SERVICES env vars
  • Auth: bearer, api_key, header — with named secrets support
  • Named secrets keep external credentials out of bundle config

Skills Subsystem #

  • Reusable agent behaviors defined in SKILL.md files (SKILL.md-compatible)
  • Namespaced as public, internal, or custom
  • Per-consumer filtering via agents_config wildcard patterns controls which skills appear in each agent's catalog
  • Composable across bundles; reusable across agents

Rendering Tools #

  • Built-in rendering_tools family generates PDF, DOCX, PPTX, PNG, and HTML artifacts
  • Agents produce structured documents as first-class outputs
  • Artifacts integrate with the timeline and trigger downloadable-artifact UI events
  • No external rendering dependency required

Web Search & Fetch #

  • web_tools.web_search and web_tools.web_fetch with model-assisted content filtering
  • URL deduplication and result segmentation built in
  • Results injected into the per-conversation source pool for cross-turn provenance
  • Traceable citations link fetched content to agent outputs

Technical extension model: docs extensibility.

Autoscaling signals & observability #

Operational telemetry supports scaling decisions, incident response, and governance reviews.

Metrics service #

  • Dedicated metrics service aggregates SSE connection counts and REST latency
  • Latency percentiles: p50, p95, p99 per endpoint
  • Queue depth and pool utilization tracked in real time
  • Pre-built signals for autoscaling and ops dashboards

Latency & admission signals #

  • Queue pressure and admission telemetry expose runtime health
  • Latency trends support autoscaling thresholds
  • Circuit-breaker transitions show fail-fast/recovery state

Draining & processor health #

  • Ingress and processor capacity are tracked independently
  • Heartbeat and queue depth drive scaling actions
  • Rate-limit windows visible for control loops

Endpoint-level metrics references are documented in docs deployment.

Drop-in integration #

No rewrite required. Route existing agent logic through KDCube runtime controls.

REST API #

Synchronous request/response integration path.

✅ Available

SSE #

Token streaming over standard HTTP.

✅ Available

Socket.IO #

Full-duplex real-time integration path.

✅ Available

Self-hosted: Agent traffic stays on your infrastructure. You control which LLMs, databases, and APIs agents can reach.

Endpoint note: Runtime ingress endpoints are transport-scoped (/sse/chat, /sse/stream, /socket.io/chat). If you expose additional app-specific routes, document them separately in your integration layer.

Protocol and endpoint specifics are maintained in docs frontend integration.

Not a framework replacement. LangGraph, LangChain, CrewAI, AutoGen, or custom Python — route through KDCube without modification.

Execution Runtime #

Agent code runs in an isolated executor. Privileged operations are mediated through a supervisor boundary before execution continues.

ReAct v2: timeline-native decision loop #

Every turn event is captured as a typed block on a conversation timeline — the single source of truth for memory, artifacts, and future reasoning. Enables reproducible, auditable, cache-efficient execution.

  • Single-agent decision loop grounded in the ordered event timeline
  • All blocks typed: messages, tool calls, artifacts, errors, usage
  • Timeline supports state recovery, replayability, and cache reuse

Multi-channel streaming #

A tag-based protocol routes a single LLM stream into named logical channels in one LLM call.

  • Channels: thinking, answer, follow-up suggestions, canvas (live JSON), usage sidecar
  • No multi-call overhead — one stream, multiple named outputs
  • Canvas channel delivers live structured JSON for UI rendering

Per-turn protection: additional turn-level controls can cap loop depth and reduce pathological retries. Full runtime mechanics are documented in docs execution model.

Agent Request tool call · code execution EXECUTOR isolated container Program code runs here No network access No environment variables Workdir filesystem only ✗ Cannot call tools directly SUPERVISOR trusted tool layer All tool dispatch routes here Socket-proxy interception Policy enforced before exec Limited internet by policy ✓ Returns string / bytes / struct socket proxy → ← result Artifact Output timeline entry · downloadable file

Executor (isolated) #

  • Runs agent-generated code
  • No network, no env vars
  • Filesystem: workdir only
  • Cannot call tools directly

Supervisor (trusted tools) #

  • All tool dispatch routes here
  • Socket-proxy interception, policy enforced
  • Internet access limited by policy
  • Returns string, bytes, or structured output

Guarantee: The executor never calls tools directly. Every tool call is intercepted and proxied through the supervisor.

Context Lifecycle & Timeline #

Ordered event timeline with shifting cache points, flagged-not-deleted history, and on-demand artifact re-fetch — keeps context windows lean at scale.

TIMELINE EVENT 1 msg · tool EVENT 2 msg · tool ◀ CACHE preserved ctx shifts forward EVENT 3 msg · tool HIDDEN flagged · not deleted EVENT N current Cache Point Marks preserved context window Shifts forward as conversation grows Flagged Item Not deleted — replaced with summary Re-fetchable on demand time Event Cache point Flagged (hidden) Latest event

Timeline #

Ordered event sequence. Turn snapshots versioned per turn; state is recoverable and replayable.

Cache points #

Shift forward as conversation grows; preserve context efficiently.

Hidden items #

Flagged, not deleted. Replaced with summary text; re-fetchable on demand.

Lazy fetching: Fetch artifacts by name on demand, or pre-pull via scan. Both keep the active context window lean.

Execution Outputs #

Two reporting modes. Both produce canonical artifact shapes that integrate with the timeline.

Contract-based #

  • Declare expected outputs upfront (path, MIME, description)
  • Output verified against contract
  • Missing or mismatched outputs are errors

Side-effect-based #

  • All created/updated files reported automatically
  • Text → string; binary → base64
  • Runtime vs. program errors reported separately

Artifact shape #

Every artifact: name, MIME, size, description, readable payload. Integrates with the timeline; triggers downloadable-artifact UI events.

Runtime Internals #

Tool interception, error separation, and artifact reporting — the three enforcement mechanics.

Tool interception & proxying #

All tool calls from agent code intercepted via socket proxy. The Executor never calls tools directly — the Supervisor enforces policy before forwarding, unconditionally.

Error separation #

  • Runtime errors — crash, timeout, resource exhaustion
  • Program errors — exceptions in agent-generated code with traceback

Both surfaced in structured format; agents distinguish infrastructure from code-level failures.

Artifact reporting #

Canonical shape: name, MIME, size, description, readable payload. Integrates with timeline; triggers downloadable-artifact UI events.

Isolation scope: The Executor has zero outbound connections. Network tool calls are permitted only through the Supervisor, explicitly allowed by policy. The Supervisor operates with configurable permissions.

Build AI that doesn't break trust

Deploy runtime controls in under an hour. Review the code, run it in your environment, and evaluate the enforcement layer directly.

MIT Licensed · Self-Hosted · Open Source