KDCube Platform Documentation
Self-hosted, multi-tenant AI platform and SDK. Architecture, configuration, deployment, and SDK — from system to bundle.
Python 3.11+ · SSE Streaming · ReAct Agent · Multi-Tenant · Docker / ECS · OpenAI · Anthropic · Gemini
Concepts
KDCube AI App is a self-hosted, multi-tenant platform and SDK for building AI assistants and agentic applications. The platform has two core services: Ingress (API gateway, auth, SSE emitter) and Processor/Proc (bundle execution, queue worker, integrations API).
System at a Glance
Key Terms
| Term | Description |
|---|---|
| Bundle | Python package registered with @agentic_workflow. The unit of AI application deployment — contains agent logic, tools, skills, and UI widgets. |
| Ingress | API gateway service: auth, SSE streaming, task enqueueing, rate limiting. Handles all inbound traffic before the bundle sees it. |
| Processor / Proc | Queue worker: executes bundles, hosts the Operations REST API. Loads bundle singletons and calls execute_core() per turn. |
| Timeline | Rolling cross-conversation context persisted as artifact:conv.timeline.v1. Ordered blocks (oldest → newest) streamed as SSE events. Grows across turns; compacted when context budget is reached. The turn log is the per-turn portion appended each turn. |
| ReAct Agent | Autonomous loop (ReactStateV2, max 6 iterations). Plan is a tool — not a separate component. No coordinator needed. |
| Tenant / Project | Multi-tenancy units. All data, config, bundles, and budgets are scoped to tenant + project. Each tenant gets an isolated schema in PostgreSQL. |
Supported Providers
Quick Start
-
1
Install the KDCube CLI
# Recommended: pipx (isolated installation) pipx install kdcube-cli # Or with pip pip install kdcube-cliPrerequisites: Python 3.11+, Docker, Git.
-
2
Run the setup wizard
kdcube-setupThe wizard is fully interactive — you can enter everything manually, or point it at deployment descriptors and it applies them automatically:
assembly.yaml·gateway.yaml·bundles.yaml. Choose release-latest → answer yes to "Run docker compose now?" — no local build needed. Creates~/.kdcube/kdcube-runtime/, pulls images, starts Docker Compose. Secrets are injected into the kdcube-secrets service at setup time — never written to disk. -
3
Open the UI
http://localhost:${KDCUBE_UI_PORT}/chatbot/chatYou're now running a full KDCube stack with the built-in example bundles.
Setup Flow
Workdir Layout
~/.kdcube/kdcube-runtime/
├─ config/
│ ├─ .env # Base Docker Compose env
│ ├─ assembly.yaml # Platform version, auth type, domain
│ ├─ bundles.yaml # Bundle definitions ← edit this
│ ├─ gateway.yaml # Rate limits, circuit breaker
│ ├─ install-meta.json # CLI install metadata
│ ├─ frontend.config.*.json # UI auth config
│ ├─ nginx_proxy.conf # Nginx reverse proxy
│ └─ nginx_ui.conf # Nginx UI config
├─ data/
│ ├─ bundle-storage/ # Per-bundle persistent storage
│ ├─ bundles/ # Bundle source cache
│ ├─ exec-workspace/ # Code execution sandboxes
│ ├─ kdcube-storage/ # Conversations, artifacts, files
│ ├─ nginx/ # Nginx runtime data
│ ├─ postgres/ # Database volume
│ └─ redis/ # Redis persistence
└─ logs/ # Service logs
Secrets are never stored on disk — they are injected into kdcube-secrets at setup time and resolved in-memory at runtime.
Full CLI reference: kdcube_cli/README.md
Platform
Ingress (chat-ingress)
Handles all inbound traffic. Authenticates users, enforces rate limits, validates the bundle, enqueues the task, and opens an SSE stream back to the client. Your bundle rarely needs to know about this — the platform wires it up automatically.
Processor (chat-proc)
Dequeues tasks, loads your bundle singleton, and calls execute_core(). Also hosts the Operations API — the REST endpoint that UI widgets call directly (no SSE needed for widget interactions).
SSE Streaming Flow
Modules
Platform Architecture — Detail
Services
| Service | Port | Role | Required? |
|---|---|---|---|
| web-proxy | :443 / :80 | TLS termination, token unmasking, routing | Required |
| chat-ingress | :8010 | Auth, SSE/Socket.IO gateway, task enqueueing | Required |
| chat-proc | :8020 | Bundle execution, integrations REST API | Required |
| web-ui | :80 | SPA frontend | Required |
| kdcube-secrets | internal | In-memory secrets service (secrets never written to disk) | Required (local) |
| metrics | internal | Autoscaling metric export (CloudWatch) — not needed for single-node | Optional |
| proxylogin | internal | Delegated auth token exchange | Optional |
| clamav | internal | Antivirus scanning for file attachments | Optional |
| exec (on-demand) | — | Ephemeral Docker/Fargate container for isolated code execution | Optional |
Routing
| Path Pattern | Routes To |
|---|---|
/sse/*, /api/chat/*, /admin/* | chat-ingress |
/api/integrations/* | chat-proc |
/auth/* | proxylogin (delegated auth only) |
/* | web-ui |
Communication Subsystem
The platform communication subsystem is a three-layer pipeline that delivers async events from bundles to connected clients. It is transport-agnostic — the same bundle code streams to SSE, Socket.IO, or REST without changes.
Filter / Firewall
Each event can be intercepted and filtered before delivery. Bundles can define event filters to transform, block, or enrich events. The bundle firewall enforces per-tenant policies on what events flow to clients.
Broadcast / P2P Channels
The relay supports session-scoped pub/sub channels. Bundles can broadcast to all subscribers of a room, or send point-to-point to a specific target_sid. The same mechanism powers streaming from proc to multiple ingress instances.
Recorder
Events flowing through the pipeline can be recorded — useful for audit trails, replay, and observability. The recorder captures the envelope payload including session_id, event, and timestamp.
ChatCommunicator — Bundle Producer API
Every bundle workflow receives a ChatCommunicator instance. It wraps Redis Pub/Sub with a typed API for all outbound async events. The same instance works whether the bundle runs in-process or inside an isolated execution runtime (Docker / Fargate sandbox) — the iso executor routes events to the supervisor over a Unix socket, which then publishes to Redis. The client SSE stream is seamless in both cases.
# Standard streaming lifecycle
await communicator.start() # chat_start event
await communicator.step("Searching…") # chat_step — visible progress
await communicator.delta("answer", chunk) # chat_delta — streaming text
await communicator.complete() # chat_complete
# Delta markers: answer · thinking · canvas · timeline_text · subsystem
await communicator.delta("thinking", reasoning_chunk)
await communicator.delta("canvas", json_payload)
# Custom service event — broadcast to all session subscribers
await communicator.event(
event_type="chat.service",
data={"key": "status.update", "value": "processing"},
)
# P2P — deliver only to a specific connected client (target_sid)
await communicator.event(
event_type="chat.service",
data={"key": "private", "value": result},
target_sid=sid, # omit for broadcast
)
# Error
await communicator.error("Something went wrong")
Outbound Firewall (per bundle)
Bundles can attach an IEventFilter to the workflow to suppress or gate events before they reach Redis. The filter sees the caller's user_type, user_id, and full event metadata. Filters are fail-open: an exception allows the event through.
from kdcube_ai_app.apps.chat.sdk.comm import IEventFilter
class MyBundleFilter(IEventFilter):
def allow_event(self, user_type, user_id, event) -> bool:
# hide internal step events from non-privileged users
if user_type != "privileged":
if event.get("type") == "chat.step" and event.get("broadcast"):
return False
return True
# Wire at entrypoint — passed to workflow factory via event_filter param
workflow = MyWorkflow(..., event_filter=MyBundleFilter())
ChatCommunicator is fully available inside sandboxed execution (Docker / Fargate). Events published from the sandbox travel: exec process → Unix socket → supervisor → Redis Pub/Sub → SSEHub → client. No code change needed between in-process and iso modes.See comm-system.md and README-comm.md for full detail.
Configuration
Config Hierarchy
assembly.yaml
release_name: "prod-2026-03-21"
platform:
repo: "kdcube-ai-app"
ref: "v0.3.2"
auth:
type: "delegated" # simple | cognito | delegated
domain: "chat.example.com"
bundles.yaml
bundles:
version: "1"
default_bundle_id: "react@2026-02-10-02-44"
items:
- id: "react@2026-02-10-02-44"
repo: "git@github.com:org/repo.git"
ref: "bundle-v2026.03.21"
module: "react@2026-02-10-02-44.entrypoint"
config:
model_id: "gpt-4o-mini"
role_models:
solver.react.v2.decision.v2.strong:
provider: "anthropic"
model: "claude-sonnet-4-6"
Platform-Reserved Config Keys
| Key | Description |
|---|---|
role_models | Override LLM per role. Maps logical agent roles to concrete provider + model combinations. |
embedding | Embedding provider + model for RAG and vector search in knowledge spaces. |
economics.reservation_amount_dollars | Cost reserved before execution begins. Default 2.0 USD. Committed on completion, released on failure. |
execution.runtime | Code exec backend: docker (default, fast) or fargate (batch/async workloads only). |
mcp.services | MCP connector config per bundle. Matched by server_id in MCP_TOOL_SPECS. |
Deployment Options
ECS Topology
Platform Economics
The economics subsystem enforces rate limits, quotas, and funding lanes per user per request. Cost is reserved before execution and committed on success, released on failure.
Economics Flow
Key Concepts
| Concept | Description |
|---|---|
| Plan | Quota policy identity — defines requests/second, concurrency limits, and token quotas per class of users (anonymous, registered, paid, privileged). |
| Funding Lane | Subscription budget (monthly USD) · wallet credits · project budget. Determines which funding source covers the request. |
| Reservation | Cost reserved before execution begins. Committed on completion, automatically released on failure or EconomicsLimitException. |
Rate Limit Layers
-
1
Gateway layer
Requests/second throttling at Ingress. Configurable per tenant via
gateway.yaml. Enforces backpressure before tasks are enqueued. -
2
Economics layer
Per-user quotas (daily/monthly requests, concurrent sessions, hourly token budgets). Enforced by the economics subsystem at execution time.
BaseEntrypointWithEconomics for automatic enforcement. Always re-raise EconomicsLimitException unchanged.
Security & Governance
For the full security architecture, threat model, and compliance mapping, see Security & Governance.
Auth Flow
Security Features
| Feature | Description |
|---|---|
| Content Filtering | ClamAV scans file uploads. event_filter.py in your bundle controls which tool events are visible to clients — prevent sensitive tool results from leaking. |
| Bundle Firewall | Per-bundle tool access rules. Restricts what tools a bundle can call at the platform level — independent of bundle code. |
| Gateway Controls | Per-endpoint throttling (guarded_rest_patterns), bypass patterns, backpressure, and circuit breakers. Configured in gateway.yaml. |
| Network Isolation | All services are private in the VPC. Only web-proxy is ALB-facing. TLS terminated at ALB (ECS) or at the proxy (Compose). |
Enforcement Model
KDCube's runtime enforcement is pre-execution. Budget, rate, and tenant checks happen before the LLM call is charged and before any external system receives a request. Every agent request passes through a fixed 6-step gate sequence:
- Session Resolution — authenticate and resolve user/session context.
- Rate Limit — enforce per-session and per-user-type limits.
- Backpressure — evaluate queue/capacity admission thresholds.
- Circuit Breaker — fail-fast on unhealthy system states.
- Economics / Quota — budget reservation and commit gates before execution.
- Atomic Enqueue — admit or reject at the enqueue boundary with an auditable decision event.
This model differs fundamentally from output filtering (which operates after the LLM responds). Budget, rate, and isolation controls prevent wrong actions from executing — not just detect them after the fact. See Security → Enforcement Model for the full sequence diagram and compliance mapping.
What Is a Bundle?
A Bundle is your application package. It combines Python backend code with optional TypeScript UI widgets and views into a single deployable unit. The package becomes a bundle by subclassing the platform's base class (BaseEntrypoint or a derived variant) and registering with @agentic_workflow.
App sources live in git. The platform's assembly descriptor points to a git repo, ref, and subdirectory. On deploy, the CLI or CI pipeline pulls the ref and registers the bundle. You can update a running deployment by pushing a new ref and re-running the deploy step — no service restart required.
Most example bundles use the ReAct agent (ReactStateV2), but a bundle does not have to. You can use any LangGraph graph, a simple request→response pipeline, or no LLM at all.
Example Bundles
Start from any of the examples in sdk/examples/bundles/. The simplest starting point is react@2026-02-10-02-44 — a minimal ReAct agent with no custom tools.
| Bundle | Description |
|---|---|
react@2026-02-10-02-44 |
Simplest ReAct agent bundle. Recommended starting point — minimal boilerplate, no custom tools. |
react.doc@2026-03-02-22-10 |
ReAct agent over the KDCube knowledge base. Answers questions about the platform and can guide you through building your own bundle. Uses custom bundle-local tools and MCP connectors. |
react.mcp@2026-03-09 |
ReAct agent demonstrating MCP tool configuration — how to connect external tool servers via stdio, HTTP, and SSE transports. |
eco@2026-02-18-15-06 |
Shows how to add economic rate limits to a bundle — per-user quotas, reservation amounts, and BaseEntrypointWithEconomics. |
openrouter-data@2026-03-11 |
Single-turn data-processing pipeline (extraction, classification, tagging, summarization) via OpenRouter. No ReAct agent — demonstrates a simple request → LLM completion → response pattern. |
with-isoruntime@2026-02-16-14-00 |
No AI agent. Demonstrates the exec tool and isolated runtime — how to run user-supplied Python code safely in a sandboxed subprocess (Docker or Fargate). |
Platform Architecture for Bundle Developers
See also: Platform Overview
KDCube is a multi-tenant AI application platform. It provides the infrastructure for deploying conversational AI agents at scale — with streaming, storage, economics, tooling, and extensibility all built in.
A Bundle is your unit of deployment. It's a Python package that plugs into the platform and defines how your AI agent thinks, what tools it uses, how it stores data, what UI widgets it exposes, and how it bills users.
Big Picture: How It All Fits Together
Two Services, One Platform
Ingress (chat-ingress)
Handles all inbound traffic. Authenticates users, enforces rate limits, validates the bundle, enqueues the task, and opens an SSE stream back to the client. Your bundle rarely needs to know about this — the platform wires it up automatically.
Processor (chat-proc)
Dequeues tasks, loads your bundle singleton, and calls execute_core(). It also hosts the Operations API — the REST endpoint that your UI widgets call directly (no SSE needed for widget interactions).
Bundle Anatomy
A bundle is a Python package with a required entrypoint.py and optional supporting files:
Minimal Entrypoint
from kdcube_ai_app.infra.plugin.agentic_loader import agentic_workflow
from kdcube_ai_app.apps.chat.sdk.solutions.chatbot.entrypoint import BaseEntrypoint
from langgraph.graph import StateGraph, START, END
from typing import Dict, Any
BUNDLE_ID = "my-bundle"
@agentic_workflow(bundle_id=BUNDLE_ID)
class MyBundle(BaseEntrypoint):
@property
def configuration(self) -> Dict[str, Any]:
return {
"role_models": {
"solver.react.v2.decision.v2.strong": {
"provider": "anthropic",
"model": "claude-sonnet-4-6"
}
}
}
def on_bundle_load(self, **kwargs):
# Called ONCE per process when bundle first loads.
self.logger.info("Bundle loaded!")
async def execute_core(self, state, thread_id, params):
graph = StateGraph(dict)
graph.add_node("run", self._run_node)
graph.add_edge(START, "run"); graph.add_edge("run", END)
return await graph.compile().ainvoke(state)
async def _run_node(self, state):
await self._comm.delta(text="Hello from my bundle!", index=0, marker="answer")
return {"final_answer": "Done", "followups": []}
BaseEntrypoint Lifecycle
| Method / Property | When Called | Typical Use |
|---|---|---|
configuration (property) | On bundle load | Declare defaults: role_models, embedding, knowledge, economics, execution |
on_bundle_load(**kwargs) | Once per process, per tenant/project | Build knowledge index, connect external services, warm caches |
pre_run_hook(state) | Before each turn | Re-reconcile knowledge space if config changed |
execute_core(state, thread_id, params) | Every turn | Build + invoke your LangGraph workflow |
rebind_request_context(...) | On cached singleton reuse | Refresh request-bound state (comm, user, etc.) |
Interface: In & Out
📥 Inbound: ChatTaskPayload
Every turn arrives as a ChatTaskPayload (Pydantic). Key fields:
request.message— user's textrequest.chat_history— prior messagesrequest.payload— arbitrary JSON (for REST ops)actor— tenant_id, project_idrouting— conversation_id, turn_id, bundle_iduser— user_id, user_type, roles, timezonecontinuation— follow-up or steer type
📤 Outbound: SSE Streaming
Your bundle streams back via the Communicator. The client receives ChatEnvelope events in real time.
- delta — streaming text chunks (thinking / answer)
- step — tool calls, status updates, timeline events
- complete — turn finished with final data
- error — propagate errors cleanly
- event — custom events (artifacts, reactions, etc.)
Using the Communicator
# Stream answer text
await self._comm.delta(text="chunk...", index=0, marker="answer")
# Announce a step
await self._comm.step(step="web_search", status="started", title="Searching the web...")
await self._comm.step(step="web_search", status="completed")
# Emit follow-up suggestions
await self._comm.followups(["Tell me more", "Show examples"])
# Final complete signal
await self._comm.complete(data={"answer": "..."})
REST Operations (for UI Widgets)
Your bundle exposes additional REST operations via the Operations API hosted by the Processor:
POST /bundles/{tenant}/{project}/operations/{operation}
The platform calls your_workflow.{operation}(**request_body). Define operations as async methods in your workflow or entrypoint. This is how UI widgets communicate back to your bundle (see Widgets / UI).
Continuation Types
| Type | Description |
|---|---|
regular | Normal new message |
followup | User clicked a suggested follow-up |
steer | User is redirecting the ongoing turn |
Storage
KDCube is a distributed, multi-tenant system. Three storage tiers are available to your bundle:
Cloud Storage (AIBundleStorage)
from kdcube_ai_app.apps.chat.sdk.storage.ai_bundle_storage import AIBundleStorage
storage = AIBundleStorage(
tenant="my-tenant", project="my-project",
ai_bundle_id="my-bundle@1-0",
storage_uri="s3://my-bucket" # or file:///data/bundle-storage
)
storage.write("reports/latest.json", data='{"count": 42}')
content = storage.read("reports/latest.json", as_text=True)
keys = storage.list("reports/")
Local FS (Shared Bundle Storage)
# In entrypoint or workflow
root = self.bundle_storage_root() # pathlib.Path
index_path = root / "knowledge_index"
index_path.mkdir(exist_ok=True)
Path namespaced: {BUNDLE_STORAGE_ROOT}/{tenant}/{project}/{bundle_id}/. Ideal for knowledge indexes, model weights, large caches. In production, mount an EFS volume.
Redis Cache
# Low-level Redis client (aioredis)
await self.redis.set("my:key", "value", ex=3600)
val = await self.redis.get("my:key")
# KVCache wrapper
await self.kv_cache.set("user_prefs", {"theme": "dark"}, ttl=86400)
Configuration & Secrets
bundles.yaml — Your Bundle Definition
bundles:
version: "1"
default_bundle_id: "my-bundle@1-0"
items:
- id: "my-bundle@1-0"
name: "My Bundle"
repo: "git@github.com:org/my-bundle-repo.git" # optional git source
ref: "v1.0.0"
subdir: "src/my_bundle"
module: "my-bundle@1-0.entrypoint"
config:
embedding:
provider: "openai"
model: "text-embedding-3-small"
role_models:
solver.react.v2.decision.v2.strong:
provider: "anthropic"
model: "claude-sonnet-4-6"
economics:
reservation_amount_dollars: 2.0
execution:
runtime:
mode: "docker" # docker is default — see Exec Runtime section
enabled: true
Bundle Secrets — Managed by kdcube-secrets (Never on Disk)
Bundle secrets are provisioned via the Admin UI or injected into kdcube-secrets at setup time. They are resolved in-memory at runtime and are never written to disk. The bundles.secrets.yaml format below is only used as a transient CLI input during initial provisioning — it is not stored in the workdir.
bundles:
version: "1"
items:
- id: "my-bundle@1-0"
secrets:
openai:
api_key: null # null = resolve from env OPENAI_API_KEY
my_service:
api_key: "sk-live-..." # inline value — consumed by CLI, not stored
webhook_url: "env:MY_WEBHOOK_URL" # or env: reference
Reading Config & Secrets in Code
value = self.bundle_prop("some.nested.key") # dot-path navigation
all_props = self.bundle_props # full merged dict
from kdcube_ai_app.apps.chat.sdk.config import get_secret
api_key = get_secret("bundles.my-bundle@1-0.secrets.my_service.api_key")
Configuration Resolution Order
| Priority | Source | How |
|---|---|---|
| 1 (highest) | Admin UI / runtime overrides | Stored in Redis, applied per tenant/project |
| 2 | bundles.yaml config section | Loaded at startup, seeded into Redis |
| 3 (lowest) | entrypoint.configuration | Bundle code defaults |
Reserved Property Paths
| Path | Purpose |
|---|---|
role_models | Maps logical agent roles → concrete LLM (provider + model) |
embedding | Embedding provider/model for RAG and vector search |
economics.reservation_amount_dollars | Pre-run cost reservation floor (default 2.0) |
execution.runtime | Code exec configuration (Docker, Fargate) |
knowledge | Knowledge space repo/paths configuration |
See full config docs: bundle-configuration-README.md
Workflow Orchestration
The platform uses LangGraph for workflow orchestration. Your execute_core builds and invokes a StateGraph. For common patterns, use BaseWorkflow:
from kdcube_ai_app.apps.chat.sdk.solutions.chatbot.base_workflow import BaseWorkflow
class MyWorkflow(BaseWorkflow):
def __init__(self, *args, bundle_props=None, **kwargs):
super().__init__(*args, bundle_props=bundle_props, **kwargs)
async def process(self, payload):
scratchpad = self.start_turn(payload)
try:
react = await self.build_react(
scratchpad=scratchpad,
tools_module="my_bundle.tools_descriptor",
skills_module="my_bundle.skills_descriptor",
)
result = await react.run(payload)
self.finish_turn(scratchpad, ok=True)
return result
except Exception as e:
self.finish_turn(scratchpad, ok=False); raise
BaseWorkflow for the quickest path. It wires up ConvMemories, TurnStatus, ContextRAG, ApplicationHosting, and gives you build_react() which assembles the full ReAct agent with all tools and skills resolved.
BaseWorkflow Key Parameters
| Parameter | Type | Description |
|---|---|---|
conv_idx | ConvIndex | Conversation vector index for semantic search |
store | ConversationStore | File/S3-backed conversation storage |
comm | ChatCommunicator | SSE streaming channel |
model_service | ModelServiceBase | LLM registry / router |
ctx_client | ContextRAGClient | Context retrieval and RAG |
bundle_props | Dict | Bundle runtime configuration |
graph | GraphCtx | Optional knowledge graph context |
Simplest Agentic Workflow Pattern
A bundle can use any orchestration pattern — it is not required to use the ReAct agent. This is one common pattern that combines a Gate agent with the ReAct agent:
Tools System
| Tool | Namespace | Description |
|---|---|---|
web_search | web_tools | Neural web search with ranking (Brave / DuckDuckGo) |
web_fetch | web_tools | Fetch + parse web pages (readability-enabled) |
execute_code_python | exec_tools | Isolated Python code execution — Docker (default), Fargate, or in-process per bundle config |
write_pdf | rendering_tools | Generate PDF from Markdown + table of contents |
write_png | rendering_tools | Render HTML/SVG to PNG image |
write_docx | rendering_tools | Generate DOCX from Markdown |
write_pptx | rendering_tools | Generate PPTX slide deck |
write_html | rendering_tools | Generate standalone HTML artifact |
fetch_ctx | ctx_tools | Fetch artifacts by logical path (ar:/fi:/ks: addresses) |
read | react → react.read | Load artifact into timeline (fi:/ar:/ks:/so: paths) |
write | react → react.write | Author content and stream to canvas/timeline_text/internal channel; optionally share as file |
plan | react → react.plan | Create / update / close the current turn plan (shown in ANNOUNCE) |
patch | react → react.patch | Patch an existing artifact (unified diff or full replacement) |
memsearch | react → react.memsearch | Semantic search in past conversation turns |
search_files | react → react.search_files | Search files in execution workspace |
hide | react → react.hide | Replace timeline snippet with placeholder |
Bundle-local tools use @kernel_function from Semantic Kernel and are registered in tools_descriptor.py.
# tools/my_tools.py
from typing import Annotated
import semantic_kernel as sk
from semantic_kernel.functions import kernel_function
class MyTools:
@kernel_function(name="search", description="Search product catalog")
async def search(self,
query: Annotated[str, "Search query"],
limit: Annotated[int, "Max results"] = 5
) -> str:
# your logic here
return "results..."
# tools_descriptor.py
TOOLS_SPECS = [
# SDK built-in tool modules (installed package)
{"module": "kdcube_ai_app.apps.chat.sdk.tools.web_tools", "alias": "web_tools", "use_sk": True},
{"module": "kdcube_ai_app.apps.chat.sdk.tools.exec_tools", "alias": "exec_tools", "use_sk": True},
# Bundle-local tools ("ref" = path relative to bundle root, works in Docker too)
{"ref": "tools/my_tools.py", "alias": "my_tools", "use_sk": True},
]
# Tool IDs: web_tools.web_search, exec_tools.execute_code_python, my_tools.search
# Optional: per-tool runtime overrides
TOOL_RUNTIME = {
"web_tools.web_search": "local", # subprocess sandbox
"exec_tools.execute_code_python": "docker", # Docker container
}
See custom-tools-README.md and the example bundle react.doc tools_descriptor.py.
MCP (Model Context Protocol) servers are declared in MCP_TOOL_SPECS and configured in bundle props.
# tools_descriptor.py
MCP_TOOL_SPECS = [
{"server_id": "web_search", "alias": "web_search", "tools": ["web_search"]},
{"server_id": "docs", "alias": "docs", "tools": ["*"]}, # all tools
{"server_id": "stack", "alias": "stack", "tools": ["*"]},
]
# Tool IDs: mcp.docs.some_tool, mcp.stack.some_tool
# bundles.yaml config section — how to connect
config:
mcp:
services:
mcpServers:
docs:
transport: http
url: https://mcp.example.com
auth:
type: bearer
secret: bundles.my-bundle.secrets.docs.token
stack:
transport: stdio
command: npx
args: ["mcp-remote", "mcp.stackoverflow.com"]
See mcp-README.md for all transports (stdio, http, streamable-http, sse) and auth modes.
Artifact Path Families
| Prefix | Resolves To | Example |
|---|---|---|
fi: | File artifact (output or attachment) | fi:turn_123.files/report.pdf |
ar: | Artifact from timeline | ar:turn_123.artifacts.summary |
ks: | Knowledge space (read-only, docs/src) | ks:docs/architecture.md |
sk: | Skills space (skill instruction files) | sk:public.pdf-press/SKILL.md |
so: | Sources from context pool | so:sources_pool[1-5] |
tc: | Tool call block | tc:turn_123.abc.call |
Skills System
Skills are reusable instruction sets that give agents specialized capabilities. A skill bundles a natural-language instruction (SKILL.md), tool references, and source references.
Built-in Platform Skills
| Skill ID | Namespace | Description |
|---|---|---|
url-gen | public | Generate hosted URLs for file artifacts |
pdf-press | public | PDF generation and manipulation |
docx-press | public | DOCX document generation |
pptx-press | public | PPTX presentation generation |
png-press | public | PNG image rendering from HTML/SVG |
mermaid | public | Mermaid diagram generation |
link-evidence | internal | Citation and evidence linking |
sources-section | internal | Automatic sources section generation |
Custom Bundle Skill
# skills/my_skill/SKILL.md
You are an expert in our product catalog.
When asked about products, use the `product_search` tool to find relevant items.
Always include pricing and availability.
# skills/my_skill/tools.yaml
tools:
- id: product_search
role: search
why: Search the product catalog
# skills_descriptor.py
AGENTS_CONFIG = {
"solver": {
"enabled_skills": ["my_skill", "pdf-press", "url-gen"],
"disabled_skills": []
}
}
ReAct Agent (V2)
The ReAct V2 agent is a single autonomous loop — no separate planner, no gate. It starts in runtime.py as a ReactStateV2 with up to max_iterations=6 rounds. In each round the agent makes a decision (LLM call) then executes one or more tools. Planning is available as a tool the agent can call on itself — not a separate orchestration layer.
PlanSnapshot that is tracked as a react.plan block in the timeline. On subsequent rounds the agent can update step statuses (✓ done, ✗ failed, … in-progress). This is all within the same loop — no coordinator needed, and no cache miss from a different system prompt.
ReAct Agent V2 — Loop & Tool Integration
Creating and Running the ReAct Agent
# In your workflow (BaseWorkflow subclass)
react = await self.build_react(
scratchpad=scratchpad,
tools_module="my_bundle.tools_descriptor",
skills_module="my_bundle.skills_descriptor",
knowledge_space_fn=self._search_knowledge, # optional
knowledge_read_fn=self._read_knowledge, # optional
)
result = await react.run(payload)
Timeline
The timeline (timeline.py) is the single source of truth for turn context. It is persisted as artifact:conv.timeline.v1 and loaded at the start of each turn. A separate artifact:conv:sources_pool tracks all sources referenced in the conversation.
Cache checkpoints are computed by rounds (tool call rounds + final completion). They allow LLM context caching to skip retokenizing earlier parts of long conversations. See timeline-README.md, source-pool-README.md, and react-announce-README.md.
Multi-Channel Streaming
ReAct Agent Documentation
Full docs live in docs/sdk/agents/react/. Key files:
Architecture & Flow
Timeline & Artifacts
- timeline-README.md — blocks, cache points, compaction
- react-announce-README.md — announce channel
- source-pool-README.md
- artifact-storage-README.md
- artifact-discovery-README.md
- conversation-artifacts-README.md
Isolated Execution Runtime
The platform provides a sandboxed code execution runtime — your agent can generate and run arbitrary Python code in complete isolation. The runtime has a clear two-zone model:
- Supervisor — networked, has env secrets and full runtime context. All bundle tools from
tools_descriptor.pyexecute here, including MCP tools, bundle-local tools, and custom SDK tools. TheChatCommunicatoris also available to tool code, streaming events via Redis Pub/Sub to the client SSE. - Executor — completely isolated: no network, no env secrets, separate Linux namespace (UID 1001). Runs LLM-generated code. All tool calls are proxied to the Supervisor over a Unix socket. Can only write to
/workspace/workand/workspace/out.
Two execution backends are practical:
🐳 Docker Default
Runs code in an isolated Docker container on the same EC2 host as the Processor. Low latency, ideal for interactive agentic loops. The container shares the host's Docker daemon — fast spin-up, full isolation.
execution:
runtime:
mode: "docker"
enabled: true # default
☁️ AWS Fargate Async only
Serverless container on a separate compute plane. Recommended for long-running, non-live workloads — batch data processing, heavy computation, report generation — where startup latency (10–30s) is acceptable. Not suitable for fast interactive agentic loops.
execution:
runtime:
mode: "fargate"
cluster: "arn:aws:ecs:..."
task_definition: "exec-task"
Exec Environment Variables (Inside Executed Code)
| Variable | Description |
|---|---|
WORKDIR | Working directory (source, helpers) |
OUTPUT_DIR | Output directory (write files here) |
EXECUTION_ID | Unique execution identifier |
RUNTIME_GLOBALS_JSON | Serialized runtime context (tools, state) |
RUNTIME_TOOL_MODULES | Tool module names available |
BUNDLE_ROOT | Bundle root path (access your bundle files) |
BUNDLE_ID | Current bundle ID |
Knowledge Space
Bundles can expose a searchable knowledge space built from a Git repository's docs, source code, deployment configs, and tests.
return {
"knowledge": {
"repo": "https://github.com/org/repo.git", # "" = local repo
"ref": "main",
"docs_root": "app/docs",
"src_root": "app/src",
"deploy_root": "app/deploy",
"tests_root": "app/tests",
"validate_refs": True
}
}
on_bundle_load()— Builds the index once per process (file-locked, signature-cached)pre_run_hook()— Reconciles if config changed
Agent access via ks: paths: react.search_knowledge(query=..., limit=5) and react.read(["ks:docs/architecture.md"])
Context, RAG & Conversations
Context RAG Client
# self.ctx_client is ContextRAGClient
results = await self.ctx_client.search(
query="previous analysis of sales data",
kind="assistant", # or "user" | "attachment"
limit=5
)
artifact = await self.ctx_client.fetch_ctx(["ar:turn_abc.artifacts.summary"])
Conversations API Endpoints
GET /conversations/{tenant}/{project}
POST /conversations/{tenant}/{project}/fetch
POST /conversations/{tenant}/{project}/{conv_id}/turns-with-feedbacks
POST /conversations/{tenant}/{project}/feedback/conversations-in-period
The react.memsearch tool provides vector search in past turns directly inside the agent loop. The ConversationStore (accessible via BaseWorkflow.store) manages turn payloads, timelines, and artifacts.
Timeline & Context Layout
Each conversation maintains a rolling timeline of turn artifacts stored as artifact:conv.timeline.v1. The timeline is the canonical cross-turn context passed to the LLM. It is structured as an ordered sequence of turn records, each containing user input, assistant output, tool calls, and any attached artifacts.
Cache Points
The platform inserts up to three LLM-level cache checkpoints per turn: prev-turn (the end of the prior turn), pre-tail (just before the current turn's tail), and tail (after the current turn). These cache points allow the LLM inference layer to reuse context prefix KV-cache across turns, reducing both latency and token cost for multi-turn conversations.
Compaction
When the accumulated timeline approaches the configured context budget ceiling, the platform triggers compaction: older turn ranges are summarized into a compact conv.range.summary artifact and replaced in the timeline. This is a hard-ceiling guard — it ensures context never silently overflows the model's context window. Compaction is transparent to bundle code.
Hosting & File Resources
Your bundle can produce files (PDFs, PNGs, data exports) and make them available via hosted URLs. The platform handles upload, serving, and access control automatically.
# ApplicationHostingService (via BaseWorkflow.hosting_service)
url = hosting.get_artifact_url("fi:turn_123.files/report.pdf")
# Resource Name format
# ef:{tenant}:{project}:chatbot:{stage}:{user_id}:{conv_id}:{turn_id}:{role}:{path}
# Resolved by POST /by-rn with authentication enforced by platform
Files written to OUTPUT_DIR/turn_{id}/files/ during exec are automatically tracked and can be served. User attachments appear as fi:{turn_id}.user.attachments/{filename}.
Citations & Sources
Citation Tokens
The company was founded in 2015 [[S:1]] and expanded by 2020 [[S:2,3]].
According to multiple sources [[S:1-4]], the trend is clear.
Sources Pool Fields
| Field | Description |
|---|---|
sid | Source ID (integer, per-conversation, deduplicated) |
title | Page or file title |
url | URL or file path |
source_type | web | file | attachment | manual |
objective_relevance | Semantic relevance score (0–1) |
published_time_iso | Publication timestamp |
favicon_url | Source favicon for UI display |
Feedback System
POST /conversations/{tenant}/{project}/{conv_id}/turns/{turn_id}/feedback
{ "reaction": "ok", "text": "Very helpful!", "ts": "2026-03-21T10:00:00Z" }
# reaction: ok | not_ok | neutral | null
Your bundle can also emit machine feedback (origin: "machine") for confidence scores or quality checks — additive, not replacing user feedback. Satisfaction rate: ok / (ok + not_ok + neutral).
Bundle Economics & Rate Limiting
from kdcube_ai_app.apps.chat.sdk.solutions.chatbot.entrypoint_with_economic import (
BaseEntrypointWithEconomics
)
@agentic_workflow(bundle_id=BUNDLE_ID)
class MyBundle(BaseEntrypointWithEconomics):
@property
def configuration(self):
return {"economics": {"reservation_amount_dollars": 2.0}, ...}
Default Quota Policies
| User Type | Concurrent | Req/Day | Req/Month | Tokens/Hour |
|---|---|---|---|---|
anonymous | 1 | 2 | 60 | 150k |
free | 2 | 100 | 30k | 500k |
payasyougo | 2 | 200 | 6k | 1.5M |
admin | 10 | unlimited | unlimited | unlimited |
EconomicsLimitException unchanged. The base report_turn_error() handles this correctly — never catch it silently.Widgets & Custom UI
Your bundle can expose React SPA widgets embedded in the platform UI. Define an operation that returns HTML (a React app), and the UI renders it. The React app can in turn call back to POST /operations/{op} for any further interactions.
Example: Return a React SPA
async def my_dashboard(self, **kwargs) -> dict:
"""My Dashboard widget — loaded by UI when user clicks the button."""
html = """<!DOCTYPE html><html><body>
<div id="root"></div>
<script>
// Your bundled React app
// Can POST back to /bundles/{tenant}/{project}/operations/my_api
</script>
</body></html>"""
return {"html": html, "content_type": "text/html"}
Platform Built-in Operations
| Operation | Description |
|---|---|
ai_bundle | Bundle admin dashboard (props editor, status) — all bundles inherit this |
control_plane | Economics dashboard (usage, billing) — via BaseEntrypointWithEconomics |
suggestions | Suggested prompts for new conversations |
news | News/updates from the bundle |
Deploying Your Bundle
Option A: With the KDCube Platform
-
1
Push your bundle to Git
git push origin v1.0.0 -
2
Add to bundles.yaml
- id: "my-bundle@1-0" repo: "git@github.com:org/my-bundle.git" ref: "v1.0.0" module: "my_bundle.entrypoint" -
3
Inject secrets and apply
kdcube-setup --secrets-set GIT_HTTP_TOKEN=ghp_... kdcube-setup --secrets-prompt # for LLM keysThe bundle config is applied immediately via Redis — no restart needed. Change the
refand re-run to switch bundle versions on the fly. -
4
Set as default bundle via the Admin Dashboard
Open the AI Bundle Dashboard (
/api/integrations/bundles/{tenant}/{project}/operations/ai_bundle). Your registered bundle appears in the list. Set it as thedefault_bundle_idfor the tenant/project. The change is applied immediately via Redis — no restart needed.
Option B: Standalone (Without Platform)
Bundles can run outside the platform. The SDK is a plain Python package — build your own FastAPI app that imports and invokes it directly. Or build a custom Docker image that runs just your bundle with its own server. The platform's value is in hosting, auth, SSE, storage, economics, and UI — none of that is required for the core agent logic.
Bundle Git Auth
| Mode | bundles.yaml | Secret |
|---|---|---|
| SSH key | git@github.com:org/repo.git | SSH key mounted in container |
| HTTPS token | https://github.com/org/repo.git | GIT_HTTP_TOKEN secret |
Example Bundles
| Bundle ID | What It Shows | Key Features |
|---|---|---|
react@2026-02-10-02-44 | Basic ReAct + economics | BaseEntrypointWithEconomics, quota policies, web search, news, suggestions |
react.doc@2026-03-02-22-10 | Docs reader with knowledge space | Knowledge space (docs/src/deploy), file locking, signature caching |
react.mcp@2026-03-09 | ReAct with MCP tools | MCP server integration, custom tool namespacing |
openrouter-data@2026-03-11 | OpenRouter LLM provider | OpenRouter integration, model switching, data analysis |
with-isoruntime@2026-02-16 | Isolated exec runtime | Docker exec, code generation + execution, file output |
customer@marketing | Real-world external bundle | Economics, complex orchestration, custom widgets, news API operation |
react.doc — Knowledge Space pattern deep-dive
Builds the index in on_bundle_load() with file locking (multi-worker safe) and signature-based caching (skip rebuilds when nothing changed). Reconciles on config change in pre_run_hook(). The agent searches with ks: paths via react.search_knowledge and react.read.
Documentation Reference
SDK & Bundle
Deployment Descriptors
ReAct Agent [full folder →]
- flow-README.md
- react-context-README.md
- timeline-README.md
- react-turn-workspace-README.md
- artifact-discovery-README.md
- artifact-storage-README.md
- runtime-configuration-README.md
- source-pool-README.md
- turn-log-README.md
- turn-data-README.md
- external-exec-README.md
- event-blocks-README.md
- tool-call-blocks-README.md
- conversation-artifacts-README.md
- agent-workspace-collaboration-README.md
Monitoring & Observability
The platform exposes operational signals through a combination of log streams, Redis-backed queue metrics, and a health endpoint. These signals are intended for operators monitoring a deployed instance.
Queue Depth
The chat-proc worker consumes a Redis-backed message queue. Queue depth (number of pending messages) is observable via standard Redis list length commands against the configured queue key. Elevated queue depth indicates processor backpressure or under-provisioning.
Processor Health
Each processor instance reports liveness through its internal health check. In ECS deployments the ALB target group tracks instance health; in Compose deployments a container health check is defined. Unhealthy processor instances are replaced or restarted by the orchestrator.
Log Streams
All platform services emit structured log output to stdout, captured by the container runtime. In ECS deployments logs flow to CloudWatch Logs; in Compose deployments they are available via docker compose logs. Log lines include request IDs, tenant/project scope, bundle ID, and error classification where applicable.
Metrics Endpoint
The Operations REST API (served by chat-proc) exposes operational data accessible to authorized callers. Bundle-level metrics (turn counts, economics state, error rates) can be queried via the Operations API using a privileged token. Prometheus-format scrape endpoints are not currently built in but can be added as a bundle operation.
Coming Soon & Current Status
Here's the current state of platform capabilities and what's next on the roadmap:
| Feature | Status | Notes |
|---|---|---|
| Dynamic widget buttons | Soon | Bundle declares widget list in props (widgets: [{id, label, icon}]); UI reads it and renders buttons dynamically — no hardcoding on the platform side needed. Requires adding widgets to the bundle props schema and updating the frontend bundle selector. |
| Custom main view | Soon | Bundle can override the default chat view with a custom HTML/SPA — ideal for dashboards and data-driven UIs that don't need SSE. Requires main_view.operation in bundle props and frontend reading it to render an operation result instead of the chat panel. |
| Static asset serving from bundle | Available now | A bundle can already serve static files (CSS, JS, images) by encoding them in an operation endpoint that reads from its own package. Adding a GET variant to the Operations API (alongside the existing POST) would make this cleaner — a small extension that can be done now. |
| Bundle marketplace | Soon | Browse and install community bundles from a registry. |
| Live bundle reload | Available | Bundle code is loaded per-process and cached as a singleton. Config updates (bundle props, role_models, etc.) propagate immediately via Redis pub/sub. Restarting the processor picks up any code changes. |
| Bundle versioning | Available | Update the ref (branch/tag/commit) in bundles.yaml or via the Admin API and the change applies immediately to new requests — no downtime. The bundle registry is stored in Redis and updated atomically. |
| Multi-bundle conversations | Partial | Technically possible today via routing, but both bundles must understand and agree on the shared conversation format (timeline structure, artifact paths, turn state). Requires alignment between bundle developers on the protocol. |
| Streaming SSE for operations | Soon | Today, widget operations are synchronous REST POST. Adding an SSE channel for operations would allow widgets to stream responses back — enabling live-updating dashboards and progress indicators from widget calls. |
| Policy DSL | Roadmap | A declarative policy language for expressing security rules — bundle access restrictions, data handling constraints, and tenant-scoped controls — without writing Python enforcement code. |
| Deterministic Enforcement Engine | Roadmap | A rule evaluation engine that guarantees enforcement decisions are deterministic and auditable — decoupled from LLM judgment. Designed to complement the existing pre-execution gate sequence. |
| Workflow Invariants | Roadmap | Declarative assertions on workflow state that must hold at defined checkpoints — e.g., "tool X may only be called after step Y" or "context artifact Z must be present before LLM call." Violations halt execution before they propagate. |
| Cross-Agent Approval Gates | Roadmap | Human-in-the-loop or supervisor-agent approval checkpoints for multi-agent workflows. A sub-agent can pause and request explicit approval from a coordinating agent or operator before proceeding with a high-impact action. |