KDCube Platform Documentation

Self-hosted, multi-tenant AI platform and SDK. Architecture, configuration, deployment, and SDK — from system to bundle.

Python 3.11+ · SSE Streaming · ReAct Agent · Multi-Tenant · Docker / ECS · OpenAI · Anthropic · Gemini

Concepts

KDCube AI App is a self-hosted, multi-tenant platform and SDK for building AI assistants and agentic applications. The platform has two core services: Ingress (API gateway, auth, SSE emitter) and Processor/Proc (bundle execution, queue worker, integrations API).

System at a Glance

System architecture overview diagram
System Architecture Overview Platform architecture showing Ingress and Processor services with client, bundle, and storage interactions User Browser / App SSE + REST SSE/WS REST POST /bundles/{t}/{p}/operations/{op} Ingress Auth & Rate Limit Bundle Routing SSE Emitter Task Enqueueing enqueue Redis Queue Task Buffer dequeue Processor (proc) Bundle Loader LangGraph Exec Communicator (async) Operations API (REST) POST /bundles/{t}/{p}/operations/{op} invoke chat·steer·ops YOUR BUNDLE Entrypoint Workflow Tools Skills ReAct Agent Widgets / UI Storage · Config · Economics Communicator Firewall Settings spawns ISO Runtime (Docker / Fargate) Supervisor (networked) All bundle tools network · secrets Communicator → Redis Pub/Sub → SSE (same path as proc) ↔ Unix socket Executor (no network) LLM-gen code tool_call() → Unix socket ✗ no network ✗ no secrets SSE stream Communicator (async streaming → client) PostgreSQL Conversations Redis Props / Cache S3 / Local FS Artifacts / Files Platform-managed persistence layer

Key Terms

TermDescription
BundlePython package registered with @agentic_workflow. The unit of AI application deployment — contains agent logic, tools, skills, and UI widgets.
IngressAPI gateway service: auth, SSE streaming, task enqueueing, rate limiting. Handles all inbound traffic before the bundle sees it.
Processor / ProcQueue worker: executes bundles, hosts the Operations REST API. Loads bundle singletons and calls execute_core() per turn.
TimelineRolling cross-conversation context persisted as artifact:conv.timeline.v1. Ordered blocks (oldest → newest) streamed as SSE events. Grows across turns; compacted when context budget is reached. The turn log is the per-turn portion appended each turn.
ReAct AgentAutonomous loop (ReactStateV2, max 6 iterations). Plan is a tool — not a separate component. No coordinator needed.
Tenant / ProjectMulti-tenancy units. All data, config, bundles, and budgets are scoped to tenant + project. Each tenant gets an isolated schema in PostgreSQL.

Supported Providers

🔌
Transports: SSE · Socket.IO · REST  |  LLMs: OpenAI · Anthropic · Gemini · OpenRouter  |  Search: Brave · DuckDuckGo  |  Auth: Cognito · SimpleIDP · Delegated

Quick Start

  1. 1

    Install the KDCube CLI

    # Recommended: pipx (isolated installation)
    pipx install kdcube-cli
    
    # Or with pip
    pip install kdcube-cli

    Prerequisites: Python 3.11+, Docker, Git.

  2. 2

    Run the setup wizard

    kdcube-setup

    The wizard is fully interactive — you can enter everything manually, or point it at deployment descriptors and it applies them automatically: assembly.yaml · gateway.yaml · bundles.yaml. Choose release-latest → answer yes to "Run docker compose now?" — no local build needed. Creates ~/.kdcube/kdcube-runtime/, pulls images, starts Docker Compose. Secrets are injected into the kdcube-secrets service at setup time — never written to disk.

    KDCube CLI setup wizard
  3. 3

    Open the UI

    http://localhost:${KDCUBE_UI_PORT}/chatbot/chat

    You're now running a full KDCube stack with the built-in example bundles.

Setup Flow

CLI setup flow diagram
Setup Flow Sequential steps of KDCube CLI setup wizard from install through configuration Descriptors ×5 assembly · secrets gateway · bundles kdcube-setup CLI wizard Workdir + Env ~/.kdcube/kdcube-runtime Docker Compose full stack up Platform Live ✓ ready for bundles

Workdir Layout

~/.kdcube/kdcube-runtime/
├─ config/
│  ├─ .env                           # Base Docker Compose env
│  ├─ assembly.yaml                  # Platform version, auth type, domain
│  ├─ bundles.yaml                   # Bundle definitions        ← edit this
│  ├─ gateway.yaml                   # Rate limits, circuit breaker
│  ├─ install-meta.json              # CLI install metadata
│  ├─ frontend.config.*.json         # UI auth config
│  ├─ nginx_proxy.conf               # Nginx reverse proxy
│  └─ nginx_ui.conf                  # Nginx UI config
├─ data/
│  ├─ bundle-storage/                # Per-bundle persistent storage
│  ├─ bundles/                       # Bundle source cache
│  ├─ exec-workspace/                # Code execution sandboxes
│  ├─ kdcube-storage/                # Conversations, artifacts, files
│  ├─ nginx/                         # Nginx runtime data
│  ├─ postgres/                      # Database volume
│  └─ redis/                         # Redis persistence
└─ logs/                             # Service logs

Secrets are never stored on disk — they are injected into kdcube-secrets at setup time and resolved in-memory at runtime.

Full CLI reference: kdcube_cli/README.md

Platform

Ingress (chat-ingress)

Handles all inbound traffic. Authenticates users, enforces rate limits, validates the bundle, enqueues the task, and opens an SSE stream back to the client. Your bundle rarely needs to know about this — the platform wires it up automatically.

Processor (chat-proc)

Dequeues tasks, loads your bundle singleton, and calls execute_core(). Also hosts the Operations API — the REST endpoint that UI widgets call directly (no SSE needed for widget interactions).

SSE Streaming Flow

SSE streaming flow diagram
SSE Streaming Flow Real-time event delivery from bundle to client via Server-Sent Events Client open SSE stream send message Ingress auth · rate limit enqueue task fan-out SSE Redis Queue task buffer Proc dequeue · execute bundle.run() communicator Bundle emit events via comm. SSE stream → client in real time (Redis pub/sub relay)

Modules

Platform Architecture — Detail

Platform architecture detail diagram
Platform Architecture Detail Detailed view of KDCube service modules including ingress, processor, storage, and Redis layers Client Browser HTTPS :443 web-proxy OpenResty TLS termination token unmasking routing chat-ingress :8010 Auth · JWT · Rate Limit /api/chat/* — SSE gateway /api/conversations/* /api/resources/* · opex/* /api/economics/* · ctrl/* Redis Queue per user_type chat-proc :8020 Bundle Loader · LangGraph /integrations/.../ops/{op} Admin: bundle reg · props Admin: secrets · cleanup Communicator (pub/sub) invoke chat·steer·ops bundle communicator → SSE YOUR BUNDLE @agentic_workflow run() · on_bundle_load() tools · skills ReAct Agent loop Communicator Firewall Settings PostgreSQL RDS conversations Redis ElastiCache cache · pub/sub · queue EFS / S3 bundle storage Shared persistence — managed by platform

Services

ServicePortRoleRequired?
web-proxy:443 / :80TLS termination, token unmasking, routingRequired
chat-ingress:8010Auth, SSE/Socket.IO gateway, task enqueueingRequired
chat-proc:8020Bundle execution, integrations REST APIRequired
web-ui:80SPA frontendRequired
kdcube-secretsinternalIn-memory secrets service (secrets never written to disk)Required (local)
metricsinternalAutoscaling metric export (CloudWatch) — not needed for single-nodeOptional
proxylogininternalDelegated auth token exchangeOptional
clamavinternalAntivirus scanning for file attachmentsOptional
exec (on-demand)Ephemeral Docker/Fargate container for isolated code executionOptional

Routing

Path PatternRoutes To
/sse/*, /api/chat/*, /admin/*chat-ingress
/api/integrations/*chat-proc
/auth/*proxylogin (delegated auth only)
/*web-ui

Communication Subsystem

The platform communication subsystem is a three-layer pipeline that delivers async events from bundles to connected clients. It is transport-agnostic — the same bundle code streams to SSE, Socket.IO, or REST without changes.

Communication subsystem pipeline diagram
Communication Subsystem Pipeline Three-layer async event pipeline from bundle producer to connected clients Bundle chat-proc ChatCommunicator pub Comm Pipeline Filter / Firewall Broadcast / P2P channels Recorder Redis Pub/Sub per-session channels Chat Relay session fan-out sub/unsub lifecycle SSE transport /sse/stream + /sse/chat Socket.IO transport chat_message event Browser / App client

Filter / Firewall

Each event can be intercepted and filtered before delivery. Bundles can define event filters to transform, block, or enrich events. The bundle firewall enforces per-tenant policies on what events flow to clients.

Broadcast / P2P Channels

The relay supports session-scoped pub/sub channels. Bundles can broadcast to all subscribers of a room, or send point-to-point to a specific target_sid. The same mechanism powers streaming from proc to multiple ingress instances.

Recorder

Events flowing through the pipeline can be recorded — useful for audit trails, replay, and observability. The recorder captures the envelope payload including session_id, event, and timestamp.

ChatCommunicator — Bundle Producer API

Every bundle workflow receives a ChatCommunicator instance. It wraps Redis Pub/Sub with a typed API for all outbound async events. The same instance works whether the bundle runs in-process or inside an isolated execution runtime (Docker / Fargate sandbox) — the iso executor routes events to the supervisor over a Unix socket, which then publishes to Redis. The client SSE stream is seamless in both cases.

# Standard streaming lifecycle
await communicator.start()                   # chat_start event
await communicator.step("Searching…")       # chat_step — visible progress
await communicator.delta("answer", chunk)   # chat_delta — streaming text
await communicator.complete()                # chat_complete

# Delta markers: answer · thinking · canvas · timeline_text · subsystem
await communicator.delta("thinking", reasoning_chunk)
await communicator.delta("canvas", json_payload)

# Custom service event — broadcast to all session subscribers
await communicator.event(
    event_type="chat.service",
    data={"key": "status.update", "value": "processing"},
)

# P2P — deliver only to a specific connected client (target_sid)
await communicator.event(
    event_type="chat.service",
    data={"key": "private", "value": result},
    target_sid=sid,                           # omit for broadcast
)

# Error
await communicator.error("Something went wrong")

Outbound Firewall (per bundle)

Bundles can attach an IEventFilter to the workflow to suppress or gate events before they reach Redis. The filter sees the caller's user_type, user_id, and full event metadata. Filters are fail-open: an exception allows the event through.

from kdcube_ai_app.apps.chat.sdk.comm import IEventFilter

class MyBundleFilter(IEventFilter):
    def allow_event(self, user_type, user_id, event) -> bool:
        # hide internal step events from non-privileged users
        if user_type != "privileged":
            if event.get("type") == "chat.step" and event.get("broadcast"):
                return False
        return True

# Wire at entrypoint — passed to workflow factory via event_filter param
workflow = MyWorkflow(..., event_filter=MyBundleFilter())
Iso runtime transparency: ChatCommunicator is fully available inside sandboxed execution (Docker / Fargate). Events published from the sandbox travel: exec process → Unix socket → supervisor → Redis Pub/Sub → SSEHub → client. No code change needed between in-process and iso modes.

See comm-system.md and README-comm.md for full detail.

Configuration

Config Hierarchy

Configuration hierarchy diagram
Configuration Hierarchy Resolution order of platform configuration from assembly to bundle to gateway layers assembly.yaml Platform version auth type domain gateway.yaml Rate limits Capacity Throttling bundles.yaml Bundle defs git refs per-bundle config kdcube-secrets API keys · tokens in-memory only never on disk Admin UI Runtime overrides per tenant/project stored in Redis Precedence: Admin UI override → bundles.yaml config → code defaults

assembly.yaml

release_name: "prod-2026-03-21"
platform:
  repo: "kdcube-ai-app"
  ref: "v0.3.2"
auth:
  type: "delegated"   # simple | cognito | delegated
domain: "chat.example.com"

bundles.yaml

bundles:
  version: "1"
  default_bundle_id: "react@2026-02-10-02-44"
  items:
    - id: "react@2026-02-10-02-44"
      repo: "git@github.com:org/repo.git"
      ref: "bundle-v2026.03.21"
      module: "react@2026-02-10-02-44.entrypoint"
      config:
        model_id: "gpt-4o-mini"
        role_models:
          solver.react.v2.decision.v2.strong:
            provider: "anthropic"
            model: "claude-sonnet-4-6"

Platform-Reserved Config Keys

KeyDescription
role_modelsOverride LLM per role. Maps logical agent roles to concrete provider + model combinations.
embeddingEmbedding provider + model for RAG and vector search in knowledge spaces.
economics.reservation_amount_dollarsCost reserved before execution begins. Default 2.0 USD. Committed on completion, released on failure.
execution.runtimeCode exec backend: docker (default, fast) or fargate (batch/async workloads only).
mcp.servicesMCP connector config per bundle. Matched by server_id in MCP_TOOL_SPECS.

Deployment Options

Deployment options overview diagram
Deployment Options Overview of KDCube deployment modes including local Docker Compose and AWS ECS Platform Dev • Services run on host • infra via local-infra-stack • Fast iteration cycles • For SDK / platform devs DEV All-in-One Compose • Full stack in Docker Compose • Bootstrap with kdcube-setup • Best for bundle dev • Platform evaluation LOCAL Custom UI + Infra • KDCube + custom frontend • Postgres/Redis external • OpenResty auth templates • Cognito / Delegated auth PROD-READY AWS ECS (Production) • Descriptor-driven via Terraform • chat-proc on EC2-backed tasks • ALB + ACM TLS, Cloud Map • GitHub Actions CI/CD PRODUCTION
⚠️
Fargate has 10–30s startup latency — suitable for batch processing only. Interactive agentic loops require EC2-backed tasks. Use Docker exec mode (default) for live agent interactions.

ECS Topology

ECS topology diagram
ECS Topology AWS ECS deployment topology showing load balancer, ingress, processor, and data services AWS VPC Client HTTPS :443 ALB ACM TLS web-proxy OpenResty :80 ECS Cluster web-ui Fargate chat-ingress Fargate chat-proc EC2-backed proxylogin Fargate Cloud Map service discovery RDS PostgreSQL Redis ElastiCache EFS Bundle storage ECR images

Platform Economics

The economics subsystem enforces rate limits, quotas, and funding lanes per user per request. Cost is reserved before execution and committed on success, released on failure.

Economics Flow

Economics flow diagram
Economics Flow Token budget and rate limit enforcement flow through the platform economics subsystem Request user_type bundle_id Plan Check rate limits concurrency Reservation reserve $2.00 funding lane Execute bundle.run() token counting Commit / Release commit on success release on failure EconomicsLimitException → release reservation

Key Concepts

ConceptDescription
PlanQuota policy identity — defines requests/second, concurrency limits, and token quotas per class of users (anonymous, registered, paid, privileged).
Funding LaneSubscription budget (monthly USD) · wallet credits · project budget. Determines which funding source covers the request.
ReservationCost reserved before execution begins. Committed on completion, automatically released on failure or EconomicsLimitException.

Rate Limit Layers

  1. 1

    Gateway layer

    Requests/second throttling at Ingress. Configurable per tenant via gateway.yaml. Enforces backpressure before tasks are enqueued.

  2. 2

    Economics layer

    Per-user quotas (daily/monthly requests, concurrent sessions, hourly token budgets). Enforced by the economics subsystem at execution time.

💡
Stripe integration is available for subscription and wallet top-up flows. Use BaseEntrypointWithEconomics for automatic enforcement. Always re-raise EconomicsLimitException unchanged.

Security & Governance

For the full security architecture, threat model, and compliance mapping, see Security & Governance.

Auth Flow

Authentication flow diagram
Auth Flow Request authorization pipeline from client through JWT validation to bundle execution Client Request Bearer / Cookie Token Extraction header / cookie / SSE param Auth Provider Cognito · SimpleIDP · Delegated Role Resolution privileged → paid → registered Session + UserType user_id · user_type · roles tenant · project scope Cognito (production) Bearer + ID token SimpleIDP (dev) idp_users.json Delegated proxylogin token exchange

Security Features

FeatureDescription
Content FilteringClamAV scans file uploads. event_filter.py in your bundle controls which tool events are visible to clients — prevent sensitive tool results from leaking.
Bundle FirewallPer-bundle tool access rules. Restricts what tools a bundle can call at the platform level — independent of bundle code.
Gateway ControlsPer-endpoint throttling (guarded_rest_patterns), bypass patterns, backpressure, and circuit breakers. Configured in gateway.yaml.
Network IsolationAll services are private in the VPC. Only web-proxy is ALB-facing. TLS terminated at ALB (ECS) or at the proxy (Compose).

Enforcement Model

KDCube's runtime enforcement is pre-execution. Budget, rate, and tenant checks happen before the LLM call is charged and before any external system receives a request. Every agent request passes through a fixed 6-step gate sequence:

  1. Session Resolution — authenticate and resolve user/session context.
  2. Rate Limit — enforce per-session and per-user-type limits.
  3. Backpressure — evaluate queue/capacity admission thresholds.
  4. Circuit Breaker — fail-fast on unhealthy system states.
  5. Economics / Quota — budget reservation and commit gates before execution.
  6. Atomic Enqueue — admit or reject at the enqueue boundary with an auditable decision event.

This model differs fundamentally from output filtering (which operates after the LLM responds). Budget, rate, and isolation controls prevent wrong actions from executing — not just detect them after the fact. See Security → Enforcement Model for the full sequence diagram and compliance mapping.

What Is a Bundle?

A Bundle is your application package. It combines Python backend code with optional TypeScript UI widgets and views into a single deployable unit. The package becomes a bundle by subclassing the platform's base class (BaseEntrypoint or a derived variant) and registering with @agentic_workflow.

App sources live in git. The platform's assembly descriptor points to a git repo, ref, and subdirectory. On deploy, the CLI or CI pipeline pulls the ref and registers the bundle. You can update a running deployment by pushing a new ref and re-running the deploy step — no service restart required.

Most example bundles use the ReAct agent (ReactStateV2), but a bundle does not have to. You can use any LangGraph graph, a simple request→response pipeline, or no LLM at all.

Example Bundles

Start from any of the examples in sdk/examples/bundles/. The simplest starting point is react@2026-02-10-02-44 — a minimal ReAct agent with no custom tools.

BundleDescription
react@2026-02-10-02-44 Simplest ReAct agent bundle. Recommended starting point — minimal boilerplate, no custom tools.
react.doc@2026-03-02-22-10 ReAct agent over the KDCube knowledge base. Answers questions about the platform and can guide you through building your own bundle. Uses custom bundle-local tools and MCP connectors.
react.mcp@2026-03-09 ReAct agent demonstrating MCP tool configuration — how to connect external tool servers via stdio, HTTP, and SSE transports.
eco@2026-02-18-15-06 Shows how to add economic rate limits to a bundle — per-user quotas, reservation amounts, and BaseEntrypointWithEconomics.
openrouter-data@2026-03-11 Single-turn data-processing pipeline (extraction, classification, tagging, summarization) via OpenRouter. No ReAct agent — demonstrates a simple request → LLM completion → response pattern.
with-isoruntime@2026-02-16-14-00 No AI agent. Demonstrates the exec tool and isolated runtime — how to run user-supplied Python code safely in a sandboxed subprocess (Docker or Fargate).

Platform Architecture for Bundle Developers

See also: Platform Overview

KDCube is a multi-tenant AI application platform. It provides the infrastructure for deploying conversational AI agents at scale — with streaming, storage, economics, tooling, and extensibility all built in.

A Bundle is your unit of deployment. It's a Python package that plugs into the platform and defines how your AI agent thinks, what tools it uses, how it stores data, what UI widgets it exposes, and how it bills users.

💡
Think of a Bundle as a self-contained AI product — it can include its own agent logic, knowledge base, custom tools, UI dashboards, billing configuration, and more. The platform provides everything else (auth, routing, scaling, storage backends, rate limiting).

Big Picture: How It All Fits Together

Big picture platform overview diagram
Big Picture: How It All Fits Together Complete platform overview showing all components from client through ingress, processor, bundles, storage, and integrations User Browser / App SSE + REST SSE/WS REST POST /bundles/{t}/{p}/operations/{op} Ingress Auth & Rate Limit Bundle Routing SSE Emitter Task Enqueueing enqueue Redis Queue Task Buffer dequeue Processor (proc) Bundle Loader LangGraph Exec Communicator (async) Operations API (REST) POST /bundles/{t}/{p}/operations/{op} invoke chat·steer·ops YOUR BUNDLE Entrypoint Workflow Tools Skills ReAct Agent Widgets / UI Storage · Config · Economics Communicator Firewall Settings SSE stream Communicator (async streaming → client) PostgreSQL Conversations Redis Props / Cache S3 / Local FS Artifacts / Files Platform-managed persistence layer

Two Services, One Platform

Ingress (chat-ingress)

Handles all inbound traffic. Authenticates users, enforces rate limits, validates the bundle, enqueues the task, and opens an SSE stream back to the client. Your bundle rarely needs to know about this — the platform wires it up automatically.

Processor (chat-proc)

Dequeues tasks, loads your bundle singleton, and calls execute_core(). It also hosts the Operations API — the REST endpoint that your UI widgets call directly (no SSE needed for widget interactions).

Bundle Anatomy

A bundle is a Python package with a required entrypoint.py and optional supporting files:

Bundle anatomy file structure diagram
Bundle Anatomy File structure of a KDCube bundle package showing entrypoint, tools, skills, and config files my-bundle@1-0/ entrypoint.py REQUIRED — @agentic_workflow orchestrator/workflow.py BaseWorkflow subclass agents/gate.py Optional: custom intent gate (demo pattern) tools/my_tools.py Custom tool implementations tools_descriptor.py Register which tools are enabled skills/my_skill/SKILL.md Agent instruction sets skills_descriptor.py Which skills are visible to which agents knowledge/resolver.py Optional: docs/code knowledge space resources.py User-facing error messages event_filter.py Optional: filter/transform events

Minimal Entrypoint

from kdcube_ai_app.infra.plugin.agentic_loader import agentic_workflow
from kdcube_ai_app.apps.chat.sdk.solutions.chatbot.entrypoint import BaseEntrypoint
from langgraph.graph import StateGraph, START, END
from typing import Dict, Any

BUNDLE_ID = "my-bundle"

@agentic_workflow(bundle_id=BUNDLE_ID)
class MyBundle(BaseEntrypoint):

    @property
    def configuration(self) -> Dict[str, Any]:
        return {
            "role_models": {
                "solver.react.v2.decision.v2.strong": {
                    "provider": "anthropic",
                    "model": "claude-sonnet-4-6"
                }
            }
        }

    def on_bundle_load(self, **kwargs):
        # Called ONCE per process when bundle first loads.
        self.logger.info("Bundle loaded!")

    async def execute_core(self, state, thread_id, params):
        graph = StateGraph(dict)
        graph.add_node("run", self._run_node)
        graph.add_edge(START, "run"); graph.add_edge("run", END)
        return await graph.compile().ainvoke(state)

    async def _run_node(self, state):
        await self._comm.delta(text="Hello from my bundle!", index=0, marker="answer")
        return {"final_answer": "Done", "followups": []}

BaseEntrypoint Lifecycle

Method / PropertyWhen CalledTypical Use
configuration (property)On bundle loadDeclare defaults: role_models, embedding, knowledge, economics, execution
on_bundle_load(**kwargs)Once per process, per tenant/projectBuild knowledge index, connect external services, warm caches
pre_run_hook(state)Before each turnRe-reconcile knowledge space if config changed
execute_core(state, thread_id, params)Every turnBuild + invoke your LangGraph workflow
rebind_request_context(...)On cached singleton reuseRefresh request-bound state (comm, user, etc.)

Interface: In & Out

📥 Inbound: ChatTaskPayload

Every turn arrives as a ChatTaskPayload (Pydantic). Key fields:

  • request.message — user's text
  • request.chat_history — prior messages
  • request.payload — arbitrary JSON (for REST ops)
  • actor — tenant_id, project_id
  • routing — conversation_id, turn_id, bundle_id
  • user — user_id, user_type, roles, timezone
  • continuation — follow-up or steer type

📤 Outbound: SSE Streaming

Your bundle streams back via the Communicator. The client receives ChatEnvelope events in real time.

  • delta — streaming text chunks (thinking / answer)
  • step — tool calls, status updates, timeline events
  • complete — turn finished with final data
  • error — propagate errors cleanly
  • event — custom events (artifacts, reactions, etc.)

Using the Communicator

# Stream answer text
await self._comm.delta(text="chunk...", index=0, marker="answer")

# Announce a step
await self._comm.step(step="web_search", status="started", title="Searching the web...")
await self._comm.step(step="web_search", status="completed")

# Emit follow-up suggestions
await self._comm.followups(["Tell me more", "Show examples"])

# Final complete signal
await self._comm.complete(data={"answer": "..."})

REST Operations (for UI Widgets)

Your bundle exposes additional REST operations via the Operations API hosted by the Processor:

POST /bundles/{tenant}/{project}/operations/{operation}

The platform calls your_workflow.{operation}(**request_body). Define operations as async methods in your workflow or entrypoint. This is how UI widgets communicate back to your bundle (see Widgets / UI).

Continuation Types

TypeDescription
regularNormal new message
followupUser clicked a suggested follow-up
steerUser is redirecting the ongoing turn

Storage

KDCube is a distributed, multi-tenant system. Three storage tiers are available to your bundle:

Storage tiers diagram
Storage Tiers Three storage tiers available to bundles: cloud object storage, local filesystem, and Redis cache ☁️ Cloud Storage AIBundleStorage file:// or s3:// Survives restarts & scaling Per tenant / project / bundle Read/write arbitrary keys CB_BUNDLE_STORAGE_URL 📁 Local FS (Shared) bundle_storage_root() BUNDLE_STORAGE_ROOT env Per bundle version namespace Knowledge indexes, caches EFS in production pathlib.Path (filesystem) ⚡ Redis Cache self.kv_cache / self.redis In-memory, fast Bundle props & config Session state, rate limits Pub/sub for bundle updates aioredis client

Cloud Storage (AIBundleStorage)

from kdcube_ai_app.apps.chat.sdk.storage.ai_bundle_storage import AIBundleStorage

storage = AIBundleStorage(
    tenant="my-tenant", project="my-project",
    ai_bundle_id="my-bundle@1-0",
    storage_uri="s3://my-bucket"  # or file:///data/bundle-storage
)

storage.write("reports/latest.json", data='{"count": 42}')
content = storage.read("reports/latest.json", as_text=True)
keys = storage.list("reports/")

Local FS (Shared Bundle Storage)

# In entrypoint or workflow
root = self.bundle_storage_root()  # pathlib.Path
index_path = root / "knowledge_index"
index_path.mkdir(exist_ok=True)

Path namespaced: {BUNDLE_STORAGE_ROOT}/{tenant}/{project}/{bundle_id}/. Ideal for knowledge indexes, model weights, large caches. In production, mount an EFS volume.

Redis Cache

# Low-level Redis client (aioredis)
await self.redis.set("my:key", "value", ex=3600)
val = await self.redis.get("my:key")

# KVCache wrapper
await self.kv_cache.set("user_prefs", {"theme": "dark"}, ttl=86400)

Configuration & Secrets

bundles.yaml — Your Bundle Definition

bundles:
  version: "1"
  default_bundle_id: "my-bundle@1-0"
  items:
    - id: "my-bundle@1-0"
      name: "My Bundle"
      repo: "git@github.com:org/my-bundle-repo.git"   # optional git source
      ref: "v1.0.0"
      subdir: "src/my_bundle"
      module: "my-bundle@1-0.entrypoint"
      config:
        embedding:
          provider: "openai"
          model: "text-embedding-3-small"
        role_models:
          solver.react.v2.decision.v2.strong:
            provider: "anthropic"
            model: "claude-sonnet-4-6"
        economics:
          reservation_amount_dollars: 2.0
        execution:
          runtime:
            mode: "docker"   # docker is default — see Exec Runtime section
            enabled: true

Bundle Secrets — Managed by kdcube-secrets (Never on Disk)

Bundle secrets are provisioned via the Admin UI or injected into kdcube-secrets at setup time. They are resolved in-memory at runtime and are never written to disk. The bundles.secrets.yaml format below is only used as a transient CLI input during initial provisioning — it is not stored in the workdir.

bundles:
  version: "1"
  items:
    - id: "my-bundle@1-0"
      secrets:
        openai:
          api_key: null          # null = resolve from env OPENAI_API_KEY
        my_service:
          api_key: "sk-live-..."  # inline value — consumed by CLI, not stored
          webhook_url: "env:MY_WEBHOOK_URL"  # or env: reference

Reading Config & Secrets in Code

value = self.bundle_prop("some.nested.key")   # dot-path navigation
all_props = self.bundle_props                    # full merged dict

from kdcube_ai_app.apps.chat.sdk.config import get_secret
api_key = get_secret("bundles.my-bundle@1-0.secrets.my_service.api_key")

Configuration Resolution Order

PrioritySourceHow
1 (highest)Admin UI / runtime overridesStored in Redis, applied per tenant/project
2bundles.yaml config sectionLoaded at startup, seeded into Redis
3 (lowest)entrypoint.configurationBundle code defaults

Reserved Property Paths

PathPurpose
role_modelsMaps logical agent roles → concrete LLM (provider + model)
embeddingEmbedding provider/model for RAG and vector search
economics.reservation_amount_dollarsPre-run cost reservation floor (default 2.0)
execution.runtimeCode exec configuration (Docker, Fargate)
knowledgeKnowledge space repo/paths configuration

See full config docs: bundle-configuration-README.md

Workflow Orchestration

The platform uses LangGraph for workflow orchestration. Your execute_core builds and invokes a StateGraph. For common patterns, use BaseWorkflow:

from kdcube_ai_app.apps.chat.sdk.solutions.chatbot.base_workflow import BaseWorkflow

class MyWorkflow(BaseWorkflow):
    def __init__(self, *args, bundle_props=None, **kwargs):
        super().__init__(*args, bundle_props=bundle_props, **kwargs)

    async def process(self, payload):
        scratchpad = self.start_turn(payload)
        try:
            react = await self.build_react(
                scratchpad=scratchpad,
                tools_module="my_bundle.tools_descriptor",
                skills_module="my_bundle.skills_descriptor",
            )
            result = await react.run(payload)
            self.finish_turn(scratchpad, ok=True)
            return result
        except Exception as e:
            self.finish_turn(scratchpad, ok=False); raise
🚀
Use BaseWorkflow for the quickest path. It wires up ConvMemories, TurnStatus, ContextRAG, ApplicationHosting, and gives you build_react() which assembles the full ReAct agent with all tools and skills resolved.

BaseWorkflow Key Parameters

ParameterTypeDescription
conv_idxConvIndexConversation vector index for semantic search
storeConversationStoreFile/S3-backed conversation storage
commChatCommunicatorSSE streaming channel
model_serviceModelServiceBaseLLM registry / router
ctx_clientContextRAGClientContext retrieval and RAG
bundle_propsDictBundle runtime configuration
graphGraphCtxOptional knowledge graph context

Simplest Agentic Workflow Pattern

A bundle can use any orchestration pattern — it is not required to use the ReAct agent. This is one common pattern that combines a Gate agent with the ReAct agent:

Agentic workflow pattern diagram
Simplest Agentic Workflow Pattern Workflow diagram combining a Gate agent with the ReAct agent for agentic task routing Simplest Agentic Workflow Pattern (optional gate + react layers) User Message Gate Agent (Optional layer) Intent classify Route decision ReAct Agent Decision loop Tool calls Self-planning Answer Gen (Optional layer) Final answer Follow-ups 📜 Timeline — streamed to client in real time tool_calls · answer · thinking · artifacts · plan blocks — all streaming

Tools System

ToolNamespaceDescription
web_searchweb_toolsNeural web search with ranking (Brave / DuckDuckGo)
web_fetchweb_toolsFetch + parse web pages (readability-enabled)
execute_code_pythonexec_toolsIsolated Python code execution — Docker (default), Fargate, or in-process per bundle config
write_pdfrendering_toolsGenerate PDF from Markdown + table of contents
write_pngrendering_toolsRender HTML/SVG to PNG image
write_docxrendering_toolsGenerate DOCX from Markdown
write_pptxrendering_toolsGenerate PPTX slide deck
write_htmlrendering_toolsGenerate standalone HTML artifact
fetch_ctxctx_toolsFetch artifacts by logical path (ar:/fi:/ks: addresses)
readreactreact.readLoad artifact into timeline (fi:/ar:/ks:/so: paths)
writereactreact.writeAuthor content and stream to canvas/timeline_text/internal channel; optionally share as file
planreactreact.planCreate / update / close the current turn plan (shown in ANNOUNCE)
patchreactreact.patchPatch an existing artifact (unified diff or full replacement)
memsearchreactreact.memsearchSemantic search in past conversation turns
search_filesreactreact.search_filesSearch files in execution workspace
hidereactreact.hideReplace timeline snippet with placeholder

Bundle-local tools use @kernel_function from Semantic Kernel and are registered in tools_descriptor.py.

# tools/my_tools.py
from typing import Annotated
import semantic_kernel as sk
from semantic_kernel.functions import kernel_function

class MyTools:
    @kernel_function(name="search", description="Search product catalog")
    async def search(self,
        query: Annotated[str, "Search query"],
        limit: Annotated[int, "Max results"] = 5
    ) -> str:
        # your logic here
        return "results..."
# tools_descriptor.py
TOOLS_SPECS = [
    # SDK built-in tool modules (installed package)
    {"module": "kdcube_ai_app.apps.chat.sdk.tools.web_tools", "alias": "web_tools", "use_sk": True},
    {"module": "kdcube_ai_app.apps.chat.sdk.tools.exec_tools", "alias": "exec_tools", "use_sk": True},
    # Bundle-local tools ("ref" = path relative to bundle root, works in Docker too)
    {"ref": "tools/my_tools.py", "alias": "my_tools", "use_sk": True},
]
# Tool IDs: web_tools.web_search, exec_tools.execute_code_python, my_tools.search

# Optional: per-tool runtime overrides
TOOL_RUNTIME = {
    "web_tools.web_search": "local",       # subprocess sandbox
    "exec_tools.execute_code_python": "docker",  # Docker container
}

See custom-tools-README.md and the example bundle react.doc tools_descriptor.py.

MCP (Model Context Protocol) servers are declared in MCP_TOOL_SPECS and configured in bundle props.

# tools_descriptor.py
MCP_TOOL_SPECS = [
    {"server_id": "web_search", "alias": "web_search", "tools": ["web_search"]},
    {"server_id": "docs", "alias": "docs", "tools": ["*"]},  # all tools
    {"server_id": "stack", "alias": "stack", "tools": ["*"]},
]
# Tool IDs: mcp.docs.some_tool, mcp.stack.some_tool
# bundles.yaml config section — how to connect
config:
  mcp:
    services:
      mcpServers:
        docs:
          transport: http
          url: https://mcp.example.com
          auth:
            type: bearer
            secret: bundles.my-bundle.secrets.docs.token
        stack:
          transport: stdio
          command: npx
          args: ["mcp-remote", "mcp.stackoverflow.com"]

See mcp-README.md for all transports (stdio, http, streamable-http, sse) and auth modes.

Artifact Path Families

PrefixResolves ToExample
fi:File artifact (output or attachment)fi:turn_123.files/report.pdf
ar:Artifact from timelinear:turn_123.artifacts.summary
ks:Knowledge space (read-only, docs/src)ks:docs/architecture.md
sk:Skills space (skill instruction files)sk:public.pdf-press/SKILL.md
so:Sources from context poolso:sources_pool[1-5]
tc:Tool call blocktc:turn_123.abc.call

Skills System

Skills are reusable instruction sets that give agents specialized capabilities. A skill bundles a natural-language instruction (SKILL.md), tool references, and source references.

Built-in Platform Skills

Skill IDNamespaceDescription
url-genpublicGenerate hosted URLs for file artifacts
pdf-presspublicPDF generation and manipulation
docx-presspublicDOCX document generation
pptx-presspublicPPTX presentation generation
png-presspublicPNG image rendering from HTML/SVG
mermaidpublicMermaid diagram generation
link-evidenceinternalCitation and evidence linking
sources-sectioninternalAutomatic sources section generation

Custom Bundle Skill

# skills/my_skill/SKILL.md
You are an expert in our product catalog.
When asked about products, use the `product_search` tool to find relevant items.
Always include pricing and availability.
# skills/my_skill/tools.yaml
tools:
  - id: product_search
    role: search
    why: Search the product catalog
# skills_descriptor.py
AGENTS_CONFIG = {
    "solver": {
        "enabled_skills": ["my_skill", "pdf-press", "url-gen"],
        "disabled_skills": []
    }
}

ReAct Agent (V2)

The ReAct V2 agent is a single autonomous loop — no separate planner, no gate. It starts in runtime.py as a ReactStateV2 with up to max_iterations=6 rounds. In each round the agent makes a decision (LLM call) then executes one or more tools. Planning is available as a tool the agent can call on itself — not a separate orchestration layer.

🔁
Plan is a tool, not a component. When the agent calls the plan tool, it creates a PlanSnapshot that is tracked as a react.plan block in the timeline. On subsequent rounds the agent can update step statuses (✓ done, ✗ failed, … in-progress). This is all within the same loop — no coordinator needed, and no cache miss from a different system prompt.
📄
Dedicated ReAct v2 page: For the full production reference — timeline model, multi-channel streaming, tool catalog, source management, code examples, and architecture diagrams — see KDCube ReAct v2 Agent →

ReAct Agent V2 — Loop & Tool Integration

ReAct Agent V2 loop and tool integration diagram
ReAct Agent V2 Loop and Tool Integration Autonomous ReAct V2 agent loop showing decision, tool execution, and iteration cycles Context user message timeline history plan state ReAct Loop (max 6 rounds) ① Decision (LLM call) react_decision_stream_v2 → tool selection or answer ② Tool Execution execute selected tool(s), stream results ③ Observe & Update State append to timeline, update plan status next round answer ready Answer final_answer followups timeline Available Tools 🔍 web_search / web_fetch ⚡ exec_tools.execute_code_python → Docker (default, fast) ✍️ react.write (canvas / timeline_text) → multi-channel stream, file, hosting 🩹 react.patch / .read 📋 plan (self-planning) 📜 Timeline — streamed block-by-block as the loop progresses

Creating and Running the ReAct Agent

# In your workflow (BaseWorkflow subclass)
react = await self.build_react(
    scratchpad=scratchpad,
    tools_module="my_bundle.tools_descriptor",
    skills_module="my_bundle.skills_descriptor",
    knowledge_space_fn=self._search_knowledge,   # optional
    knowledge_read_fn=self._read_knowledge,      # optional
)
result = await react.run(payload)

Timeline

The timeline (timeline.py) is the single source of truth for turn context. It is persisted as artifact:conv.timeline.v1 and loaded at the start of each turn. A separate artifact:conv:sources_pool tracks all sources referenced in the conversation.

Conversation timeline structure diagram
Conversation Timeline Turn context structure showing timeline blocks, cache points, and compaction ReAct Timeline — Blocks, Cache Points & Announce oldest to newest PREVIOUS TURN (turn N-1) ... earlier blocks ... assistant.completion (1) cache: prev-turn last block before current turn.header CURRENT TURN (turn N) turn.header user.prompt user.attachment.meta (optional) ROUND 1 react.tool.call react.tool.result ROUND 2 react.tool.call react.tool.result (2) cache: pre-tail end of round[total - 1 - offset] (offset=2) ROUND 3 react.tool.call react.tool.result react.plan (any round) FINAL ROUND assistant.completion stage.suggested_followups react.exit react.state (3) cache: tail end of last complete round stable prefix ends here EPHEMERAL -- appended after tail, never cached, cleared on turn exit react.announce iteration · plan · budget Added each decision round after the stable prefix. Cleared on exit. SOURCES POOL Stored as a separate artifact: artifact:conv:sources_pool Not in the block sequence. Loaded alongside the timeline. Appended to context after cache points and before announce. COMPACTION (hard-ceiling guard) When system + blocks >= 0.9 x budget, inserts a conv.range.summary block at cut point; earlier blocks hidden. Cut prefers user/assistant/tool-call boundaries. Cache points recomputed on visible blocks. LEGEND User blocks Agent / React blocks Completion / final-round Stage blocks Ephemeral (announce) Cache checkpoint Three cache checkpoints per turn: (1) prev-turn -- last block before turn.header (2) pre-tail -- end of round [N - 1 - offset] (default offset=2) (3) tail -- end of last complete round RENDER PIPELINE (each decision call) 1. TTL pruning 2. Compaction (if tokens >= 0.9 x budget) 3. Cache points recomputed on visible blocks 4. Sources pool appended after stable prefix 5. Announce appended last (never cached) Blocks ordered oldest to newest. Cache points are on stable blocks only. Announce is always the absolute last block in context.

Cache checkpoints are computed by rounds (tool call rounds + final completion). They allow LLM context caching to skip retokenizing earlier parts of long conversations. See timeline-README.md, source-pool-README.md, and react-announce-README.md.

Multi-Channel Streaming

📡
The ReAct agent supports multiple streaming channels simultaneously. The canvas channel streams large content (HTML, Markdown, JSON) for display in a widget panel. The timeline_text channel streams short text visible in the main chat. The internal channel captures agent notes invisible to the user. This is what enables live-updating widget dashboards while the agent is still running. See channeled-streamer-README.md and streaming-widget-README.md.

ReAct Agent Documentation

Full docs live in docs/sdk/agents/react/. Key files:

Isolated Execution Runtime

The platform provides a sandboxed code execution runtime — your agent can generate and run arbitrary Python code in complete isolation. The runtime has a clear two-zone model:

  • Supervisor — networked, has env secrets and full runtime context. All bundle tools from tools_descriptor.py execute here, including MCP tools, bundle-local tools, and custom SDK tools. The ChatCommunicator is also available to tool code, streaming events via Redis Pub/Sub to the client SSE.
  • Executor — completely isolated: no network, no env secrets, separate Linux namespace (UID 1001). Runs LLM-generated code. All tool calls are proxied to the Supervisor over a Unix socket. Can only write to /workspace/work and /workspace/out.

Two execution backends are practical:

Isolated execution runtime backend diagram
Isolated Execution Runtime Two-zone execution model showing supervisor and executor separation with Docker and Fargate backends ISO RUNTIME CONTAINER (Docker / Fargate) SUPERVISOR network ✓ · env secrets ✓ · full runtime context ALL BUNDLE TOOLS — tools_descriptor.py io_tools · web_tools · exec_tools · rendering_tools react_tools (knowledge search) · bundle_data MCP tools (stdio / http / SSE transports) bundle-local tools (tools/*.py) ↳ full network · secrets · env available to all tool code ↳ external APIs · widgets · streaming Communicator → Redis Pub/Sub → Ingress SSE → client 🌐 Network access 🔑 Env secrets 📡 Communicator Unix socket EXECUTOR no network · no env secrets · isolated namespace (UID 1001) user_code.py · LLM-generated code result = react.search_knowledge(query=...) data = web_tools.fetch_url(url) chart = rendering_tools.render_chart(spec) meta = io_tools.read_file("data.json") # ↑ all calls → Unix socket → Supervisor writes: /workspace/work · /workspace/out no keys · no outbound net · sandbox UID 1001 ✗ No network ✗ No env secrets write: workdir · outdir All side-effects (network, secrets, Communicator streaming) stay in Supervisor · Executor only sees return values
🐳
Docker is the default and recommended mode — even in production. Each Processor instance runs on an EC2-backed ECS task, so Docker exec is local and fast. The Processor has a bounded number of in-flight requests and can run Docker containers synchronously within its capacity. Use Docker for live agentic loops.

🐳 Docker Default

Runs code in an isolated Docker container on the same EC2 host as the Processor. Low latency, ideal for interactive agentic loops. The container shares the host's Docker daemon — fast spin-up, full isolation.

execution:
  runtime:
    mode: "docker"
    enabled: true  # default

☁️ AWS Fargate Async only

Serverless container on a separate compute plane. Recommended for long-running, non-live workloads — batch data processing, heavy computation, report generation — where startup latency (10–30s) is acceptable. Not suitable for fast interactive agentic loops.

execution:
  runtime:
    mode: "fargate"
    cluster: "arn:aws:ecs:..."
    task_definition: "exec-task"

Exec Environment Variables (Inside Executed Code)

VariableDescription
WORKDIRWorking directory (source, helpers)
OUTPUT_DIROutput directory (write files here)
EXECUTION_IDUnique execution identifier
RUNTIME_GLOBALS_JSONSerialized runtime context (tools, state)
RUNTIME_TOOL_MODULESTool module names available
BUNDLE_ROOTBundle root path (access your bundle files)
BUNDLE_IDCurrent bundle ID

See external-exec-README.md

Knowledge Space

Bundles can expose a searchable knowledge space built from a Git repository's docs, source code, deployment configs, and tests.

return {
    "knowledge": {
        "repo": "https://github.com/org/repo.git",  # "" = local repo
        "ref": "main",
        "docs_root": "app/docs",
        "src_root": "app/src",
        "deploy_root": "app/deploy",
        "tests_root": "app/tests",
        "validate_refs": True
    }
}
  • on_bundle_load() — Builds the index once per process (file-locked, signature-cached)
  • pre_run_hook() — Reconciles if config changed

Agent access via ks: paths: react.search_knowledge(query=..., limit=5) and react.read(["ks:docs/architecture.md"])

Context, RAG & Conversations

Context RAG Client

# self.ctx_client is ContextRAGClient
results = await self.ctx_client.search(
    query="previous analysis of sales data",
    kind="assistant",   # or "user" | "attachment"
    limit=5
)
artifact = await self.ctx_client.fetch_ctx(["ar:turn_abc.artifacts.summary"])

Conversations API Endpoints

GET  /conversations/{tenant}/{project}
POST /conversations/{tenant}/{project}/fetch
POST /conversations/{tenant}/{project}/{conv_id}/turns-with-feedbacks
POST /conversations/{tenant}/{project}/feedback/conversations-in-period

The react.memsearch tool provides vector search in past turns directly inside the agent loop. The ConversationStore (accessible via BaseWorkflow.store) manages turn payloads, timelines, and artifacts.

Timeline & Context Layout

Each conversation maintains a rolling timeline of turn artifacts stored as artifact:conv.timeline.v1. The timeline is the canonical cross-turn context passed to the LLM. It is structured as an ordered sequence of turn records, each containing user input, assistant output, tool calls, and any attached artifacts.

Cache Points

The platform inserts up to three LLM-level cache checkpoints per turn: prev-turn (the end of the prior turn), pre-tail (just before the current turn's tail), and tail (after the current turn). These cache points allow the LLM inference layer to reuse context prefix KV-cache across turns, reducing both latency and token cost for multi-turn conversations.

Compaction

When the accumulated timeline approaches the configured context budget ceiling, the platform triggers compaction: older turn ranges are summarized into a compact conv.range.summary artifact and replaced in the timeline. This is a hard-ceiling guard — it ensures context never silently overflows the model's context window. Compaction is transparent to bundle code.

Hosting & File Resources

Your bundle can produce files (PDFs, PNGs, data exports) and make them available via hosted URLs. The platform handles upload, serving, and access control automatically.

# ApplicationHostingService (via BaseWorkflow.hosting_service)
url = hosting.get_artifact_url("fi:turn_123.files/report.pdf")

# Resource Name format
# ef:{tenant}:{project}:chatbot:{stage}:{user_id}:{conv_id}:{turn_id}:{role}:{path}
# Resolved by POST /by-rn with authentication enforced by platform

Files written to OUTPUT_DIR/turn_{id}/files/ during exec are automatically tracked and can be served. User attachments appear as fi:{turn_id}.user.attachments/{filename}.

Citations & Sources

Citation Tokens

The company was founded in 2015 [[S:1]] and expanded by 2020 [[S:2,3]].
According to multiple sources [[S:1-4]], the trend is clear.

Sources Pool Fields

FieldDescription
sidSource ID (integer, per-conversation, deduplicated)
titlePage or file title
urlURL or file path
source_typeweb | file | attachment | manual
objective_relevanceSemantic relevance score (0–1)
published_time_isoPublication timestamp
favicon_urlSource favicon for UI display

See citations-system.md and source-pool-README.md

Feedback System

POST /conversations/{tenant}/{project}/{conv_id}/turns/{turn_id}/feedback

{ "reaction": "ok", "text": "Very helpful!", "ts": "2026-03-21T10:00:00Z" }
# reaction: ok | not_ok | neutral | null

Your bundle can also emit machine feedback (origin: "machine") for confidence scores or quality checks — additive, not replacing user feedback. Satisfaction rate: ok / (ok + not_ok + neutral).

See feedback-system.md

Bundle Economics & Rate Limiting

from kdcube_ai_app.apps.chat.sdk.solutions.chatbot.entrypoint_with_economic import (
    BaseEntrypointWithEconomics
)

@agentic_workflow(bundle_id=BUNDLE_ID)
class MyBundle(BaseEntrypointWithEconomics):
    @property
    def configuration(self):
        return {"economics": {"reservation_amount_dollars": 2.0}, ...}

Default Quota Policies

User TypeConcurrentReq/DayReq/MonthTokens/Hour
anonymous1260150k
free210030k500k
payasyougo22006k1.5M
admin10unlimitedunlimitedunlimited
⚠️
Always re-raise EconomicsLimitException unchanged. The base report_turn_error() handles this correctly — never catch it silently.

Widgets & Custom UI

Your bundle can expose React SPA widgets embedded in the platform UI. Define an operation that returns HTML (a React app), and the UI renders it. The React app can in turn call back to POST /operations/{op} for any further interactions.

Widget and custom UI integration diagram
Widgets and Custom UI Widget integration flow showing React SPA bundle operations and platform UI embedding UI Panel Renders widget POST /operations/{op} Proc Operations API /bundles/{t}/{p}/ operations/{op} Your Bundle workflow.{op}() returns HTML/SPA React SPA Rendered in UI panel Calls back /operations SPA can POST to /operations for further actions

Example: Return a React SPA

async def my_dashboard(self, **kwargs) -> dict:
    """My Dashboard widget — loaded by UI when user clicks the button."""
    html = """<!DOCTYPE html><html><body>
      <div id="root"></div>
      <script>
        // Your bundled React app
        // Can POST back to /bundles/{tenant}/{project}/operations/my_api
      </script>
    </body></html>"""
    return {"html": html, "content_type": "text/html"}

Platform Built-in Operations

OperationDescription
ai_bundleBundle admin dashboard (props editor, status) — all bundles inherit this
control_planeEconomics dashboard (usage, billing) — via BaseEntrypointWithEconomics
suggestionsSuggested prompts for new conversations
newsNews/updates from the bundle

Deploying Your Bundle

Option A: With the KDCube Platform

  1. 1

    Push your bundle to Git

    git push origin v1.0.0
  2. 2

    Add to bundles.yaml

    - id: "my-bundle@1-0"
      repo: "git@github.com:org/my-bundle.git"
      ref: "v1.0.0"
      module: "my_bundle.entrypoint"
  3. 3

    Inject secrets and apply

    kdcube-setup --secrets-set GIT_HTTP_TOKEN=ghp_...
    kdcube-setup --secrets-prompt  # for LLM keys

    The bundle config is applied immediately via Redis — no restart needed. Change the ref and re-run to switch bundle versions on the fly.

  4. 4

    Set as default bundle via the Admin Dashboard

    Open the AI Bundle Dashboard (/api/integrations/bundles/{tenant}/{project}/operations/ai_bundle). Your registered bundle appears in the list. Set it as the default_bundle_id for the tenant/project. The change is applied immediately via Redis — no restart needed.

Option B: Standalone (Without Platform)

Bundles can run outside the platform. The SDK is a plain Python package — build your own FastAPI app that imports and invokes it directly. Or build a custom Docker image that runs just your bundle with its own server. The platform's value is in hosting, auth, SSE, storage, economics, and UI — none of that is required for the core agent logic.

Bundle Git Auth

Modebundles.yamlSecret
SSH keygit@github.com:org/repo.gitSSH key mounted in container
HTTPS tokenhttps://github.com/org/repo.gitGIT_HTTP_TOKEN secret

Example Bundles

Bundle IDWhat It ShowsKey Features
react@2026-02-10-02-44Basic ReAct + economicsBaseEntrypointWithEconomics, quota policies, web search, news, suggestions
react.doc@2026-03-02-22-10Docs reader with knowledge spaceKnowledge space (docs/src/deploy), file locking, signature caching
react.mcp@2026-03-09ReAct with MCP toolsMCP server integration, custom tool namespacing
openrouter-data@2026-03-11OpenRouter LLM providerOpenRouter integration, model switching, data analysis
with-isoruntime@2026-02-16Isolated exec runtimeDocker exec, code generation + execution, file output
customer@marketingReal-world external bundleEconomics, complex orchestration, custom widgets, news API operation
react.doc — Knowledge Space pattern deep-dive

Builds the index in on_bundle_load() with file locking (multi-worker safe) and signature-based caching (skip rebuilds when nothing changed). Reconciles on config change in pre_run_hook(). The agent searches with ks: paths via react.search_knowledge and react.read.

View entrypoint.py on GitHub →

Monitoring & Observability

The platform exposes operational signals through a combination of log streams, Redis-backed queue metrics, and a health endpoint. These signals are intended for operators monitoring a deployed instance.

Queue Depth

The chat-proc worker consumes a Redis-backed message queue. Queue depth (number of pending messages) is observable via standard Redis list length commands against the configured queue key. Elevated queue depth indicates processor backpressure or under-provisioning.

Processor Health

Each processor instance reports liveness through its internal health check. In ECS deployments the ALB target group tracks instance health; in Compose deployments a container health check is defined. Unhealthy processor instances are replaced or restarted by the orchestrator.

Log Streams

All platform services emit structured log output to stdout, captured by the container runtime. In ECS deployments logs flow to CloudWatch Logs; in Compose deployments they are available via docker compose logs. Log lines include request IDs, tenant/project scope, bundle ID, and error classification where applicable.

Metrics Endpoint

The Operations REST API (served by chat-proc) exposes operational data accessible to authorized callers. Bundle-level metrics (turn counts, economics state, error rates) can be queried via the Operations API using a privileged token. Prometheus-format scrape endpoints are not currently built in but can be added as a bundle operation.

ℹ️
Monitoring capabilities described here reflect what is documented in source. Dedicated metrics infrastructure (Prometheus, OpenTelemetry export) is not yet built in — see Roadmap for planned observability improvements.

Coming Soon & Current Status

Here's the current state of platform capabilities and what's next on the roadmap:

FeatureStatusNotes
Dynamic widget buttons Soon Bundle declares widget list in props (widgets: [{id, label, icon}]); UI reads it and renders buttons dynamically — no hardcoding on the platform side needed. Requires adding widgets to the bundle props schema and updating the frontend bundle selector.
Custom main view Soon Bundle can override the default chat view with a custom HTML/SPA — ideal for dashboards and data-driven UIs that don't need SSE. Requires main_view.operation in bundle props and frontend reading it to render an operation result instead of the chat panel.
Static asset serving from bundle Available now A bundle can already serve static files (CSS, JS, images) by encoding them in an operation endpoint that reads from its own package. Adding a GET variant to the Operations API (alongside the existing POST) would make this cleaner — a small extension that can be done now.
Bundle marketplace Soon Browse and install community bundles from a registry.
Live bundle reload Available Bundle code is loaded per-process and cached as a singleton. Config updates (bundle props, role_models, etc.) propagate immediately via Redis pub/sub. Restarting the processor picks up any code changes.
Bundle versioning Available Update the ref (branch/tag/commit) in bundles.yaml or via the Admin API and the change applies immediately to new requests — no downtime. The bundle registry is stored in Redis and updated atomically.
Multi-bundle conversations Partial Technically possible today via routing, but both bundles must understand and agree on the shared conversation format (timeline structure, artifact paths, turn state). Requires alignment between bundle developers on the protocol.
Streaming SSE for operations Soon Today, widget operations are synchronous REST POST. Adding an SSE channel for operations would allow widgets to stream responses back — enabling live-updating dashboards and progress indicators from widget calls.
Policy DSL Roadmap A declarative policy language for expressing security rules — bundle access restrictions, data handling constraints, and tenant-scoped controls — without writing Python enforcement code.
Deterministic Enforcement Engine Roadmap A rule evaluation engine that guarantees enforcement decisions are deterministic and auditable — decoupled from LLM judgment. Designed to complement the existing pre-execution gate sequence.
Workflow Invariants Roadmap Declarative assertions on workflow state that must hold at defined checkpoints — e.g., "tool X may only be called after step Y" or "context artifact Z must be present before LLM call." Violations halt execution before they propagate.
Cross-Agent Approval Gates Roadmap Human-in-the-loop or supervisor-agent approval checkpoints for multi-agent workflows. A sub-agent can pause and request explicit approval from a coordinating agent or operator before proceeding with a high-impact action.

🎯 Priority: Dynamic Widget Registry + Custom Main View

These two features together unlock fully self-describing bundles — a bundle that declares its own UI surface, widgets, and main view without any platform-side configuration. We're reviewing the cleanest implementation path now.

💬
Have feedback or want to prioritize something? Open an issue on the GitHub repo.