ℹ️

For the full interactive reference with code examples and comparison tables, see ReAct v2 — Full Reference.

ReAct Agent (V2)

The ReAct V2 agent is a single autonomous loop — no separate planner, no gate. It starts in runtime.py as a ReactStateV2 with up to max_iterations=15 rounds by default. In each round the agent makes a decision (LLM call) then executes one or more tools. Planning is available as a tool the agent can call on itself — not a separate orchestration layer.

🔁
Plan is a tool, not a component. When the agent calls the plan tool, it creates a PlanSnapshot that is tracked as a react.plan block in the timeline. On subsequent rounds the agent can update step statuses (✓ done, ✗ failed, … in-progress). This is all within the same loop — no coordinator needed, and no cache miss from a different system prompt.
📄
Dedicated ReAct v2 page: For the full production reference — timeline model, multi-channel streaming, tool catalog, source management, code examples, and architecture diagrams — see KDCube ReAct v2 Agent →
✍️
Engineering blog: For design write-ups on the attention area, cache strategy, memory model, and why React is not built as pure provider-native tool-calling, see KDCube Engineering Blog →

ReAct Agent V2 — Loop & Tool Integration

ReAct Agent V2 loop and tool integration diagram
ReAct Agent V2 Loop and Tool Integration Autonomous ReAct V2 agent loop showing decision, tool execution, and iteration cycles Context user message timeline history plan state ReAct Loop (max 15 rounds) ① Decision (LLM call) react_decision_stream_v2 → tool selection or answer ② Tool Execution execute selected tool(s), stream results ③ Observe & Update State append to timeline, update plan status next round answer ready Answer final_answer followups timeline Available Tools 🔍 web_search / web_fetch ⚡ exec_tools.execute_code_python → Docker (default, fast) ✍️ react.write (canvas / timeline_text) → multi-channel stream, file, hosting 🩹 react.patch / .read 📋 plan (self-planning) 📜 Timeline — streamed block-by-block as the loop progresses

Creating and Running the ReAct Agent

# In your workflow (BaseWorkflow subclass)
react = await self.build_react(
    scratchpad=scratchpad,
    tools_module="my_bundle.tools_descriptor",
    skills_module="my_bundle.skills_descriptor",
    knowledge_space_fn=self._search_knowledge,   # optional
    knowledge_read_fn=self._read_knowledge,      # optional
)
result = await react.run(payload)

Timeline

The timeline (timeline.py) is the single source of truth for turn context. It is persisted as artifact:conv.timeline.v1 and loaded at the start of each turn. A separate artifact:conv:sources_pool tracks all sources referenced in the conversation.

Conversation timeline structure diagram
Conversation Timeline Turn context structure showing timeline blocks, cache points, and compaction ReAct Timeline — Blocks, Cache Points & Announce oldest to newest PREVIOUS TURN (turn N-1) ... earlier blocks ... assistant.completion (1) cache: prev-turn last block before current turn.header CURRENT TURN (turn N) turn.header user.prompt user.attachment.meta (optional) ROUND 1 react.tool.call react.tool.result ROUND 2 react.tool.call react.tool.result (2) cache: pre-tail end of round[total - 1 - offset] (offset=2) ROUND 3 react.tool.call react.tool.result react.plan (any round) FINAL ROUND assistant.completion stage.suggested_followups react.exit react.state (3) cache: tail end of last complete round stable prefix ends here EPHEMERAL -- appended after tail, never cached, cleared on turn exit react.announce iteration · plan · budget Added each decision round after the stable prefix. Cleared on exit. SOURCES POOL Stored as a separate artifact: artifact:conv:sources_pool Not in the block sequence. Loaded alongside the timeline. Appended to context after cache points and before announce. COMPACTION (hard-ceiling guard) When system + blocks >= 0.9 x budget, inserts a conv.range.summary block at cut point; earlier blocks hidden. Cut prefers user/assistant/tool-call boundaries. Cache points recomputed on visible blocks. LEGEND User blocks Agent / React blocks Completion / final-round Stage blocks Ephemeral (announce) Cache checkpoint Three cache checkpoints per turn: (1) prev-turn -- last block before turn.header (2) pre-tail -- end of round [N - 1 - offset] (default offset=2) (3) tail -- end of last complete round RENDER PIPELINE (each decision call) 1. TTL pruning 2. Compaction (if tokens >= 0.9 x budget) 3. Cache points recomputed on visible blocks 4. Sources pool appended after stable prefix 5. Announce appended last (never cached) Blocks ordered oldest to newest. Cache points are on stable blocks only. Announce is always the absolute last block in context.

Cache checkpoints are computed by rounds (tool call rounds + final completion). They allow LLM context caching to skip retokenizing earlier parts of long conversations. See timeline-README.md, source-pool-README.md, and react-announce-README.md.

Multi-Channel Streaming

📡
The ReAct agent supports multiple streaming channels simultaneously. The canvas channel streams large content (HTML, Markdown, JSON) for display in a widget panel. The timeline_text channel streams short text visible in the main chat. The internal channel captures agent notes invisible to the user. This is what enables live-updating widget dashboards while the agent is still running. See channeled-streamer-README.md and streaming-widget-README.md.

ReAct Agent Documentation

Full docs live in docs/sdk/agents/react/. Key files:

Plan Tracking

Plans are a first-class timeline concept, not a separate orchestration layer. The agent creates and manages plans through the react.plan tool, and every plan is persisted as an append-only sequence of react.plan snapshot blocks in the timeline.

PlanSnapshot Structure

Each plan snapshot is stored as a timeline block of type react.plan with a stable plan_id and ordered steps. Key fields:

FieldDescription
plan_idStable identifier for the plan lineage (opaque string)
stepsOrdered list of step descriptions
statusCurrent plan status
origin_turn_idTurn where the plan was first created
last_turn_idTurn of the most recent update
closed_ts / superseded_tsTerminal timestamps (set when plan is closed or replaced)

The react.plan Tool

The agent manages plans through four lifecycle modes:

mode="new"

Creates a fresh plan lineage with a new plan_id and ordered steps. Becomes the current plan immediately and appears in ANNOUNCE.

mode="replace"

Retires an existing plan (marks it superseded) and creates a new lineage as its replacement. The old plan disappears from the open-plans view.

mode="activate"

Re-activates an older open plan as the current plan. Does not create a new plan_id. Progress acknowledgements apply only to the current plan.

mode="close"

Terminates a plan without replacement. The lineage stays in history but disappears from ANNOUNCE.

Plan Block in Timeline

Plans appear in the timeline as react.plan blocks with a stable reread handle:

# Stable latest-snapshot alias for any plan lineage
ar:plan.latest:<plan_id>

# Model creates a plan
react.plan(mode="new", steps=["collect metrics", "compare trends", "draft answer"])

# ANNOUNCE shows open plans with step markers
# [OPEN PLANS]
#   plan_id=plan_alpha (current)
#     □ [1] collect metrics
#     □ [2] compare trends
#     □ [3] draft answer

Step Statuses

The agent reports step progress via notes using status markers. The runtime parses these markers and updates the plan snapshot automatically.

MarkerStatusMeaning
✓ [n]DoneStep completed successfully
✗ [n]FailedStep failed or was abandoned
… [n]In-progressStep is currently being worked on
□ [n]PendingStep not yet started (default)
ℹ️
Progress and lifecycle are separated by round. Status-marker notes are applied only in rounds that are not also changing plan lifecycle. If the agent calls react.plan(mode="activate"|"replace"|"close"), it should acknowledge progress in a later round, not the same one.

Multi-Round Plan Tracking

Plans survive across rounds and turns through the following mechanisms:

  • ANNOUNCE lists the last 4 open plans each round, marking the current one explicitly with (current).
  • The stable alias ar:plan.latest:<plan_id> always resolves to the newest snapshot for a lineage, regardless of which turn last updated it.
  • On a new turn, the runtime rehydrates only the current open plan automatically. Older plans must be inspected explicitly via react.read if they become relevant again.
  • When history is compacted, older plans appear in a react.plan.history block with step skeletons, statuses, and stable snapshot_refs for recovery.

A plan lineage is considered open only if its latest snapshot is not closed, superseded, or complete. Only the plan tagged (current) in ANNOUNCE may receive step acknowledgements.

See plan-README.md

Isolated Execution Runtime

The platform provides a sandboxed code execution runtime — your agent can generate and run arbitrary Python code in complete isolation. The runtime has a clear two-zone model:

  • Supervisor — networked, has env secrets and full runtime context. All bundle tools from tools_descriptor.py execute here, including MCP tools, bundle-local tools, and custom SDK tools. The ChatCommunicator is also available to tool code, streaming events via Redis Pub/Sub to the client SSE.
  • Executor — completely isolated: no network, no env secrets, separate Linux namespace (UID 1001). Runs LLM-generated code. All tool calls are proxied to the Supervisor over a Unix socket. Can only write to /workspace/work and /workspace/out.

Two execution backends are practical:

Isolated execution runtime backend diagram
Isolated Execution Runtime Two-zone execution model showing supervisor and executor separation with Docker and Fargate backends ISO RUNTIME CONTAINER (Docker / Fargate) SUPERVISOR network ✓ · env secrets ✓ · full runtime context ALL BUNDLE TOOLS — tools_descriptor.py io_tools · web_tools · exec_tools · rendering_tools react_tools (knowledge search) · bundle_data MCP tools (stdio / http / SSE transports) bundle-local tools (tools/*.py) ↳ full network · secrets · env available to all tool code ↳ external APIs · widgets · streaming Communicator → Redis Pub/Sub → Ingress SSE → client 🌐 Network access 🔑 Env secrets 📡 Communicator Unix socket EXECUTOR no network · no env secrets · isolated namespace (UID 1001) user_code.py · LLM-generated code result = react.search_knowledge(query=...) data = web_tools.fetch_url(url) chart = rendering_tools.render_chart(spec) meta = io_tools.read_file("data.json") # ↑ all calls → Unix socket → Supervisor writes: /workspace/work · /workspace/out no keys · no outbound net · sandbox UID 1001 ✗ No network ✗ No env secrets write: workdir · outdir All side-effects (network, secrets, Communicator streaming) stay in Supervisor · Executor only sees return values
🐳
Docker is the default and recommended mode — even in production. Each Processor instance runs on an EC2-backed ECS task, so Docker exec is local and fast. The Processor has a bounded number of in-flight requests and can run Docker containers synchronously within its capacity. Use Docker for live agentic loops.

🐳 Docker Default

Runs code in an isolated Docker container on the same EC2 host as the Processor. Low latency, ideal for interactive agentic loops. The container shares the host's Docker daemon — fast spin-up, full isolation.

execution:
  runtime:
    mode: "docker"
    enabled: true  # default

☁️ AWS Fargate Async only

Serverless container on a separate compute plane. Recommended for long-running, non-live workloads — batch data processing, heavy computation, report generation — where startup latency (10–30s) is acceptable. Not suitable for fast interactive agentic loops.

execution:
  runtime:
    mode: "fargate"
    cluster: "arn:aws:ecs:..."
    task_definition: "exec-task"

Exec Environment Variables (Inside Executed Code)

VariableDescription
WORKDIRWorking directory (source, helpers)
OUTPUT_DIROutput directory (write files here)
EXECUTION_IDUnique execution identifier
RUNTIME_GLOBALS_JSONSerialized runtime context (tools, state)
RUNTIME_TOOL_MODULESTool module names available
BUNDLE_ROOTBundle root path (access your bundle files)
BUNDLE_IDCurrent bundle ID

See external-exec-README.md

Supervisor vs Executor Architecture

The execution runtime uses a strict two-process model within a single container (Docker or Fargate):

  • The Supervisor (PID 1) bootstraps the full runtime: loads dynamic tool modules, initializes ModelService, KB client, Redis communication, and starts a PrivilegedSupervisor listening on /tmp/supervisor.sock.
  • The Executor subprocess drops privileges to UID 1001, optionally calls unshare(CLONE_NEWNET) for network isolation, and runs the LLM-generated user_code.py.
  • Every tool call from executor code (io_tools, web_tools, react_tools, etc.) is proxied over the Unix socket to the supervisor. The executor never has direct access to network, secrets, or databases.

Docker Execution Mode

🐳
Docker is the default and fastest execution backend. The Processor (chat-proc) spawns a Docker child container on the same EC2 host, sharing workdir/outdir via bind mounts. Startup is near-instant. Network isolation uses unshare(CLONE_NEWNET). Docker mode supports custom images, CPU/memory limits, and PID limits via bundle configuration.
# Docker profile in bundle props
execution:
  runtime:
    profiles:
      docker:
        mode: "docker"
        image: "py-code-exec:latest"
        network_mode: "host"
        cpus: "1.5"
        memory: "2g"
        extra_args: ["--pids-limit", "256"]

Fargate Execution Mode

Fargate exec runs the same supervisor/executor architecture as Docker, but on a dedicated ECS Fargate task instead of a local container. This is the replacement for Docker-on-node in environments where Fargate containers cannot access the Docker daemon.

AspectDocker ModeFargate Mode
Startup latencySub-second10-30 seconds
Workdir sharingHost bind mountS3 snapshot + restore
Network isolationunshare(CLONE_NEWNET)Task-level VPC security group
Task lifetimeContainer exits, docker rmECS task STOPPED
Caller waits viaproc.communicate()Poll describe_tasks until STOPPED
Best forInteractive agentic loopsBatch workloads, heavy computation

The caller (chat-proc) snapshots the workdir and outdir to S3, launches the Fargate task via ecs.run_task, polls until completion, then restores output zips back to the local workspace. From the agent's perspective, the result contract is identical to Docker mode.

Environment Variable Injection

The Fargate task receives its full runtime context via containerOverrides.environment at run_task time. Key variables include RUNTIME_GLOBALS_JSON (serialized runtime context), RUNTIME_TOOL_MODULES, EXEC_SNAPSHOT URIs for workspace restoration, and all proc-level secrets (Postgres, Redis, API keys). Bundle tool module paths are rewritten from host paths to container paths (/workspace/bundles/{bundle_dir}/...).

Network Isolation & Unix Socket Communication

In both Docker and Fargate modes, the executor subprocess is network-isolated. All tool calls from generated code are routed over a Unix domain socket (/tmp/supervisor.sock) to the supervisor process. The supervisor has full access to Redis, Postgres, ModelService, S3, and external APIs. In Fargate, the supervisor connects to backing services via VPC DNS (Cloud Map private DNS or direct ElastiCache/RDS endpoints).

Error Propagation

Runtime-specific failures (ECS startup failure, Fargate timeout, snapshot restore failure) are surfaced through the same report_text / error envelope as local Docker execution. The agent sees a unified result contract regardless of backend:

# Unified result fields (both Docker and Fargate)
ok: bool          # execution succeeded
artifacts: list   # produced files
error: str        # error message if failed
report_text: str  # human-readable summary
user_out_tail: str       # last lines of user.log
runtime_err_tail: str    # last lines of runtime errors

See distributed-exec-README.md and exec-logging-error-propagation-README.md

Knowledge Space

Bundles can expose a searchable knowledge space built from a Git repository's docs, source code, deployment configs, and tests.

return {
    "knowledge": {
        "repo": "https://github.com/org/repo.git",  # "" = local repo
        "ref": "main",
        "docs_root": "app/docs",
        "src_root": "app/src",
        "deploy_root": "app/deploy",
        "tests_root": "app/tests",
        "validate_refs": True
    }
}
  • on_bundle_load() — Builds the index once per process (file-locked, signature-cached)
  • pre_run_hook() — Reconciles if config changed

Agent access via ks: paths: react.search_knowledge(query=..., limit=5) and react.read(["ks:docs/architecture.md"])

Context, RAG & Conversations

Context RAG Client

# self.ctx_client is ContextRAGClient
results = await self.ctx_client.search(
    query="previous analysis of sales data",
    kind="assistant",   # or "user" | "attachment"
    limit=5
)
artifact = await self.ctx_client.fetch_ctx(["ar:turn_abc.artifacts.summary"])

Conversations API Endpoints

GET  /conversations/{tenant}/{project}
POST /conversations/{tenant}/{project}/fetch
POST /conversations/{tenant}/{project}/{conv_id}/turns-with-feedbacks
POST /conversations/{tenant}/{project}/feedback/conversations-in-period

The react.memsearch tool provides vector search in past turns directly inside the agent loop. The ConversationStore (accessible via BaseWorkflow.store) manages turn payloads, timelines, and artifacts.

Timeline & Context Layout

Each conversation maintains a rolling timeline of turn artifacts stored as artifact:conv.timeline.v1. The timeline is the canonical cross-turn context passed to the LLM. It is structured as an ordered sequence of turn records, each containing user input, assistant output, tool calls, and any attached artifacts.

Cache Points

The platform inserts up to three LLM-level cache checkpoints per turn: prev-turn (the end of the prior turn), pre-tail (just before the current turn's tail), and tail (after the current turn). These cache points allow the LLM inference layer to reuse context prefix KV-cache across turns, reducing both latency and token cost for multi-turn conversations.

Compaction

When the accumulated timeline approaches the configured context budget ceiling, the platform triggers compaction: older turn ranges are summarized into a compact conv.range.summary artifact and replaced in the timeline. This is a hard-ceiling guard — it ensures context never silently overflows the model's context window. Compaction is transparent to bundle code.

Hosting & File Resources

Your bundle can produce files (PDFs, PNGs, data exports) and make them available via hosted URLs. The platform handles upload, serving, and access control automatically.

# ApplicationHostingService (via BaseWorkflow.hosting_service)
url = hosting.get_artifact_url("fi:turn_123.outputs/export/report.pdf")

# Resource Name format
# ef:{tenant}:{project}:chatbot:{stage}:{user_id}:{conv_id}:{turn_id}:{role}:{path}
# Resolved by POST /by-rn with authentication enforced by platform

Files written to OUTPUT_DIR/turn_{id}/files/ remain part of the durable workspace tree, while files written to OUTPUT_DIR/turn_{id}/outputs/ are tracked as non-workspace produced artifacts. User-facing downloads should typically come from outputs/ with external visibility. User attachments appear as fi:{turn_id}.user.attachments/{filename}.

Attachments & Limits

User-uploaded files enter the system via the chat API (SSE or Socket.IO), pass through security scanning, are stored in the ConversationStore, and then flow to two downstream paths: multimodal LLM inference (as base64 blocks) and code execution (as rehosted files in the workspace).

User Upload Flow

When a user submits attachments, the ingress layer enforces size caps and runs security preflight before storage:

  1. Collect raw bytes + metadata (filename, MIME type)
  2. Enforce per-file and total-message size caps
  3. Run ClamAV antivirus scan (when APP_AV_SCAN=1, always enabled in production)
  4. Run preflight validation: MIME-type allowlist via magic sniffing, PDF heuristic checks, ZIP/OOXML structural checks, macro blocking
  5. If allowed, store via ConversationStore.put_attachment()
🛡️
Macro-free policy: Any macro-enabled OOXML file (.docm, .pptm, VBA projects) is rejected at ingress. Generic ZIP archives are also disallowed by default.

Supported File Types

CategoryAccepted Types
Documentsapplication/pdf, .docx, .pptx, .xlsx
Imagesimage/jpeg, image/png, image/gif, image/webp
Texttext/* (subject to size limit)

File Rehosting for Execution

For code-generated programs, attachments are rehosted into the execution workspace at workdir/<turn_id>/attachments/<filename>, so generated code can read them as local files inside the sandboxed container.

Artifact Size & Count Limits

LimitValue
Per-image cap5 MB (MODALITY_MAX_IMAGE_BYTES)
Per-PDF cap10 MB (MODALITY_MAX_DOC_BYTES)
Total message cap (text + attachments)25 MB (MESSAGE_MAX_BYTES)
PDF max pages500
ZIP max entries2,000
ZIP max uncompressed total120 MB
ZIP max compression ratio200x
Text file max size10 MB

Timeline Truncation Limits

To prevent context blowup, the platform applies truncation policies to older timeline blocks:

LimitDefault
User/assistant text truncation4,000 chars
Tool result text truncation400 chars
Tool result list items cap50 items
Tool result dict keys cap80 keys
Base64 in timeline blocks4,000 chars (oversized replaced with placeholder)
Sources pool base64 cap4,000 chars (dropped if exceeded)
🔄
Truncated artifacts can be recovered. Use react.read to rehydrate hidden or pruned artifacts when needed. Skills loaded by react.read are pruned in old turns with a placeholder containing the original sk: reference for re-reading.

See attachments-system.md and artifacts-limits-README.md

Citations & Sources

Citation Tokens

The company was founded in 2015 [[S:1]] and expanded by 2020 [[S:2,3]].
According to multiple sources [[S:1-4]], the trend is clear.

Sources Pool Fields

FieldDescription
sidSource ID (integer, per-conversation, deduplicated)
titlePage or file title
urlURL or file path
source_typeweb | file | attachment | manual
objective_relevanceSemantic relevance score (0–1)
published_time_isoPublication timestamp
favicon_urlSource favicon for UI display

See citations-system.md and source-pool-README.md

Feedback System

POST /conversations/{tenant}/{project}/{conv_id}/turns/{turn_id}/feedback

{ "reaction": "ok", "text": "Very helpful!", "ts": "2026-03-21T10:00:00Z" }
# reaction: ok | not_ok | neutral | null

Your bundle can also emit machine feedback (origin: "machine") for confidence scores or quality checks — additive, not replacing user feedback. Satisfaction rate: ok / (ok + not_ok + neutral).

See feedback-system.md