ReAct V3 Agent

The featured ReAct agent is v3: a single autonomous loop with no separate planner and no gate. It preserves the timeline-first runtime model while adding safe multi-action rounds when AI_REACT_AGENT_MULTI_ACTION=safe_fanout is enabled. In normal single-action operation, each round still makes a decision and executes one action: either one tool call or a final-answer action. The base round cap resolves from bundle props config.react.max_iterations / react.max_iterations, then assembly/env ai.react.max_iterations / AI_REACT_MAX_ITERATIONS, then fallback 15. Planning is available as a tool the agent can call on itself, not a separate orchestration layer.

🧠

Why ReAct matters. The ReAct loop — Reason, Act, Observe — is what turns a language model from a text predictor into an agent. A raw LLM can only answer from its frozen weights; the moment it needs to query a database, call an API, read a file, or verify a claim against current data, it hallucinates unless it can act. ReAct closes that gap by interleaving explicit reasoning with real tool calls and feeding the results back into the next reasoning step, so the agent converges on the right answer iteratively instead of guessing in one shot. Just as importantly for production systems, each Reason / Act / Observe step is a discrete, inspectable block — which is exactly where governance lives. Budget caps, tenant boundaries, policy gates, and approval checkpoints all attach to step boundaries, not to opaque prompt internals. Without ReAct, an AI application is either a stateless wrapper around the model (narrow and brittle) or a black box that occasionally touches external systems (unauditable and unsafe). With ReAct, it becomes a system you can reason about, constrain before execution, and prove what happened after — the prerequisite for any agent that handles real customer data, real money, or real consequences.

ReAct is also not built around provider-native tool calling as the model contract. The runtime asks for its own generation shape directly, so thinking, decision JSON, raw code, and other channels do not have to be squeezed into assistant/tool/result framing.

🧪

ReAct V3: ReAct v3 is selected with AI_REACT_AGENT_VERSION=v3. In AI_REACT_AGENT_MULTI_ACTION=safe_fanout mode it can accept multiple action requests in the same round by repeating only the <channel:ReactDecisionOutV2>...</channel:ReactDecisionOutV2> block once per action. The intended model contract still emits one thinking block and one code block per response, but the streamer tolerates repeated declared channels and routes them separately. Accepted multi-action bundles are executed sequentially, not in parallel, so an action scheduled later in the same round must not depend on results from an earlier action in that bundle. v3 uses the same timeline, working-summary, react.read, react.memsearch, and artifact path model.

🔁

Plan is a tool, not a component. When the agent calls the plan tool, it creates a PlanSnapshot that is tracked as a react.plan block in the timeline. On subsequent rounds the agent can update step statuses (✓ done, ✗ failed, … in-progress). This is all within the same loop — no coordinator needed, and no cache miss from a different system prompt.

✍️

Engineering blog: For design write-ups on the attention area, cache strategy, memory model, and why ReAct is not built as pure provider-native tool-calling, see KDCube Engineering Blog →

🔌

Separate subagent runtime: ReAct tools and skills are configured through bundle descriptors. Claude Code is configured separately through SDK-managed workspace files, MCP config, and Claude project Skills. See Claude Code Integration →

ReAct V3 Agent — Detailed Loop & Tool Integration

ReAct V3 loop showing rendered context, model decision, agent-called tools including memsearch, observations, append-only timeline, pruning, compaction, and ANNOUNCE.

What the Agent Sees Each Round

Every decision round is rendered from durable runtime state into one model-facing context. The shape is intentional: old conversation memory appears first, the active turn stays readable, and ANNOUNCE stays in the uncached tail where operational facts cannot be hidden behind stale cache.

[COMPACTED PRIOR CONVERSATION MEMORY]   # if older raw turns were compacted
PRUNED PRIOR TURNS                      # working summaries or retrieval rows
RECENT INTACT TURNS                     # newest turns rendered normally
CURRENT TURN                            # current user input, rounds, tools, files
SOURCES POOL                            # citation/source inventory
ANNOUNCE                                # uncached budget, plans, live events, workspace state

The model replies through runtime-owned channels. The primary structured channel is ReactDecisionOutV2, which carries exactly one action per channel instance: call a tool, complete, or exit. In safe fanout mode, the model may emit repeated ReactDecisionOutV2 channel instances in the same response. It must not put multiple JSON decisions into one channel block. Repeated declared channels are tolerated by the streamer: for example, a second thinking block is emitted as another thinking instance. The code channel is parsed as raw executable text, so backticks inside generated HTML, JavaScript, or Python do not hide the </channel:code> boundary.

Creating and Running the ReAct Agent

# In your workflow (BaseWorkflow subclass)
react = await self.build_react(
    scratchpad=scratchpad,
    tools_module="my_bundle.tools_descriptor",
    skills_module="my_bundle.skills_descriptor",
    knowledge_space_fn=self._search_knowledge,   # optional
    knowledge_read_fn=self._read_knowledge,      # optional
)
result = await react.run(payload)

Timeline

The timeline (timeline.py) is the single source of truth for turn context. It is persisted as artifact:conv.timeline.v1 and loaded at the start of each turn. A separate artifact:conv:sources_pool tracks all sources referenced in the conversation; the timeline payload also carries the current full source rows so react.read and exec fetch_ctx can recover fetched web content.

It is also a live event surface while a turn is running. Busy-conversation external events such as followup, steer, and the same general family of reactive inputs used by forms, wizards, or alert acknowledgements enter a shared event source for the conversation. Every such event lands on the append-only timeline first. If the event is reactive and the active React turn owns the live listener, the runtime folds it into the current turn and re-enters the loop at the next decision boundary; steer-like controls may also trigger an engineering-layer interrupt that tries to cancel the active generation or cancellable tool phase immediately. The timeline also records round start explicitly, so a reactive event that arrives while the agent is already thinking is shown inside that open round instead of looking as if it happened before the round existed. If a new reactive event lands after a visible completion attempt, the same turn can later append another assistant.completion; the latest completion keeps the stable path ar:<turn_id>.assistant.completion and earlier visible completions use numbered paths such as ar:<turn_id>.assistant.completion.1. External-event message blocks follow the same path pattern, for example ar:<turn_id>.external.followup.<message_id> or ar:<turn_id>.external.alert.<message_id>. The raw timeline remains append-only; restored conversation order is reconstructed later from truthful event start timestamps rather than from append position alone. React then re-enters with the new reactive material already on the timeline and gets a short bounded finalize phase to close the turn cleanly while preserving the progress made so far.

When context compaction actually starts or completes, the runtime emits chat.compaction on the dedicated chat_compaction transport route. Browser clients and adapters such as Telegram can append this as a short progress item while the same ReAct turn continues running.

Detailed ReAct timeline diagram showing previous turn, current turn blocks, round blocks, three cache checkpoints, sources pool, compaction, render pipeline, and ANNOUNCE.

ReAct timeline visibility diagram showing the durable raw timeline, TTL pruning, compaction, cache points, pruned old turn generations, recent intact turns, current turn, sources pool, and ANNOUNCE.

Cache checkpoints are computed by rounds (tool call rounds + final completion). They allow LLM context caching to skip retokenizing earlier parts of long conversations. See timeline-README.md, source-pool-README.md, and react-announce-README.md.

ReAct cache strategy diagram showing previous-turn, pre-tail, and tail checkpoints, tail-only react.hide, TTL pruning, compaction, and uncached tail blocks.

Pruning, Compaction, and Recovery

TTL pruning and hard compaction do different jobs. TTL pruning keeps the cache useful by replacing older visible blocks with compact recovery rows. Hard compaction is the context-window safety valve: it summarizes an older range into su:<turn_id>.conv.range.summary and removes the compacted raw blocks from the visible stream. Neither process deletes artifacts, tool logs, turn logs, or source rows. The visible text becomes a map; logical paths remain the handles for exact recovery. Compaction lifecycle is visible to clients through chat_compaction stream events.

Layer	Visible Shape	Recovery Route
Working summary	`ws:<turn_id>.conv.working.summary` with goal, outcome, facts, refs	`react.read([ws_path])`, then read exact refs
TTL-pruned turn	compact turn data rows or summary cards, not full old chatter	`react.read([ar/tc/fi/so path])`
Compacted range	`[COMPACTED PRIOR CONVERSATION MEMORY]` checkpoint	`react.memsearch` or paths carried by the summary
Exact file	`fi:` logical file path	`react.pull([fi_path])` when code needs a local file

Multi-Channel Streaming

📡

The ReAct agent supports multiple streaming channels simultaneously. The canvas channel streams large content (HTML, Markdown, JSON) for display in a widget panel. The timeline_text channel streams short text visible in the main chat. The internal channel captures internal memory anchors: short user-invisible notes the agent leaves for future turns when it has something stable and reusable to preserve. Final and exit decisions may also emit a hidden summary channel. That becomes the turn's working summary, is indexed for react.memsearch, and is not shown as user-facing assistant text. These are runtime-defined channels, not tool-call arguments. ReAct can stream raw code, thinking, decision JSON, and widget/subsystem payloads independently instead of forcing generation into a provider-native tool-calling format. Repeated declared channels are handled as repeated channel instances, and raw code is kept isolated even when generated HTML/JS contains backticks. This is what enables live-updating widget dashboards while the agent is still running. When those dashboards are part of a bundle, they remain normal bundle UI surfaces served by KDCube. A client shell may embed them, but iframe embedding is outside the ReAct protocol and outside the bundle surface model. See channeled-streamer-README.md and streaming-widget-README.md.

Built-In React Tool Surface

The built-in react.* tools are control-plane tools for the loop itself. Bundle tools, MCP tools, web/email tools, and isolated exec tools sit beside them, but these are the primitives the agent uses to manage memory, files, plans, and context size.

Large initial tool results are prompt-capped before the next decision round: the full tc: result remains stored, while the model sees a bounded preview with size metadata, a depth-limited shape, and recovery instructions for react.read or exec ctx_tools.fetch_ctx.

Tool	Purpose	Typical Use
`react.read`	Reopen logical paths and exact ranges	Read `ar:`, `tc:`, `fi:`, `so:`, `ws:`, `su:`, `sk:`, or `ks:` refs. Large text returns a configured bounded preview by default. For large text files, pass `items=[{path,line_start,line_count}]` from `react.rg` to materialize line-numbered ranges. Text previews report fully visible lines as `[start-end]/total`; a mid-line cut is marked separately. `max_text_symbols` requests a smaller explicit preview, and `stats_only` returns metadata without content. PDF/image payloads are attached whole only when under the raw byte cap.
`react.rg`	Find local files and text regions	Search materialized artifact files by filename regex and text-like files by content regex. Roots may be `files/...`, `outputs/...`, `attachments/...`, `turn_.../...`, or `fi:...`. It does not search unpulled refs or the endless conversation timeline. Returns `size_bytes`, `text_symbols`, `line_count`, `logical_path`, and ready-to-read `read_item` ranges for `react.read`.
`react.memsearch`	Search prior conversation memory	Find summaries by topic, ordinal turn, or time window
`react.pull`	Materialize historical files	Bring old `fi:` refs onto the current worker as readonly local reference material. Pull before checkout when a prior turn path must become editable.
`react.checkout`	Rebuild an editable workspace	Copy prior `files/` paths into the current editable workspace. It is for editing and testing current-turn copies, not for simply reading old outputs or attachments.
`react.write` / `react.patch`	Create or edit current-turn text artifacts	Write Markdown, HTML, JSON, notes, or internal files, and patch current workspace files. Text previews may be line-numbered for reading; those prefixes are display-only and must never be generated in patch or replacement content. `channel="internal"` creates an internal file by default; add `scratchpad=true` only for short inline `react.note` anchors.
`react.plan`	Manage open plans	Create, replace, activate, close, and update step state
`react.hide`	Shrink visible tail blocks	Replace bulky but recoverable content with a short placeholder

Visible read limits are unit-specific and apply per requested path: text previews use text-character and token caps; all payloads use a raw byte cap. Unsupported binaries remain metadata-only and should be inspected through exec or related text/source refs.

ReAct V3 Agent Documentation

Full docs live in docs/sdk/agents/react/. Key files:

Architecture & Flow

Timeline & Artifacts

timeline-README.md — blocks, cache points, compaction
context-caching-README.md — cache checkpoints, TTL pruning, hide
compaction-README.md — hard ceiling and compacted memory
react-announce-README.md — announce channel
source-pool-README.md
artifact-storage-README.md
artifact-discovery-README.md
conversation-artifacts-README.md

Tools & Execution

react-tools-README.md — built-in react.* tools
memory-recovery-path-README.md — memsearch, read, turn index
external-exec-README.md
tool-call-blocks-README.md
event-blocks-README.md
turn-log-README.md
turn-data-README.md

Plan Tracking

Plans are a first-class timeline concept, not a separate orchestration layer. The agent creates and manages plans through the react.plan tool, and every plan is persisted as an append-only sequence of react.plan snapshot blocks in the timeline.

PlanSnapshot Structure

Each plan snapshot is stored as a timeline block of type react.plan with a stable plan_id and ordered steps. Key fields:

Field	Description
`plan_id`	Stable identifier for the plan lineage (opaque string)
`steps`	Ordered list of step descriptions
`status`	Current plan status
`origin_turn_id`	Turn where the plan was first created
`last_turn_id`	Turn of the most recent update
`closed_ts` / `superseded_ts`	Terminal timestamps (set when plan is closed or replaced)

The `react.plan` Tool

The agent manages plans through four lifecycle modes:

`mode="new"`

Creates a fresh plan lineage with a new plan_id and ordered steps. Becomes the current plan immediately and appears in ANNOUNCE.

`mode="replace"`

Retires an existing plan (marks it superseded) and creates a new lineage as its replacement. The old plan disappears from the open-plans view.

`mode="activate"`

Re-activates an older open plan as the current plan. Does not create a new plan_id. Progress acknowledgements apply only to the current plan.

`mode="close"`

Terminates a plan without replacement. The lineage stays in history but disappears from ANNOUNCE.

Plan Block in Timeline

Plans appear in the timeline as react.plan blocks with a stable reread handle:

# Stable latest-snapshot alias for any plan lineage
ar:plan.latest:<plan_id>

# Model creates a plan
react.plan(mode="new", steps=["collect metrics", "compare trends", "draft answer"])

# ANNOUNCE shows open plans with step markers
# [OPEN PLANS]
#   plan_id=plan_alpha (current)
#     □ [1] collect metrics
#     □ [2] compare trends
#     □ [3] draft answer

Step Statuses

The agent reports step progress via notes using status markers. The runtime parses these markers and updates the plan snapshot automatically.

Marker	Status	Meaning
`✓ [n]`	Done	Step completed successfully
`✗ [n]`	Failed	Step failed or was abandoned
`… [n]`	In-progress	Step is currently being worked on
`□ [n]`	Pending	Step not yet started (default)

ℹ️

Progress and lifecycle are separated by round. Status-marker notes are applied only in rounds that are not also changing plan lifecycle. If the agent calls react.plan(mode="activate"|"replace"|"close"), it should acknowledge progress in a later round, not the same one.

Multi-Round Plan Tracking

Plans survive across rounds and turns through the following mechanisms:

ANNOUNCE lists the last 4 open plans each round, marking the current one explicitly with (current).
The stable alias ar:plan.latest:<plan_id> always resolves to the newest snapshot for a lineage, regardless of which turn last updated it.
On a new turn, the runtime rehydrates only the current open plan automatically. Older plans must be inspected explicitly via react.read if they become relevant again.
When history is compacted, older plans appear in a react.plan.history block with step skeletons, statuses, and stable snapshot_refs for recovery.

A plan lineage is considered open only if its latest snapshot is not closed, superseded, or complete. Only the plan tagged (current) in ANNOUNCE may receive step acknowledgements.

See plan-README.md

Isolated Execution Runtime

The platform provides a sandboxed code execution runtime: the agent can generate Python programs, execute them under policy, and receive a normalized result envelope. The runtime has two logical zones. Docker can run those zones in the default split topology with sibling supervisor and executor containers, or in the legacy combined container strategy. Fargate runs the same logical contract in a remote ECS task.

Supervisor — networked, has full runtime context, and resolves settings/secrets through the descriptor-backed provider. All bundle tools from tools_descriptor.py execute here, including MCP tools, bundle-local tools, and custom SDK tools. The ChatCommunicator is also available to tool code, streaming events via Redis Pub/Sub to the client SSE.
Executor — completely isolated: no network, no descriptor payloads, no provider secret material, separate Linux namespace (UID 1001). Runs LLM-generated code. All tool calls are proxied to the Supervisor over a Unix socket. Can only write to /workspace/work and /workspace/out.

Two execution backends are practical:

ReAct isolated execution runtime diagram showing Docker split topology with chat processor, supervisor container, executor container, shared supervisor socket, safe executor mounts, and normalized result return.

🐳

Docker is the default and recommended mode — even in production. Each Processor instance runs on an EC2-backed ECS task, so Docker exec is local and fast. The Processor has a bounded number of in-flight requests and can run Docker containers synchronously within its capacity. Use Docker for live agentic loops.

🐳 Docker Default

Runs code on the same EC2 host as the Processor. Low latency, ideal for interactive agentic loops. Docker supports combined and split container strategies; split gives the executor a separate no-network container with only work, output, logs, and the supervisor socket mounted.

execution:
  runtime:
    mode: "docker"
    enabled: true  # default

☁️ AWS Fargate Async only

Serverless container on a separate compute plane. Recommended for long-running, non-live workloads — batch data processing, heavy computation, report generation — where startup latency (10–30s) is acceptable. Not suitable for fast interactive agentic loops.

execution:
  runtime:
    mode: "fargate"
    cluster: "arn:aws:ecs:..."
    task_definition: "exec-task"

Executor Environment Variables (Generated Code)

Variable	Description
`WORKDIR`	Working directory (source, helpers)
`OUTPUT_DIR`	Output directory (write files here)
`EXECUTION_ID`	Unique execution identifier
`AGENT_IO_CONTEXT`	Limited tool-proxy context for Unix socket calls

Supervisor launch env is different from generated-code env. Docker receives the exec launch payload inline as RUNTIME_GLOBALS_JSON. Fargate receives KDCUBE_EXEC_PAYLOAD_SECRET_ID, an AWS Secrets Manager secret name for temporary launch JSON; the entrypoint calls GetSecretValue, parses the JSON, and restores RUNTIME_GLOBALS_JSON, RUNTIME_TOOL_MODULES, and packaged supervisor env before bootstrap. The supervisor also receives descriptor payloads such as KDCUBE_RUNTIME_ASSEMBLY_YAML_B64, KDCUBE_RUNTIME_BUNDLES_YAML_B64, KDCUBE_RUNTIME_GATEWAY_YAML_B64, KDCUBE_RUNTIME_SECRETS_YAML_B64, and KDCUBE_RUNTIME_BUNDLES_SECRETS_YAML_B64. It materializes those descriptors before tool bootstrap so bundle tools can use normal get_settings(), get_plain(), get_secret(...), bundle props, and get_secret("b:...") bundle-secret lookups. By default descriptor payloads are full; setting execution.runtime.descriptor_payload_scope: active_bundle filters only bundles.yaml and bundles.secrets.yaml to the active caller bundle.

See external-exec-README.md

Supervisor vs Executor Architecture

The execution runtime uses a strict supervisor/executor boundary. In Docker combined, that boundary is inside one py-code-exec container. In Docker split, the supervisor and executor are sibling containers. In Fargate, the same logical contract runs inside the remote exec task.

The Supervisor bootstraps the full runtime: loads dynamic tool modules, initializes ModelService, KB client, Redis communication, and starts a PrivilegedSupervisor listening on the supervisor socket.
The Executor drops privileges to UID 1001, uses runtime-level network isolation, and runs the LLM-generated user_code.py.
Every tool call from executor code (io_tools, web_tools, react_tools, etc.) is proxied over the Unix socket to the supervisor. The executor never has direct access to network, secrets, or databases.

Docker Execution Mode

🐳

Docker is the default and fastest execution backend. The Processor (chat-proc) spawns a Docker child container on the same EC2 host, sharing workdir/outdir via bind mounts. Startup is near-instant. Network isolation uses unshare(CLONE_NEWNET). Docker mode supports custom images, CPU/memory limits, and PID limits via bundle configuration. For stronger filesystem isolation, configure py_code_exec_container_strategy: "split" so supervisor-only bundle mounts and descriptor material are not mounted into the executor container.

# Docker profile in bundle props
execution:
  runtime:
    profiles:
      docker:
        mode: "docker"
        image: "py-code-exec:latest"
        container_strategy: "split"  # default; combined | split
        network_mode: "host"
        cpus: "1.5"
        memory: "2g"
        extra_args: ["--pids-limit", "256"]

Fargate Execution Mode

Fargate exec runs the same supervisor/executor architecture as Docker, but on a dedicated ECS Fargate task instead of a local container. This is the replacement for Docker-on-node in environments where Fargate containers cannot access the Docker daemon.

Aspect	Docker Mode	Fargate Mode
Startup latency	Sub-second	10-30 seconds
Workdir sharing	Host bind mount	S3 snapshot + restore
Network isolation	`unshare(CLONE_NEWNET)`	Task-level VPC security group
Task lifetime	Container exits, docker rm	ECS task STOPPED
Caller waits via	`proc.communicate()`	Poll `describe_tasks` until STOPPED
Best for	Interactive agentic loops	Batch workloads, heavy computation

The caller (chat-proc) snapshots the workdir and outdir to S3, launches the Fargate task via ecs.run_task, polls until completion, then restores output zips back to the local workspace. From the agent's perspective, the result contract is identical to Docker mode.

Environment Variable Injection

The Fargate task receives supervisor launch state through containerOverrides.environment at run_task time. The current Fargate path stores the exec launch payload in AWS Secrets Manager under a name like kdcube/runtime/exec-payloads/<exec_id> and passes that name as KDCUBE_EXEC_PAYLOAD_SECRET_ID. The task entrypoint reads it with GetSecretValue, restores the runtime env, then proc deletes the temporary secret after the task finishes. Platform and bundle config are shipped separately as descriptor payloads (KDCUBE_RUNTIME_*_YAML_B64), not as raw provider API-key env promotion. Bundle tool module paths are rewritten from host paths to container paths (/workspace/bundles/{bundle_dir}/...).

Network Isolation & Unix Socket Communication

In both Docker and Fargate modes, the executor side is network-isolated. All tool calls from generated code are routed over a Unix domain socket to the supervisor side. In Docker split mode the socket is shared through a small socket volume. The supervisor has full access to Redis, Postgres, ModelService, S3, and external APIs. In Fargate, the supervisor connects to backing services via VPC DNS (Cloud Map private DNS or direct ElastiCache/RDS endpoints).

Error Propagation

Runtime-specific failures (ECS startup failure, Fargate timeout, snapshot restore failure) are surfaced through the same report_text / error envelope as local Docker execution. The agent sees a unified result contract regardless of backend:

# Unified result fields (both Docker and Fargate)
ok: bool          # execution succeeded
artifacts: list   # produced files
error: str        # error message if failed
report_text: str  # human-readable summary
user_out_tail: str       # last lines of user.log
runtime_err_tail: str    # last lines of runtime errors

See distributed-exec-README.md and exec-logging-error-propagation-README.md

Knowledge Space

Bundles can expose a searchable knowledge space built from a Git repository's docs, source code, deployment configs, and tests.

return {
    "knowledge": {
        "repo": "https://github.com/org/repo.git",  # "" = local repo
        "ref": "main",
        "docs_root": "app/docs",
        "src_root": "app/src",
        "deploy_root": "app/deploy",
        "tests_root": "app/tests",
        "validate_refs": True
    }
}

on_bundle_load() — Builds the index once per process (file-locked, signature-cached)
pre_run_hook() — Reconciles if config changed

Agent access via ks: paths: react.search_knowledge(query=..., limit=5) and react.read(["ks:docs/architecture.md"])

Context, RAG & Conversations

Context RAG Client

# self.ctx_client is ContextRAGClient
results = await self.ctx_client.search(
    query="previous analysis of sales data",
    kind="assistant",   # or "user" | "attachment"
    limit=5
)
artifact = await self.ctx_client.fetch_ctx(["ar:turn_abc.artifacts.summary"])

Conversations API Endpoints

GET  /conversations/{tenant}/{project}
POST /conversations/{tenant}/{project}/fetch
POST /conversations/{tenant}/{project}/{conv_id}/turns-with-feedbacks
POST /conversations/{tenant}/{project}/feedback/conversations-in-period

The react.memsearch tool searches past conversation memory directly inside the agent loop. It has two families: semantic search over indexed snippets and catalog search over Postgres turn-log rows for timeline, ordinal, and temporal questions. That distinction matters: broad questions like "what have we discussed so far?" should use mode="timeline" over targets=["summary"], not a generic semantic query. Questions like "what was the second turn about?" use mode="ordinal". Questions like "what did we discuss in March?" use mode="temporal". The ConversationStore (accessible via BaseWorkflow.store) manages turn payloads, timelines, and artifacts.

Timeline & Context Layout

Each conversation maintains a rolling timeline of turn artifacts stored as artifact:conv.timeline.v1. The timeline is the canonical cross-turn context passed to the LLM. It is structured as an ordered sequence of turn records, each containing user input, assistant output, tool calls, internal notes, working summaries, external-event blocks such as live followup, steer, form events, wizard events, alert events, and any attached artifacts. A single turn may contain multiple prompt-like user entries and multiple visible assistant.completion blocks. The latest assistant completion keeps the stable alias ar:<turn_id>.assistant.completion; earlier visible completions use numbered paths such as ar:<turn_id>.assistant.completion.1. External-event message artifacts use the matching family ar:<turn_id>.external.<kind>.<message_id>. The runtime also surfaces a compact [LIVE TURN EVENTS] area inside ANNOUNCE so the model can orient to same-turn control input without rereading the whole tail.

Cache Points

The platform inserts up to three LLM-level cache checkpoints per turn: prev-turn (the end of the prior turn), pre-tail (just before the current turn's tail), and tail (after the current turn). These cache points allow the LLM inference layer to reuse context prefix KV-cache across turns, reducing both latency and token cost for multi-turn conversations.

Compaction

When the accumulated timeline approaches the configured context budget ceiling, the platform triggers compaction: older turn ranges are summarized into a compact conv.range.summary artifact and replaced in the visible timeline. This is a hard-ceiling guard — it ensures context never silently overflows the model's context window. Working summaries are injected into the compaction prompt, internal notes can be preserved as stable anchors, and consumed followup / steer controls remain visible through preserved event copies because they are treated as first-class user intent rather than disposable transport noise. Compaction is transparent to bundle code.

Hosting & File Resources

Your bundle can produce files (PDFs, PNGs, data exports) and make them available via hosted URLs. The platform handles upload, serving, and access control automatically.

# ApplicationHostingService (via BaseWorkflow.hosting_service)
url = hosting.get_artifact_url("fi:turn_123.outputs/export/report.pdf")

# Resource Name format
# ef:{tenant}:{project}:chatbot:{stage}:{user_id}:{conv_id}:{turn_id}:{role}:{path}
# Resolved by POST /by-rn with authentication enforced by platform

Files written to OUTPUT_DIR/turn_{id}/files/ remain part of the durable workspace tree, while files written to OUTPUT_DIR/turn_{id}/outputs/ are tracked as non-workspace produced artifacts. User-facing downloads should typically come from outputs/ with external visibility. Original user attachments appear as fi:{turn_id}.user.attachments/{filename}. Later external events can also carry attachment payloads. Those keep message-level identity in a separate logical path family such as fi:{turn_id}.external.followup.attachments/{message_id}/{filename}, fi:{turn_id}.external.form.attachments/{message_id}/{filename}, or more generally fi:{turn_id}.external.<kind>.attachments/{message_id}/{filename}, so repeated filenames from different user messages do not collide while still materializing into the live turn timeline.

Attachments & Limits

User-uploaded files enter the system via the chat API (SSE or Socket.IO), pass through security scanning, are stored in the ConversationStore, and then flow to two downstream paths: multimodal LLM inference and code execution. Original turn attachments follow the normal attachment path. Later busy-turn external events may also carry attachment payloads; those belong to the continuation-event contract, but reactive kinds can still fold them into the active turn timeline under the corresponding external.<kind> path family.

📎

Continuation attachments: if a conversation is busy and a reactive external event carries files, ingress hosts those files first and the live event carries only hosted reference metadata (hosted_uri, rn, key, filename, MIME type, message id). If the active React turn owns the live listener, it hydrates readable attachment content from hosting and folds the attachment into the same turn under paths such as fi:<turn_id>.external.followup.attachments/<message_id>/<filename> or, more generally, fi:<turn_id>.external.<kind>.attachments/<message_id>/<filename>. If the event is instead promoted into a later turn, the same hosted attachment payload remains available there too.

User Upload Flow

When a user submits attachments, the ingress layer enforces size caps and runs security preflight before storage:

Collect raw bytes + metadata (filename, MIME type)
Enforce per-file and total-message size caps
Run ClamAV antivirus scan (when APP_AV_SCAN=1, always enabled in production)
Run preflight validation: MIME-type allowlist via magic sniffing, PDF heuristic checks, ZIP/OOXML structural checks, macro blocking
If allowed, store via ConversationStore.put_attachment()

🛡️

Macro-free policy: Any macro-enabled OOXML file (.docm, .pptm, VBA projects) is rejected at ingress. Generic ZIP archives are also disallowed by default.

Supported File Types

Category	Accepted Types
Documents	`application/pdf`, `.docx`, `.pptx`, `.xlsx`
Images	`image/jpeg`, `image/png`, `image/gif`, `image/webp`
Text	`text/*` (subject to size limit)

File Rehosting for Execution

For code-generated programs, attachments are materialized into the execution workspace as local files inside the sandboxed container. Original prompt attachments resolve under turn_<id>/attachments/<filename>; busy-turn continuation attachments keep their event-scoped identity under paths such as turn_<id>/external/followup/attachments/<message_id>/<filename> and, more generally, turn_<id>/external/<kind>/attachments/<message_id>/<filename>.

Artifact Size & Count Limits

Limit	Value
Per-image cap	5 MB (`MODALITY_MAX_IMAGE_BYTES`)
Per-PDF cap	10 MB (`MODALITY_MAX_DOC_BYTES`)
Total message cap (text + attachments)	25 MB (`MESSAGE_MAX_BYTES`)
PDF max pages	500
ZIP max entries	2,000
ZIP max uncompressed total	120 MB
ZIP max compression ratio	200x
Text file max size	10 MB

Timeline Truncation Limits

To prevent context blowup, the platform applies truncation policies to older timeline blocks:

Limit	Default
User/assistant text truncation	4,000 chars
Tool result text truncation	400 chars
Tool result list items cap	50 items
Tool result dict keys cap	80 keys
Base64 in timeline blocks	4,000 chars (oversized replaced with placeholder)
Sources pool base64 cap	4,000 chars (dropped if exceeded)

🔄

Truncated artifacts can be recovered. Use react.read to rehydrate hidden or pruned artifacts when needed. Ranged reads are normal timeline result blocks; after TTL pruning their placeholders preserve the path and line/text-symbol range so the same range can be read again. Skills loaded by react.read are pruned in old turns with a placeholder containing the original sk: reference for re-reading.

Memory Recovery Path

Pruning and compaction are allowed to remove old raw blocks from the visible prompt because the runtime preserves recovery handles. The agent follows a short route instead of rereading everything:

ReAct memory recovery diagram showing exact path reads, summary reads, react.memsearch, reconstructed turn index, and namespace resolution for ar, tc, ws, su, fi, so, sk, and ks paths.

visible exact path
  -> react.read([path])
  -> react.pull([fi_path]) if execution needs a local file

visible summary path (ws:/su:)
  -> react.read([summary_path])
  -> react.read(["ar:<turn_id>.react.turn.index"]) if refs are incomplete
  -> react.read([ar_or_tc_or_so_path, ...]) or react.pull([fi_path, ...])

topic only
  -> react.memsearch(query, targets=["summary", "user", "assistant", "attachment"])
  -> read the returned refs or the returned turn_index_path

broad conversation overview
  -> react.memsearch(mode="timeline", targets=["summary"], order="asc", top_k=N)
  -> summarize returned working summaries in turn order

ordinal clue
  -> react.memsearch(mode="ordinal", ordinal=2, targets=["summary", "user", "assistant"])

temporal clue
  -> react.memsearch(mode="temporal", from="2026-03-01T00:00:00Z", to="2026-04-01T00:00:00Z", targets=["summary", "user", "assistant"])

The turn index path ar:<turn_id>.react.turn.index is not stored as another timeline block. It is reconstructed on demand from the persisted turn log and artifact metadata, and it lists the turn's summaries, messages, events, tools, artifacts, and sources with short semantic hints.

See attachments-system.md and artifacts-limits-README.md

Citations & Sources

Citation Tokens

The company was founded in 2015 [[S:1]] and expanded by 2020 [[S:2,3]].
According to multiple sources [[S:1-4]], the trend is clear.

Sources Pool Fields

Field	Description
`sid`	Source ID (integer, per-conversation, deduplicated)
`title`	Page or file title
`url`	URL or file path
`source_type`	web \| file \| attachment \| manual
`objective_relevance`	Semantic relevance score (0–1)
`published_time_iso`	Publication timestamp
`favicon_url`	Source favicon for UI display

See citations-system.md and source-pool-README.md

Feedback System

POST /conversations/{tenant}/{project}/{conv_id}/turns/{turn_id}/feedback

{ "reaction": "ok", "text": "Very helpful!", "ts": "2026-03-21T10:00:00Z" }
# reaction: ok | not_ok | neutral | null

Your bundle can also emit machine feedback (origin: "machine") for confidence scores or quality checks — additive, not replacing user feedback. Satisfaction rate: ok / (ok + not_ok + neutral).

See feedback-system.md