A timeline-native, single-agent decision loop designed for precision reasoning, structured multi-channel output, and long-lived conversational memory — built for the demands of production AI applications.
Naming note: The runtime is referred to as "ReAct v2", "React v2", and "ReAct Agent" across different parts of the codebase and documentation. These names refer to the same system; this page uses ReAct v2 as the canonical form.
ReAct v2 is KDCube's production decision-loop agent. It implements a single-agent Reason + Act loop running over a shared conversation timeline, where every tool call, artifact, plan update, and final answer is recorded as an ordered block in that timeline. The timeline acts as the single source of truth for all in-turn context, history, and compaction.
Unlike multi-agent pipelines that pass context through separate message buses, ReAct v2
keeps all state in one timeline artifact (conv.timeline.v1) and a companion
source pool (conv:sources_pool). This design makes every turn reproducible,
debuggable, and efficiently cacheable.
One ReAct agent drives the full decision loop per turn. No separate coordinator or final-answer generator is required in the reference implementation. A lightweight optional Gate agent runs only for new conversations to extract a title.
The timeline is built from contribute (persistent) and announce (ephemeral) blocks. Agents read a rendered snapshot at every round — not a raw message array — giving precise control over what is cached and what is fresh.
Tool results, artifact paths, and source citations are recorded as typed blocks with stable logical paths. The agent always has a verifiable audit trail for every action taken.
A tag-based streaming protocol lets the model emit multiple logical outputs — thinking, answer, JSON sidecars, follow-up suggestions — in a single LLM call, each routed to the correct consumer.
Production AI apps face three recurring problems: context drift (what did we discuss three turns ago?), output fragmentation (how do I show reasoning and a structured result at the same time?), and cost at scale (how do I avoid re-feeding the whole conversation on every round?). ReAct v2 addresses all three directly through its timeline design, multi-channel streamer, and dual-checkpoint caching system.
The timeline is the authoritative, ordered log of everything that happened in a conversation. It is loaded at turn start, updated as the turn progresses, and persisted at turn end as a single JSON artifact.
Blocks written via ctx_browser.contribute() are saved into the timeline and appear in all future renders. Used for:
react.tool.call, react.tool.result)react.plan) and acknowledgements (react.plan.ack)assistant.completion)Appended to the rendered tail when include_announce=True — never persisted as part of the main timeline. Used for:
The announce model keeps high-frequency state signals out of the cached timeline. This preserves cache hits across rounds while still giving the agent fresh situational awareness on every decision call.
ctx_browser.load_timeline() fetches artifact:conv.timeline.v1 and artifact:conv:sources_pool, hydrating in-memory state for the turn.
User prompt and any attachments are contributed as persistent blocks, joining the timeline block stream.
Only for new conversations. The gate agent renders the timeline without sources or announce, emits a title block, and contributes it. All subsequent turns skip the gate entirely.
Each round: render timeline (with sources + announce), call the LLM, execute the chosen tool, contribute results back into the timeline. Repeat until the agent emits a final answer or the iteration budget is exhausted.
Both conv.timeline.v1 and conv:sources_pool are written back to storage. The turn log (artifact:turn.log) records the current-turn blocks for fast next-turn reconstruction.
When the rendered timeline would exceed the model's context budget, the runtime compacts
earlier blocks into a single conv.range.summary block. The compacted blocks
are removed from the persisted payload; future renders start from the summary forward.
Separately, when a session cache_ttl_seconds is configured, blocks older
than the TTL are replaced with truncated placeholders on render. A system.message
block is appended advising the agent to use react.read(path) to restore
any specific logical path.
# Typical timeline block sequence (condensed)
[TURN turn_A]
user.prompt
user.attachment.meta
stage.gate # optional, new conv only
stage.react
react.tool.call # tool params
react.tool.result # rendered artifact metadata
react.plan # JSON snapshot of plan
react.plan.ack # human-readable ack
assistant.completion # [sources_used: [1,2,3]]
[TURN turn_B]
user.prompt
...
[SOURCES POOL] # appended at tail (uncached)
[ANNOUNCE] # appended at tail (uncached)
react.*)
Standard LLM tool calling routes tool invocations to external executors. ReAct v2 supplements
this with a set of in-loop react.* tools that operate directly
on the timeline and the agent's data spaces — without leaving the decision loop.
Naming note: Documentation references both react.stream and react.file for output writing. Earlier drafts sometimes used react.write. The authoritative tool names per current docs are react.stream (streaming to display/file) and react.file (write to file). Use those names in new bundles.
ks: Knowledge SpaceRead-only. Reference files prepared by the system — docs, indexes, cloned repos. Accessed via react.read("ks:path/to/doc.md").
fi: OUT_DIRRead/write during the turn. Per-turn execution artifacts. Files map to fi:<turn_id>.files/<relpath>. Writable via react.file / react.stream.
workspace: (future)Not yet implemented. A shared, writable workspace that persists across turns — designed for copilot-style, long-running project flows.
| Tool | Purpose | Key behavior | Path family |
|---|---|---|---|
| react.read | Load an existing artifact into timeline context | Emits a status block first (dedup check), then artifact blocks. Re-exposes hidden artifacts. Clears hidden flag. |
fi: ar: so: su: tc: ks: |
| react.stream | Stream or generate content for a new artifact | kind=display shows to user; kind=file records as file. Content goes to the named channel (canvas / timeline_text). |
OUT_DIR-relative path under <turn_id>/files/ |
| react.file | Write content to a file and emit result metadata | Stores file on disk; emits react.tool.result metadata block. Suitable for non-streamed file writes. |
OUT_DIR-relative path under <turn_id>/files/ |
| react.patch | Patch an existing file; supports unified diff | If patch starts with ---/+++/@@ → unified diff; otherwise full replacement. Emits patch text as result. |
OUT_DIR-relative path under <turn_id>/files/ |
| react.memsearch | Semantic search over past turns | Returns compact snippets with turn_id, timestamps, relevance scores. Targets: assistant / user / attachment. |
Conversation index (no path argument) |
| react.hide | Replace a large in-context snippet with a placeholder | Original content remains retrievable via react.read. Only blocks within the editable tail window can be hidden. |
ar: fi: tc: so: ks: |
| react.search_files | Safely enumerate files under OUT_DIR or workdir | Returns paths + sizes + logical paths. Does not load content. Use react.read on results to load. |
outdir / workdir prefixes only |
| react.search_knowledge | Search knowledge space (bundle-provided) | Available only when the active bundle registers the tool (e.g., react.doc bundle). Returns ranked doc hits. |
ks: namespaces |
Every tool call follows a strict ordering contract: parameters must be supplied in the
documented field order (first field, second field, etc.). If the agent emits notes
with a decision, a react.notes block is contributed before the tool call.
Protocol errors and validation notices are emitted as react.notice blocks,
visible to the agent on the next round.
Artifact paths use stable logical namespaces (fi:, ar:, tc:, so:).
Physical execution paths returned by bundle namespace resolvers are valid only inside that
exec runtime and are explicitly not valid as inputs to react.* tools.
ReAct v2's output layer is built on the Channeled Streamer
(versatile_streamer.py), a tag-based protocol that routes
a single LLM stream into multiple named logical channels, each with independent
format, citation replacement, and subscriber fanout.
The model wraps each logical output in XML-like channel tags:
# Model output (single LLM call, multiple channels)
<channel:thinking>
Let me check the document structure first...
</channel:thinking>
<channel:answer>
Based on the report [[S:1]], the key findings are...
</channel:answer>
<channel:followup>
{"followups": ["Show me the full table", "Compare with last quarter"]}
</channel:followup>
The stream_with_channels() function parses these tags incrementally,
routing each chunk to the correct channel handler as it streams. Citation tokens
([[S:n]]) are replaced per-channel without modifying the stored raw output.
Internal reasoning trace. Routed to marker="thinking". Typically markdown format. Can be shown in a dedicated reasoning UI panel or suppressed from the end user.
User-visible main response. Routed to marker="answer". Supports markdown with live citation replacement ([[S:n]] → linked references).
JSON list of suggested follow-up prompts. Routed to a chat.followups step. Parsed incrementally for live UI rendering.
When composite streaming is enabled, JSON content is parsed field-by-field and each attribute delta is streamed to the canvas marker for live structured UI updates.
JSON sidecar with token counts, model metadata, and source IDs used in the answer. Enables client-side analytics without touching the main answer stream.
Structured JSON channels carrying the agent's next action (tool call + params + reasoning). Parsed against a Pydantic model for protocol validation.
Each channel can carry one or more subscribers — side-effect handlers
that run alongside the primary emit function. A subscriber on the
usage channel, for example, can write source IDs to a timeline record
streamer while the answer channel continues streaming to the user uninterrupted.
# Illustrative adaptation (from docs pattern)
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import (
ChannelSpec, ChannelSubscribers, stream_with_channels
)
channels = [
ChannelSpec(name="thinking", format="markdown", replace_citations=False, emit_marker="thinking"),
ChannelSpec(name="answer", format="markdown", replace_citations=True, emit_marker="answer"),
ChannelSpec(name="followup", format="json", replace_citations=False, emit_marker="followup"),
ChannelSpec(name="usage", format="json", replace_citations=False, emit_marker="answer"),
]
results, meta = await stream_with_channels(
svc=svc,
messages=[system_msg, user_msg],
role="answer.generator.regular",
channels=channels,
emit=_emit_wrapper,
agent="my.react.agent",
artifact_name="response",
sources_list=sources_list,
subscribers=ChannelSubscribers().subscribe("usage", _record_usage_sids),
return_full_raw=True,
)
answer_text = results["answer"].raw
used_sids = results["answer"].used_sources
service_err = (meta or {}).get("service_error")
Citation tokens ([[S:1]], [[S:1,3]], [[S:2-4]])
are replaced at stream time for markdown and text channels,
and replaced with <sup class="cite"> tags for html
channels. The stored raw output is never modified; replacement happens
only in the bytes sent to the client.
A stateful per-channel citation tokenizer handles tokens split across chunk
boundaries, ensuring [[S:n]] is never partially emitted to the client.
This allows citation-heavy responses to stream cleanly without client-side parsing.
For managed JSON artifacts, the CompositeJsonArtifactStreamer can
be attached to a JSON channel. As the model streams a JSON object, each top-level
field is parsed incrementally and emitted as a per-attribute delta to a separate
marker (typically canvas). This enables live structured UI rendering
before the full JSON closes.
The sources pool is a per-conversation registry of canonical source rows accumulated across the entire conversation lifecycle — not just the current turn.
web_tools.web_search and web_tools.web_fetchimage/*) for rendering/embeddingimage/*)sources.yaml when a skill is loaded via react.readNon-image file types (XLSX, PPTX, DOCX, PDFs, archives) are not added to the sources pool. Only images are eligible as attachment/file sources.
Sources are merged by normalized URL (or physical_path for local files).
Once a source is assigned a sequential ID (sid), that ID is
stable for the entire conversation. Duplicate URLs reuse the existing SID;
only genuinely new sources receive the next integer.
This means [[S:3]] in turn 2 refers to the same source as [[S:3]]
in turn 12. Clients can rehydrate citations by matching sources_used SIDs
in any artifact against the current sources_pool in timeline.json.
The full pool is stored as artifact:conv:sources_pool. A compact snapshot
(sid, title, url, short text, and limited metadata) is embedded in the timeline artifact
for fast local access. The compact snapshot is rendered as a single SOURCES POOL
tail block when the timeline is rendered with include_sources=True.
# Load specific source rows from docs pattern
react.read(["so:sources_pool[1-5]"])
react.read(["so:sources_pool[1,3,7]"])
# Inline citation tokens (docs pattern)
[[S:1]] # single source
[[S:1,3]] # multiple sources
[[S:2-4]] # range of sources
# Note: only web sources (http/https) should be cited as evidence.
# Image sources are for rendering only, not evidence citations.
Because SIDs are stable across turns, a source indexed as [[S:5]]
from a web fetch in turn 1 remains [[S:5]] in the final answer
in turn 15. The streamer resolves citations from the current pool state at
stream time, so the client always receives correct hyperlinks regardless of
when the source was first added.
ReAct v2 approaches memory not as a separate subsystem but as a set of layered strategies on the timeline: caching, compaction, TTL pruning, semantic search, and selective hiding all operate on the same block stream.
The context browser inserts up to three cache checkpoints per rendered timeline, enabling providers that support prompt caching (such as Anthropic) to serve most of the context from cache on repeated calls:
offset_rounds before the tail round (when enough rounds exist).Sources and announce blocks are appended after all checkpoints and remain uncached every round, ensuring the agent always sees fresh source data and plan status without invalidating the stable prefix cache.
When the visible timeline exceeds the model budget, earlier blocks are
compacted into a conv.range.summary block at the cut point.
The original blocks are removed from the persisted payload; future renders
start from the summary forward. This is an edit-over-rewrite
approach — the timeline is surgically trimmed, not discarded and restarted.
When cache_ttl_seconds is set in the session configuration,
blocks from previous turns are replaced with compact truncated placeholders
after the TTL expires. A system.message block is appended to
explain the pruning and guide the agent to restore specific paths via
react.read(path). This mechanism keeps the active context window
efficient for long-lived sessions without losing the ability to
recall prior content.
react.memsearch — Persistent recall
Even after compaction or TTL pruning, prior turn content is indexed semantically.
react.memsearch allows the agent to query that index for relevant
snippets from any historical turn, surfacing them with scores and timestamps.
The agent can then selectively reload full content via react.read.
# Docs pattern: semantic recall from past turns
react.memsearch(
query="database schema discussed last week",
targets=["assistant", "user"],
top_k=5,
days=30
)
# Returns: [{turn_id, text, score, ts}, ...]
react.hide — Selective context pruning
Within a single turn, the agent can replace large blocks in the editable tail
window with short placeholders using react.hide. The original
content remains accessible via react.read. This is useful for
hiding large file artifacts that are no longer needed in the active context
window, freeing space for new tool results.
react.hide is constrained to the editable tail window — blocks before the pre-tail cache checkpoint cannot be hidden. This prevents cache invalidation from inadvertent edits to stable prefixes.
Plans are tracked as explicit react.plan JSON snapshot blocks
in the timeline, updated each time the decision agent acknowledges progress
in its reasoning notes. A human-readable react.plan.ack block
is emitted alongside. The active plan is surfaced in the announce section
each round; closed or completed plans remain in the timeline history but are
no longer re-announced.
# Plan snapshot block (docs pattern, simplified)
{
"type": "react.plan",
"mime": "application/json",
"path": "ar:<turn_id>.react.plan.<plan_id>",
"text": {
"plan_id": "plan_abc123",
"origin_turn_id": "turn_001",
"steps": [
{ "n": 1, "label": "Gather sources", "status": "done" },
{ "n": 2, "label": "Draft report", "status": "pending" }
]
}
}
Examples below are labeled either FROM DOCS PATTERN (directly reproduced or lightly formatted from SDK documentation) or ILLUSTRATIVE ADAPTATION (derived structure; not a verbatim API contract).
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import (
ChannelSpec, ChannelSubscribers, stream_with_channels
)
# Define channels emitted by the model
channels = [
ChannelSpec(
name="answer",
format="markdown",
replace_citations=True,
emit_marker="answer",
),
ChannelSpec(
name="usage",
format="json",
replace_citations=False,
emit_marker="answer",
),
]
# Attach a subscriber to the "usage" channel for side-effect fanout
subscribers = ChannelSubscribers().subscribe("usage", _usage_json_fanout)
# Invoke streaming (docs pattern; see versatile_streamer.py)
results, meta = await stream_with_channels(
svc=svc,
messages=[system_msg, user_msg],
role="answer.generator.regular",
channels=channels,
emit=_emit_wrapper,
agent="my.agent",
artifact_name="report",
sources_list=sources_list,
subscribers=subscribers,
return_full_raw=True,
)
answer_raw = results["answer"].raw
used_sources = results["answer"].used_sources
service_err = (meta or {}).get("service_error")
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import ChannelSpec, stream_with_channels
from kdcube_ai_app.apps.chat.sdk.streaming.artifacts_channeled_streaming import CompositeJsonArtifactStreamer
# Answer channel carries JSON; composite streamer fans it out to canvas
channels = [
ChannelSpec(name="answer", format="json", replace_citations=False, emit_marker="answer"),
ChannelSpec(name="usage", format="json", replace_citations=False, emit_marker="answer"),
]
results = await stream_with_channels(
svc=svc,
messages=[system_msg, user_msg],
role="answer.generator.regular",
channels=channels,
emit=emit_delta,
agent="my.agent",
artifact_name="my.json.artifact",
composite_cfg={"artifactA": "path.to.schema"},
composite_channel="answer", # route JSON channel into composite streamer
composite_marker="canvas", # emit per-attribute deltas to canvas
)
# Step 1: Sources are accumulated across turns into conv:sources_pool.
# The pool is loaded at turn start via ctx_browser.load_timeline().
# Step 2: Decision agent loads specific sources by SID range (docs pattern)
# Inside the react decision loop:
react.read(["so:sources_pool[1-5]"]) # loads sources 1 through 5
react.read(["so:sources_pool[1,3,7]"]) # loads sources 1, 3, 7
# Step 3: Agent cites sources inline in generated text
# The streamer replaces [[S:n]] tokens during streaming (never in stored raw)
answer_text = """
According to the analysis [[S:1]], the platform supports
multi-channel streaming [[S:2,3]]. Cache efficiency results
in up to 60% cost reduction [[S:1]].
"""
# Step 4: Client rehydrates citations using sources_used SIDs from the artifact
# matched against current sources_pool in timeline.json
used_sids = results["answer"].used_sources # e.g. [1, 2, 3]
pool = timeline_json["sources_pool"]
refs = [s for s in pool if s["sid"] in used_sids]
# Semantic search over past turns — useful after compaction or TTL pruning
# Inside the react decision loop (react.memsearch is a react.* in-loop tool):
react.memsearch(
query="authentication configuration from last session",
targets=["assistant", "user"],
top_k=5,
days=90
)
# Returns blocks like:
# [{"turn_id": "turn_123", "text": "...", "score": 0.84, "ts": "2026-02-01T...Z"}]
# Then restore the full content if needed:
react.read(["fi:turn_123.files/config.yaml"])
Conceptual comparison only. KDCube does not publish direct benchmarks or first-party technical comparisons with third-party systems. The table below describes architectural design properties of ReAct v2 as documented in the SDK, and contrasts them with common characteristics of general-purpose agentic frameworks (e.g., tool-calling patterns in Claude Artifacts, ChatGPT Code Interpreter, or generic LangGraph-style agents). These comparisons are reasoned from design intent, not empirically measured.
| Capability | KDCube ReAct v2 | General agentic frameworks / tool-calling agents |
|---|---|---|
| Context representation | Single ordered timeline artifact with typed blocks; every tool call, result, and plan stored as a block with stable logical paths. | Typically a message array or in-memory list; tool results appended as assistant/tool messages. Less structured for audit or replay. |
| Multi-channel streaming output | Tag-based protocol (<channel:thinking>, <channel:answer>, etc.) with per-channel format, citation replacement, and subscriber fanout in a single LLM call. |
Typically single output stream or separate API calls. Structured output and reasoning trace usually require multiple round-trips or post-processing. |
| Prompt cache efficiency | Dual-checkpoint + pre-tail caching strategy. Stable prefix is cached across rounds; sources/announce remain uncached. Designed to minimize cache invalidation during multi-round loops. | Depends on provider and framework. Many frameworks rebuild the full message array each call, reducing cache hit rate. Cache strategy often manual or absent. |
| Cross-turn source continuity | Sources pool persisted as a conversation-level artifact with stable SIDs. Citations ([[S:n]]) consistent across all turns in a conversation. |
Web search results typically ephemeral per turn. Cross-turn citation continuity requires custom implementation. |
| In-loop context management tools | react.hide, react.memsearch, react.read — agent controls context compaction and recall directly from within the decision loop. |
Context management is generally external (framework truncation, summarization). Agent typically cannot selectively load/hide prior content. |
| Plan tracking with timeline persistence | Explicit react.plan JSON snapshots + react.plan.ack blocks in the timeline. Active plan surfaced via announce. Plan history is auditable in the block stream. |
Plans are usually in-prompt instructions or ephemeral state. No standardized plan snapshot or audit trail in the context artifact. |
| Contribute vs announce separation | Explicit split: persistent contribute blocks vs ephemeral announce signals. High-frequency state (iteration, budget, time) never pollutes the cached timeline. | System messages and injected context usually rebuilt each call, increasing cache miss risk. No formal separation of ephemeral vs persistent signals. |
| Protocol validation on tool calls | react.notice blocks emitted on protocol errors. Strict parameter ordering enforced per tool spec. Agent sees validation feedback on the next round. |
Tool call validation generally provider-side (JSON schema). Runtime protocol feedback to the agent typically absent or requires custom middleware. |
| Open source / self-hostable | Available via KDCube open-source AI app repository. Deployable to Kubernetes, local, or cloud infrastructure. | Varies. LangGraph, LlamaIndex are open source. Claude Artifacts and ChatGPT Code Interpreter are hosted-only. |
ReAct v2 is designed as a production-first SDK rather than a research agent framework. The architectural choices prioritize operational properties — cache efficiency, auditability, context control — that matter in deployed applications with real users and cost constraints.