A timeline-native, single-agent decision loop designed for precision reasoning, structured multi-channel output, and long-lived conversational memory — built for the demands of production AI applications.
Naming note: The runtime is referred to as "ReAct v2", "React v2", and "ReAct Agent" across different parts of the codebase and documentation. These names refer to the same system; this page uses ReAct v2 as the canonical form.
ReAct v2 is KDCube's production decision-loop agent. It implements a single-agent Reason + Act loop running over a shared conversation timeline, where every tool call, artifact, plan update, and final answer is recorded as an ordered block in that timeline. The timeline acts as the single source of truth for all in-turn context, history, and compaction.
Unlike multi-agent pipelines that pass context through separate message buses, ReAct v2
keeps all state in one timeline artifact (conv.timeline.v1) and a companion
source pool (conv:sources_pool). This design makes every turn reproducible,
debuggable, and efficiently cacheable.
One ReAct agent drives the full decision loop per turn. No separate coordinator or final-answer generator is required in the reference implementation. A lightweight optional Gate agent runs only for new conversations to extract a title.
The timeline is built from contribute (persistent) and announce (ephemeral) blocks. Agents read a rendered snapshot at every round — not a raw message array — giving precise control over what is cached and what is fresh.
Tool results, artifact paths, and source citations are recorded as typed blocks with stable logical paths. The agent always has a verifiable audit trail for every action taken.
A tag-based streaming protocol lets the model emit multiple logical outputs — thinking, answer, JSON sidecars, follow-up suggestions — in a single LLM call, each routed to the correct consumer.
Production AI apps face three recurring problems: context drift (what did we discuss three turns ago?), output fragmentation (how do I show reasoning and a structured result at the same time?), and cost at scale (how do I avoid re-feeding the whole conversation on every round?). ReAct v2 addresses all three directly through its timeline design, multi-channel streamer, and dual-checkpoint caching system.
The timeline is the authoritative, ordered log of everything that happened in a conversation. It is loaded at turn start, updated as the turn progresses, and persisted at turn end as a single JSON artifact.
Blocks written via ctx_browser.contribute() are saved into the timeline and appear in all future renders. Used for:
react.tool.call, react.tool.result)react.plan) and acknowledgements (react.plan.ack)assistant.completion)Appended to the rendered tail when include_announce=True — never persisted as part of the main timeline. Used for:
The announce model keeps high-frequency state signals out of the cached timeline. This preserves cache hits across rounds while still giving the agent fresh situational awareness on every decision call.
ctx_browser.load_timeline() fetches artifact:conv.timeline.v1 and artifact:conv:sources_pool, hydrating in-memory state for the turn.
User prompt and any attachments are contributed as persistent blocks, joining the timeline block stream.
Only for new conversations. The gate agent renders the timeline without sources or announce, emits a title block, and contributes it. All subsequent turns skip the gate entirely.
Each round: render timeline (with sources + announce), call the LLM, execute the chosen tool, contribute results back into the timeline. Repeat until the agent emits a final answer or the iteration budget is exhausted.
Both conv.timeline.v1 and conv:sources_pool are written back to storage. The turn log (artifact:turn.log) records the current-turn blocks for fast next-turn reconstruction.
When the rendered timeline would exceed the model's context budget, the runtime compacts
earlier blocks into a single conv.range.summary block. The compacted blocks
are removed from the persisted payload; future renders start from the summary forward.
Separately, when a session cache_ttl_seconds is configured, blocks older
than the TTL are replaced with truncated placeholders on render. A system.message
block is appended advising the agent to use react.read(path) to restore
any specific logical path.
# Typical timeline block sequence (condensed)
[TURN turn_A]
user.prompt
user.attachment.meta
stage.gate # optional, new conv only
stage.react
react.tool.call # tool params
react.tool.result # rendered artifact metadata
react.plan # JSON snapshot of plan
react.plan.ack # human-readable ack
assistant.completion # [sources_used: [1,2,3]]
[TURN turn_B]
user.prompt
...
[SOURCES POOL] # appended at tail (uncached)
[ANNOUNCE] # appended at tail (uncached)
react.*)
Standard LLM tool calling routes tool invocations to external executors. ReAct v2 supplements
this with a set of in-loop react.* tools that operate directly
on the timeline and the agent's data spaces — without leaving the decision loop.
Current write contract: The authoritative in-loop file authoring tool is react.write. It writes current-turn text artifacts into files/<scope>/... for durable workspace state or outputs/<scope>/... for non-workspace produced artifacts. Historical refs can be materialized locally with react.pull(paths=[...]) as readonly reference material. The active current workspace under turn_<current_turn>/files/... is defined explicitly with react.checkout(...).
ks: Knowledge SpaceRead-only. Reference files prepared by the system — docs, indexes, cloned repos. Accessed via react.read("ks:path/to/doc.md").
fi: Versioned turn artifactsLogical snapshot namespace. Used for historical workspace files, non-workspace outputs, and attachments such as fi:<turn_id>.files/<scope>/..., fi:<turn_id>.outputs/<scope>/..., and fi:<turn_id>.user.attachments/<name>.
Active writable surface. The agent writes durable project state to turn_<current_turn>/files/... and produced artifacts to turn_<current_turn>/outputs/.... In git mode this turn root is also a sparse local git repo.
Workspace activation is explicit. ReAct should inspect ANNOUNCE first, then use react.checkout(mode="replace", paths=["fi:<turn>.files/<scope>"]) to seed or continue the active current-turn workspace. Use react.checkout(mode="overlay", paths=[...]) to import selected historical files into the existing workspace. Use react.pull(paths=["fi:..."]) when older versions should be available locally as readonly side views only. In git mode, ANNOUNCE also exposes ls workspace so the agent can choose which existing scope to continue.
| Tool | Purpose | Key behavior | Path family |
|---|---|---|---|
| react.read | Load an existing artifact into timeline context | Emits a status block first (dedup check), then artifact blocks. Re-exposes hidden artifacts. Clears hidden flag. |
fi: ar: so: su: tc: ks: |
| react.write | Write current-turn text content and optionally emit a user-visible file | Writes to files/<scope>/... for durable workspace state or outputs/<scope>/... for non-workspace artifacts. Supports channel=canvas|timeline_text|internal and kind=display|file. |
Current-turn paths under <turn_id>/files/ or <turn_id>/outputs/ |
| react.pull | Explicitly materialize historical workspace files or outputs locally as readonly reference material | Accepts fi: refs only. Supports subtree pulls for fi:<turn_id>.files/.... Outputs and attachments must be exact file refs. Pulled refs land under their historical turn roots and do not modify the active current-turn workspace. Historical files are not auto-hydrated for exec or patching. |
fi:<turn_id>.files/..., fi:<turn_id>.outputs/..., exact attachment refs |
| react.patch | Patch an existing current-turn file; supports unified diff | If patch starts with ---/+++/@@ → unified diff; otherwise full replacement. Historical cross-turn patching requires the file to be pulled first. |
Current-turn paths under <turn_id>/files/ or <turn_id>/outputs/ |
| react.checkout | Construct or update the active current-turn workspace from historical refs | Accepts ordered paths=[fi:<turn>.files/...] with mode="replace"|"overlay". replace seeds the current workspace from scratch. overlay imports selected historical files into the existing workspace and overwrites overlaps without deleting unspecified files. Legacy version="turn_..." remains as compatibility for whole-tree checkout. |
paths=[fi:<turn>.files/...], optional mode |
| react.memsearch | Semantic search over past turns | Returns compact snippets with turn_id, timestamps, relevance scores. Targets: assistant / user / attachment. |
Conversation index (no path argument) |
| react.hide | Replace a large in-context snippet with a placeholder | Original content remains retrievable via react.read. Only blocks within the editable tail window can be hidden. |
ar: fi: tc: so: ks: |
| react.search_files | Safely enumerate files under OUT_DIR or workdir | Returns paths + sizes + logical paths. Does not load content. Use react.read on results to load. |
outdir / workdir prefixes only |
| react.search_knowledge | Search knowledge space (bundle-provided) | Available only when the active bundle registers the tool (e.g., react.doc bundle). Returns ranked doc hits. |
ks: namespaces |
Every tool call follows a strict ordering contract: parameters must be supplied in the
documented field order (first field, second field, etc.). If the agent emits notes
with a decision, a react.notes block is contributed before the tool call.
Protocol errors and validation notices are emitted as react.notice blocks,
visible to the agent on the next round.
Artifact paths use stable logical namespaces (fi:, ar:, tc:, so:).
Physical execution paths returned by bundle namespace resolvers are valid only inside that
exec runtime and are explicitly not valid as inputs to react.* tools.
ReAct v2's output layer is built on the Channeled Streamer
(versatile_streamer.py), a tag-based protocol that routes
a single LLM stream into multiple named logical channels, each with independent
format, citation replacement, and subscriber fanout.
The model wraps each logical output in XML-like channel tags:
# Model output (single LLM call, multiple channels)
<channel:thinking>
Let me check the document structure first...
</channel:thinking>
<channel:answer>
Based on the report [[S:1]], the key findings are...
</channel:answer>
<channel:followup>
{"followups": ["Show me the full table", "Compare with last quarter"]}
</channel:followup>
The stream_with_channels() function parses these tags incrementally,
routing each chunk to the correct channel handler as it streams. Citation tokens
([[S:n]]) are replaced per-channel without modifying the stored raw output.
Internal reasoning trace. Routed to marker="thinking". Typically markdown format. Can be shown in a dedicated reasoning UI panel or suppressed from the end user.
User-visible main response. Routed to marker="answer". Supports markdown with live citation replacement ([[S:n]] → linked references).
JSON list of suggested follow-up prompts. Routed to a chat.followups step. Parsed incrementally for live UI rendering.
When composite streaming is enabled, JSON content is parsed field-by-field and each attribute delta is streamed to the canvas marker for live structured UI updates.
JSON sidecar with token counts, model metadata, and source IDs used in the answer. Enables client-side analytics without touching the main answer stream.
Structured JSON channels carrying the agent's next action (tool call + params + reasoning). Parsed against a Pydantic model for protocol validation.
Each channel can carry one or more subscribers — side-effect handlers
that run alongside the primary emit function. A subscriber on the
usage channel, for example, can write source IDs to a timeline record
streamer while the answer channel continues streaming to the user uninterrupted.
# Illustrative adaptation (from docs pattern)
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import (
ChannelSpec, ChannelSubscribers, stream_with_channels
)
channels = [
ChannelSpec(name="thinking", format="markdown", replace_citations=False, emit_marker="thinking"),
ChannelSpec(name="answer", format="markdown", replace_citations=True, emit_marker="answer"),
ChannelSpec(name="followup", format="json", replace_citations=False, emit_marker="followup"),
ChannelSpec(name="usage", format="json", replace_citations=False, emit_marker="answer"),
]
results, meta = await stream_with_channels(
svc=svc,
messages=[system_msg, user_msg],
role="answer.generator.regular",
channels=channels,
emit=_emit_wrapper,
agent="my.react.agent",
artifact_name="response",
sources_list=sources_list,
subscribers=ChannelSubscribers().subscribe("usage", _record_usage_sids),
return_full_raw=True,
)
answer_text = results["answer"].raw
used_sids = results["answer"].used_sources
service_err = (meta or {}).get("service_error")
Citation tokens ([[S:1]], [[S:1,3]], [[S:2-4]])
are replaced at stream time for markdown and text channels,
and replaced with <sup class="cite"> tags for html
channels. The stored raw output is never modified; replacement happens
only in the bytes sent to the client.
A stateful per-channel citation tokenizer handles tokens split across chunk
boundaries, ensuring [[S:n]] is never partially emitted to the client.
This allows citation-heavy responses to stream cleanly without client-side parsing.
For managed JSON artifacts, the CompositeJsonArtifactStreamer can
be attached to a JSON channel. As the model streams a JSON object, each top-level
field is parsed incrementally and emitted as a per-attribute delta to a separate
marker (typically canvas). This enables live structured UI rendering
before the full JSON closes.
The sources pool is a per-conversation registry of canonical source rows accumulated across the entire conversation lifecycle — not just the current turn.
web_tools.web_search and web_tools.web_fetchimage/*) for rendering/embeddingimage/*)sources.yaml when a skill is loaded via react.readNon-image file types (XLSX, PPTX, DOCX, PDFs, archives) are not added to the sources pool. Only images are eligible as attachment/file sources.
Sources are merged by normalized URL (or physical_path for local files).
Once a source is assigned a sequential ID (sid), that ID is
stable for the entire conversation. Duplicate URLs reuse the existing SID;
only genuinely new sources receive the next integer.
This means [[S:3]] in turn 2 refers to the same source as [[S:3]]
in turn 12. Clients can rehydrate citations by matching sources_used SIDs
in any artifact against the current sources_pool in timeline.json.
The full pool is stored as artifact:conv:sources_pool. A compact snapshot
(sid, title, url, short text, and limited metadata) is embedded in the timeline artifact
for fast local access. The compact snapshot is rendered as a single SOURCES POOL
tail block when the timeline is rendered with include_sources=True.
# Load specific source rows from docs pattern
react.read(["so:sources_pool[1-5]"])
react.read(["so:sources_pool[1,3,7]"])
# Inline citation tokens (docs pattern)
[[S:1]] # single source
[[S:1,3]] # multiple sources
[[S:2-4]] # range of sources
# Note: only web sources (http/https) should be cited as evidence.
# Image sources are for rendering only, not evidence citations.
Because SIDs are stable across turns, a source indexed as [[S:5]]
from a web fetch in turn 1 remains [[S:5]] in the final answer
in turn 15. The streamer resolves citations from the current pool state at
stream time, so the client always receives correct hyperlinks regardless of
when the source was first added.
ReAct v2 approaches memory not as a separate subsystem but as a set of layered strategies on the timeline: caching, compaction, TTL pruning, semantic search, and selective hiding all operate on the same block stream.
The context browser inserts up to three cache checkpoints per rendered timeline, enabling providers that support prompt caching (such as Anthropic) to serve most of the context from cache on repeated calls:
offset_rounds before the tail round (when enough rounds exist).Sources and announce blocks are appended after all checkpoints and remain uncached every round, ensuring the agent always sees fresh source data and plan status without invalidating the stable prefix cache.
When the visible timeline exceeds the model budget, earlier blocks are
compacted into a conv.range.summary block at the cut point.
The original blocks are removed from the persisted payload; future renders
start from the summary forward. This is an edit-over-rewrite
approach — the timeline is surgically trimmed, not discarded and restarted.
When cache_ttl_seconds is set in the session configuration,
blocks from previous turns are replaced with compact truncated placeholders
after the TTL expires. A system.message block is appended to
explain the pruning and guide the agent to restore specific paths via
react.read(path). This mechanism keeps the active context window
efficient for long-lived sessions without losing the ability to
recall prior content.
react.memsearch — Persistent recall
Even after compaction or TTL pruning, prior turn content is indexed semantically.
react.memsearch allows the agent to query that index for relevant
snippets from any historical turn, surfacing them with scores and timestamps.
The agent can then selectively reload full content via react.read.
# Docs pattern: semantic recall from past turns
react.memsearch(
query="database schema discussed last week",
targets=["assistant", "user"],
top_k=5,
days=30
)
# Returns: [{turn_id, text, score, ts}, ...]
react.hide — Selective context pruning
Within a single turn, the agent can replace large blocks in the editable tail
window with short placeholders using react.hide. The original
content remains accessible via react.read. This is useful for
hiding large file artifacts that are no longer needed in the active context
window, freeing space for new tool results.
react.hide is constrained to the editable tail window — blocks before the pre-tail cache checkpoint cannot be hidden. This prevents cache invalidation from inadvertent edits to stable prefixes.
Plans are tracked as explicit react.plan JSON snapshot blocks
in the timeline, updated each time the decision agent acknowledges progress
in its reasoning notes. A human-readable react.plan.ack block
is emitted alongside. The active plan is surfaced in the announce section
each round; closed or completed plans remain in the timeline history but are
no longer re-announced.
# Plan snapshot block (docs pattern, simplified)
{
"type": "react.plan",
"mime": "application/json",
"path": "ar:<turn_id>.react.plan.<plan_id>",
"text": {
"plan_id": "plan_abc123",
"origin_turn_id": "turn_001",
"steps": [
{ "n": 1, "label": "Gather sources", "status": "done" },
{ "n": 2, "label": "Draft report", "status": "pending" }
]
}
}
Examples below are labeled either FROM DOCS PATTERN (directly reproduced or lightly formatted from SDK documentation) or ILLUSTRATIVE ADAPTATION (derived structure; not a verbatim API contract).
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import (
ChannelSpec, ChannelSubscribers, stream_with_channels
)
# Define channels emitted by the model
channels = [
ChannelSpec(
name="answer",
format="markdown",
replace_citations=True,
emit_marker="answer",
),
ChannelSpec(
name="usage",
format="json",
replace_citations=False,
emit_marker="answer",
),
]
# Attach a subscriber to the "usage" channel for side-effect fanout
subscribers = ChannelSubscribers().subscribe("usage", _usage_json_fanout)
# Invoke streaming (docs pattern; see versatile_streamer.py)
results, meta = await stream_with_channels(
svc=svc,
messages=[system_msg, user_msg],
role="answer.generator.regular",
channels=channels,
emit=_emit_wrapper,
agent="my.agent",
artifact_name="report",
sources_list=sources_list,
subscribers=subscribers,
return_full_raw=True,
)
answer_raw = results["answer"].raw
used_sources = results["answer"].used_sources
service_err = (meta or {}).get("service_error")
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import ChannelSpec, stream_with_channels
from kdcube_ai_app.apps.chat.sdk.streaming.artifacts_channeled_streaming import CompositeJsonArtifactStreamer
# Answer channel carries JSON; composite streamer fans it out to canvas
channels = [
ChannelSpec(name="answer", format="json", replace_citations=False, emit_marker="answer"),
ChannelSpec(name="usage", format="json", replace_citations=False, emit_marker="answer"),
]
results = await stream_with_channels(
svc=svc,
messages=[system_msg, user_msg],
role="answer.generator.regular",
channels=channels,
emit=emit_delta,
agent="my.agent",
artifact_name="my.json.artifact",
composite_cfg={"artifactA": "path.to.schema"},
composite_channel="answer", # route JSON channel into composite streamer
composite_marker="canvas", # emit per-attribute deltas to canvas
)
# Step 1: Sources are accumulated across turns into conv:sources_pool.
# The pool is loaded at turn start via ctx_browser.load_timeline().
# Step 2: Decision agent loads specific sources by SID range (docs pattern)
# Inside the react decision loop:
react.read(["so:sources_pool[1-5]"]) # loads sources 1 through 5
react.read(["so:sources_pool[1,3,7]"]) # loads sources 1, 3, 7
# Step 3: Agent cites sources inline in generated text
# The streamer replaces [[S:n]] tokens during streaming (never in stored raw)
answer_text = """
According to the analysis [[S:1]], the platform supports
multi-channel streaming [[S:2,3]]. Cache efficiency results
in up to 60% cost reduction [[S:1]].
"""
# Step 4: Client rehydrates citations using sources_used SIDs from the artifact
# matched against current sources_pool in timeline.json
used_sids = results["answer"].used_sources # e.g. [1, 2, 3]
pool = timeline_json["sources_pool"]
refs = [s for s in pool if s["sid"] in used_sids]
# Semantic search over past turns — useful after compaction or TTL pruning
# Inside the react decision loop (react.memsearch is a react.* in-loop tool):
react.memsearch(
query="authentication configuration from last session",
targets=["assistant", "user"],
top_k=5,
days=90
)
# Returns blocks like:
# [{"turn_id": "turn_123", "text": "...", "score": 0.84, "ts": "2026-02-01T...Z"}]
# Then restore the full content if needed:
react.read(["fi:turn_123.files/config.yaml"])
Conceptual comparison only. KDCube does not publish direct benchmarks or first-party technical comparisons with third-party systems. The table below focuses on the design properties that make this runtime distinct: timeline-first context, the attention area, cache-aware context shaping, layered memory, and explicit workspace control. These are architectural contrasts, not benchmark claims.
| Architectural property | KDCube ReAct v2 | Common general-purpose agentic pattern |
|---|---|---|
| Primary context model | Timeline-first. The agent reads one ordered block stream that can contain user prompts, tool activity, plans, source updates, runtime notices, feedback, and other events on the same temporal surface. | Usually transcript-first. The dominant abstraction is a message array plus assistant/tool-call/tool-result turns, with other runtime facts injected ad hoc. |
| Shared event landscape | The runtime explicitly models that not all important events are caused by the agent. Steer, feedback, pruning notices, source-pool updates, workspace publish notices, and service alerts can all appear beside tool-caused events. | Usually optimized for assistant-caused tool use. External or runtime-originated events tend to be bolted on as extra prompt text or ignored by the core protocol. |
| Attention area / signal board | A fixed tail surface keeps SOURCES POOL and ANNOUNCE directly in front of the model. High-priority runtime signals, budget, time, plan state, and workspace status always appear in the same place. |
No formal attention-area concept. Current state is usually scattered across system prompts, hidden middleware state, or freshly rebuilt messages each round. |
| Streaming contract | Channeled generation. One model call can drive thinking, structured decision output, and code as separate streams that widgets and runtime components consume live. |
Usually one assistant stream, or multiple independent calls stitched together after the fact. Tool intent and code payloads often compete for the same output channel. |
| Cache-aware context shaping | Cached system message + three moving timeline checkpoints. Stable prefix stays reusable while the active tail remains editable. react.hide, TTL pruning, and compaction all work within that design. |
Depends on provider and framework. Many systems rebuild the full context each round and treat hiding/pruning/summarization as external maintenance rather than part of the runtime contract. |
| Memory architecture | Memory is layered: timeline, attention area, turn log, workspace, hosted artifacts, indexed conversation memory, compaction summaries, hidden replacement text, plans, notes, and feedback all play distinct roles and are reopened through logical paths. | Often presented as one generic memory layer or one RAG step. Different memory roles usually require custom app logic rather than being first-class in the runtime. |
| Workspace control | Workspace state is explicit. react.pull(...) brings historical refs locally as readonly material; react.checkout(mode="replace"|"overlay", ...) defines what is materialized into the active current-turn workspace. files/... and outputs/... are separate namespaces. Both custom and git backends are supported. |
Often the local filesystem is treated as the implicit truth. Historical versions, active workspace state, and produced artifacts are commonly mixed together or left to app-specific conventions. |
| Working memory and decision trace | react.plan, react.plan.ack, react.notes, and internal note lines tagged with [P], [D], and [S] let the runtime persist preferences, rationale, and technical structure as part of the conversation memory model. |
Usually left in transient prompt text or app-specific side stores. The runtime rarely gives these forms stable paths and a clear relationship to the main context artifact. |
| Runtime-owned enforcement and feedback | Tool protocol is owned by the SDK runtime. Validation failures become visible notices inside the loop, so the agent can correct itself against runtime rules rather than only provider schemas. | Validation is often pushed down to provider JSON schema or external middleware. Runtime feedback inside the next model round is commonly absent or inconsistent. |
| Execution model | Designed for distributed isolated execution. Generated code can run in controlled local or remote sandboxes while still seeing logical paths, workspace rules, and runtime-owned tool proxies. | Frequently tied to a single host process, a personal local agent model, or a thin wrapper around provider-native tool calling with less explicit isolation semantics. |
| Open source / self-hostable | Available via KDCube open-source AI app repository. Deployable to Kubernetes, local, or cloud infrastructure. | Varies. Some frameworks are open source; many polished hosted agent experiences are not self-hostable. |
ReAct v2 is designed as a production-first SDK. The distinctive part is not only that it can call tools. It is that the runtime owns the event landscape, the attention area, cache behavior, memory surfaces, and workspace semantics around those tools.