SDK Single-Agent Loop Timeline-Native Multi-Channel Streaming

KDCube ReAct v2 Agent

A timeline-native, single-agent decision loop designed for precision reasoning, structured multi-channel output, and long-lived conversational memory — built for the demands of production AI applications.

Timeline as single source of truth Contribute vs announce model Multi-channel streamer Cross-conversation source pool react.* in-loop tools

What is KDCube ReAct v2?

ℹ️

Naming note: The runtime is referred to as "ReAct v2", "React v2", and "ReAct Agent" across different parts of the codebase and documentation. These names refer to the same system; this page uses ReAct v2 as the canonical form.

ReAct v2 is KDCube's production decision-loop agent. It implements a single-agent Reason + Act loop running over a shared conversation timeline, where every tool call, artifact, plan update, and final answer is recorded as an ordered block in that timeline. The timeline acts as the single source of truth for all in-turn context, history, and compaction.

Unlike multi-agent pipelines that pass context through separate message buses, ReAct v2 keeps all state in one timeline artifact (conv.timeline.v1) and a companion source pool (conv:sources_pool). This design makes every turn reproducible, debuggable, and efficiently cacheable.

Core philosophy

🔁

Single-agent loop

One ReAct agent drives the full decision loop per turn. No separate coordinator or final-answer generator is required in the reference implementation. A lightweight optional Gate agent runs only for new conversations to extract a title.

📋

Timeline-first design

The timeline is built from contribute (persistent) and announce (ephemeral) blocks. Agents read a rendered snapshot at every round — not a raw message array — giving precise control over what is cached and what is fresh.

🔒

Evidence over inference

Tool results, artifact paths, and source citations are recorded as typed blocks with stable logical paths. The agent always has a verifiable audit trail for every action taken.

📡

Multi-channel output

A tag-based streaming protocol lets the model emit multiple logical outputs — thinking, answer, JSON sidecars, follow-up suggestions — in a single LLM call, each routed to the correct consumer.

Why it matters for production AI

Production AI apps face three recurring problems: context drift (what did we discuss three turns ago?), output fragmentation (how do I show reasoning and a structured result at the same time?), and cost at scale (how do I avoid re-feeding the whole conversation on every round?). ReAct v2 addresses all three directly through its timeline design, multi-channel streamer, and dual-checkpoint caching system.

Timeline & Side Effects Model

The timeline is the authoritative, ordered log of everything that happened in a conversation. It is loaded at turn start, updated as the turn progresses, and persisted at turn end as a single JSON artifact.

Contribute vs Announce

Contribute (persistent)

Blocks written via ctx_browser.contribute() are saved into the timeline and appear in all future renders. Used for:

  • User prompts and attachments
  • Gate and ReAct stage outputs
  • Tool call / tool result blocks (react.tool.call, react.tool.result)
  • Plan snapshots (react.plan) and acknowledgements (react.plan.ack)
  • Final assistant completion (assistant.completion)

Announce (ephemeral)

Appended to the rendered tail when include_announce=True — never persisted as part of the main timeline. Used for:

  • Current iteration count and remaining budget
  • Authoritative temporal context (UTC, user timezone)
  • Active plan status with step markers (✓ / ✗ / □)
  • System notices (e.g., cache TTL pruning alerts)
  • Feedback updates (shown until incorporated)
💡

The announce model keeps high-frequency state signals out of the cached timeline. This preserves cache hits across rounds while still giving the agent fresh situational awareness on every decision call.

Timeline block stream with side effects
TURN turn_A (persisted contribute blocks) user.prompt tool.call tool.result react.plan assistant .completion → sources_pool update → turn.log persist ANNOUNCE (ephemeral — appended at tail, never persisted) iteration / budget / active plan / temporal context

Turn lifecycle

  1. 1

    Load timeline

    ctx_browser.load_timeline() fetches artifact:conv.timeline.v1 and artifact:conv:sources_pool, hydrating in-memory state for the turn.

  2. 2

    Contribute user input

    User prompt and any attachments are contributed as persistent blocks, joining the timeline block stream.

  3. 3

    Gate agent (optional)

    Only for new conversations. The gate agent renders the timeline without sources or announce, emits a title block, and contributes it. All subsequent turns skip the gate entirely.

  4. 4

    ReAct decision loop

    Each round: render timeline (with sources + announce), call the LLM, execute the chosen tool, contribute results back into the timeline. Repeat until the agent emits a final answer or the iteration budget is exhausted.

  5. 5

    Persist

    Both conv.timeline.v1 and conv:sources_pool are written back to storage. The turn log (artifact:turn.log) records the current-turn blocks for fast next-turn reconstruction.

Compaction and TTL pruning

When the rendered timeline would exceed the model's context budget, the runtime compacts earlier blocks into a single conv.range.summary block. The compacted blocks are removed from the persisted payload; future renders start from the summary forward.

Separately, when a session cache_ttl_seconds is configured, blocks older than the TTL are replaced with truncated placeholders on render. A system.message block is appended advising the agent to use react.read(path) to restore any specific logical path.

Block ordering (schematic)

# Typical timeline block sequence (condensed)
[TURN turn_A]
  user.prompt
  user.attachment.meta
  stage.gate                  # optional, new conv only
  stage.react
  react.tool.call             # tool params
  react.tool.result           # rendered artifact metadata
  react.plan                  # JSON snapshot of plan
  react.plan.ack              # human-readable ack
  assistant.completion        # [sources_used: [1,2,3]]

[TURN turn_B]
  user.prompt
  ...
[SOURCES POOL]                # appended at tail (uncached)
[ANNOUNCE]                    # appended at tail (uncached)

Custom Tool-Calling Architecture (react.*)

ReAct Agent V2 — Loop & Tool Integration (detailed)
ReAct Agent V2 — Detailed Loop & Tool Integration autonomous loop — up to max_iterations rounds User Input user.prompt + attachments Context Assembly render_timeline() + sources + announce + 3 cache checkpoints LLM Decision react_decision_stream_v2 structured JSON output: action + tool + params + notes Output Channels thinking / answer followup / canvas usage sidecar → Client streamed fanout tool call Tool-Call Validation Gate protocol order check · path namespace validation react.notice on error → agent sees next round react.* tool external tool In-loop react.* Tools react.read · react.stream react.file · react.patch react.hide · react.memsearch operate on timeline & data spaces External Tool Execution web_tools · code_exec mcp_tools · skill tools subprocess / network result returned to agent Contribute to Timeline ctx_browser.contribute() tool.call + tool.result blocks round contribution → timeline state Budget & Iteration Check rounds < max_iterations? token budget remaining? yes → next round no → final answer next round Persist: conv.timeline.v1 + conv:sources_pool + turn.log + artifacts → turn end

Standard LLM tool calling routes tool invocations to external executors. ReAct v2 supplements this with a set of in-loop react.* tools that operate directly on the timeline and the agent's data spaces — without leaving the decision loop.

⚠️

Naming note: Documentation references both react.stream and react.file for output writing. Earlier drafts sometimes used react.write. The authoritative tool names per current docs are react.stream (streaming to display/file) and react.file (write to file). Use those names in new bundles.

Data spaces

ks: Knowledge Space

Read-only. Reference files prepared by the system — docs, indexes, cloned repos. Accessed via react.read("ks:path/to/doc.md").

fi: OUT_DIR

Read/write during the turn. Per-turn execution artifacts. Files map to fi:<turn_id>.files/<relpath>. Writable via react.file / react.stream.

workspace: (future)

Not yet implemented. A shared, writable workspace that persists across turns — designed for copilot-style, long-running project flows.

Tool catalog

Tool Purpose Key behavior Path family
react.read Load an existing artifact into timeline context Emits a status block first (dedup check), then artifact blocks. Re-exposes hidden artifacts. Clears hidden flag. fi: ar: so: su: tc: ks:
react.stream Stream or generate content for a new artifact kind=display shows to user; kind=file records as file. Content goes to the named channel (canvas / timeline_text). OUT_DIR-relative path under <turn_id>/files/
react.file Write content to a file and emit result metadata Stores file on disk; emits react.tool.result metadata block. Suitable for non-streamed file writes. OUT_DIR-relative path under <turn_id>/files/
react.patch Patch an existing file; supports unified diff If patch starts with ---/+++/@@ → unified diff; otherwise full replacement. Emits patch text as result. OUT_DIR-relative path under <turn_id>/files/
react.memsearch Semantic search over past turns Returns compact snippets with turn_id, timestamps, relevance scores. Targets: assistant / user / attachment. Conversation index (no path argument)
react.hide Replace a large in-context snippet with a placeholder Original content remains retrievable via react.read. Only blocks within the editable tail window can be hidden. ar: fi: tc: so: ks:
react.search_files Safely enumerate files under OUT_DIR or workdir Returns paths + sizes + logical paths. Does not load content. Use react.read on results to load. outdir / workdir prefixes only
react.search_knowledge Search knowledge space (bundle-provided) Available only when the active bundle registers the tool (e.g., react.doc bundle). Returns ranked doc hits. ks: namespaces

Protocol validation

Every tool call follows a strict ordering contract: parameters must be supplied in the documented field order (first field, second field, etc.). If the agent emits notes with a decision, a react.notes block is contributed before the tool call. Protocol errors and validation notices are emitted as react.notice blocks, visible to the agent on the next round.

Artifact paths use stable logical namespaces (fi:, ar:, tc:, so:). Physical execution paths returned by bundle namespace resolvers are valid only inside that exec runtime and are explicitly not valid as inputs to react.* tools.

Multi-Channel Streaming

ReAct v2's output layer is built on the Channeled Streamer (versatile_streamer.py), a tag-based protocol that routes a single LLM stream into multiple named logical channels, each with independent format, citation replacement, and subscriber fanout.

How it works

The model wraps each logical output in XML-like channel tags:

# Model output (single LLM call, multiple channels)
<channel:thinking>
  Let me check the document structure first...
</channel:thinking>

<channel:answer>
  Based on the report [[S:1]], the key findings are...
</channel:answer>

<channel:followup>
  {"followups": ["Show me the full table", "Compare with last quarter"]}
</channel:followup>

The stream_with_channels() function parses these tags incrementally, routing each chunk to the correct channel handler as it streams. Citation tokens ([[S:n]]) are replaced per-channel without modifying the stored raw output.

Channel types in ReAct v2

thinking

Thinking

Internal reasoning trace. Routed to marker="thinking". Typically markdown format. Can be shown in a dedicated reasoning UI panel or suppressed from the end user.

answer

Answer

User-visible main response. Routed to marker="answer". Supports markdown with live citation replacement ([[S:n]] → linked references).

followup

Follow-up

JSON list of suggested follow-up prompts. Routed to a chat.followups step. Parsed incrementally for live UI rendering.

canvas

Canvas / JSON artifact

When composite streaming is enabled, JSON content is parsed field-by-field and each attribute delta is streamed to the canvas marker for live structured UI updates.

usage

Usage sidecar

JSON sidecar with token counts, model metadata, and source IDs used in the answer. Enables client-side analytics without touching the main answer stream.

ReactDecisionOut / code

Structured decision output

Structured JSON channels carrying the agent's next action (tool call + params + reasoning). Parsed against a Pydantic model for protocol validation.

Single-round multi-channel streaming flow
LLM stream Channel Parser versatile_streamer thinking answer canvas usage Client reasoning panel answer stream canvas/JSON analytics sidecar

Listener attachment and subscriber fanout

Each channel can carry one or more subscribers — side-effect handlers that run alongside the primary emit function. A subscriber on the usage channel, for example, can write source IDs to a timeline record streamer while the answer channel continues streaming to the user uninterrupted.

# Illustrative adaptation (from docs pattern)
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import (
    ChannelSpec, ChannelSubscribers, stream_with_channels
)

channels = [
    ChannelSpec(name="thinking",  format="markdown", replace_citations=False, emit_marker="thinking"),
    ChannelSpec(name="answer",    format="markdown", replace_citations=True,  emit_marker="answer"),
    ChannelSpec(name="followup",  format="json",     replace_citations=False, emit_marker="followup"),
    ChannelSpec(name="usage",     format="json",     replace_citations=False, emit_marker="answer"),
]

results, meta = await stream_with_channels(
    svc=svc,
    messages=[system_msg, user_msg],
    role="answer.generator.regular",
    channels=channels,
    emit=_emit_wrapper,
    agent="my.react.agent",
    artifact_name="response",
    sources_list=sources_list,
    subscribers=ChannelSubscribers().subscribe("usage", _record_usage_sids),
    return_full_raw=True,
)

answer_text = results["answer"].raw
used_sids   = results["answer"].used_sources
service_err = (meta or {}).get("service_error")
Subscriber / listener attachment pattern
stream_with _channels() Primary emit all channels → client Subscriber "usage" → record_sids Timeline record sources_used + meta Subscriber runs alongside emit — does not interrupt the answer stream

Citation replacement and token savings

Citation tokens ([[S:1]], [[S:1,3]], [[S:2-4]]) are replaced at stream time for markdown and text channels, and replaced with <sup class="cite"> tags for html channels. The stored raw output is never modified; replacement happens only in the bytes sent to the client.

A stateful per-channel citation tokenizer handles tokens split across chunk boundaries, ensuring [[S:n]] is never partially emitted to the client. This allows citation-heavy responses to stream cleanly without client-side parsing.

Composite JSON streaming

For managed JSON artifacts, the CompositeJsonArtifactStreamer can be attached to a JSON channel. As the model streams a JSON object, each top-level field is parsed incrementally and emitted as a per-attribute delta to a separate marker (typically canvas). This enables live structured UI rendering before the full JSON closes.

Source Management

The sources pool is a per-conversation registry of canonical source rows accumulated across the entire conversation lifecycle — not just the current turn.

What goes into the pool

ℹ️

Non-image file types (XLSX, PPTX, DOCX, PDFs, archives) are not added to the sources pool. Only images are eligible as attachment/file sources.

Stable SID numbering and deduplication

Sources are merged by normalized URL (or physical_path for local files). Once a source is assigned a sequential ID (sid), that ID is stable for the entire conversation. Duplicate URLs reuse the existing SID; only genuinely new sources receive the next integer.

This means [[S:3]] in turn 2 refers to the same source as [[S:3]] in turn 12. Clients can rehydrate citations by matching sources_used SIDs in any artifact against the current sources_pool in timeline.json.

Storage and rendering

The full pool is stored as artifact:conv:sources_pool. A compact snapshot (sid, title, url, short text, and limited metadata) is embedded in the timeline artifact for fast local access. The compact snapshot is rendered as a single SOURCES POOL tail block when the timeline is rendered with include_sources=True.

Accessing and citing sources

In the decision loop

# Load specific source rows from docs pattern
react.read(["so:sources_pool[1-5]"])
react.read(["so:sources_pool[1,3,7]"])

In generated text

# Inline citation tokens (docs pattern)
[[S:1]]       # single source
[[S:1,3]]     # multiple sources
[[S:2-4]]     # range of sources
# Note: only web sources (http/https) should be cited as evidence.
# Image sources are for rendering only, not evidence citations.
Cross-conversation source pool with stable SIDs
Turn 1 web_search → 3 URLs → SID 1, 2, 3 assigned Turn 5 web_fetch → 2 new URLs → SID 4, 5 assigned Turn 12 web_search → URL 2 again → SID 2 reused (dedup) conv:sources_pool (persisted, cross-turn) SID 1 | SID 2 | SID 3 | SID 4 | SID 5 — stable for entire conversation Citations [[S:n]] resolve identically in turn 1 and turn 15

Cross-document citation continuity

Because SIDs are stable across turns, a source indexed as [[S:5]] from a web fetch in turn 1 remains [[S:5]] in the final answer in turn 15. The streamer resolves citations from the current pool state at stream time, so the client always receives correct hyperlinks regardless of when the source was first added.

Memory & Context Management

ReAct v2 approaches memory not as a separate subsystem but as a set of layered strategies on the timeline: caching, compaction, TTL pruning, semantic search, and selective hiding all operate on the same block stream.

Three-checkpoint caching

The context browser inserts up to three cache checkpoints per rendered timeline, enabling providers that support prompt caching (such as Anthropic) to serve most of the context from cache on repeated calls:

Sources and announce blocks are appended after all checkpoints and remain uncached every round, ensuring the agent always sees fresh source data and plan status without invalidating the stable prefix cache.

Compaction (edit-over-rewrite framing)

When the visible timeline exceeds the model budget, earlier blocks are compacted into a conv.range.summary block at the cut point. The original blocks are removed from the persisted payload; future renders start from the summary forward. This is an edit-over-rewrite approach — the timeline is surgically trimmed, not discarded and restarted.

TTL-based context expiry efficiency

When cache_ttl_seconds is set in the session configuration, blocks from previous turns are replaced with compact truncated placeholders after the TTL expires. A system.message block is appended to explain the pruning and guide the agent to restore specific paths via react.read(path). This mechanism keeps the active context window efficient for long-lived sessions without losing the ability to recall prior content.

react.memsearch — Persistent recall

Even after compaction or TTL pruning, prior turn content is indexed semantically. react.memsearch allows the agent to query that index for relevant snippets from any historical turn, surfacing them with scores and timestamps. The agent can then selectively reload full content via react.read.

# Docs pattern: semantic recall from past turns
react.memsearch(
    query="database schema discussed last week",
    targets=["assistant", "user"],
    top_k=5,
    days=30
)
# Returns: [{turn_id, text, score, ts}, ...]

react.hide — Selective context pruning

Within a single turn, the agent can replace large blocks in the editable tail window with short placeholders using react.hide. The original content remains accessible via react.read. This is useful for hiding large file artifacts that are no longer needed in the active context window, freeing space for new tool results.

⚠️

react.hide is constrained to the editable tail window — blocks before the pre-tail cache checkpoint cannot be hidden. This prevents cache invalidation from inadvertent edits to stable prefixes.

Plan snapshots

Plans are tracked as explicit react.plan JSON snapshot blocks in the timeline, updated each time the decision agent acknowledges progress in its reasoning notes. A human-readable react.plan.ack block is emitted alongside. The active plan is surfaced in the announce section each round; closed or completed plans remain in the timeline history but are no longer re-announced.

# Plan snapshot block (docs pattern, simplified)
{
  "type": "react.plan",
  "mime": "application/json",
  "path": "ar:<turn_id>.react.plan.<plan_id>",
  "text": {
    "plan_id": "plan_abc123",
    "origin_turn_id": "turn_001",
    "steps": [
      { "n": 1, "label": "Gather sources",  "status": "done" },
      { "n": 2, "label": "Draft report",    "status": "pending" }
    ]
  }
}

Code Examples

ℹ️

Examples below are labeled either FROM DOCS PATTERN (directly reproduced or lightly formatted from SDK documentation) or ILLUSTRATIVE ADAPTATION (derived structure; not a verbatim API contract).

1 — Channel definition and listener setup

FROM DOCS PATTERN
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import (
    ChannelSpec, ChannelSubscribers, stream_with_channels
)

# Define channels emitted by the model
channels = [
    ChannelSpec(
        name="answer",
        format="markdown",
        replace_citations=True,
        emit_marker="answer",
    ),
    ChannelSpec(
        name="usage",
        format="json",
        replace_citations=False,
        emit_marker="answer",
    ),
]

# Attach a subscriber to the "usage" channel for side-effect fanout
subscribers = ChannelSubscribers().subscribe("usage", _usage_json_fanout)

# Invoke streaming (docs pattern; see versatile_streamer.py)
results, meta = await stream_with_channels(
    svc=svc,
    messages=[system_msg, user_msg],
    role="answer.generator.regular",
    channels=channels,
    emit=_emit_wrapper,
    agent="my.agent",
    artifact_name="report",
    sources_list=sources_list,
    subscribers=subscribers,
    return_full_raw=True,
)

answer_raw   = results["answer"].raw
used_sources = results["answer"].used_sources
service_err  = (meta or {}).get("service_error")

2 — Multi-channel streaming round with composite JSON

FROM DOCS PATTERN
from kdcube_ai_app.apps.chat.sdk.streaming.versatile_streamer import ChannelSpec, stream_with_channels
from kdcube_ai_app.apps.chat.sdk.streaming.artifacts_channeled_streaming import CompositeJsonArtifactStreamer

# Answer channel carries JSON; composite streamer fans it out to canvas
channels = [
    ChannelSpec(name="answer", format="json", replace_citations=False, emit_marker="answer"),
    ChannelSpec(name="usage",  format="json", replace_citations=False, emit_marker="answer"),
]

results = await stream_with_channels(
    svc=svc,
    messages=[system_msg, user_msg],
    role="answer.generator.regular",
    channels=channels,
    emit=emit_delta,
    agent="my.agent",
    artifact_name="my.json.artifact",
    composite_cfg={"artifactA": "path.to.schema"},
    composite_channel="answer",   # route JSON channel into composite streamer
    composite_marker="canvas",    # emit per-attribute deltas to canvas
)

3 — Source referencing across documents

ILLUSTRATIVE ADAPTATION
# Step 1: Sources are accumulated across turns into conv:sources_pool.
# The pool is loaded at turn start via ctx_browser.load_timeline().

# Step 2: Decision agent loads specific sources by SID range (docs pattern)
# Inside the react decision loop:
react.read(["so:sources_pool[1-5]"])   # loads sources 1 through 5
react.read(["so:sources_pool[1,3,7]"]) # loads sources 1, 3, 7

# Step 3: Agent cites sources inline in generated text
# The streamer replaces [[S:n]] tokens during streaming (never in stored raw)
answer_text = """
According to the analysis [[S:1]], the platform supports
multi-channel streaming [[S:2,3]]. Cache efficiency results
in up to 60% cost reduction [[S:1]].
"""

# Step 4: Client rehydrates citations using sources_used SIDs from the artifact
# matched against current sources_pool in timeline.json
used_sids = results["answer"].used_sources  # e.g. [1, 2, 3]
pool      = timeline_json["sources_pool"]
refs      = [s for s in pool if s["sid"] in used_sids]

4 — react.memsearch for persistent recall

FROM DOCS PATTERN
# Semantic search over past turns — useful after compaction or TTL pruning
# Inside the react decision loop (react.memsearch is a react.* in-loop tool):

react.memsearch(
    query="authentication configuration from last session",
    targets=["assistant", "user"],
    top_k=5,
    days=90
)

# Returns blocks like:
# [{"turn_id": "turn_123", "text": "...", "score": 0.84, "ts": "2026-02-01T...Z"}]

# Then restore the full content if needed:
react.read(["fi:turn_123.files/config.yaml"])

Comparison: ReAct v2 vs Standard Tool-Calling Agents

⚠️

Conceptual comparison only. KDCube does not publish direct benchmarks or first-party technical comparisons with third-party systems. The table below describes architectural design properties of ReAct v2 as documented in the SDK, and contrasts them with common characteristics of general-purpose agentic frameworks (e.g., tool-calling patterns in Claude Artifacts, ChatGPT Code Interpreter, or generic LangGraph-style agents). These comparisons are reasoned from design intent, not empirically measured.

Capability KDCube ReAct v2 General agentic frameworks / tool-calling agents
Context representation Single ordered timeline artifact with typed blocks; every tool call, result, and plan stored as a block with stable logical paths. Typically a message array or in-memory list; tool results appended as assistant/tool messages. Less structured for audit or replay.
Multi-channel streaming output Tag-based protocol (<channel:thinking>, <channel:answer>, etc.) with per-channel format, citation replacement, and subscriber fanout in a single LLM call. Typically single output stream or separate API calls. Structured output and reasoning trace usually require multiple round-trips or post-processing.
Prompt cache efficiency Dual-checkpoint + pre-tail caching strategy. Stable prefix is cached across rounds; sources/announce remain uncached. Designed to minimize cache invalidation during multi-round loops. Depends on provider and framework. Many frameworks rebuild the full message array each call, reducing cache hit rate. Cache strategy often manual or absent.
Cross-turn source continuity Sources pool persisted as a conversation-level artifact with stable SIDs. Citations ([[S:n]]) consistent across all turns in a conversation. Web search results typically ephemeral per turn. Cross-turn citation continuity requires custom implementation.
In-loop context management tools react.hide, react.memsearch, react.read — agent controls context compaction and recall directly from within the decision loop. Context management is generally external (framework truncation, summarization). Agent typically cannot selectively load/hide prior content.
Plan tracking with timeline persistence Explicit react.plan JSON snapshots + react.plan.ack blocks in the timeline. Active plan surfaced via announce. Plan history is auditable in the block stream. Plans are usually in-prompt instructions or ephemeral state. No standardized plan snapshot or audit trail in the context artifact.
Contribute vs announce separation Explicit split: persistent contribute blocks vs ephemeral announce signals. High-frequency state (iteration, budget, time) never pollutes the cached timeline. System messages and injected context usually rebuilt each call, increasing cache miss risk. No formal separation of ephemeral vs persistent signals.
Protocol validation on tool calls react.notice blocks emitted on protocol errors. Strict parameter ordering enforced per tool spec. Agent sees validation feedback on the next round. Tool call validation generally provider-side (JSON schema). Runtime protocol feedback to the agent typically absent or requires custom middleware.
Open source / self-hostable Available via KDCube open-source AI app repository. Deployable to Kubernetes, local, or cloud infrastructure. Varies. LangGraph, LlamaIndex are open source. Claude Artifacts and ChatGPT Code Interpreter are hosted-only.
💡

ReAct v2 is designed as a production-first SDK rather than a research agent framework. The architectural choices prioritize operational properties — cache efficiency, auditability, context control — that matter in deployed applications with real users and cost constraints.