How React manages cache without losing context

The runtime does not try to freeze the whole prompt. It keeps a reusable stable prefix, a fresh uncached tail, and explicit ways to reopen what was hidden. That is what lets the agent forget junk early without losing availability.

June 2026 update: ReAct's cache and compaction model now works with named-service objects too. Pulling mem:, task:, fi:, or other registered refs materializes readable artifacts while preserving meta.object_ref; block production and optional block rendering policies project provider-owned objects into bounded model-visible text. Cache still governs what stays visible, while object refs preserve the way back to authoritative data. See Object Ecosystem.

The key thing to understand is that React is not only chasing cache hits. It is managing what remains visible to the model as the turn evolves. The stable prefix, the three cache points, react.hide, TTL pruning, and compaction all serve that same goal: keep context efficient while still leaving the important things recoverable through summaries, turn indexes, and exact logical paths.

Short version: previous-turn cache point keeps the earlier prefix reusable, pre-tail cache point defines the editable tail, tail cache point moves with the latest stable round, react.hide clears garbage only in that tail, TTL pruning hides old payloads, and compaction inserts a summary only when the visible stream reaches the hard ceiling.

Cache is not durable user memory. Cache points and compaction decide what is cheap and visible in the current rendered context. Durable user memories live in the memory subsystem, can be inspected in the memory widget, and may appear only as a compact ANNOUNCE hotset when that subsystem is enabled.

1. The strategy starts with the shape of the rendered context

React renders a stable stream first, then appends SOURCES POOL and ANNOUNCE at the tail. That tail is intentionally uncached. The cache points live only in the stable stream. This matters because it lets the expensive reusable prefix stay calm while the live operational surface can keep moving every round.

Inside the rendered stable stream, React uses three anchors:

CP1 prev-turn: the last block before the current turn header
CP2 pre-tail: a checkpoint offset from the tail by cache_point_offset_rounds, present only when enough rounds exist
CP3 tail: the last block of the latest stable round

CP1 is about cross-turn reuse. CP2 and CP3 are about the current turn’s moving frontier.

There is also one more cache anchor outside the timeline stream. The decision call builds the system instruction as a separately cached system message, so the full request has a fixed system-instruction cache point first, then the three timeline checkpoints shown here. In other words: this article’s diagram focuses on the three cache points in the rendered timeline, not the prompt-level system cache marker.

React cache strategy across turns showing previous-turn, pre-tail, and tail checkpoints, tail-only react.hide, TTL pruning, compaction, and uncached tail blocks. — One focused picture of the rendered-timeline strategy: three cache points across turns, tail-only `react.hide`, uncached SOURCES POOL and ANNOUNCE, then TTL pruning and compaction as separate mechanisms.

2. The extra checkpoint exists so the agent can forget garbage early

The extra checkpoint is not decoration. It exists so the editable tail begins somewhere before the very end of the visible stream. Without that intermediate checkpoint, every hide decision would either be too risky for cache reuse or too constrained to be useful.

With CP2 in place, React can discover some large but useless result, decide it is junk, and hide it while keeping the earlier cached prefix intact. That is why the system has a real react.hide tool instead of asking the model to simply “ignore” old text.

react.hide is the early-forgetting tool. It removes bulky, no-longer-useful snippets from the model’s eyes, leaves a short replacement that explains what was hidden and why, and preserves the path so the agent can restore the original with react.read(path).

The runtime enforces this very explicitly:

the target must be after the pre-tail checkpoint
the target can also be rejected if it is too far from the tail by token distance
the tool accepts a logical path, not a search query
the original payload is not destroyed; it becomes hidden and recoverable

There is even one more nuance. If CP2 moves above CP1 during the turn, the runtime can treat CP2 as the effective previous-turn cache point for the rest of the turn, so hide is not blocked by an older cache anchor that no longer makes operational sense.

3. Cache points move with the turn

CP1 is fixed once the current turn begins. CP2 and CP3 are not. They move as rounds accumulate. That is exactly what you want in a live agent loop.

At the start of the turn, there may not even be enough rounds to place CP2. Later, once enough tool activity exists, the pre-tail checkpoint appears. As more rounds are added, both CP2 and CP3 march forward. So the editable tail is not a static byte range. It is a moving window tied to round structure.

React is not trying to preserve the whole current turn equally. It preserves a reusable prefix and lets the active tail remain editable.

4. Session TTL pruning hides old payloads, not logical availability

TTL pruning is the session-level freshness rule. When the session cache has gone cold enough, React hides older visible payloads with replacement text. The important point is that this is still path-oriented, not destructive amnesia.

The runtime keeps the last keep_recent_turns visible, keeps the last keep_recent_intact_turns especially intact, and hides older blocks by path. It can also lightly hide heavy recent tool results and oversized recent images or PDFs without making the agent blind.

When pruning happens, the runtime also appends a one-time announce notice and persists a system message explaining that hidden context can be restored with react.read(path). So the visible stream becomes lighter, but the context remains highly available through surviving path hints, working summaries, and on-demand turn indexes.

5. Compaction is different: it is the hard ceiling

Compaction is not the everyday cleanup mechanism. It is the hard ceiling. In the current implementation, it runs when the estimated system + visible blocks crosses roughly 0.9 * max_tokens, or when a retry after a context-limit error forces sanitization.

When compaction runs, React inserts a conv.range.summary block at a carefully chosen cut point, slices the visible stream after the latest summary, and keeps the old prefix represented by the summary instead of raw blocks. That summary is explicitly prior conversation memory, not a normal assistant answer. The exact old turn can still be reopened by reading its on-demand turn index, for example ar:<turn_id>.react.turn.index.

Mechanism	Why it runs	What happens	What stays available
react.hide	agent decides a tail snippet is garbage	hide path, keep replacement text	original is recoverable via `react.read(path)`
TTL pruning	session freshness / prune buffer	hide old or heavy visible payloads	paths, replacement text, notices, and read-back remain
Compaction	visible stream approaches hard token ceiling	insert `conv.range.summary` and move visible boundary	older context survives as summary + carried refs + turn-index recovery

6. The render order is part of the design

In the actual render path, the order is deliberate:

1. apply session TTL pruning
2. compact if needed
3. slice after the latest summary
4. apply hidden replacements
5. recompute cache markers
6. append SOURCES POOL and ANNOUNCE

That ordering prevents the cache logic from being computed on a stream that is about to change. It also keeps the uncached tail truly at the end.

7. Recovery is a deliberate path, not a lucky cache hit

Once raw older turns leave the visible stream, the agent should not guess from memory. The intended recovery route is short and mechanical:

visible exact path
  -> react.read(path) or react.pull(path)

visible summary path
  -> react.read(summary)
  -> react.read(["ar:<turn_id>.react.turn.index"])
  -> batch-read or pull the exact refs

topic or date only
  -> react.memsearch(...)
  -> read returned refs or the returned turn_index_path

This is why working summaries and turn indexes matter. The summary carries the useful semantic state. The turn index reconstructs the inventory of user messages, assistant completions, tools, sources, and artifacts for that turn.

8. The point is availability, not perfect recall in the prompt

The runtime is not trying to keep everything continuously visible. It is trying to keep what matters visible now, what is stable cached for reuse, and what was hidden still discoverable. That is a much more realistic goal for a distributed long-running agent.

So the promise is not “the model never forgets.” The promise is stronger and more honest: the model can forget aggressively when it should, while the system leaves enough path and summary structure behind for the agent to recover what matters.