The key thing to understand is that React is not only chasing cache hits. It is managing what remains visible to the model as the turn evolves. The stable prefix, the three cache points, react.hide, TTL pruning, and compaction all serve that same goal: keep context efficient while still leaving the important things recoverable.

Short version: previous-turn cache point keeps the earlier prefix reusable, pre-tail cache point defines the editable tail, tail cache point moves with the latest stable round, react.hide clears garbage only in that tail, TTL pruning hides old payloads, and compaction inserts a summary only when the visible stream reaches the hard ceiling.

1. The strategy starts with the shape of the rendered context

React renders a stable stream first, then appends SOURCES POOL and ANNOUNCE at the tail. That tail is intentionally uncached. The cache points live only in the stable stream. This matters because it lets the expensive reusable prefix stay calm while the live operational surface can keep moving every round.

Inside the rendered stable stream, React uses three anchors:

  • CP1 prev-turn: the last block before the current turn header
  • CP2 pre-tail: a checkpoint offset from the tail by cache_point_offset_rounds, present only when enough rounds exist
  • CP3 tail: the last block of the latest stable round

CP1 is about cross-turn reuse. CP2 and CP3 are about the current turn’s moving frontier.

There is also one more cache anchor outside the timeline stream. The decision call builds the system instruction as a separately cached system message, so the full request has a fixed system-instruction cache point first, then the three timeline checkpoints shown here. In other words: this article’s diagram focuses on the three cache points in the rendered timeline, not the prompt-level system cache marker.
React cache strategy across turns showing previous-turn, pre-tail, and tail checkpoints, tail-only react.hide, TTL pruning, compaction, and uncached tail blocks.
One focused picture of the rendered-timeline strategy: three cache points across turns, tail-only react.hide, uncached SOURCES POOL and ANNOUNCE, then TTL pruning and compaction as separate mechanisms.

2. The extra checkpoint exists so the agent can forget garbage early

The extra checkpoint is not decoration. It exists so the editable tail begins somewhere before the very end of the visible stream. Without that intermediate checkpoint, every hide decision would either be too risky for cache reuse or too constrained to be useful.

With CP2 in place, React can discover some large but useless result, decide it is junk, and hide it while keeping the earlier cached prefix intact. That is why the system has a real react.hide tool instead of asking the model to simply “ignore” old text.

react.hide is the early-forgetting tool. It removes bulky, no-longer-useful snippets from the model’s eyes, leaves a short replacement that explains what was hidden and why, and preserves the path so the agent can restore the original with react.read(path).

The runtime enforces this very explicitly:

  • the target must be after the pre-tail checkpoint
  • the target can also be rejected if it is too far from the tail by token distance
  • the tool accepts a logical path, not a search query
  • the original payload is not destroyed; it becomes hidden and recoverable

There is even one more nuance. If CP2 moves above CP1 during the turn, the runtime can treat CP2 as the effective previous-turn cache point for the rest of the turn, so hide is not blocked by an older cache anchor that no longer makes operational sense.

3. Cache points move with the turn

CP1 is fixed once the current turn begins. CP2 and CP3 are not. They move as rounds accumulate. That is exactly what you want in a live agent loop.

At the start of the turn, there may not even be enough rounds to place CP2. Later, once enough tool activity exists, the pre-tail checkpoint appears. As more rounds are added, both CP2 and CP3 march forward. So the editable tail is not a static byte range. It is a moving window tied to round structure.

React is not trying to preserve the whole current turn equally. It preserves a reusable prefix and lets the active tail remain editable.

4. Session TTL pruning hides old payloads, not logical availability

TTL pruning is the session-level freshness rule. When the session cache has gone cold enough, React hides older visible payloads with replacement text. The important point is that this is still path-oriented, not destructive amnesia.

The runtime keeps the last keep_recent_turns visible, keeps the last keep_recent_intact_turns especially intact, and hides older blocks by path. It can also lightly hide heavy recent tool results and oversized recent images or PDFs without making the agent blind.

When pruning happens, the runtime also appends a one-time announce notice and persists a system message explaining that hidden context can be restored with react.read(path). So the visible stream becomes lighter, but the context remains highly available through the surviving path hints.

5. Compaction is different: it is the hard ceiling

Compaction is not the everyday cleanup mechanism. It is the hard ceiling. In the current implementation, it runs when the estimated system + visible blocks crosses roughly 0.9 * max_tokens, or when a retry after a context-limit error forces sanitization.

When compaction runs, React inserts a conv.range.summary block at a carefully chosen cut point, slices the visible stream after the latest summary, and keeps the old prefix represented by the summary instead of raw blocks. That is a very different operation from hide or TTL pruning.

Mechanism Why it runs What happens What stays available
react.hide agent decides a tail snippet is garbage hide path, keep replacement text original is recoverable via react.read(path)
TTL pruning session freshness / prune buffer hide old or heavy visible payloads paths, replacement text, notices, and read-back remain
Compaction visible stream approaches hard token ceiling insert conv.range.summary and move visible boundary older context survives as summary + digest + carried refs

6. The render order is part of the design

In the actual render path, the order is deliberate:

1. apply session TTL pruning
2. compact if needed
3. slice after the latest summary
4. apply hidden replacements
5. recompute cache markers
6. append SOURCES POOL and ANNOUNCE

That ordering prevents the cache logic from being computed on a stream that is about to change. It also keeps the uncached tail truly at the end.

7. The point is availability, not perfect recall in the prompt

The runtime is not trying to keep everything continuously visible. It is trying to keep what matters visible now, what is stable cached for reuse, and what was hidden still discoverable. That is a much more realistic goal for a distributed long-running agent.

So the promise is not “the model never forgets.” The promise is stronger and more honest: the model can forget aggressively when it should, while the system leaves enough path and summary structure behind for the agent to recover what matters.