The rule is simple: generated code gets a small place to work and a narrow way to request approved tools. It does not get the whole runtime. Network access, descriptor-backed configuration, bundle storage, runtime storage, and most logs stay on the trusted supervisor side.

Important scope: this applies when a product enables generated-code execution. A product that only uses deterministic bundle APIs or service tools does not need to put every request through the generated-code executor.

The current shape

Docker split mode runs two containers for an execution: one supervisor container and one executor container. They share only the surfaces needed to complete the request.

Split isolated runtime architecture A diagram showing an agent decision routed to a trusted supervisor and a networkless executor. The executor has only work, output, executor logs, and socket mounts. The supervisor owns tools, network, descriptors, bundle code, and storage. Artifacts and logs are merged into a persisted execution package. Split isolated runtime Generated code is narrow. Tool access stays possible, but crosses a supervised boundary. React decision code channel + contract Trusted supervisor networked tools and runtime services bundle tools + rendering + web descriptors, secrets, storage supervisor logs + diagnostics authenticated tool dispatch Generated-code executor small runtime surface no direct network read-only root filesystem work / output / executor logs no descriptors or secret env socket tool request Persisted execution package artifacts, executor logs, supervisor diagnostics, validation report, packaged workspace snapshot
The split is intentionally asymmetric. The executor has enough filesystem to compute and produce artifacts; the supervisor has the platform context needed to run tools safely and report errors clearly.

What the executor can see

The executor side is not meant to be comfortable as a platform shell. It is meant to be comfortable as a generated-code work area. In split mode it receives:

  • a work directory for scratch data;
  • an artifact output directory where the requested files are produced;
  • executor-local logs for stdout/stderr and generated-code diagnostics;
  • a supervisor socket and short-lived execution token for approved tool calls;
  • a minimal environment for Python, fonts, plotting, and file generation.

It does not receive platform descriptor payloads, bundle storage mounts, managed-bundle roots, runtime storage roots, cloud credentials, or supervisor logs. The important point is not only "can it write outside the output directory?" but also "can it read outer runtime data?" The split design addresses both.

proc host execution tree
  ctx_v2_.../
    work/                  runtime scratch
    out/
      workdir/             executor-visible artifact output
      logs/
        executor/          visible from executor
        supervisor/        not mounted into executor
      timeline / sources   runtime metadata, not executor workspace
    ctx_v2_....zip         persisted package for later inspection

Why not put every tool inside the executor?

Some tools need network, secrets, browser binaries, bundle configuration, or storage access. If those tools were imported and executed directly by generated code, the isolation boundary would become mostly cosmetic. The executor would need the same authority as the platform.

KDCube keeps the tool implementation on the supervisor side. Generated code can ask for a tool through a stub, but the call crosses an explicit boundary where the runtime can authenticate the request, resolve the allowed alias, execute the tool, log the operation, and return a result.

Surface Executor Supervisor
Outbound network No direct path Available for configured tools
Platform descriptors and secrets Not mounted / not inherited Available where needed for tool bootstrap
Generated files Writes to artifact output workspace Validates, packages, and exposes contracted outputs
Logs Sees executor-local logs only Merges executor, supervisor, Docker, and infra diagnostics for the tool result
Rendering and web search Calls through stubs Runs the actual network/browser/tool work

The rejected simpler path

A tempting design is to keep one Docker container and add more in-process sandboxing around the generated-code subprocess. That can work for some environments, but it tends to become fragile: kernel capabilities, mount behavior, browser dependencies, and host-specific runtime settings all leak into the correctness story.

The split container model is less clever and more explicit. The executor container is the untrusted side. The supervisor container is the trusted side. The boundary is visible in Docker arguments, logs, mounts, and tests. That is easier to explain to operators and easier to audit.

What this does not claim

This is not a claim that generated-code execution is risk-free. Container isolation still depends on the container runtime, kernel, host configuration, and deployment policy. The design reduces what generated code can normally read, write, and contact, while preserving the ability to run approved platform tools.

That distinction matters. A useful runtime should not hide failures or silently block diagnostics. The execution result still includes runtime failure summaries, missing artifact reports, program log tails, and infra diagnostics so the agent and operator can understand whether the issue came from user code, a tool, or infrastructure.

The executor is not a smaller copy of the platform. It is a narrow generated-code workspace with a supervised request path back to approved tools.

Why this fits React

React already separates generation channels, runtime validation, tool dispatch, workspace state, and user-visible progress. The split isolated runtime follows the same philosophy: keep the channel for generated code useful, but do not confuse it with platform authority.

That is the shape we want for production agents: enough flexibility to generate real files and use real tools, enough containment that generated code cannot browse the platform around it, and enough diagnostics that failures remain understandable.