TESTED Microdesign v1.1 — new, tested on the evening of April 27, 2026. Cluster sandbox 9/9 green. Reference: runtime/sandbox.py.
Status sequence: under approvalapprovedtestedimplemented. Replaces the obsolete v1.0 (~700 lines) written before implementation.
← Documentation index Microdesign › sandbox

Metnos

sandbox — the kernel-level shell around executors
Microdesign v1.1 — status TESTED (April 27, 2026 evening)
Audience: those who want to understand how Metnos isolates executors from the rest of the system.

Reading time: 10 minutes.

Contents

  1. What the sandbox is
  2. Public API
  3. Manifest-driven flag derivation
  4. Graceful fallback
  5. Integration in agent_runtime
  6. Profiles and autonomy levels
  7. Tests
  8. State of bwrap on the system
  9. Limits and what is deferred to v1.2+

1. What the sandbox is

The sandbox is layer 3 of the Metnos architecture (ch. 6 of the Architecture): it wraps the execution of an executor in bubblewrap to isolate it from the rest of the system. The module is small (~180 lines) because all that is needed is to map the executor's manifest onto bwrap flags; there is no daemon and no persistent state.

The runtime's pseudo-sandbox — path and host filtering inside the executor wrappers, in cooperation with the Vaglio — remains as the first line of defence: it performs checks before the subprocess is even launched. bwrap adds, on top of it, a kernel-level shell: even if an executor managed to evade the application-level checks, it would find separate namespaces, a read-only filesystem and no network.

A sandbox as a library, not as a service. No daemon, no socket, no separate policy file to edit. Everything is derived from the executor's manifest: the planner calls sandbox.wrap_command(executor, cmd) and gets back a wrapped command ready for subprocess.run. If bwrap is missing, the command is passed through unchanged.

2. Public API

The runtime/sandbox.py module exposes four public functions. No classes: global state is nil except for shutil.which's cache.

FunctionWhat it doesCitation
bwrap_available() True if bwrap is in PATH. Result cached at first access via shutil.which. runtime/sandbox.py:30-32
sandbox_disabled() True if the user has explicitly disabled the sandbox via METNOS_SANDBOX (recognised values: 0|off|no|false, case-insensitive). runtime/sandbox.py:35-40
wrap_command(executor, cmd, autonomy="supervised", extra_ro=None, extra_rw=None) Main function. Returns the command wrapped in bwrap if available and not disabled; otherwise the unchanged command. executor must expose code_path (Path) and capabilities (list, manifest format). runtime/sandbox.py:178-207
status() Dict with bwrap_available, bwrap_path, disabled_via_env, active. For dashboards and debug. runtime/sandbox.py:212-219

Internally, wrap_command delegates to three private helpers:

3. Manifest-driven flag derivation

The heart of the module is _build_bwrap_args: it takes the executor's code_path and its capability list, and produces the exact sequence of flags to pass to bwrap. The rules, in order.

3.1 Read-only system paths

Bwrap starts from an empty root: the system paths needed by the Python interpreter and libraries must be mounted explicitly. The module mounts read-only only those that actually exist (otherwise bwrap errors out):

_SYSTEM_RO_PATHS = (
    "/usr", "/bin", "/sbin", "/lib", "/lib64", "/lib32",
    "/etc", "/opt", "/var/lib/python3",
)

For each, if Path(p).exists(), --ro-bind p p is appended. On minimal systems (e.g. an Alpine container without /lib32) missing paths are skipped without error (runtime/sandbox.py:100-124).

3.2 Private filesystems

Three mandatory mounts, always present:

Citation: runtime/sandbox.py:127-129.

3.3 Executor code

The executor's Python file must be readable. The whole containing directory is mounted read-only:

code_dir = code_path.parent
args += ["--ro-bind", str(code_dir), str(code_dir)]

This way the executor can import accessory modules that live in the same package, but cannot modify its own code (runtime/sandbox.py:131-133).

3.4 fs:read and fs:write capabilities

For each capability in the manifest, the module looks at kind (family) and mode:

CapabilityEffect
fs:read with hintfor each hint, the ancestor is computed (see 3.5) and --ro-bind <path> <path> is appended.
fs:write with hintsame as above, but --bind (read-write).
network:*no extra bind, but the flag has_network = True is set (see 3.6).
code:execno extra bind: usual tools already come from /usr/bin per 3.1.
other families (mail, time, …)no effect on the sandbox.

Only paths that exist are actually mounted: a hint pointing to a not-yet-created folder is skipped silently, no error (runtime/sandbox.py:135-155).

3.5 Hint expansion

Hints in the manifest are glob-like (e.g. ~/notes/**, /tmp/**): bwrap does not understand them, it mounts directories. _expand_hints_to_paths truncates each hint at the first glob segment, expands ~, deduplicates:

"~/notes/**"   → "/home/roberto/notes"
"/tmp/**"      → "/tmp"
"/tmp/*"       → "/tmp"
"~/Pictures"   → "/home/roberto/Pictures"

Result: bwrap mounts the entire root, not the individual matching files. Fine-grained gating remains the runtime's application-level filter (runtime/sandbox.py:45-69).

3.6 Network isolation

If no capability has family network, --unshare-net is appended: the executor starts in an empty network namespace, no interface beyond a down lo. If at least one capability is network:*, the flag is not added and the executor inherits the host's network (runtime/sandbox.py:166-167).

3.7 Always-on isolations

Regardless of the manifest, every sandbox includes:

--unshare-user --unshare-ipc --unshare-uts --die-with-parent

Citation: runtime/sandbox.py:170-173.

4. Graceful fallback

The sandbox must be a benefit, not a blocker. Three fallback levels ensure the system keeps working even when bwrap is absent:

CaseBehaviourCitation
(a) bwrap not installed bwrap_available() returns False, wrap_command returns the unchanged command. runtime/sandbox.py:30-32, 195-196
(b) METNOS_SANDBOX=0|off|no|false sandbox_disabled() returns True, wrap_command returns the unchanged command. Useful for local debugging or CI without bwrap. runtime/sandbox.py:35-40, 195-196
(c) shutil.which exceptions The exception surfaces as a False from bwrap_available(): command unchanged. No crash on a "broken PATH" case. runtime/sandbox.py:32, 195

In all three cases, the runtime's pseudo-sandbox (path/host filter inside executor wrappers + Vaglio) stays active: the application-level defence does not disappear because the kernel-level one is missing. The outer shell is lost, not the inner filter.

No explicit error when bwrap is missing. This is a deliberate choice: forcing bwrap's presence would make the system fragile on developer machines, minimal containers, CI environments. The "mandatory sandbox" policy can be enforced at deployment level (by checking in boot.py that sandbox.status()["active"] is True), but the module itself does not demand it.

5. Integration in agent_runtime

The planner calls the sandbox at a single point: the invoke_executor function. Here is the code (runtime/agent_runtime.py:194-212):

def invoke_executor(executor, args, timeout_s=30, *, autonomy="supervised"):
    """Invoke an executor, optionally inside a bubblewrap sandbox.

    If `bwrap` is installed and `METNOS_SANDBOX` is not disabled,
    the command is wrapped; otherwise it runs as a plain Python
    subprocess (the runtime's pseudo-sandbox stays active:
    path/host filter + Vaglio).
    """
    import sandbox as _sandbox  # lazy: avoids circular import and overhead for modules that do not use it
    payload = json.dumps(args)
    base_cmd = ["python3", str(executor.code_path)]
    cmd = _sandbox.wrap_command(executor, base_cmd, autonomy=autonomy)
    result = subprocess.run(
        cmd, input=payload, capture_output=True, text=True, timeout=timeout_s,
    )
    ...

Three details of the code deserve attention.

Lazy import. The sandbox module is imported inside the function, not at the top of the file. This avoids two problems: import cycles (the sandbox module does not depend on agent_runtime, but the lazy pattern is defensive) and overhead for modules that use agent_runtime but never call invoke_executor (e.g. tests that exercise only the dry-run ReAct loop). Python's internal cache makes the cost of the lazy import negligible after the first call.

Constant base command. base_cmd is always ["python3", <code_path>]. The sandbox prefixes it with ["bwrap", *flags, "--", ...]; without sandbox, base_cmd stays intact. subprocess.run does not distinguish the two cases: it works on the final list.

Pass-through autonomy. Today wrap_command receives autonomy but does not use it to differentiate flags (see ch. 6). It accepts it as a reserved parameter: when separate profiles arrive, only _build_bwrap_args needs to change, no call-site does.

The call from the ReAct loop is at runtime/agent_runtime.py:540 (obs = invoke_executor(executor, args)): no explicit autonomy parameter, default "supervised".

6. Profiles and autonomy levels

The Architecture, ch. 12 defines three autonomy levels — ReadOnly, Supervised, Full — with different policies for system access. The v1.1 sandbox exposes the autonomy parameter but does not apply separate profiles: today every wrap derives the same scheme from the manifest, regardless of the level.

This is a stated choice. The manifest already carries the needed capabilities and their hints; introducing a second axis "profile per level" here would produce duplication (each capability would be filtered twice) and would push the policy decision into the wrong module. The right place for an autonomy×capability table is policy.html: the runtime, depending on the level Roberto picks, will pass to wrap_command the profile that policy has computed. Then autonomy will become a real selector, not a pass-through parameter.

The rewrite of policy.html v1.1 is the last gate of phase 5: at that point _build_bwrap_args will receive a derived profile argument and will apply differentiated restrictions (e.g. ReadOnly forces --ro-bind even for capabilities declaring fs:write; Full disables --unshare-net regardless of declared capabilities).

7. Tests

Cluster sandbox in the runtime test framework: 9/9 green as of April 27, 2026. The cases are designed to exercise every derivation rule and every fallback level without requiring bwrap to be installed.

#CaseWhat it verifies
1status_torna_dictstatus() returns a dict with the expected keys (bwrap_available, bwrap_path, disabled_via_env, active).
2wrap_command_no_bwrap_passa_invariatoWhen bwrap_available() is False, wrap_command returns exactly the input command (equal list).
3sandbox_disabled_rispetta_envMETNOS_SANDBOX=0 (and variants) disables wrapping even when bwrap is present.
4expand_hints_tronca_al_globGlob-like hints (e.g. /tmp/**) are truncated at the first glob separator; ~ is expanded; duplicates are removed.
5capability_kind_e_mode_parse_capability_kind and _capability_mode recognise fs:read, network:http, code:exec; both dict and string forms.
6build_bwrap_args_isola_rete_se_no_network_capManifest without network:* capability → args contain --unshare-net.
7build_bwrap_args_lascia_rete_se_network_capManifest with network:http → args do not contain --unshare-net.
8build_bwrap_args_bind_rw_per_fs_writeCapability fs:write with hint produces --bind; fs:read produces --ro-bind.
9build_bwrap_args_include_code_dir_roArgs always include --ro-bind <code_dir> <code_dir> derived from executor.code_path.parent.

Cases 6-9 exercise _build_bwrap_args without invoking bwrap: the produced flag list is verified. This way the cluster runs green even on a development server where bwrap is not installed, while still covering the derivation rules that are the module's true contract.

8. State of bwrap on the system

For full transparency: on the development server, on the evening of April 27, 2026, bwrap is not installed. The sandbox module runs in fallback mode (case (a) of ch. 4): every executor command runs as subprocess.run(["python3", <code_path>], ...), with no kernel-level wrapping.

Activation requires a single system operation:

# Debian/Ubuntu
sudo apt install bubblewrap

# Fedora/RHEL
sudo dnf install bubblewrap

# Arch
sudo pacman -S bubblewrap

No code changes, no runtime restart. On the next access, bwrap_available() returns True (cached), and from that moment every invoke_executor wraps automatically. status() will reflect active: True.

When to activate. On metnos-server (the project's .33 machine) it makes sense to install bwrap as soon as the pure POC phase ends and synthesised executors begin running on uncontrolled input. The pseudo-sandbox covers normal cases, but a synthesised executor produced from an "exotic" prompt can always step out of expected boundaries: the kernel-level shell is the safety net.

9. Limits and what is deferred to v1.2+

v1.1 limitWhen it is removed
No landlock. Fine-grained filesystem filtering via landlock requires kernel ≥ 5.13 and dedicated syscalls. For now we rely on bwrap binds. v1.2, when the approval pipeline with mature dispatcher callbacks stabilises. Landlock can replace some read-only binds with finer permissions (read yes, exec no, etc.).
No Docker namespaces. For cases requiring even stronger isolation (e.g. executors running local LLMs with heavy native dependencies), a Docker or podman container would be more appropriate. When a specific executor will demand full isolation (e.g. CUDA, native scientific computing libraries): a second backend will be added, selectable from the manifest (sandbox_backend = "docker").
No custom seccomp. Bwrap's default syscall filter is used (already restrictive: blocks ptrace, kexec, …). No custom policy per executor family. When concrete threats arise that the default does not cover. Today the complexity of maintaining seccomp profiles per capability is not worth it.
Separate profiles per autonomy level. Today autonomy is pass-through and everything is derived from the manifest identically for every level. Rewrite of policy.html v1.1 (last gate of phase 5): it will introduce an autonomy×capability table that lets the runtime pass a computed profile to wrap_command.
No network whitelist. A network:* capability today leaves the network fully open; no per-host filtering (e.g. only *.example.com). When the executor pool contains enough web callers to make per-host filtering a net win. Implementation: nftables inside the network namespace, or a dedicated LAN proxy that performs enforcement.

Final notes

The sandbox is a small component (~180 lines) but central to the Metnos security posture. Its smallness is the point: all the complexity lives in the executor's manifest, which is the readable contract. The sandbox.py module is pure mechanical translation.

The graceful fallback, in particular, reflects an ethical as well as a pragmatic choice: security must not become an entry barrier. On a developer laptop or in a minimal container, the system runs all the same, with the application-level pseudo-sandbox active. When moving to the production server, an apt install bubblewrap adds the kernel-level shell without touching the code.