TESTED Microdesign v1.1 — new, tested on April 26, 2026. The component has been built and tested at module and cluster level in the v1.1 POC (cluster scratchpad 34/34 green, integrated into agent_runtime). Implementation reference: /opt/myclaw/runtime/scratchpad.py.
Status in the microdesign sequence: under approvalapprovedtestedimplemented. Concept introduced in the POC of 26/4/2026 after the D-obs stress test (large observations) showed the need for a mechanism that separates full observation and view in the LLM context.
← Documentation index Microdesign › scratchpad

Metnos

scratchpad — the notepad for what does not fit in the conversation
Microdesign v1.1 — status TESTED (April 26, 2026)
Audience: those who want to understand how Metnos handles large data without saturating the LLM context.

Reading time: 12 minutes.

Contents

  1. The problem to solve
  2. The scratchpad idea
  3. When an observation goes to the scratchpad
  4. What the LLM sees instead of the full observation
  5. How the LLM reads from the scratchpad: scratchpad_read
  6. Storage and lifecycle
  7. Builtin: why scratchpad_read is special
  8. Limits and what is deferred to v1.2+

1. The problem to solve

Imagine a user asks: «read the file /tmp/big_log.txt and tell me the errors». The file weighs 800 KB. The fs_read executor reads it without issue. But now the content has to reach the LLM so it can decide which are the errors. And here lies the problem: 800 KB in the context window of a small LLM (8K-32K tokens) means context explosion.

A first naive answer: we truncate. Say we keep only the first 1500 characters of the observation. It works for small files. But if the user looks for "errors" and the errors are at the end of the log, we have lost exactly the part that would matter. A blind truncation destroys useful information.

A second answer: we ask the LLM to express in advance what it wants. If it asks for "the errors", we add a filter_regex parameter to fs_read and we pass it only the lines that match. It works, but it forces the LLM to know beforehand what it will look for, and to be able to say it as a regex. For many real cases the LLM wants to explore: "show me the start to understand the format, then I look for the errors, then I want the details of the first error".

The scratchpad is the third answer: we separate the storage of the observation from its view in the context. The actual, complete observation goes into a temporary store (the scratchpad). Into the LLM context we put only a brief summary + an identifier. When the LLM needs to see slices of it, it asks explicitly. It explores.

Executore.g. fs_read /tmp/big.log obs 89KB Runtime size > 4KB ? yes: offload to scratchpad replace with synth save whole Scratchpad SQLite id, turn_id, content, summary, ttl=1h obs synth LLM (sees synth) {ok:true, scratchpad_id:"485d7b…", summary:"INFO ... ERROR" size:89042} decides what it needs tool_call: scratchpad_read id="485d7b…", mode="tail", n=200 Runtime: lookup + slice retrieves from SQLite, returns slice final_answer to the user
Typical sequence: the executor produces a large observation, the runtime offloads it into the scratchpad and shows the LLM only the summary; the LLM, if it needs more, explicitly calls scratchpad_read with the mode it wants.

2. The scratchpad idea

The scratchpad is a small local SQLite database (~/.local/share/metnos/scratchpad.db) where the runtime parks large observations. For each entry it saves:

The entry lives only for the duration of the turn (and a bit beyond, for safety). A garbage collection function removes expired entries at the boot of the next turn.

3. When an observation goes to the scratchpad

The planner, after an executor has produced its observation, applies a simple rule:

if len(json.dumps(observation)) > SCRATCHPAD_THRESHOLD_BYTES (default 4096):
    save the observation in scratchpad
    create a "synthetic" version to put in the LLM history
else:
    put the full observation into history (with optional truncation to 1500 chars)

The 4 KB threshold is calibrated on the fact that beyond 4 KB of JSON an observation occupies ~1000 tokens in the context, which is already significant for a multistep turn of 3-5 steps. Below 4 KB the cost is negligible and putting it in the scratchpad would be needless ceremony.

The threshold is configurable in the config; the default is generous enough for most uses.

4. What the LLM sees instead of the full observation

When an observation is put in the scratchpad, the planner builds a synthetic observation for the LLM history. Real example from the POC:

Case: the user asks to read an 89 KB log

The fs_read executor returned an 89 KB observation. The runtime saves the 89 KB into the scratchpad and puts in the LLM history:

{
  "ok": true,
  "scratchpad_id": "485d7beae4e144eb",
  "size_bytes": 89042,
  "kind": "text",
  "summary": "INFO 2026-04-26 10:00:01 event number 0\nINFO 2026-04-26 10:00:02 event number 1\n[... 88500 characters omitted ...]\nERROR 2026-04-26 23:59:59 LAST_CRITICAL_EVENT\n",
  "metadata": {
    "path": "/tmp/big_log.txt",
    "bytes": 89042,
    "encoding": "utf-8",
    ...
  },
  "_note": "Large observation saved in scratchpad. To read it in full or partially, use the scratchpad_read tool."
}

The summary is a smart truncation: the first 500 characters + a placeholder with the exact number of characters omitted + the last 500. The LLM sees the start and the end of the content at once, sufficient in most cases to:

The original metadata of the observation is preserved: the LLM still knows the path, the size, the encoding, etc.

Smart summary for binaries

For binary-kind observations (e.g. a zip file read with encoding=binary), the summary becomes a technical note:

"[BINARY: 45123 bytes, sha256=abc123def456...]"

No textual preview (it would be gibberish), but the LLM sees size and fingerprint. To read slices, it must call scratchpad_read in binary mode.

5. How the LLM reads from the scratchpad: scratchpad_read

The LLM, seeing in the history an observation with scratchpad_id, knows it can call a builtin tool named scratchpad_read to access the full content or parts of it.

The tool schema:

{
  "name": "scratchpad_read",
  "description": "Reads from the scratchpad an observation previously saved...",
  "parameters": {
    "type": "object",
    "required": ["scratchpad_id"],
    "properties": {
      "scratchpad_id": {"type": "string", "description": "Id obtained from the 'scratchpad_id' field of a previous observation."},
      "mode": {"type": "string", "enum": ["full", "head", "tail", "range"], "default": "head"},
      "n":     {"type": "integer", "description": "For mode head/tail: number of characters to read. Default 2000."},
      "start": {"type": "integer", "description": "For mode range: start index."},
      "end":   {"type": "integer", "description": "For mode range: end index (exclusive)."}
    }
  }
}

The four modes

ModeWhat it returnsWhen to use it
fullThe entire content.Only when it is really small (under the threshold or a little above). Discouraged for large files.
headFirst N characters (default 2000).When the user asks for the start or when understanding the format is needed.
tailLast N characters (default 2000).When the user asks for the end, the latest events, the most recent lines.
rangeCharacters from start to end (exclusive).When the LLM wants to explore a specific slice already identified.
Example: real pattern from the POC

User: "download https://httpbin.org/get and save it in /tmp/out.txt".

Step 1: web_fetch(url=https://httpbin.org/get) returns 14 KB of JSON. It goes into the scratchpad, id eae04122bd704636.

Step 2: the LLM, seeing in the history the observation with scratchpad_id, knows that the actual content is there. It proposes:
fs_write(path="/tmp/out.txt", content="{{step1.content}}").

The runtime resolves {{step1.content}} by retrieving from the scratchpad the full content (not the summary), and passes it to fs_write.

Result: file written correctly, 14 KB of actual bytes.

The interesting bit: the LLM understood by itself that it had to refer to the content of the previous step, even though in its history it saw the summary. The {{stepN.field}} syntax and the scratchpad cooperate: the reference resolves by reconstructing from the database, not from the visible summary.

6. Storage and lifecycle

SQL schema

CREATE TABLE entries (
    id            TEXT PRIMARY KEY,
    turn_id       TEXT NOT NULL,
    step_num      INTEGER,
    executor_name TEXT,
    content_kind  TEXT NOT NULL,    -- 'text' | 'binary'
    content       BLOB NOT NULL,
    size_bytes    INTEGER NOT NULL,
    summary       TEXT,
    created_at    REAL NOT NULL,
    expires_at    REAL NOT NULL
);

SQLite, single file in ~/.local/share/metnos/scratchpad.db. Indexes on turn_id (to retrieve entries of a turn) and expires_at (for the GC).

Garbage collection

The planner calls scratchpad.gc() at the start of every turn, which removes all entries with expires_at < now(). Default TTL: 1 hour from creation. Configurable via the ttl_seconds parameter in put().

This way a scratchpad heavily used for a few minutes goes back to clean after an hour. No unbounded growth.

Isolation between turns

Every turn has its own uuid turn_id. The planner shows the LLM only the entries of the current turn (via list_for_turn(turn_id)). Even though the SQLite contains entries from other turns (waiting for GC), the LLM does not see them and cannot access them.

7. Builtin: why scratchpad_read is special

Unlike "normal" executors (fs_read, web_fetch, etc.), scratchpad_read does not live on disk as a signed package. It exists only in the runtime: the planner builds its schema dynamically (a constant Python dict) and adds it to the tools passed to the LLM when there are active scratchpad entries in the current turn.

If there is no observation in the scratchpad, the LLM does not see the tool: less noise in the context. As soon as the first large observation is offloaded, scratchpad_read enters the tool catalogue for the next steps.

The "duplicate read" guard (see agent_runtime ch. 10) does not apply to scratchpad_read, because calling it multiple times on the same scratchpad_id with different mode/range is the normal use case.

Why a builtin and not a full-fledged executor? Because the scratchpad is a runtime primitive, not a capability that a synthesizer (synt) might invent or write. A signed on-disk "scratchpad_read" executor would be needlessly rich in ceremony for something the runtime already knows intimately. The family of builtins (of which scratchpad_read is the first) will probably include in the future a scheduler and similar.

8. Limits and what is deferred to v1.2+

v1.1 limitWhen it is removed
Fixed TTL (1 hour) for all entriesWhen an observation has to survive beyond the turn (e.g. to be retrieved in a future turn): TTL declarable per entry.
Summary only "head + tail"When a semantic summary (LLM-generated) for 50KB-1MB observations becomes useful. Implies a mini LLM call at offload time, to be balanced against the cost.
Range only by character/byte indicesWhen the LLM wants "lines N..M" or "lines that match regex": extension of the range mode with sub-modes line, grep.
No content compressionWhen the scratchpad volume grows (for now it is negligible, the GC keeps it clean).
No "federated scratchpad" between turns / between remote instancesWhen remote execution or distributed synt will require sharing (currently each Metnos instance has its own).

Final notes

The scratchpad is a small component (~200 lines of Python) but architecturally important: without it, the system would be limited to observations that comfortably fit in 1500 characters of context, which is trivially nothing useful. With the scratchpad, the scale of the useful grows by three orders of magnitude (MB-scale files readable in slices) without saturating the LLM.

Concept that emerged from the POC of April 26, 2026 after the D-obs stress test showed that blind truncation to 1500 characters was throwing away 99% of the content of read executors, even when that content would have been the key piece for the user's answer.