scratchpad — the notepad for what does not fit in the conversation
Imagine a user asks: «read the file /tmp/big_log.txt and tell me the errors». The file weighs 800 KB. The fs_read executor reads it without issue. But now the content has to reach the LLM so it can decide which are the errors. And here lies the problem: 800 KB in the context window of a small LLM (8K-32K tokens) means context explosion.
A first naive answer: we truncate. Say we keep only the first 1500 characters of the observation. It works for small files. But if the user looks for "errors" and the errors are at the end of the log, we have lost exactly the part that would matter. A blind truncation destroys useful information.
A second answer: we ask the LLM to express in advance what it wants. If it asks for "the errors", we add a filter_regex parameter to fs_read and we pass it only the lines that match. It works, but it forces the LLM to know beforehand what it will look for, and to be able to say it as a regex. For many real cases the LLM wants to explore: "show me the start to understand the format, then I look for the errors, then I want the details of the first error".
The scratchpad is the third answer: we separate the storage of the observation from its view in the context. The actual, complete observation goes into a temporary store (the scratchpad). Into the LLM context we put only a brief summary + an identifier. When the LLM needs to see slices of it, it asks explicitly. It explores.
scratchpad_read with the mode it wants.
The scratchpad is a small local SQLite database (~/.local/share/metnos/scratchpad.db) where the runtime parks large observations. For each entry it saves:
scratchpad_id (16 hexadecimal characters);turn_id it belongs to (so we can isolate entries from different turns);kind: text or binary;content (UTF-8 text or raw bytes);size_bytes;summary — see below;created_at and expires_at timestamps (default TTL of one hour).The entry lives only for the duration of the turn (and a bit beyond, for safety). A garbage collection function removes expired entries at the boot of the next turn.
The planner, after an executor has produced its observation, applies a simple rule:
if len(json.dumps(observation)) > SCRATCHPAD_THRESHOLD_BYTES (default 4096):
save the observation in scratchpad
create a "synthetic" version to put in the LLM history
else:
put the full observation into history (with optional truncation to 1500 chars)
The 4 KB threshold is calibrated on the fact that beyond 4 KB of JSON an observation occupies ~1000 tokens in the context, which is already significant for a multistep turn of 3-5 steps. Below 4 KB the cost is negligible and putting it in the scratchpad would be needless ceremony.
The threshold is configurable in the config; the default is generous enough for most uses.
When an observation is put in the scratchpad, the planner builds a synthetic observation for the LLM history. Real example from the POC:
The fs_read executor returned an 89 KB observation. The runtime saves the 89 KB into the scratchpad and puts in the LLM history:
{
"ok": true,
"scratchpad_id": "485d7beae4e144eb",
"size_bytes": 89042,
"kind": "text",
"summary": "INFO 2026-04-26 10:00:01 event number 0\nINFO 2026-04-26 10:00:02 event number 1\n[... 88500 characters omitted ...]\nERROR 2026-04-26 23:59:59 LAST_CRITICAL_EVENT\n",
"metadata": {
"path": "/tmp/big_log.txt",
"bytes": 89042,
"encoding": "utf-8",
...
},
"_note": "Large observation saved in scratchpad. To read it in full or partially, use the scratchpad_read tool."
}
The summary is a smart truncation: the first 500 characters + a placeholder with the exact number of characters omitted + the last 500. The LLM sees the start and the end of the content at once, sufficient in most cases to:
The original metadata of the observation is preserved: the LLM still knows the path, the size, the encoding, etc.
For binary-kind observations (e.g. a zip file read with encoding=binary), the summary becomes a technical note:
"[BINARY: 45123 bytes, sha256=abc123def456...]"
No textual preview (it would be gibberish), but the LLM sees size and fingerprint. To read slices, it must call scratchpad_read in binary mode.
scratchpad_read
The LLM, seeing in the history an observation with scratchpad_id, knows it can call a builtin tool named scratchpad_read to access the full content or parts of it.
The tool schema:
{
"name": "scratchpad_read",
"description": "Reads from the scratchpad an observation previously saved...",
"parameters": {
"type": "object",
"required": ["scratchpad_id"],
"properties": {
"scratchpad_id": {"type": "string", "description": "Id obtained from the 'scratchpad_id' field of a previous observation."},
"mode": {"type": "string", "enum": ["full", "head", "tail", "range"], "default": "head"},
"n": {"type": "integer", "description": "For mode head/tail: number of characters to read. Default 2000."},
"start": {"type": "integer", "description": "For mode range: start index."},
"end": {"type": "integer", "description": "For mode range: end index (exclusive)."}
}
}
}
| Mode | What it returns | When to use it |
|---|---|---|
full | The entire content. | Only when it is really small (under the threshold or a little above). Discouraged for large files. |
head | First N characters (default 2000). | When the user asks for the start or when understanding the format is needed. |
tail | Last N characters (default 2000). | When the user asks for the end, the latest events, the most recent lines. |
range | Characters from start to end (exclusive). | When the LLM wants to explore a specific slice already identified. |
User: "download https://httpbin.org/get and save it in /tmp/out.txt".
Step 1: web_fetch(url=https://httpbin.org/get) returns 14 KB of JSON. It goes into the scratchpad, id eae04122bd704636.
Step 2: the LLM, seeing in the history the observation with scratchpad_id, knows that the actual content is there. It proposes:
fs_write(path="/tmp/out.txt", content="{{step1.content}}").
The runtime resolves {{step1.content}} by retrieving from the scratchpad the full content (not the summary), and passes it to fs_write.
Result: file written correctly, 14 KB of actual bytes.
The interesting bit: the LLM understood by itself that it had to refer to the content of the previous step, even though in its history it saw the summary. The {{stepN.field}} syntax and the scratchpad cooperate: the reference resolves by reconstructing from the database, not from the visible summary.
CREATE TABLE entries (
id TEXT PRIMARY KEY,
turn_id TEXT NOT NULL,
step_num INTEGER,
executor_name TEXT,
content_kind TEXT NOT NULL, -- 'text' | 'binary'
content BLOB NOT NULL,
size_bytes INTEGER NOT NULL,
summary TEXT,
created_at REAL NOT NULL,
expires_at REAL NOT NULL
);
SQLite, single file in ~/.local/share/metnos/scratchpad.db. Indexes on turn_id (to retrieve entries of a turn) and expires_at (for the GC).
The planner calls scratchpad.gc() at the start of every turn, which removes all entries with expires_at < now(). Default TTL: 1 hour from creation. Configurable via the ttl_seconds parameter in put().
This way a scratchpad heavily used for a few minutes goes back to clean after an hour. No unbounded growth.
Every turn has its own uuid turn_id. The planner shows the LLM only the entries of the current turn (via list_for_turn(turn_id)). Even though the SQLite contains entries from other turns (waiting for GC), the LLM does not see them and cannot access them.
Unlike "normal" executors (fs_read, web_fetch, etc.), scratchpad_read does not live on disk as a signed package. It exists only in the runtime: the planner builds its schema dynamically (a constant Python dict) and adds it to the tools passed to the LLM when there are active scratchpad entries in the current turn.
If there is no observation in the scratchpad, the LLM does not see the tool: less noise in the context. As soon as the first large observation is offloaded, scratchpad_read enters the tool catalogue for the next steps.
The "duplicate read" guard (see agent_runtime ch. 10) does not apply to scratchpad_read, because calling it multiple times on the same scratchpad_id with different mode/range is the normal use case.
scheduler and similar.
| v1.1 limit | When it is removed |
|---|---|
| Fixed TTL (1 hour) for all entries | When an observation has to survive beyond the turn (e.g. to be retrieved in a future turn): TTL declarable per entry. |
| Summary only "head + tail" | When a semantic summary (LLM-generated) for 50KB-1MB observations becomes useful. Implies a mini LLM call at offload time, to be balanced against the cost. |
| Range only by character/byte indices | When the LLM wants "lines N..M" or "lines that match regex": extension of the range mode with sub-modes line, grep. |
| No content compression | When the scratchpad volume grows (for now it is negligible, the GC keeps it clean). |
| No "federated scratchpad" between turns / between remote instances | When remote execution or distributed synt will require sharing (currently each Metnos instance has its own). |
The scratchpad is a small component (~200 lines of Python) but architecturally important: without it, the system would be limited to observations that comfortably fit in 1500 characters of context, which is trivially nothing useful. With the scratchpad, the scale of the useful grows by three orders of magnitude (MB-scale files readable in slices) without saturating the LLM.
Concept that emerged from the POC of April 26, 2026 after the D-obs stress test showed that blind truncation to 1500 characters was throwing away 99% of the content of read executors, even when that content would have been the key piece for the user's answer.