← Documentation index Microdesign › agent_runtime

Metnos

agent_runtime — the turn-by-turn runtime

Microdesign — the agent runtime.
Audience: those who read to understand how Metnos works inside;
those who implement or extend a runtime component.

Reading time: 18 minutes.

How Metnos receives a request, decides what to do, executes and answers. The agent runtime is the heart that orchestrates every turn: it extracts the intent, routes the request to the cognitive engine, executes the steps and composes the answer, with fine-grained telemetry for each sub-phase (intent_ms, prefilter_ms, vaglio_ms, exec_ms, rerank_ms). Implementation reference: runtime/agent_runtime.py.

Index

What it does: the story of one request
The loop of one turn, step by step
Modes: local, online, hybrid
The LLM tiers: fast, middle, wise (local) + frontier
Pipeline shape: E+ (F | A)? invariant
Auto-remediation registry
The pre-filter: choosing the sub-catalog
Native tool-use (no JSON parsing)
Data piping between steps: from_step and {{stepN.field}}
Scratchpad for large observations
Vaglio of the plan
Safety caps and runtime guards
What it writes to logs
When things go wrong
What is deferred

1. What it does: the story of one request

Imagine writing to Metnos: «read the file ~/notes/diary.md and tell me the last three lines». From that moment on a turn begins. The planner — the module we describe here — receives the sentence, decides what to do, sets up the steps, and produces the answer.

To understand what "deciding what to do" means, let us follow a concrete example. The request is the one above. The planner has no hardcoded logic that says "if the user says 'read a file' call read_files"; the idea is different:

The planner looks at the catalog of executors available (small signed programs that know how to do one thing only: read a file, write one, make an HTTP call, read the clock, etc.).
It selects a subset relevant to the request (here the candidates will be read_files, write_files, get_urls).
It passes those candidates to an LLM (a language model) as "tools" it can use.
The LLM reads the request and proposes the next step: "call read_files with paths=["~/notes/diary.md"] (reading the last lines)".
The planner validates the proposal (sandbox, args, constitutional vaglio), invokes the executor, collects the output.
It hands back to the LLM with: the original request + the output of the previous step.
The LLM says: "I have everything, the answer is: here are the last three lines of your diary".
The planner delivers that answer to the user, writes the log, ends the turn.

The key point: useful behaviour emerges from composition, not from hardcoded rules. If tomorrow the user asks "fetch a page, save it to a file, and tell me how many bytes you wrote", the planner will compose get_urls + write_files + final_answer with no need for any special case. The same pipeline, a different request, a different sequence of steps.

What the planner does not do. It never directly resolves user requests. It does not read files, does not call URLs, does not write anything. All of that is done by the executors (specialised programs). The planner only decides whom to call and with what arguments, then collects the results.

Drop-in intelligent executors. The public contract does not change: the planner sends the same arguments and receives the same result. A specialised executor may resolve intermediate states internally under a fixed budget. For example, login_sites observes the page, reaches the sign-in area, handles privacy prompts, off-screen targets, complete forms, and username-first/continue/password-next flows, then verifies the outcome. After login, act_sites can keep a goal and traverse menus under a bounded re-observation loop. The model may only choose among elements enumerated by the broker; credential fields and origins remain deterministic and invisible to the planner.

Persistent credential mandates. New web credentials display a secret-free form: interactive or sites.read. The latter permits automated login and reading, but not changes, sending, payments, downloads, or application POSTs. This mandate is the default for every interactive query; a query may narrow it, while a wider action requires ordinary one-shot consent and does not change it. A scheduled task also carries a subordinate envelope bound to its actor, query hash, previously verified exact hosts, and allowed operations. A new host or out-of-scope effect fails with mandate_scope_exceeded instead of suspending the task on a dialog.

Often the planner is not even bothered. Before calling the model, the runtime consults its two stateful memory levels: the fastpath (L0), which recognises an identical request already seen and replays its sequence in a few milliseconds, and the autopath (L1), which recognises a request of the same kind as earlier ones and reuses their generalised plan. Only when both levels miss does the planner — the local model in the wise tier — enter the turn. The structure described below applies in full to that case; the fast path is covered in its dedicated page.

2. The loop of one turn, step by step

Now let us see the same path more precisely. A turn always has this structure:

Flow of a turn: from the user query to the final log, with the multistep loop at the centre.

The loop is the last layer, not the first. Before reaching the loop described here, the request goes through the cognitive engine cascade. The first two layers have memory: the fastpath (L0) replays the sequence of an identical request already seen; the autopath (L1) reuses the generalised plan of a group of similar requests. Either can answer in a few milliseconds, with no model call at all. The step-by-step loop described in this section is the deepest layer, the engine, preceded by a stateless validator that may correct the plan once before execution: it fires only when neither the memory nor the shortcuts recognise the request. In that case the engine proposes the whole plan in a single call and the execution that follows is the deterministic one described below.

Phase 0 — preparation

The catalog is obtained — it lives in memory since boot. On the first load (cold) the loader discovers all executors on disk (folders under executors/); for each it verifies that the manifest is signed by the author's key and that the code digest matches. Those failing verification are discarded with a reason. On subsequent boots and every turn (hot), a lightweight signature check (mtime of manifest.toml, .py, .sig files + lifecycle DB) returns the cached catalog in O(1). Any modification (synth-on-the-fly, manual re-sign, ageing demote/archive) automatically invalidates the cache on the next turn.
The mode is chosen (local/online/hybrid — see ch. 3) and the LLM tier (fast/middle/wise — see ch. 4).
A unique turn_id is generated to identify this turn in the logs.

Phase 1 — pre-filter

The pre-filter ranks the catalog by relevance to the user query and selects a subset: the top-K. Without a pre-filter, we would have to pass to the LLM all executors (potentially dozens or hundreds), and the quality of the choice would degrade together with the latency. The pre-filter solves the problem in a trivial yet effective way (see ch. 5).

Phase 2 — step loop

At this point the turn enters a loop. For each step:

The LLM is called passing: the user query, the sub-catalog (as native "tools"), and the history of observations from previous steps.
The LLM replies in one of two ways:
- tool_call: it wants to invoke an executor. It returns the name and arguments in structured form.
- text: it has everything it needs, it produces the final answer and stops.
If it is a tool_call, the planner runs a series of checks:
1. Resolve from_step: if the args contain from_step: N (reference to a list produced by an earlier step), the runtime retrieves the list from the scratchpad and injects it as entries (see ch. 7).
2. Resolve references: if the args contain {{stepN.field}} (for non-list args, e.g. content, dst_template), it substitutes them with the real value (see ch. 7).
3. Validate: do the args respect the JSON Schema declared in the executor's manifest?
4. Sandbox check: is the requested path/host within the scope declared in the executor's hint?
5. Vaglio: does the constitutional evaluator give the green light on this specific use of the executor? Executor signatures were already verified at load time (phase 0); the vaglio does not re-verify components, it inspects the single call: is this delete_files(paths=...), with these args, in this context, allowed? Two phases: guard (binary: does it touch a forbidden path? Does it violate one of the 4 Laws?) and judge (graduated: alignment to the user's telos). See vaglio.html.
6. Guard duplicate read: are we re-reading the same path/url as a previous step? If so, intercept and suggest the LLM to formulate the final_answer.
If all checks pass, the executor is run as a subprocess. The observation (a JSON with {ok, content?, metadata?, error?}) returns to the planner.
If the observation is larger than a threshold (4 KB of JSON), it is saved to scratchpad and the LLM history gets a synthetic version with id + summary (see ch. 8).
The turn history grows by one step. The loop resumes from point 1.

Phase 3 — turn closure

The turn ends in one of the following ways:

final_kind = "answer" — the LLM produced the final answer.
final_kind = "ask" — a choice or clarification from the user is needed before proceeding.
final_kind = "awaiting" — the turn waits for an external action (e.g. a shared location) before completing.
final_kind = "needs_inputs" — a required datum is missing (e.g. a credential): a dialog opens to collect it.
final_kind = "cap_steps" — the step limit was exceeded (default 30).
final_kind = "cap_same_executor" — the same executor was called too many times (default 10; 2 for vectorial executors).
final_kind = "loop_break" — a loop was detected (the same step repeated with no progress): the turn stops.
final_kind = "error" — an external component is unreachable (e.g. the local llama-server down), or the catalog is empty, etc.

In every case, the planner writes a complete JSONL record of the turn (see ch. 11) and returns (log, final_message).

Example

Query: «fetch https://httpbin.org/uuid and tell me only the UUID».

Step 1 — the LLM proposes get_urls(urls=["https://httpbin.org/uuid"]). Execution: returns JSON {"uuid":"7c089d54-..."}.

Step 2 — the LLM reads the history, understands it has what it needs, produces final_answer: "7c089d54-...".

No pre-coding for "extract uuid": the local LLM reasons over the JSON and extracts the field on its own.

3. Modes: local, online, hybrid

The planner works in three modes, chosen by configuration. The main difference among the three is how many round-trips to the LLM are needed to complete a turn and where the LLM runs.

Mode	Loop	Typical LLM	When it makes sense
`local`	multistep ReAct (one round-trip per step)	local (llama-server)	Default. Maximum privacy, zero cost, decent latency.
`online`	single-shot (one round-trip for the whole plan)	frontier (Anthropic, OpenAI)	Complex tasks worth the money spent. A frontier model succeeds in one call where a local one would have needed five.
`hybrid`	local by default; escalation to online for critical tasks	mixed	Balance of cost and quality. For domestic Metnos.

The principle: the shape of the loop follows from the per-call cost. A free local LLM can afford to iterate; an expensive frontier LLM compresses everything into one call. Same planner, behaviour adapted to cost.

The default mode is local: everything runs on the local model, at zero cost and with maximum privacy. The online and hybrid modes route the more complex tasks to a frontier model; they require configuring an online provider and the routing rules in [runtime.hybrid].

4. The LLM tiers: fast, middle, wise (local) + frontier

Independently of the mode (which says where the LLM runs), the runtime exposes four tiers. The three local tiers all point to the same local model (on llama-server:8080, with MTP self-speculative): the difference is not the model but the per-call parameters (reasoning on or off, token budget). The fourth tier, frontier, is a cloud model, opt-in.

Tier	Characteristic	Model
fast	Direct answers, reasoning off, low token budget.	local model, `think=false`
middle	Intermediate reasoning. The tier of the intent extractor and the vaglio.	local model
wise	Maximum local reflection, reasoning on, high token budget. The tier that proposes the plans.	local model, `think=true`
frontier	Cloud model, opt-in, for the cases that warrant it. Accepts latency and cost.	Anthropic Opus 4.8 (online)

The engine's proposer uses wise. The intent extractor and the vaglio use middle. On-the-fly fillers (${FILLER}) use fast. The frontier is invoked only explicitly, as a last resort. The canonical source of the tiers is runtime/llm_router.py.

The three tiers coexist in the runtime; each component picks the one it needs.

Specialisation by role. Local tiers may share an endpoint or use different backends. Model, reasoning and token budget are role configuration; the caller does not depend on those choices.

Minimal config schema:

[runtime.llm.fast]
provider = "llamacpp"
model = "<fast-model-id>"
think = false

[runtime.llm.middle]
provider = "llamacpp"
model = "<middle-model-id>"

[runtime.llm.wise]
provider = "llamacpp"
model = "<wise-model-id>"
think = true

[runtime.llm.frontier]
provider = "<frontier-provider>"
model = "<frontier-model-id>"

Tiers declare roles, not a topology: they may share the same model or not. Differentiation includes endpoints, per-call parameters and the role-specific system prompt. The frontier tier is opt-in and may leave the machine only when explicitly invoked.

4-bis. Pipeline shape: `E+ (F | A)?` invariant

The planner may emit steps in an order that violates the data-flow. To prevent silent failures (a consumer without upstream data, an action without target, a step after a terminator), the runtime enforces a universal shape rule, deterministic, derived from the closed vocabulary §2.2. Every turn must match the regex:

E+ (F | A)? optionally followed by final_answer (always allowed)

Three categories, classified from the verb prefix (never from executor name):

Cat	Verbs	Role
E	`read, find, list, get, filter, sort, group, classify, compute, compare, extract`	Emits entries reusable downstream (producer-out).
F	`describe, render`	User-facing presentation. Closes the pipeline.
A	`move, delete, send, share, write, set, create, change, order, compress`	State mutation, output = outcome metadata. Closes the pipeline.

The five system pseudo-verbs (final_answer, undo_last_turn, request_new_executor, admin, request_disambiguation_from_user) bypass the FSM: meta-operations of the runtime, outside the data-flow.

Deterministic 3-state FSM implemented in runtime/pipeline_shape.py:

 E E F | A
START ─────► CHAIN ─────► CHAIN ─────► TERMINAL
 │ │ │
 │ F | A │ │ *
 ▼ ▼ ▼
ERROR VALID ERROR
(no source) (post-F|A) (post-terminator)

Minimal API: compute_state(history) → str rebuilds accumulated state; next_state(state, name, args) → (new_state, error_class) simulates the next transition. The helper has_literal_source(args) recognizes from_step or any non-empty list-arg as implicit producer, so delete_files(paths=["/x"]) is a valid 1-step pipeline (literal counts as implicit E).

Three illegal transitions, three canonical error_class:

Error	Cause	Strategy
`needs_data_source`	F or consumer-E without upstream source	auto-recoverable: cascade `intent.object → OBJECT_PRIMARY_TOOLS → find_urls`
`needs_action_target`	A without target (no `from_step`, no literal)	fail-fast: `get_inputs` dialog, NEVER external cascade (a mutation target is never fabricated)
`pipeline_already_closed`	step after F/A terminator	force immediate `final_answer`, log warning

The check lives in a single point in agent_runtime.py, right after the planner emits chosen_name. The FSM is a safety net: the planner prompt (rule 0-PRE in _core.j2) already teaches the pattern upstream, so runtime remediation rarely fires.

4-ter. Auto-remediation registry

The FSM reports violations via canonical error_class values; a centralized registry in runtime/auto_remediation.py maps them to remediation plans. Same pattern as install_on_demand for missing binaries, generalized to any prerequisite synthesizable on the fly.

@dataclass(frozen=True)
class RemediationPlan:
 prereq_tool: Any # str | Callable[[obs], str] ← dynamic
 hint_field: Optional[str] # None = pass whole obs
 arg_builder: Callable[[Any], dict]
 merge_field: str = "entries"
 merge_source: str = "entries"
 skip_retry: bool = False # fail-fast (get_inputs dialog)

Current registry, append-only:

error_class	Prereq	Hint / Strategy
`needs_content_fetch`	`read_urls_html`	fetch top-5 URLs from `needs_urls_html` hint, retry original executor with enriched entries
`needs_data_source`	dynamic chooser	cascade `intent.object → OBJECT_PRIMARY_TOOLS[obj][0] → find_urls`; retry with entries
`needs_action_target`	`get_inputs`	free_text dialog with `verb + object`; `skip_retry=True`, turn ends via `needs_inputs` orchestrator

Extending the pattern is one line in REMEDIATIONS. Future possible examples: needs_ocr → change_files_ocr; needs_embedding → create_<dom>_indices; needs_voicemail → read_messages_voice.

5. The pre-filter: choosing the sub-catalog

With a catalog of 30+ executors, passing them all as "tools" to the LLM blows up the prompt and dilutes the attention. The pre-filter ranks by relevance and passes to the LLM only the most promising ones.

How it ranks

The ranking is bag-of-words: it tokenises the user query, tokenises the affinity declared in each manifest, sums the matches (affinity matches score 4× those on the description, capped at 3 for description). Extracting the tokens means: lowercase, alphanumeric words, no accents.

Example

Query: «fetch https://httpbin.org/get». Tokens: {fetch, httpbin, org, get, https}.

Match against affinity of get_urls: web, http, url, fetch, scarica, leggi, pagina, api, rest. Match: fetch. Score: 2.

Match against affinity of read_files: read, leggi, lettura, file,... No match. Score: 0.

get_urls wins with high confidence.

Adaptive K

K (number of executors to pass to the LLM) is not fixed: it depends on the confidence with which the pre-filter distinguishes the top-1 from the others.

High confidence (top-1 well above) → small K (default k_min=5). No point in passing zero-score candidates.
Low confidence (close scores or all zero) → large K. Let the LLM decide.
K is in any case capped at the number of executors with score ≥ 1 (no padding).

In production the pool has a fixed configurable size via METNOS_ENGINE_POOL_SIZE (default 12). The pre-filter runs sub-millisecond up to 300+ executors.

Cross-object recall

A second mechanism, affinity_phrase_recall, injects into the pool any tool whose multi-word affinity tag (≥ 2 distinctive tokens, excluding stopwords and generic verbs) fully matches the query, even when the intent's verb or object differ. This unlocks cross-object recall: for instance, «which mail accounts do you have» activates find_credentials and read_persons even though the intent points to object=messages. The mechanism is deterministic and capped at 3 additional tools per query.

6. Native tool-use (no JSON parsing)

Modern LLMs (Anthropic, OpenAI from the start; llama-server for local models such as Qwen 3, Llama 3.1+, Mistral) support a native tool-calling protocol. The runtime declares the available tools in the API, each with its JSON Schema taken from the manifest. When the LLM decides to call a tool, it returns a structured field:

{
 "tool_calls": [
 {
 "id": "call_abc123",
 "function": {
 "name": "read_files",
 "arguments": {"paths": ["/tmp/note.txt"]}
 }
 }
 ]
}

It is already a Python dict, parsed by the HTTP protocol. No regex on text, no markdown blocks to extract, no edge cases of "the LLM forgot to close the brackets". When the LLM is ready for the final answer, it simply does not call any tool and produces only text: the planner recognises this as final_answer.

Consequence for manifests. The manifest of an executor declares the args in JSON Schema (section [args]) and that JSON Schema is passed directly to the provider as parameters of the tool. A single source of truth on the shape of the arguments, zero translation. See executor.html for the manifest details.

7. Data piping between steps: `from_step` and `{{stepN.field}}`

In multistep, the LLM at step N+1 needs to refer to the output of step N. Two distinct mechanisms, chosen based on the data type.

7.1 Lists: `from_step: N`

For args that receive a list of entries (mail, files, web_results, etc.) produced by a previous step, the runtime exposes a dedicated arg from_step: integer. The LLM passes the number of the step that produced the list; the runtime retrieves it from the scratchpad and automatically injects entries into the kwargs before invoking the executor.

Example: classify+filter+describe pipeline

// Step 1
{ "tool": "read_messages", "args": { "account": "knowcastle", "time_window": "today" } }
// observation handle: {ok: true, scratchpad_id, count: 12, list_field: "entries", schema: [...]}

// Step 2 (proposed by the LLM)
{ "tool": "classify_entries", "args": {
 "from_step": 1,
 "dimension": "relevance",
 "pre_filter": true
} }

// The runtime retrieves entries from scratchpad[step1] and invokes:
classify_entries(entries=[12 mails], dimension="relevance", pre_filter=true)

The schema of tools that receive lists declares only from_step: integer required. entries: array is no longer exposed to the LLM: the schema-guided decoder emits an integer, not an array of dicts, and the problem of inline data fabrication (LLM concocting plausible dicts instead of consulting the scratchpad) disappears by construction.

7.2 Single values: `{{stepN.field}}`

For args of NON-list type (strings, content, paths), the syntax remains the template placeholder:

N is the step number (1-indexed: step1 is the first).
field is the key inside the observation of that step. It can be nested: {{step1.metadata.path}}, {{step2.content}}.

Example: fetch and save

// Step 1
{ "tool": "get_urls", "args": { "urls": ["https://httpbin.org/get"] } }
// observation: {ok: true, content: "", metadata: {...}}

// Step 2 (proposed by the LLM)
{ "tool": "write_files", "args": {
 "path": "/tmp/out.txt",
 "content": "{{step1.content}}"
} }

// The runtime substitutes and invokes:
write_files(path="/tmp/out.txt", content="")

The syntax holds ONLY in the args. In the text of the final_answer write the actual values (e.g. "I wrote 173 bytes"); NEVER {{step1.bytes_written}}. The planner's prompt explicitly instructs the LLM about this limit. A violation of this rule causes the silent failure of the placeholder in the final text.

The reference must be the sole value of the arg, not interpolated within longer strings (limit of).

8. Scratchpad for large observations

When an executor returns a lot of content (a 100 KB file, a long HTML page, a verbose API body), passing it whole into the LLM history blows up the context. Cutting at 1500 characters loses useful information.

The solution: scratchpad. When an observation exceeds the threshold (4 KB of serialised JSON), the runtime saves it to a local SQLite (~/.local/share/metnos/scratchpad.db) and puts in the history a synthetic observation:

{
 "ok": true,
 "scratchpad_id": "eae04122bd704636",
 "size_bytes": 14144,
 "kind": "text",
 "summary": "hello this is a test note\n\n[... 13900 characters omitted...]\n\nINFO 2026-04-26 23:59:59 LAST_CRITICAL_EVENT\n",
 "metadata": {"path": "/tmp/big_log.txt", "bytes": 14144,...}
}

The summary is a smart truncation: the first 500 characters, a placeholder with the number of characters omitted, then the last 500. This way the LLM sees the start and end of the content.

The LLM at the next step, seeing the summary, decides:

If the summary is enough, it formulates the final_answer.
If it needs more, it calls the builtin executor scratchpad_read with mode:
- full: the entire content (not recommended if very large).
- head: the first N characters (default 2000).
- tail: the last N characters.
- range: an interval [start, end).

scratchpad_read is a builtin: it lives in the runtime, has no manifest on disk, is added to the tool catalog dynamically when active scratchpad entries exist in the current turn.

Full details in the dedicated doc: scratchpad.html.

9. Vaglio of the plan

The vaglio is the constitutional evaluator: before a tool_call becomes action, it decides whether it is lawful. In multistep it runs between one step and the next; in single-shot it runs post-hoc on the entire plan. The vaglio is active and works in two distinct phases.

9.1 Guardia (binary)

Blocks violations of the 4 Laws. The encoded rules are:

Forbidden paths: ~/.ssh, /etc/passwd|shadow|sudoers, /root, /boot, /sys, /proc, /dev/sd*|nvme*, ~/.aws/credentials, ~/.config/*/credentials.env, ~/.gnupg. If even just MENTIONED in a tool argument, the action is denied.
Quasi-irrecoverable shell commands (Law 1: no irrecoverable state): rm -rf /, rm -rf ~, mkfs, dd of=/dev/..., fork bomb, recursive chmod 7XX on the root. Match only for executors with capability: code:exec.

The list does not relax with autonomy level: it is the "non-negotiable core" of ch. 5. If the guardia stops, no score: the verdict is blocked_by="guard".

9.2 Giudice (graduated)

If the guardia lets it through, the giudice measures the alignment of the action to the user's telos in [0, 1]. Below the threshold METNOS_JUDGE_THRESHOLD (default 0.30, configurable via env) the action is denied with blocked_by="judge". Above, it is approved.

Today the giudice is rule-based: local heuristics, microseconds, zero cost. Base score 0.7, bonus if the intent mentions the executor name (signal of explicit intent), penalty for .. in path (possible path traversal), penalty for keys in args with non-alphanumeric characters (anomaly). The LLM giudice (middle tier, separate context from the proposer to avoid self-confirmation) is deferred: it requires middle tier configured + explicit budget. The deontology/teleology split is already in the right place, the giudice's implementation can evolve without touching the guardia.

The Verdict exposed by the vaglio module contains {approved, reason, score, blocked_by, judge_kind, ts}. The JSONL log on ~/.local/share/metnos/vaglio/YYYY-MM.jsonl records only the keys of args (not the values), for privacy.

10. Safety caps and runtime guards

The planner has three safety mechanisms against loops and ill-posed actions, based on real observed LLM behaviour.

Mechanism	What it does	Default
`cap_steps`	Maximum number of steps per turn.	30
`cap_same_executor`	Limit of calls of the same executor in the turn.	10 (2 for vectorial executors)
guard duplicate read	If the LLM re-invokes `read_files`/`write_files`/`get_urls` with the same path/url as a previous step, the runtime does not re-execute: it returns an observation that says "you already have this data at step X, formulate the final_answer".	active

The guard duplicate read avoids a common waste: without it, the LLM tends to re-read the same file with slightly different args hoping for a better result, ending up in cap_same. The guard intercepts upstream and unblocks the formulation.

Exception: scratchpad_read is not subject to the guard, because calling it more than once with different mode/range on the same scratchpad_id is the normal use case.

11. What it writes to logs

For each turn the planner writes a JSONL line to ~/.local/share/metnos/turns/YYYY-MM-DD.jsonl with:

turn_id — uuid of the turn.
ts_start, ts_end — Unix timestamps.
user_query — original user text.
mode — chosen mode (local/online/hybrid).
candidates — names of the executors passed to the LLM after pre-filter.
steps — list of steps with: number, llm in/out tokens, latency, tool called, raw and resolved args, validation/sandbox/vaglio outcome, result.
final_message — final text to the user.
final_kind — one of answer | ask | awaiting | needs_inputs | cap_steps | cap_same_executor | loop_break | error.

JSONL append-only, one file per day. No automatic rotation (with normal use ~3 MB/month, negligible).

11.1 Writing into the mnestome

In parallel with the JSONL log, the planner updates the mnestome (SQLite, single file). Two simple hooks, activated only when there is observed piping between steps:

mnest active — at the end of a step with obs.ok = true, if the raw_args contained at least one resolved {{stepM.field}} reference and executor M was real (not a proto, not scratchpad), it invokes Mnestoma.record_passing(src=executor_M, dst=current_executor, dst_exists=True). The mnest grows or is born with bootstrap weight; every future passage reinforces it.
proto-mnest — when the LLM calls a tool name that the catalog does not contain (nonexistent_executor) and the raw_args had references to previous steps, it invokes record_passing(src=executor_M, dst=desired_name, dst_exists=False, desired_signature=...). The desired signature is inferred conservatively from the requested tool name, args and turn context (build_desired_signature).

Without piping no mnest: the isolated invocation of a single executor does not represent a «passage between A and B» in the sense of ch. 2 of mnest. The write is fail-safe: an error on the mnestome is logged (verbosely) but does not interrupt the turn.

11.1.1 Synt-on-the-fly: immediate suggestion to the planner

When a proto-mnest is just registered (the case above: nonexistent tool with piping from a previous step), the planner immediately tries a reactive synthesis compose-only by calling Synt.react(req) with router=None (the generate mode stays reserved for the nightly scheduler and the introspective cycles of ch. 11.2). The outcome of this call, if positive, is added to the observation as a synt field:

if state == "composed": the composer found a chain of signed executors that closes the proto-mnest. The observation carries {strategy: "compose", state: "composed", chain: [...], first_hop: "X", suggestion: "Retry by invoking 'X' as the next step"}. The planner does not re-launch the first hop automatically: it lets the LLM decide whether to follow the suggestion at the next step (preserving ReAct discipline — the LLM remains the master of the sequence).
if state == "abandoned" or "rejected": the observation carries {state: "abandoned", suggestion: "There is no executor available for this need, look for another way"}. Non-retreat telos: the planner does not give up at the first error, it tells the LLM the road is closed so it can search for another.

Cost: one call to Composer.find_chain (BFS on the mnestome, milliseconds) and a possible lock-check. No LLM, no budget. The call is fail-safe: any synt exception does not propagate and the observation remains the default one ("nonexistent executor: X"). See runtime/agent_runtime.py:_try_synt_compose.

11.2 Builtin scheduler

The scheduler is a builtin of the runtime, not an executor: it has no capability, is not signed, has no sandbox. It is a cron-style loop that runs recurring system tasks without user input. Three schedule supports:

daily@HH:MM — every day at hour HH:MM (UTC), once.
every_N_minutes — every N minutes since the last run (or immediately if never).
manual — only via scheduler run-now <task>.

State persisted in workspace/.scheduler/state.sqlite (table tasks with last_run_at + last_status, table runs append-only). The due-time check is idempotent: two ticks in the same slot do not run the task twice.

Built-in tasks registered by default (defined in runtime/scheduler_v2/builtin_callbacks.py::_BUILTIN_JOBS, auto-installed on first HTTP daemon boot via install_default_jobs(scheduler)):

Name	Schedule	What it does
`i18n_translate_pending`	`daily@02:00`	Translates 20 keys in `i18n.sqlite` flagged `needs_translation=1` (GPU throttling cap). Wise tier default. See multilang.
`images_index_refresh`	`daily@03:00`	Incremental refresh of the unified images index: walk + stat ~11s, pipeline EXIF + ArcFace + VLM + BGE on new/modified.
`apply_executor_ager`	`daily@03:30`	Decay of inactive executors: active → deprecated after 30d idle; deprecated → archived after another 14d.
`apply_ager`	`daily@04:00`	Calls `Mnestoma.apply_ager`: decay + demote + proto purge on the mnestome.
`synt_suggest`	`daily@04:30`	For each recurring proto-mnest (`uses≥3`, `weight≥0.30`) calls `Synt.react` in compose-only and logs the outcome.
`multi_tool_maintenance`	`daily@04:30`	Housekeeping for the L2 fast-path: expire stale (TTL N effective-activity days, default 30) + promote mature pipelines (`uses≥K_synth`, default 50) into proto-mnests in the mnestome for `synth_request`.
`proposals_eta_aggregate`	`daily@04:30`	Latency aggregator per `path_shape`: scans turn JSONL from the last 7d, computes p50/p95 into `proposals_eta.sqlite`.
`promoter`	`daily@04:45`	Promoter daemon: evaluates + promotes synth proposals via `proposal_evaluator` (6 killer + 7 signal → verdict).
`introvertiva_propose`	`daily@05:00`	Introvertiva: produces dedupe proposals (orphaned/duplicate mnests) and projects them into proposals_state (no auto-apply, JSONL audit). The generalize and specialize generators were retired on 2026-07-02 (ADR 0180: the layer rule).
`proposals_cleanup`	`daily@06:00`	Lifecycle backlog maintenance: archive aged synt_proposals, dedupe candidates, auto-decay legacy_orphan. NO delete: move + UPDATE only.
`lifecycle_summary`	`daily@06:30`	READ-ONLY aggregator of the 4 agers: produces a daily summary of the mnestome lifecycle.
`skill_sandbox_watchdog`	`daily@06:35`	Checks the trigger threshold for per-skill sandbox (≥ 5 third-party skills OR ≥ 1 paired guest); notifies admin via Telegram.
`promoter_digest`	`daily@07:00`	Telegram digest of proposals in promoted_grace not yet user-approved.
GitHub maintenance	`run_user_query`	GitHub maintenance now runs via the canonical executors (`find/read/write_issues`) driven by scheduled user commands; there is no always-on watcher.

Schedule entry lifecycle. Rows live in ~/.local/state/metnos/scheduler_v2.sqlite (schedule_entries table). Three guarantees:

Fresh install boot: install_default_jobs is called on first HTTP daemon startup (metnos_http_server.py:131) and inserts the _BUILTIN_JOBS with INSERT-OR-IGNORE — existing rows are preserved (keep last_run_at, total_runs...).
Subsequent boot: the scheduler reads persistent entries, computes next_fire_at, starts the loop. Events missed during downtime are ignored (next valid window): no chaotic catch-up.
Shutdown: scheduler.stop(timeout=5s) sets shutdown_evt, waits for the current task to finish, then pool.shutdown(cancel_futures=True). In-flight jobs past the timeout are cancelled gracefully.

The daemon loop (scheduler daemon) is a single process that runs tick every 60s and goes idle. No concurrency, no inter-process locking: the scheduler in is a singleton on metnos-server. The error policy is "do not let the loop fall": every task exception is caught, marked in last_status='error' with traceback in last_output, and the tick continues onto the following tasks.

The scheduler design as builtin is consistent with the decision in the Dialogue on executors: system maintenance (decay, nightly synthesis) is part of the runtime, not an executor that the system "decides to call". See also the memory builtin executors proposals for the future builtin triad (scheduler, ager, snapshot).

11.2.1 User recurring tasks

On top of the builtin scheduler runs runtime/recurring_tasks.py: the registry of recurring tasks defined by the user via the conversational channel ("every five minutes check important mail"). Seven builtin tools are callable by the PLANNER: schedule_recurring, cancel_recurring, list_recurring, show_recurring, toggle_recurring, history_recurring, run_now_recurring.

Callback resolution is by string key: the task persists the symbolic callback name; the dispatcher resolves it at runtime against a centralized registry. No serialization of callable, no stale references on daemon restart.

A callback may return a structured CallbackOutcome: success, partial, or error, with output and error kept separate. For run_user_query, status comes from the pipeline's actual effects and failures. Delivering a channel message no longer turns a failed turn into success; errors nested in failed[] feed last_error and the circuit breaker.

Fine semantics: times governs the task lifetime (1 = one-shot, N = max executions, NULL = forever) with auto-cancel on completion; grace_window_minutes enables recover-missed with created_at as discriminator (a task created after the scheduled time does not fire retroactively); is_due applies the rule "done=false && time>scheduled && !new_post_target". The daemon loop has re-entrancy lock + cooperative timeout + per-task try/except, supervised by metnos-scheduler.service (systemd user, Restart=always). Async-ready scaffolding (local refactor, not global): _run_with_timeout standalone, _acquire_lock/_release_lock abstract, dispatch_callback with coroutine detection + sync→async bridge.

11.2.2 Location request UX (PLANNER §2-quater)

When get_location returns ok:false on a location-relative query (e.g. "the nearest pharmacy"), the PLANNER triggers the builtin tool request_location_from_user: an atomic dialog that asks the user for the location over the channel and resumes the plan on receipt. The pattern mirrors §2-ter (atomic undo): a tool that halts and resumes the turn without intermediate steps polluting the history. The "atomic dialog tool" pattern family follows this same approach.

11.2.3 Hybrid geo provider

runtime/geo_provider.py introduces the hybrid strategy Google Places primary + Photon fallback: Google's POI coverage is superior to OSM in observed real cases (live case: a pharmacy missing from OSM, present in Google), but Photon stays as fallback when Google is down or its key is unconfigured. The "OSS-first" policy is unchanged for other capabilities: for POIs, the coverage advantage justifies the exception.

11.2.4 Multi-user step 1: actor resolver

runtime/actor_resolver.py implements step 1 of (host + guest model): each pairing exposes an actor field; new pairings get auto-assigned host (first one) or guest_<id6> (subsequent). The actor is propagated end-to-end across the turn (log, scratchpad, vaglio, scheduler) and is the discriminator for per-user decisions (telos, autonomy, scheduler quotas). The approval router stays single-channel in step 1; per-user channel separation enters step 2.

11.2.5 Centralized i18n DB

runtime/i18n.py + the timer daemon runtime/i18n_translator.py centralize user-facing texts and LLM-targeted prompts in a single SQLite DB (~/.local/share/metnos/i18n.sqlite). Auto-invalidation by source_hash: a text changes → all translations are regenerated. Two prompt templates: one for user-facing texts (middle tier, batched), one for LLM-targeted prompts (wise tier, 1-per-call for quality). Admin CLI in runtime/admin/i18n_cli.py. All runtime modules load strings via i18n.t(code, lang, **kwargs); this supersedes the local tables in runtime/messages.py (which remains for internal non-user-facing templates).

11.2.7 Multilingual support (three layers, latest wins)

The multilingual subsystem of Metnos lives across three layers: (1) LLM prompts in runtime/prompts/<lang>/<role>.j2 (MiniJinja, 26 roles); (2) executor descriptions in TOML manifests ([description].<lang> table + companion manifest.lang_state.json); (3) user-facing messages in i18n.sqlite (118 standard keys + 79 migrated descriptions). The alignment rule is simple: the latest edit wins. No language is canonical by construction; whichever was edited most recently prevails (version_hash + source_hash per resource). The nightly daemon i18n_translator.run_loop regenerates candidates in _pending/; metnos-prompts review shows the diff, mark-synced promotes. Adding a new language is a single command: metnos-prompts add-language fr. See the canonical document multilang.

11.2.6 Centralized config.py + logging_setup

runtime/config.py exposes 24 tunable constants and 11 paths with env override (METNOS_INSTALL_ROOT, METNOS_USER_DATA, METNOS_LOG_LEVEL, etc.). runtime/logging_setup.py sets up a root logger metnos.* with stdout + rotating file. Companion clean-up: 54 occurrences of except: pass replaced with log.warning + pass across 17 modules. Still in TODO: migration of 19 legacy paths and 9 residual constants to config.C.

12. When things go wrong

Five typical failure modes, all covered by permanent test cases in the test framework.

What breaks	What happens
llama-server unreachable	The provider raises `ProviderError`; the planner ends the turn with `final_kind=error` and a clear message.
Executor crashes (uncaught Python exception)	The subprocess exits with stderr, the runtime returns `{ok: false, error: "non-JSON output: …; stderr: …"}`.
Executor code modified after signing	The loader rejects it at load time (digest mismatch). The executor never enters the catalog.
Executor returns non-JSON stdout	The runtime detects it and returns `{ok: false, error: "non-JSON output"}`.
Empty catalog	The turn ends cleanly with `final_kind=error`, message `"(empty catalog)"`.

The principle: do not let the system fall; always return a structured response, even when it is an error. The user understands, the LLM at the next step (in multistep) can correct, downstream executors do not see strange input.

13. What is deferred

Feature	When it lands
Real `online` mode	When an Anthropic provider with API key is configured.
`hybrid` mode with real escalation rules	When the vocabulary of `critical_capabilities` is decided and auto-routing is desired.
Probabilistic vaglio (LLM-judge)	When the `constitution` exists as reference.
Auto-escalation of tier (fast → middle → wise if steps fail)	When at least two distinct tiers are configured and a use case motivates it.
Per-tier prompt differentiation (fast/middle/wise) even with the same model	When the use case requires multiple points of view on the same situation.
Parallel-tool-call within a turn	When a real case shows significant latency win. Binding constraint.
Async approval (deferred execution with durable per_target grants)	When channel + scheduler require asynchronous interaction.
Mnestome history-driven in the pre-filter (boost from history)	When the operational mnestome exists.
Embedding (local MiniLM) in the pre-filter	When bag-of-words shows practical limits (not yet observed in production).
Automatic replay of orphan turns from a crash	When executors are guaranteed idempotent.
Automatic JSONL rotation	When the volume becomes relevant.

Closing notes

This document describes what the planner does today, validated by tests. The “deferred to” sections describe what we expect, not promises. When an extension lands, this doc will be updated: rather than speculating on how it will be, we write what works.

The layers, top to bottom. The first two have state (they learn from past turns); the last two are stateless (they decide on the current turn):

L0 — fast-path: exact hash match plus cosine search through the configured embedder. Approved from the chat with the «approve fast-path» button; wins over autopath.
L1 — autopath: semantic cluster match through the configured embedder, indexed on the intent hash, with champion/challenger scoring and a TTL on anti-autopaths.
L2 — validator: a guard that can re-propose once before the engine runs (on by default).
L3 — engine: the full ReAct loop — proposer, executor, recovery, terminator. The METNOS_ENGINE selector picks simple|metis|frontier.

Metnos

Index

1. What it does: the story of one request

2. The loop of one turn, step by step

Phase 0 — preparation

Phase 1 — pre-filter

Phase 2 — step loop

Phase 3 — turn closure

3. Modes: local, online, hybrid

4. The LLM tiers: fast, middle, wise (local) + frontier

4-bis. Pipeline shape: E+ (F | A)? invariant

4-ter. Auto-remediation registry

5. The pre-filter: choosing the sub-catalog

How it ranks

Adaptive K

Cross-object recall

6. Native tool-use (no JSON parsing)

7. Data piping between steps: from_step and {{stepN.field}}

7.1 Lists: from_step: N

7.2 Single values: {{stepN.field}}

8. Scratchpad for large observations

9. Vaglio of the plan

9.1 Guardia (binary)

9.2 Giudice (graduated)

10. Safety caps and runtime guards

11. What it writes to logs

11.1 Writing into the mnestome

11.1.1 Synt-on-the-fly: immediate suggestion to the planner

11.2 Builtin scheduler

11.2.1 User recurring tasks

11.2.2 Location request UX (PLANNER §2-quater)

11.2.3 Hybrid geo provider

11.2.4 Multi-user step 1: actor resolver

11.2.5 Centralized i18n DB

11.2.7 Multilingual support (three layers, latest wins)

11.2.6 Centralized config.py + logging_setup

12. When things go wrong

13. What is deferred

Closing notes

4-bis. Pipeline shape: `E+ (F | A)?` invariant

7. Data piping between steps: `from_step` and `{{stepN.field}}`

7.1 Lists: `from_step: N`

7.2 Single values: `{{stepN.field}}`