Version 2.0 — June 2026
Reference: Metnos 0.1.0 (pre-1.0) — a daily-driven system
Self-contained HTML — printable as PDF
Audience: anyone who wants to understand, in 30 minutes, how Metnos is built and why,
without jargon but without naivety. Fifteen chapters, fifteen diagrams.
Metnos is a self-hosted personal assistant with an unusual idea
at its core: instead of shipping with a fixed catalog of tools, it
synthesizes its own executors — small signed programs,
generated on the fly inside a closed vocabulary — and orchestrates them
with a local LLM planner. The cloud is required neither for
thinking nor for acting: frontier models are an optional consult, not the
engine.
The name comes from mētis (cunning intelligence) +
noûs (mind). It lives on a machine under your physical and legal
control; you talk to it from the channels you already use —
Telegram or the browser (port 8770) — and it
touches files, mail, photos, calendar, web and GitHub: only what you
switch on, one skill at a time.
Figure 1 — Metnos at a glance. The process lives on your machine: channels receive, the mind plans with the local LLM, the guards filter, executors act on the backends you enabled. The frontier tier is the only thing outside the fence — and it is opt-in.
The identity card
Item
Actual state
Shape
Python ≥ 3.11 process, executor-based microarchitecture; ReAct runtime with one-shot planning (the Mētis engine, ch. 5).
Tools
79 signed executors in the repo, plus those synthesized on the fly by the instance (ch. 7) and those imported behind a gate (ch. 7). All vectorized: list in, list out.
Brain
Local LLM served by llama-server; four abstract tiers fast / middle / wise / frontier (ch. 8). Frontier = cloud opt-in.
Channels
Telegram (outbound long-poll, no open ports) + web on port 8770 (chat and admin dashboards), ch. 11.
Senses
In-process image pipeline: semantics + faces + EXIF in one unified index (ch. 10).
Language
i18n by construction: every string and prompt is per-language data. IT + EN validated; other languages = drop-in translation packs (not yet tested).
License / status
AGPL-3.0; pre-1.0. Public repo: github.com/brunialti/metnos — a deterministic export-subset of the daily-driven instance.
An honest showcase, not a polished product. Metnos is a real
system, used every day — but built by one person, for one person, on one
machine. It is shared so that homelab and AI-architecture enthusiasts can
read it, run it, and build on it. Many capabilities exist but have barely
been exercised outside the reference instance.
2. The three bets
The whole project rests on three architectural bets. They are deliberate
positions, not optimizations: each one reverses a widespread habit of
agent frameworks.
Figure 2 — The three bets. Each one reverses an agent-framework habit: skills imported on trust, cloud-first design, the LLM as an oracle re-rolled every turn.
The comparison, with no discounts
Typical agent framework
Metnos
Tools
Hand-written, imported or generated free-form, then run as-is with the assistant's privileges
Synthesized at runtime too — but from a closed, audited vocabulary: signed, aged, smoke-tested and screened before they can ever run
Safety
Trust the author of the package
Don't trust the package: the package must pass the checks (7-layer gate, ch. 7)
LLM
Often cloud-first
Local first; frontier opt-in
Routing
The model picks a tool each turn — non-reproducible
Deterministic by construction: seed-pinned local inference, ties broken by curated affinity (ch. 8)
Output
Free-form, different per tool
Uniform: list in / list out, pipeable between steps (ch. 6)
Undo
Rare or best-effort
First-class: a closed catalog of reverse patterns, moves = COPY-then-DELETE, honest ok_count (ch. 12)
Language
English only, strings in the code
i18n by construction: strings and prompts are per-language data
Why determinism pays off. Most agents treat the LLM as an
oracle to re-roll: ask twice, get two different plans. Metnos makes the
opposite bet: a local planner, constrained to a closed
vocabulary, can be made reproducible. Routing can then be measured, put
under regression tests, audited — like ordinary software. And it
compounds: a request that has been solved once is replayed by a fast path
with no LLM call at all (ch. 9).
3. The key concepts, in seven cards
Seven words carry the whole document. Defining them now saves you half an
hour of confusion thirty lines from here; each one has its own microdesign
page in architecture/.
executor — an executable capability: a small program that does
one thing well (read files, send an email, move messages, search photos).
It accepts lists as input and produces lists as output, carries a
manifest that describes it, an Ed25519 signature that authenticates it and a
sandbox profile that confines it. It is the only class of things that
act in the system.
closed vocabulary — every executor is named
verb_object[_qualifier[_descriptor]], composing 23 canonical actions
and 22 canonical objects plus qualifiers in four families. It is not an
aesthetic convention: it is the boundary of what the system can name —
and therefore synthesize. New terms enter only through explicit governance
(necessary · general · understandable).
manifest — the TOML identity card of an executor: a description
in prescriptive chapters (SCOPE / PATTERN / NOT / OUT), the argument schema,
affinity keywords, the reversibility pattern, the code digest. It is not
documentation for humans: it is the tool's prompt, written so that a
mid-size LLM uses it well (ch. 6).
synt — the process that brings into existence what the pool cannot
do yet: a cascade of strategies ordered by cost that first composes existing
executors and only as a documented exception generates new code, in five
stages plus a semantic check (ch. 7). It proposes; the human approves.
vaglio — (Italian for «sifting») the filter that always sits
before execution: a deterministic guard (forbidden paths, unrecoverable
commands) followed by a judge that weighs grey-zone operations and, above
threshold, asks the user for explicit confirmation with buttons on the
channel (ch. 12).
mnest · mnestome — a mnest is the thread linking two
executors that were activated together: it is born from context, reinforced
by use, and decays if not reused. The mnestome is the graph of all
mnests: the system's associative memory, on SQLite, curated by a nightly
process (the ager). It gives the planner the intuition of «which
executor usually follows which» (ch. 9).
skill ↔ backend — two orthogonal axes: a skill decides
whether a group of capabilities is active, trusted and configured
(dormant until its prerequisite appears); a backend decides how an
action runs against a concrete service (calendar = local ICS or Google),
chosen by configuration — never by the LLM. The planner never sees the
provider.
The anatomy of a name
The closed vocabulary is the project's most fertile idea: it makes names
composable (the planner can predict what a capability it has never seen
is called), filterable (the prefilter reasons over verb and object) and
synthesizable (synt cannot name anything outside the grammar).
Figure 3 — The anatomy of a name. Four positional levels, the last two optional; the five producer verbs are distinguished by their primary input, so the planner never has to choose among synonyms.
4. The layered architecture
Metnos is an onion: the outside talks to the world, the inside executes.
Each layer trusts only the one beneath it, and privileges shrink as you move
toward the core. A request — whether from a user or from a scheduled
task — crosses all of them, in order.
Figure 4 — The seven real layers, with the modules that implement them. The cognitive engine (layer 3) is the heart of chapter 5; the guards (layer 4) sit always between the plan and the effect.
Channels — a channel is an adapter: it converts an external interface (Telegram, browser) into messages and replies. Adding one does not touch the core (ch. 11).
Turn runtime — the shell that measures and orchestrates: per-phase telemetry (intent_ms, prefilter_ms, vaglio_ms, exec_ms), safety caps, logs.
The Mētis engine — plans once, executes deterministically, recovers with judgment, and when there is no way out, says so (ch. 5).
Guards — no bare subprocess, ever: every effect passes through policy, vaglio and sandbox (ch. 12).
Executors and backends — who acts and against what: the skill↔backend separation keeps the provider out of the planner's head (ch. 3 and 6).
Tissues — what survives between turns: associative memory, learned shortcuts, undo history, audit.
5. Anatomy of a multitool turn
If you read only one chapter, read this one. We follow a real request
— «find the spam mails and move them to the trash»
— from entry to answer: four tools chained together, a single call to
the model, every step measured and annotated.
5.1 The cascade, step by step
The ground rule: the model is the last resort, not the first.
Memory is tried first (zero LLM, milliseconds); if the request is new, the
model is asked once, for the whole plan; the execution that follows is
pure deterministic mechanics.
Figure 5 — The anatomy of a multitool turn. Shortcuts (0 LLM) are tried first; if the request is new, the Proposer asks the model for the whole plan in a single constrained call; execution is deterministic, with the vaglio in front of the only state-changing step. On the right, the two error paths: targeted recovery and the honest dead end.
Literal shortcuts. A closed table recognizes the most common phrases («what time is it») in microseconds. Here: no match.
Intent. One call to the fast tier (reasoning off, ~0.4 s) extracts the canonical verb, the object and keywords. Compound requests become an ordered list of clauses, each with its own pool.
Plan memory. Fastpath (shortcuts you approved with the ★ button) and Autopath (plans learned on their own) answer without the model if they recognize the request. Here: miss, it's the first time.
Prefilter. The catalog shrinks to the relevant pool for the clause: verb+object match, qualifier bonus, and — to break ties among siblings — the curated affinity bonus (cap +3). All deterministic: same query, same pool, same order.
Mētis Proposer. ONE call to the wise tier produces the whole plan: steps, links, final message. It generates up to N candidates (adaptive, with early-stop), each physically constrained by the pool's GBNF grammar; a teleological ranking picks the best.
Validator. A typecheck of the plan before running it: existing tools, well-formed args, real references. A trivial error costs one re-proposal, not one wrong execution.
Execution. Pure mechanics: for every step the runtime resolves the placeholders, passes through the vaglio, invokes in the sandbox, accumulates the observation. Caps: 12 steps per turn, same executor max 3 times in a row.
Closing. The final message is a template filled with the real results. If the turn succeeds, Autopath records it: next time we jump straight to point 3.
5.2 The plan: what the model actually proposes
The Proposer does not produce prose: it produces a structured object —
steps, slots to fill (fillers), final message. This is the real plan
for our request:
{
"steps": [
{"tool": "find_messages",
"args": {"folder": "INBOX", "query": "is:unread"}},
{"tool": "classify_entries",
"args": {"from_step": 1, "dimension": "spam"}},
{"tool": "filter_entries",
"args": {"from_step": 2, "where_field": "spam", "where_value": "spam"}},
{"tool": "move_messages",
"args": {"from_step": 3, "dst_folder": "${FILLER:trash_folder}"}}
],
"fillers": {
"trash_folder": {
"prompt": "What is the trash folder called for this account?",
"default": "Trash",
"tier": "fast"
}
},
"final_message": "Moved ${step4.ok_count} mails to the trash."
}
Worth noting: the model does not know the account's trash folder name
— and does not make one up. It declares a slot
(${FILLER:trash_folder}) that the runtime will fill at the right
moment with a cheap micro-call (cached) or with the default.
5.3 Data piping: how the steps talk to each other
Placeholder
What it does
from_step: N
Take the entries produced by step N (1-based) and pass them whole to this step. Lists travel only this way: never pasted back into the prompt.
${stepN.field}
Extract a scalar field from step N's result (nested paths supported). Used mostly in the final message.
${FILLER:name}
A slot filled on the fly by a micro-call to the fast tier (cached) or by the declared default.
${RUNTIME:key}
Turn context, resolved by the runtime: actor (who is speaking), lang, channel.
Figure 6 — The plan of Figure 5 seen as a data flow. Lists stream between steps via from_step; scalars, slots and context pass through typed placeholders that the executor resolves deterministically.
When a cap bites, you see it. If a limit truncates a result
(entries, bytes, steps), the executor declares it in the fields
(truncated: true, used, available_total) and the runtime
says so in the reply — offering to widen only if technically
possible, and never widening on its own. A partial result presented as
complete is considered a bug, not an optimization.
6. Executors: vectorized by construction
Every executor accepts a list and returns a
list — even when the list has zero or one element. There is
no *_batch anywhere: the batch version is the executor. It is
the decision that keeps plans short and results composable.
Figure 7 — The vectorized contract. Zero, one or a thousand elements cross the same code; caps are explicit arguments and truncation is declared in the fields, never hidden.
Three conventions follow from the contract, and you will see them everywhere:
entries vs results — whatever enriches or reads a list returns entries (the record schema is preserved, the pipeline can continue); whatever transforms (move, write, delete) returns results (the schema changes: outcomes, not records).
Robustness at the natural-language boundary — 0 as a placeholder means «no limit»; comparisons are case-insensitive by default; on open text domains values with */? are globs, on closed domains (ids, slugs, scopes) matching is strict and exact. LLM biases never turn into silent failures.
Honest counting — ok_count counts the elements that were actually processed. Never declare an outcome that does not match reality.
The manifest: the tool's prompt
Every executor carries a TOML manifest. It is not courtesy documentation:
it is what the planner reads when it decides whether and how to
use the tool — written for a mid-size local LLM, not for a frontier
model. Short sentences, literal examples, defaults spelled out; the
description follows four prescriptive chapters:
[description]
en = "SCOPE: search files by pattern in directory.
PATTERN: find_files(base_path=\"/\", patterns=[\"*.jpg\"]).
NOT: list_dirs+filter_entries; get_files (ID lookup).
OUT: entries=[{path,name,type,mime,kind,size,mtime}]."
Figure 8 — One manifest, four consumers: prefilter, planner pool, grammar and undo each read different fields of the same TOML. The digest binds the manifest to the signed code.
7. Synt: the tool factory
When the pool cannot do something, the planner does not improvise code in
the middle of the turn: it hands over to synt, the process that
brings into existence what is missing. It first tries to compose
existing executors; only as a documented exception does it generate a
new one — in five stages, each with its own contract.
Figure 9 — The synthesis pipeline: four procedural stages on the middle tier, the code on the top tier, then independent semantic verification, signature and birth tests. The multi-stage design converges where the single prompt failed.
Two triggers, one cascade
Mode
Trigger
Timing
Reactive
During a turn: the planner finds no executor that satisfies the request.
Synchronous — the user is waiting; composition of existing executors is tried first.
Introvert
At night: the ager walks the mnestome and finds recurrences, overlapping traces, families with the same shape.
Asynchronous, in homeostasis: it proposes merges, generalizations, specializations.
In both cases the same rule holds: synt proposes, the human
approves. No self-modification without a filter; every proposal comes
with its rationale, and is reversible.
The 7-layer gate
The same funnel applies to synthesized code and to skills imported
from outside: no package runs on trust.
Figure 10 — The 7-layer gate, identical for synthesized and imported executors: signature, vocabulary, usage quarantine, sandbox, smoke test, semantic verification, audit. Only at the end of the funnel does a package become a trusted executor.
8. Four tiers, one deterministic routing
Tiers are abstract roles, not pinned models:
fast / middle / wise are assignments you bind to whatever endpoint
you have, and frontier is the only cloud opt-in. In the reference
instance the three local tiers all point to the same instance of
llama-server: only the per-call parameters change.
Tier
Role
In the reference instance
fast
Short structured extractions: intent, fillers, classifications. Reasoning off.
llama-server :8080 — a quantized ~35B MoE, think=False, short replies. Mandatory (the safety net).
The planner: proposes the whole plan; writes the stage-5 code.
Same instance. Mandatory: it never degrades to fast.
frontier
An external consult when explicitly requested (e.g. analyzing an issue).
Cloud API, opt-in, with managed fallback if the key is absent.
Tier ≠ model. No GPU or NPU is required by construction: a
CPU endpoint, a model you already serve, or the frontier fallback are all
first-class paths. A weaker local model means weaker planning, not a broken
install.
The three locks of determinism
An LLM at temperature zero is not enough to make routing
reproducible: the local server stays non-deterministic because of
speculative decoding with a random seed. Metnos closes the door with three
locks, one per noise source:
Figure 11 — On the left, tiers as roles bound to a single local instance (frontier aside); on the right, the three locks that make routing reproducible: pinned seed, curated affinity for ties, GBNF grammar on the decode.
No fragile parsers. Tool use is native: the model emits
structured tool_calls, and the grammar guarantees the shape upstream.
There is no fishing JSON out of prose — the classic weak point of
home-grown agents.
9. The memory that speeds things up
Metnos trains no models: no fine-tuning, no RLHF. Everything it learns is
inspectable data — plans, traces, shortcuts — and anything
learned can be read, corrected, deleted. The practical effect: the more you
use it, the less it calls the model.
Figure 12 — The circle of learning without training: successful plans become shortcuts (Autopath, Fastpath ★), co-activations become mnests, nightly recurrences become synthesis proposals. Everything is readable, reversible data.
10. The senses: the image pipeline
To search your photos, Metnos ships nothing to anyone: three
in-process extractors turn every image into three signals —
what is seen, who is there, where and when — fused into one unified
index queried through the ordinary vocabulary.
Figure 13 — The image pipeline: SigLIP for the scene, RetinaFace+ArcFace for identities, EXIF for place and time. The three signals converge into a unified index queried by an ordinary executor of the vocabulary.
A search arrives from the channel like any other request: the planner
composes find_images_indices with the criteria extracted from the
sentence, and the channel shows inline previews. Building the index is a
background job, incremental and restartable, started with a sentence
(«index the photos in…»).
11. The channels: Telegram and web
A channel is an adapter: it converts an external interface into messages
and replies, plus one optional capability — rendering buttons for
confirmations and choices. Two channels come with the install; adding more
does not touch the core.
Figure 14 — The two channels. The browser talks directly to the server on 8770 (streaming chat + dashboards); Telegram works by outbound long-poll, so no open ports and no public IP. Below, the pairing that decides who may speak.
Channel
What it offers
Web :8770
Chat in the browser with streaming replies (SSE), image previews, feedback badges; admin dashboards for proposals, executors, runs, safety and turns. The same API answers JSON or HTML depending on Accept. Admin key auto-created on first start, file with 0600 permissions.
Telegram
Your personal bot: messages, photos, inline buttons for vaglio confirmations and multiple-choice inputs. Pairing via the /pair command and a signed, expiring code.
12. Safety and reversibility
Safety is not a module: it is a chain of independent guards, and an action
must pass all of them. And since even the best guard makes mistakes,
the last defense is being able to go back: honest undo, by construction.
Figure 15 — Five guards in series (pairing, policy, vaglio, sandbox, signature+audit) and, below, the safety net: an undo with a closed catalog of reverse patterns, verified copies before any deletion, and honest counts.
The power is real, which is why it is bridled. Metnos can
genuinely administer the machine (shell, sudo, packages, mounts)
— that is what makes it a host assistant rather than a chatbot. But
every privileged action passes through the vaglio with explicit
confirmation, runs in the sandbox, lands in the audit; and the whole system
skill can be disabled, locking Metnos out of the operating system.
13. The principles, in eight cards
If you remember only eight sentences from this document, make it these.
Everything else — code, prompts, conventions — follows from here.
1Vectorized by construction. Every executor accepts a list and returns a list, even a degenerate one. The batch version is the executor: *_batch does not exist.
2A closed, governed vocabulary. Everything that acts has a composable name inside a closed grammar. A new term enters only if necessary, general and understandable.
3No silent failure. Counts reflect what actually happened; truncation is declared, not hidden; a partial result presented as complete is a bug.
4Deterministic > LLM. Where an automaton or a table suffices, the model is not used. The LLM enters where an equipotent parser would genuinely be too complex — and it enters constrained.
5Never an implicit delete. Every move is copy → check → delete; never DELETE without a confirmed COPY.
6Reversibility with a rationale. Every evolutionary act (synthesis, merge, archive) is reversible and motivated. Saying yes costs less when you can go back.
7i18n by construction. Every user-facing string and prompt is per-language data: a new language is a translation pack, not a fork of the code.
8Understandability as a duty. If the user does not understand the system, the system is useless. Simplicity is not aesthetics: it is the criterion that selected everything else.
14. What Metnos is NOT
Half of the design lives in the no's. Every temptation to add an item from
this list must be resisted.
Not a general-purpose framework. It is an agent for one person and one home. If another agent with different rules is needed, you run another instance: you do not abstract.
It does not run third-party skills as-is. Drop-in formats are execution of someone else's code with your privileges. Here every package passes the 7-layer gate, or it does not run (ch. 7).
It trains no models. No fine-tuning, no RLHF. Growth is inspectable memory + synthesis behind the human filter (ch. 7 and 9).
Not a cloud agent. It runs at home; the frontier is an explicit consult, never the residence. No opening you did not choose.
Not an IDE nor a dev assistant. It does not write code in other projects on your behalf; at most it analyzes with read-only executors.
Not a home-automation replacement. It can ask a home-automation system; it does not duplicate it.
Not multi-channel at all costs. Two channels done well; the others when truly needed.
15. Where to go next
This was Level 1: the system from above. Level 2 — the
microdesign — has one page per component, with enough detail to write
its code without inventing anything.
Metnos — Architecture: Introduction (v2). mētis + noûs: cunning intelligence in the service of the mind — on your own hardware.
Bilingual IT+EN documentation at metnos.com; code at
github.com/brunialti/metnos.