You talk to it from the browser or from Telegram, in natural language. It reads mail, files, photos, calendar and the web; it acts — moves, writes, sends, schedules — and it stops to ask before anything that cannot be redone. There is no Metnos cloud and no “sign up”: data and logic stay home.
What happens on every request, in one drawing:
The four properties that set it apart from a generic agent:
Self-hosted agents already exist — the terminal kind (OpenClaw and friends) and the ecosystem of importable “skills”. Metnos makes the opposite choice on four axes, and for a precise reason: a local planner is not a frontier model, and it must be put in a position where it cannot go wrong.
| Typical agent framework | Metnos | |
|---|---|---|
| Tools | Hand-written, imported, or generated free-form at runtime — then run as-is, with the assistant's privileges. | Synthesized at runtime too, but from a closed, audited vocabulary: signed, birth-tested, aged, admitted only through a 7-layer gate. |
| Routing | The model picks a tool each turn: ask twice, get two plans. | Deterministic by construction: pinned seed, ties broken by curated affinity. Same request, same plan, every run. |
| Output | Free-form, per tool. | Uniform: lists in, lists out, composable between steps with no glue. |
| Undo | Rare, or at best “fingers crossed”. | First-class: closed catalog of reverse patterns, move = COPY-then-DELETE, honest ok_count. |
| Language | English, strings hard-coded. | i18n by construction: every string and prompt is per-language data. IT+EN validated; other languages drop-in, not yet tested. |
| Security | Trust the package author. | Do not trust the package: the package has to pass the checks. |
The popular skill formats are convenient and dangerous in equal measure: you import a package and the assistant executes its code with its own privileges. For an assistant that touches your files, mail and shell, that is remote code execution by design: one malicious or sloppy package is enough. Metnos chooses security by construction: a closed vocabulary (you cannot name a tool the grammar doesn't allow), a 7-layer admission gate for every new or imported executor (signature → affinity overlap → aging → sandbox → smoke test → LLM verification → append-only audit), explicit consent before anything destructive. The slogan: don't trust the package — the package has to earn its place. The public skill ecosystem isn't ignored: it gets translated into this model, never executed raw.
Most agents treat the LLM as an oracle you re-roll every turn. Metnos makes the opposite bet: a local planner over a closed vocabulary can be made reproducible — the generation seed is pinned, and when two sibling tools tie, a curated affinity signal decides, not chance. Measurably: the internal routing bench repeats every request five times and demands the same plan, five out of five. And it compounds: a request shape solved once is replayed by Praxis with no LLM call at all — lower latency, zero cost, identical outcome. Flexibility (the frontier model) stays available, but it's the reserve, not the engine.
Due honesty: as a companion for people writing code — inside a repository, in real time — a mature terminal agent remains superior, with an extension ecosystem Metnos neither has nor chases. Metnos is a different thing: a household assistant for mail, photos, calendar, archive, web and the care of its own server. The two coexist just fine.
The fastest way to understand is to watch. Every scene below is a real request made to the reference instance; the heading says when it was verified live. Where there is a limit, it is written next to the merit.
Give me a summary of today's important mail.
Ok, you answer the lawyer: tell him I'll send the document by Thursday.
An outbound mail to an external recipient never leaves silently, at any autonomy level. Metnos drafts the reply and this bubble lands on Telegram, two buttons attached:
You tap Approve: the daemon checks that the token is still pending, not expired, and that the person deciding is the same paired sender who asked. Then: “Sent at 11:34. Audit line #1247.”
The card's three lines are not a flourish: they are what, on what, how redoable — the minimum to decide at a glance. By the third identical occurrence the card shrinks to two lines, then one. See approval_ux.
Sort all image files in ~/images into subfolders by year, prefixing each name with shooting date and place. If the place is missing, use “unknown”.
There is no “sort photos” tool: the planner composes four from the library,
wiring each output into the next input. Raw data never transits through the model:
it passes by reference (from_step).
find_files(base_path="~/images", patterns=["*.jpg","*.png","*.heic"], recursive=true)
→ 98 entries (size and mtime already included)filter_entries(from_step=1, kind="image")
→ 98 entries · in-memory filter, confirms the MIME-declared typeget_files(from_step=2, fields=["dates","place"])
→ 98 enriched entries: EXIF → date; GPS → place via reverse-geocode (local cache); no GPS → "unknown"move_files(from_step=3, dst_template="~/images/{year}/{date}_{place}_{name}")
→ 98 moved, 0 errors · every source→destination pair recorded in the undo logFour executors that don't know each other, chosen and wired by the planner. Real run, 2026-04-28, 98 photos.
The next day, change of heart:
Undo the last operation.
The school portal requires a login. The summary must reach the daughter's Telegram; she is paired as a guest. Three messages of setup, then it runs on its own.
Add credentials for school-portal.example: user roberto.b, password ●●●●●●●.
Metnos recognizes a value that looks like a password and raises a dedicated card:
it offers to encrypt the secret in a local vault and to scrub the clear value
from the turn log (<REDACTED:cred>). Approved, it tries a
test login.
Every morning at 7, look for news on the portal (homework, grades, circulars), read the attached PDFs too, and send the 5 most relevant items to my daughter on Telegram.
Sunday morning, five sentences in a row — a sheet on Drive, bills from the mail, a recap in a Doc, sharing, recurrence:
Create a spreadsheet “Budget 2026-05” on Drive with columns Date, Category, Amount, Notes, Paid. — Fill it from the last 30 days of mail containing “invoice”. — Make me a per-category recap in a Google Doc. — Share everything with my family member. — Update me every morning at 8.
Compress /tmp/cfg.json to gzip.
The planner inspects the catalog: no fitting executor, and no chain of existing ones closes the gap. Synthesis kicks in — five small stages instead of one monster prompt, each stage seeing only the slice of context it needs:
cfg.json.gz created.
This is how the catalog reached 77 executors: one capability at a time, motivated by a real request, never in advance. The full mechanism is in chapter 7.
Tens of thousands of photos on a network drive. The first time, Metnos offers to index them: indexing runs in the background and pings you on Telegram when done. From then on:
Find the mountain photos.
My daughter's photos at the seaside?
The project's own support runs this way — no dedicated service, no ad-hoc code: two natural-language requests, registered as recurring tasks, that the planner turns into the usual executor chains.
Every 30 minutes: find the repo's new issues, skip those already in the local db, for each one search the db for similar already-resolved issues, classify and analyze it with the frontier tier, save the draft reply with status “prepared”, and notify me.
After my approval: read from the db the “approved” issues not yet posted, publish the reply as a GitHub comment, mark them “posted”.
Metnos runs on a Linux machine of yours. The reference instance is a small unified-memory box (96 GB) serving a ~35B local model at roughly 80 tokens/s; but the LLM tiers are abstract roles, not pinned models — a CPU endpoint, a model you already serve, or the frontier fallback alone are all first-class paths. Phones and laptops are clients: the mind, the memory and the audit live on one machine.
The browser: a chat served by the server itself (port 8770), with
photo previews, the gallery, and the admin dashboards (synt proposals, executors,
runs, turns). Telegram: the same brain in your pocket, with approval
cards as inline buttons. For automation, the HTTP API (/agent/turn,
streaming too) is the same door the chat uses.
The web chat also shows ✓/✗ feedback badges on every reply: your verdicts feed the care of the catalog.
Telegram is a public surface: anyone who knows the bot's name can write to it.
Metnos binds every (channel, sender) pair to an autonomy level with a
single-use signed code: /pair PAIR.<code> (Ed25519, expires in
minutes) for pairings decided by the host; /start <token> for
guests created from the admin UI. No central account, no password: a
per-channel-per-sender directory only the host can edit.
See pairing.
Every visible message, every tool description and every LLM prompt lives as
per-language data: Italian and English are validated, a new language is
added by translating the packs, no code changes
(prompts_cli add-language <code> scaffolds the structure; the
translation is reviewed and promoted by hand). Honesty: beyond IT and EN, today,
it's untested ground. See multilang.
This is the most important piece of the experience. Every action that changes the state of the world — sending, writing outside the granted perimeter, running shell — goes through four real components, in this order. The point is not to always ask: it is to ask well, and less and less.
| Phase | What it does |
|---|---|
| 1 · Planner | Prepares the step (executor + arguments) in memory. Nothing is executed yet. |
| 2 · Vaglio | First the binary guard: forbidden paths (~/.ssh, /etc…) and near-unrecoverable shell patterns (rm -rf /, mkfs, fork bombs) are denied outright, at every autonomy level. Then the graded judge: an alignment score against your telos (the aims written in TELOS.md). |
| 3 · Policy | Capability class × autonomy level × persistent grants → one of allow_silent, approval_required, deny. |
| 4 · Card | Three lines (what · on what · how redoable) + two buttons. Opaque single-use token, TTL 600 s, verification that the decider is the requester. Only after approval does the executor run — inside a bubblewrap sandbox with the profile declared in its signed manifest. |
Less and less friction, never less control. The card modulates
with recurrence: full the first times, then two lines, then one. And on the first
approval of a kind you can grant a territory (“all writes inside
~/Documents/invoices/”) — from then on, there, Metnos stops asking; the
grant is revocable whenever you want. See
approval_ux.
And once it has acted, it can walk back. Undo is not an LLM “doing its best”: it is a closed catalog of reverse patterns declared in each mutating executor's manifest (swap source/destination, delete what was created, restore the blob backup, delete by id). Honest counts: if it says it undid three things, it was three.
SOUL.md, six
operating principles): the synt proposes executors only, never changes to
itself, to the vaglio or to the runtime — and every proposal goes through your yes.
Four distinct stores, with different lifetimes and write rules — SQLite tables and text files you can open, not an opaque vector somewhere.
| Store | Scope | What it holds | Rule |
|---|---|---|---|
| Scratchpad | the turn | The observations of intermediate steps. | emptied at end of turn |
| Turn history | days | “What did I ask you last Tuesday?” | one record per turn, feeds the dashboards |
| Memories about you | forever | “Mom's birthday is April 23rd.” | never written covertly: it proposes, you approve |
| Mnestome | about itself | Which tools worked together, with what outcome; and the attempts left halfway. | the substrate the synt reads before proposing |
Memory of the world (the first three rows) and memory of itself (the mnestome), kept separate by design. The fourth store is the unusual one: the mnestome also records the gaps — requests no tool could close — because gaps are the engine of growth.
The detail that matters: Metnos never promotes a fact to permanent memory on its own. It proposes (“I noticed you often mention the new dentist: shall I save him as a regular contact?”), you decide. And a failed attempt is not forgotten: it remains as a proto-mnest — a recorded aspiration, with enough context to recognize the same shape next time. See mnestome and scratchpad.
An executor is a small Python file that does one thing, with a manifest declaring its arguments, capabilities and sandbox profile. Once signed it is a stable artifact: it doesn't learn, it doesn't change. The growth intelligence lives elsewhere — in the synthesizer.
| Stage | What it produces |
|---|---|
| 1 · Name | from the closed vocabulary — middle tier |
| 2 · Signature | args schema, capabilities, reverse pattern |
| 3 · Tests | 4–6 birth tests: happy, empty, invalid args |
| 4 · Description | the “tool's prompt” + affinity keywords |
| 5 · Code | wise tier, local — def invoke(args) → lists |
At the exit, the admission gate (7 layers): signature → affinity overlap → aging → sandbox → smoke test → LLM description-vs-code check → append-only audit · then human approval and Ed25519 signature.
Five small, focused prompts instead of one monolith: at convergence it produced 4 correct executors out of 4, against 32% for the single-prompt approach. Only stage 1 sees the closed vocabulary; only stage 5 writes code.
Reactive synthesis (it happens mid-turn, scene 6) is the first step of a cascade ordered by cost: first compose existing tools, then generate the missing one; at night the introvert passes propose merges (two tools with overlapping traces), generalizations (three specialized tools collapse into one parametric) and specializations (a hot case splits off). Everything goes through your yes. Two installations, in two homes, will have different catalogs after six months — each shaped by real use. See synt and lifecycle.
Everything that ties Metnos to an external service is a skill: a
group of capabilities dormant until configured, enable/disable at will
(system · photos · mail · web ·
geo · calendar · github ·
google-workspace · frontier). Enabling a skill without its
prerequisites is harmless: it stays visible and inert until its service or credential
appears. You manage them from the command line or just by asking in chat (“which
skills do I have?”, “disable the web”).
The backend is the other axis: how an action reaches a
concrete service. The provider is chosen from configuration, deterministically — the
planner never sees “Google”: it sees create_events, and the resolver
routes. Adding a second provider (say, GitLab next to GitHub) is +1 backend file,
zero new executors: no per-provider photocopy tools, no provider bias in the
local model. See skills_backends.
system is the skill that makes Metnos a computer assistant, not
just a chatbot: shell, sudo, packages, network mounts. Every privileged
action requires a vaglio judgment, and the skill can be switched off entirely —
Metnos stays out of the system until you switch it back on.
Metnos is not multi-tenant in the SaaS sense: it is one household. Inside it there is a host — the server's owner, keeper of the signing key — and zero or more guests, each with their own level.
| Level | For whom | What it allows |
|---|---|---|
| ReadOnly | A guest, a delicate channel | Read-only executors only; every write is politely refused before it even reaches the planner. |
| Supervised | Everyday use | The full turn; delicate actions raise the approval card on the same channel. |
| Full | The host | The widest perimeter the policy admits. Forbidden paths stay forbidden here too. |
Scene 4 shows the model at work: the host configures, the guest receives, audit lines stay separate per pairing. Every channel has its own door, its own key, its own history.
One server, but not one filesystem: many things you'd want acted upon (laptop files, an open application, a screen) live elsewhere. The charted direction is to keep gateway, policy and memory on the server and let some executors run on registered devices — a thin client with its own Ed25519 identity, admitted with a single-use signed code, revocable from the server in one gesture.
~/.ssh, /etc, ~/.gnupg and the like: a list wired in code, not in a config filerm -rf /, mkfs, fork bombs), at any autonomyCross-cutting rule: no silent failure. Cap reached → it says so (how many used, how many available) and asks before widening. Partial outcome → counted as partial. Never a “done” that doesn't match reality.
A working showcase, not a polished product. Metnos is the system its author lives with every day, built for one person on one machine — and shared so homelab and agent-architecture enthusiasts can read it, run it, and build on it.
| Piece | Status as of June 2026 |
|---|---|
| Daily use | Yes: the reference instance is in active service (mail, photos, scheduler, maintenance of its own repo). 77 signed executors, 2,660 automated tests, 21 canonical bilingual docs aligned with the code. |
| Installer | Included. The managed path (recommended) replicates a complete environment from a clean checkout — model, supporting services, i18n data, signed executors — profiles your hardware to pick a fitting model, and finishes with a real turn, not a ping. The custom path (declare your own LLM) exists and is loudly discouraged. |
| Public repo | A deterministic export-subset of the live instance — the run-essential slice. It can trail the running system between publishes: expected, not neglect. |
| Maturity | pre-1.0: APIs and defaults change without compatibility shims when a better design appears. Capabilities exist but are barely exercised outside the reference instance (non-Italian i18n above all). If something breaks: open an issue, and bring a little patience. |
The real requirement is hardware: a machine that can serve a capable local LLM. Tiers are abstract roles — a CPU endpoint or a model you already serve is fine; a weaker model means weaker planning, not a broken install. No GPU is required on principle.
$ git clone https://github.com/brunialti/metnos.git && cd metnos $ bash install/bootstrap.sh --check # pre-flight only — writes nothing $ bash install/bootstrap.sh # interactive, six phases, idempotent
And support is part of the experiment: the repo's issues are triaged by a Metnos instance (scene 8) — drafts prepared by the system, posted only after human approval. If the assistant can't help you run the assistant, that's a bug we want to see.
If you take away one thing: Metnos is a self-hosted personal assistant with three peculiar commitments — it asks before acting (an inspectable evaluator, not a feeling), it builds its tools inside verifiable rules (closed vocabulary, signature, birth tests, 7-layer gate), it is reproducible like software (deterministic routing, honest counts, undo by construction). If that combination intrigues you, you are the right audience.
TELOS.md as soft tendencies. The judge
measures proposed actions against them.