TESTED Microdesign v1.1 — new, tested on April 27, 2026. Status: TESTED. Cluster vaglio 13/13 green + integration cluster agent_runtime 172/172. Implementation reference: runtime/vaglio.py.

Status in the microdesign sequence: under approval → approved → tested → implemented. The real vaglio (no longer the always-approve stub) went live on April 27, 2026 replacing the POC stub; the binary guard and the rule-based judge passed both micro and cluster tests. The LLM judge remains deferred to v1.2.

← Documentation index Microdesign › vaglio

Metnos

vaglio — the constitutional evaluator in two phases

Microdesign v1.1 — status TESTED (April 27, 2026)
Audience: those who want to understand how Metnos decides whether an action proposed by the LLM may become an effect on the world.

Reading time: 12 minutes.

What the vaglio is
Anatomy: the Verdict
Binary guard v1.1
Graded judge v1.1
LLM judge v1.1 (opt-in)
JSONL logging
Runtime integration
Tests
Limits of v1.1
Configurables

1. What the vaglio is

Between the moment when the LLM says «I want to call shell_exec with these arguments» and the moment the runtime executes the process, there is a mandatory checkpoint: the vaglio. It decides whether that proposal can become action. It is not an opinion, it is a filter: if the vaglio denies, the plan does not proceed.

The vaglio applies two distinct phases, and this distinction is the first thing worth grasping. The first phase is a binary guard: there is a core of violations that are not up for debate — touching ~/.ssh, executing rm -rf /, writing on /etc/passwd. If one of these patterns shows up, the action is denied, full stop. No nuance, no score, no «but the context». The second phase is a graded judge: it measures how aligned the action is with the user's telos in [0, 1]. Below threshold, denied. Above, approved.

Why two separate phases? Because otherwise a specific phenomenon happens, described in ch. 11.1 of the Architecture as model self-confirmation (self-enhancement bias): if you put deontology (what is permissible) and teleology (what serves the user) into a single score, a teleologically convinced judge will tend to justify even what should be blocked upstream. «Yes, I am deleting /etc, but deep down the user wanted cleanup.» Keeping them separate, the guard shuts the door before the judge can rationalise.

The separation is also a promise of architectural stability: the guard today is a list of encoded regexes; tomorrow, when the judge becomes an LLM call (v1.2), the guard will stay the same. The non-negotiable core does not pass through a model.

Lexical note. The term vaglio has a literal sense: sieve. Not a judging body, not a reviewer. A sieve does two things at once: it holds back what must not pass and lets through what must. The two phases are two different meshes of the same net.

2. Anatomy: the `Verdict`

Every call to the vaglio returns a Verdict object. It is a small dataclass (runtime/vaglio.py:74-81):

@dataclass
class Verdict:
    approved: bool
    reason: str
    ts: float = field(default_factory=time.time)
    judge_kind: str = "rule-based-v1"
    score: float = 1.0  # relevant only if the guard let it through
    blocked_by: str | None = None  # "guard" | "judge" | None

Field	Type	Meaning
`approved`	bool	Final outcome: `True` if the action may proceed, `False` otherwise.
`reason`	str	Readable explanation. For a guard block: `"guard: forbidden path violated: ..."`. For a judge block: `"judge: score 0.20 < threshold 0.30 (...)"`.
`ts`	float	Unix timestamp of the decision.
`judge_kind`	str	Family of judge used. `"rule-based-v1"` (default) or `"llm-v1"` (opt-in via env `METNOS_JUDGE_KIND=llm-v1`, since the evening of Apr 27).
`score`	float	Alignment score in `[0, 1]`. Meaningful only if the guard let it through. For guard blocks, value is `0.0`.
`blocked_by`	str\|None	Which phase denied: `"guard"`, `"judge"`, or `None` if approved.

The orchestrator judge(intent, executor_name, args, context) (runtime/vaglio.py:186-214) is the only public signature. It receives the user's intent (the initial query of the turn), the name of the proposed executor, the validated args, and an optional context dictionary (mode, capability, step number). It returns a single Verdict.

3. Binary guard v1.1

The guard has no nuances. It performs two checks in sequence, and stops at the first violation (runtime/vaglio.py:104-126).

Check 1: forbidden paths

The guard extracts all string values from args, even nested ones (_flatten_str_values, runtime/vaglio.py:86-97), expands the tilde with the user's path (_expand_user, runtime/vaglio.py:100-101), and compares them with a list of encoded regex patterns (runtime/vaglio.py:45-58):

_FORBIDDEN_PATH_PATTERNS = [
    re.compile(r"(^|/)\.ssh(/|$)"),
    re.compile(r"^/etc/(passwd|shadow|sudoers)"),
    re.compile(r"^/etc/ssh(/|$)"),
    re.compile(r"^/root(/|$)"),
    re.compile(r"^/boot(/|$)"),
    re.compile(r"^/sys(/|$)"),
    re.compile(r"^/proc(/[0-9]|$)"),
    re.compile(r"^/dev/(sd|nvme|mmcblk|loop)"),
    re.compile(r"\.aws/credentials"),
    re.compile(r"\.config/[^/]+/credentials\.env"),
    re.compile(r"\.gnupg(/|$)"),
]

The subtle point: the guard does not distinguish between «I am reading ~/.ssh/id_rsa» and «I am just passing the string ~/.ssh/id_rsa in some parameter or other». If the pattern shows up, it blocks. This is deliberate: it means that even an executor that mentions a forbidden path in args (for example in a "glob" or "exclude" field) gets stopped. Conservative? Yes. But the cost of a false positive is a reformulated request; the cost of a false negative is an SSH key leaving.

The list does not loosen by autonomy level. The config lets you tune many things (timeouts, thresholds, modes). On forbidden paths, nothing: it is the «non-negotiable core» of ch. 5 of the Architecture. Modifying them requires a code edit and a new deploy. Deliberate.

Check 2: shell quasi-irrecoverable commands

This applies only if executor_name == "shell_exec" or if the context declares capability == "code:exec" (runtime/vaglio.py:117-124). The value checked is the command field (or alternatively cmd); if it is a list, it gets joined with spaces before matching. The pattern list (runtime/vaglio.py:62-69):

_DANGEROUS_SHELL_PATTERNS = [
    re.compile(r"\brm\s+-rf?\s+/(\s|$)"),         # rm -rf /
    re.compile(r"\brm\s+-rf?\s+~(\s|$|/)"),       # rm -rf ~
    re.compile(r"\bmkfs\b"),
    re.compile(r"\bdd\s+.*\bof=/dev/"),
    re.compile(r":\s*\(\s*\)\s*\{\s*:\|:&\s*\}"),  # fork bomb
    re.compile(r"\bchmod\s+-?R?\s*0?77[0-9]\s+/"),
]

The motivation is Law 1 of the constitution (see constitution): «no irrecoverable state». A rm -rf / cannot be undone with an undo; a mkfs on the system drive neither; a fork bomb saturates the process. They are pathologies distinct from any ordinary file deletion: they take the system out of a state in which the system itself can still respond.

Here too the match is textual. The guard does not emulate the shell: it looks for literal patterns in the command. This means that variants cleverly built with escape characters can pass; the defence in depth is the sandbox with filesystem namespacing for shell_exec, of which the guard is only the first layer.

4. Graded judge v1.1

If the guard lets it through, the judge starts. In v1.1 the judge is rule-based: no LLM call, microsecond latency, zero cost. The function is judge_score(intent, executor_name, args, context) (runtime/vaglio.py:131-171) and returns (score, rationale).

Heuristics

It starts from a base score of 0.7: the v1.1 regime is benevolent by construction (see ch. 9). From there the judge adds or subtracts small adjustments, then clamps to [0, 1].

Signal	Effect on score	Rationale
The user's intent mentions the executor name (even just one of its tokens, length ≥ 3).	+0.10	Sign of explicit intent. If the user wrote «read the file» and the executor is `fs_read`, the match on `read` is a weak but real signal that the action matches the request.
Any string value in `args` contains `..` and `/` (possible path traversal).	−0.20	Coarse but effective heuristic: many filesystem attacks use `..` to escape the expected folder. The penalty is the largest because it indicates an attempt to cross a boundary.
An `args` key does not respect Python identifier rules (non-alphanumeric / underscore characters).	−0.10	Shape anomaly. A healthy executor declares args with normal keys; an odd key suggests either injection or a planning error.
The `context` declares `critical: false` (non-critical executor, read-only).	+0.05	Modest bonus: read-only executors cannot produce irrecoverable state, so the bar can be slightly lower.

The score is clamped to [0, 1] and compared against JUDGE_THRESHOLD (runtime/vaglio.py:40). Default: 0.30, configurable via env METNOS_JUDGE_THRESHOLD.

Example: harmless path, generic intent

Intent: "read my notes in /tmp/n.txt". Executor: fs_read. Args: {"path": "/tmp/n.txt"}.

Starting score: 0.7. The judge splits fs_read on separators, gets the tokens fs and read; in English the latter (read) appears in the intent. Intent bonus applies (+0.10). Harmless path. Clean keys.

Final score: 0.80. Above threshold. approved=True, reason="approved: score 0.80 (intent matches executor)".

Example: suspicious path traversal

Args: {"path": "/tmp/../etc/foo"}. Suspicious path traversal: .. and / present.

Score: 0.7 − 0.2 = 0.5. Still above 0.30: approved nonetheless, but with the note "possible path traversal ('..' in path)" in the reason and in the log.

To actually block the action, the judge would need other cumulative negative signals, or a higher configured threshold. In v1.1 the judge marks; it does not block on its own except in edge cases.

Why is the rule-based judge so benevolent? Because in v1.1 the real filter is the guard. The rule-based judge is a first layer of observability (what gives off a strange smell?) more than a gate. The score lands in the JSONL log and becomes signal for later analysis; outliers emerge. When the judge becomes LLM (v1.2), the threshold will make sense as more stringent, because the evaluation will really be able to weigh alternatives and context.

5. LLM judge v1.1 (opt-in)

Since the evening of Apr 27 (phase 5 kickoff) the vaglio can use an LLM-based judge instead of the rule-based one. It is enabled via the env variable METNOS_JUDGE_KIND=llm-v1 (default rule-based-v1): opt-in because it costs — every judge decision pays one call to the middle tier. Implementation in runtime/vaglio.py:_judge_score_llm.

5.1 Applied decisions

Middle tier. The judge uses the middle tier of the 3-tier LLM architecture (see agent_runtime ch. 4): not the fast tier (too coarse for ethical reasoning), not the wise tier (disproportionate for the frequency at which the vaglio runs).
Context separated from the proposer. The judge runs in a session distinct from the planner's: dedicated system prompt (_LLM_JUDGE_SYSTEM), no memory shared with the proposer. Explicit defence against model self-confirmation: if the proposer could listen to its own justifications as judge input, it would optimise them backwards. Separate sessions break the loop.
Prompt on the 4 Laws and seven telos. The judge prompt explicitly lists the 4 Laws of the constitution and the seven canonical telos (with weights, see telos). The judge applies them, it does not invent them.
Privacy: keys only. The user prompt contains ONLY argument keys, not values — same discipline as the JSONL log. Even if the middle-tier LLM is remote, no sensitive user value (paths, passwords, content) reaches it.
Structured output. Contract: JSON {"score": float [0,1], "reason": "max 25 words"}. Robust regex-based parser: if the LLM produces invalid JSON, fallback to 0.5 with rationale "llm-judge fallback (parse fail)".
Graceful fallback. If LLMRouter is not configured or the middle tier does not respond, the judge returns score 0.5 with explicit rationale "llm-judge fallback (router missing|chat failed)". The sequence is not blocked; the log reveals that the analysis was degraded.
The guard always precedes the judge. Even with JUDGE_KIND=llm-v1 active, the binary guard runs first: forbidden path or dangerous shell block without consulting the LLM at all. Guarantee: no LLM call wasted on cases already decidable by local rules (test judge_llm_guardia_blocca_prima_di_chiamare_llm).

5.2 When to enable it

Rule-based is the default and is enough for most turns: the guard filters danger, the rule-based judge flags outliers. The LLM judge is for cases that need a finer assessment of telos alignment (e.g. proactive actions under bother budget, choice between equivalent options). Enable with:

METNOS_JUDGE_KIND=llm-v1 python3 -m channels.daemon

Tradeoff: every judge call pays a latency of ~800 ms (local qwen3:8b in middle-tier mode) or ~150 ms (Anthropic Sonnet). For a 3-step turn, that is 3 vaglio calls. A future cost policy will be able to automatically degrade to rule-based above a calls-per-day threshold.

5.3 Deferred to v1.2

Few-shot examples in the prompt to stabilise the score distribution.
Reading the mnestome as context: today the judge does not know "what happened before" beyond the current turn's context.
Threshold calibration: METNOS_JUDGE_THRESHOLD is calibrated for the rule-based (default 0.30); the LLM will produce scores in a different distribution and will need its own separate threshold.
Telemetry: who is using llm-v1, how much it costs, percentage of decisions that agree with the rule-based.

6. JSONL logging

Every call to the vaglio writes a JSONL line in ~/.local/share/metnos/vaglio/YYYY-MM.jsonl (monthly file, see runtime/vaglio.py:33 and 176-183). The write is fail-safe: if the filesystem refuses, the vaglio does not block the decision — it simply does not write the log.

Record schema

{
  "approved": false,
  "reason": "guard: forbidden path violated: pattern '(^|/)\\.ssh(/|$)' in args",
  "ts": 1714225863.214,
  "judge_kind": "rule-based-v1",
  "score": 0.0,
  "blocked_by": "guard",
  "intent": "read the key",
  "executor": "fs_read",
  "args_keys": ["path"],
  "context_keys": ["mode", "step"]
}

Only the keys of args, not the values. This is explicit (runtime/vaglio.py:197 and 212): the log records the key names ("path", "command", ...) but not the values (the actual path, the actual command). Reason: privacy. Values often contain sensitive data (user paths, file contents, payloads), and a log that recorded them would itself become an asset to protect. The keys are enough for statistical analysis (which executors get blocked most often?); the live value lives in the turn events of the agent_runtime, with different retention rules.

Rotation is monthly per file. No compression, no encryption: the logs stay under the user's ~/.local. If long-term archiving is needed, it is the job of external processes (rsync, backup, etc.).

7. Runtime integration

The call site lives in the planner, in runtime/agent_runtime.py:512-526. The filter sequence is strictly ordered:

# runtime/agent_runtime.py:512-526 (excerpt)
# Validation, sandbox, vaglio
validation = validate_args(args, executor.args_schema)
step.validation_failures = validation
if validation:
    obs = {"ok": False, "error": f"validation failed: {validation}"}
else:
    scope_violation = check_hints(args, executor.capabilities)
    step.scope_violation = scope_violation
    if scope_violation:
        obs = {"ok": False, "error": scope_violation}
    else:
        verdict = judge(user_query, chosen_name, args,
                        {"mode": chosen_mode, "step": step_num})
        step.vaglio_approved = verdict.approved
        if not verdict.approved:
            obs = {"ok": False, "error": f"vaglio rejects: {verdict.reason}"}
        else:
            ...
            obs = invoke_executor(executor, args)

The order is not random:

Args validation against the executor's schema (types and required). If it fails, no vaglio: the error is one of shape, not of permissibility.
Capabilities check (check_hints): is the executor stepping out of its declared perimeter? (Example: a read-only executor that writes.)
Vaglio (judge(...)): only if the two previous checks pass. It receives the user's original intent (user_query), the chosen executor name, the validated args, and a context with mode and step_num.
Invoke executor: only if the vaglio approves.

For multistep plans, the vaglio runs between one step and the next: every tool_call proposed by the LLM gets vagliato before execution, so that a 5-step plan produces 5 distinct verdicts. For single-shot (a plan proposed entirely in one go, without intermediate steps), the vaglio runs post-hoc: the plan is already formulated, but the vaglio can still block the execution of each tool_call.

The flag step.vaglio_approved ends up in the turn log (agent_runtime ch. 11), and from there in the spectator: the user can see after the fact which steps were vagliati and with what outcome.

8. Tests

The vaglio cluster is at 13/13 green. The cases are declared in runtime/testing/populate_cases.py (section --- vaglio ---) and cover guard, judge, log and privacy.

Case	Category	What it verifies
`approva_path_innocuo`	happy	A normal path (`/tmp/x`) passes, `judge_kind` is `"rule-based-v1"`, score in `[0, 1]`.
`guard_blocca_ssh`	security	`~/.ssh/id_rsa` blocked by the guard with `blocked_by="guard"` and reason `"forbidden path"`.
`guard_blocca_etc_passwd`	security	`/etc/passwd` blocked.
`guard_blocca_credentials_user`	security	`~/.config/metnos/credentials.env` blocked (check of the `credentials.env` pattern nested in user path).
`guard_blocca_rm_rf_root`	security	`shell_exec` with `rm -rf /` blocked as irrecoverable.
`guard_blocca_fork_bomb`	security	`shell_exec` with `:(){ :\|:& };:` blocked.
`guard_lascia_passare_shell_innocua`	happy	`shell_exec` with `ls -la /tmp` passes the guard.
`judge_intent_menziona_executor_alza_score`	happy	Score with intent that mentions the executor > score with generic intent.
`judge_path_traversal_abbassa_score`	edge	Path with `..` receives lower score than the harmless path.
`judge_sotto_soglia_blocca`	failure	With `METNOS_JUDGE_THRESHOLD=0.99`, any action gets denied with `blocked_by="judge"`.
`log_jsonl_viene_scritto`	happy	After `judge(...)`, the file `YYYY-MM.jsonl` exists and contains the intent marker.
`args_keys_loggate_non_values`	security	A sensitive value in args (`"PASSWORD_segreto_..."`) does NOT appear in the log; only the key name (`"path"`) is present.

On top of these 13, integration in the agent_runtime cluster (172/172 green) exercises the vaglio at its real call site (validation + scope + vaglio + invoke), on end-to-end scenarios.

9. Limits of v1.1

Limit	Explanation	When it lifts
The vaglio does not read the mnestoma.	Separation-of-concerns decision: the v1.1 vaglio evaluates only the declared intent + executor + args, it does not look at history. Meaning: it cannot see «this same action was denied 5 minutes ago» or «the user usually does not operate this way».	v1.2: the LLM judge will receive context from the mnestoma (similar relevance, prior outcomes) to weigh better.
The approval_ux (card + buttons in the channel) is renderable, the callback dispatcher is not.	The card the user sees in the channels (Telegram, Slack, etc.) gets generated already via `runtime/channels/approval.py`. But if the user presses «Allow this once», the callback does not yet re-enter the live plan.	v1.2: callback dispatcher with re-injection into the current turn.
The rule-based judge is very benevolent.	Default threshold `0.30` and base score `0.7`: even with several moderate penalties, the score stays above. Conscious: in v1.1 the real filter is the guard, the judge only marks outliers in the log.	v1.2 with LLM judge: the threshold will mean more, and the score distribution will be wider.
No structured explanation of refusal to the user.	When the vaglio denies, the planner shows the textual `reason` but does not build a card «why I said no and what you can do». The refusal UX is minimal.	v1.2: integration with approval_ux for dialogic refusals.
No interactive vaglio.	The v1.1 vaglio is a pure function: input args, output Verdict. It cannot ask «are you sure?» or request confirmation.	v1.2: integration with the dialog manager for authorisation requests (see approval_ux).

10. Configurables

The configuration of the v1.1 vaglio is minimal, and this minimality is deliberate.

What	Where	Default	Notes
`METNOS_JUDGE_THRESHOLD`	Environment variable.	`0.30`	Below this, the judge denies. Read at module import time (`runtime/vaglio.py:40`): to change it at runtime use `importlib.reload(vaglio)`.
`VAGLIO_LOG_DIR`	Module constant (not env).	`~/.local/share/metnos/vaglio/`	Changing the log destination would require a code edit. Deliberately not a parameter: the log is a system invariant, not a user preference.
`_FORBIDDEN_PATH_PATTERNS`	Module constant, encoded.	11 regexes (see ch. 3).	Not configurable. Modifying them requires a code edit and a new deploy. Deliberate: ch. 5 of the Architecture, «non-negotiable core».
`_DANGEROUS_SHELL_PATTERNS`	Module constant, encoded.	6 regexes (see ch. 3).	Not configurable. Same reasoning: the list of shell quasi-irrecoverable commands does not loosen.

Deliberately few configurations. When the vaglio becomes more configurable (in v1.2 with LLM judge there will be the provider, the prompt, the budget, possibly the fallback), every new knob will need to be justified. The philosophy: the ethical filter has so few dials on purpose. What cannot be configured cannot be misconfigured.

Final notes

The vaglio is a small component (~210 lines of Python) but architecturally central. The two-phase shape is the price of robustness: a binary guard that is not up for debate, a graded judge that evolves. Today's guard will protect even from tomorrow's judge, because it will have proved stable across versions of the judge.

The microdesign of this module was validated by the code before the document: the module went live on April 27, 2026, passed 13 own tests and 172 integration tests, and this doc describes what runs. Not the other way around.

Metnos

Contents

1. What the vaglio is

2. Anatomy: the Verdict

3. Binary guard v1.1

Check 1: forbidden paths

Check 2: shell quasi-irrecoverable commands

4. Graded judge v1.1

Heuristics

5. LLM judge v1.1 (opt-in)

5.1 Applied decisions

5.2 When to enable it

5.3 Deferred to v1.2

6. JSONL logging

Record schema

7. Runtime integration

8. Tests

9. Limits of v1.1

10. Configurables

Final notes

2. Anatomy: the `Verdict`