TESTED Microdesign v1.1 — new, tested on April 27, 2026. Status: TESTED. Cluster vaglio 13/13 green + integration cluster agent_runtime 172/172. Implementation reference: runtime/vaglio.py.
Status in the microdesign sequence: under approvalapprovedtestedimplemented. The real vaglio (no longer the always-approve stub) went live on April 27, 2026 replacing the POC stub; the binary guard and the rule-based judge passed both micro and cluster tests. The LLM judge remains deferred to v1.2.
← Documentation index Microdesign › vaglio

Metnos

vaglio — the constitutional evaluator in two phases
Microdesign v1.1 — status TESTED (April 27, 2026)
Audience: those who want to understand how Metnos decides whether an action proposed by the LLM may become an effect on the world.

Reading time: 12 minutes.

Contents

  1. What the vaglio is
  2. Anatomy: the Verdict
  3. Binary guard v1.1
  4. Graded judge v1.1
  5. LLM judge v1.1 (opt-in)
  6. JSONL logging
  7. Runtime integration
  8. Tests
  9. Limits of v1.1
  10. Configurables

1. What the vaglio is

Between the moment when the LLM says «I want to call shell_exec with these arguments» and the moment the runtime executes the process, there is a mandatory checkpoint: the vaglio. It decides whether that proposal can become action. It is not an opinion, it is a filter: if the vaglio denies, the plan does not proceed.

The vaglio applies two distinct phases, and this distinction is the first thing worth grasping. The first phase is a binary guard: there is a core of violations that are not up for debate — touching ~/.ssh, executing rm -rf /, writing on /etc/passwd. If one of these patterns shows up, the action is denied, full stop. No nuance, no score, no «but the context». The second phase is a graded judge: it measures how aligned the action is with the user's telos in [0, 1]. Below threshold, denied. Above, approved.

Why two separate phases? Because otherwise a specific phenomenon happens, described in ch. 11.1 of the Architecture as model self-confirmation (self-enhancement bias): if you put deontology (what is permissible) and teleology (what serves the user) into a single score, a teleologically convinced judge will tend to justify even what should be blocked upstream. «Yes, I am deleting /etc, but deep down the user wanted cleanup.» Keeping them separate, the guard shuts the door before the judge can rationalise.

The separation is also a promise of architectural stability: the guard today is a list of encoded regexes; tomorrow, when the judge becomes an LLM call (v1.2), the guard will stay the same. The non-negotiable core does not pass through a model.

Lexical note. The term vaglio has a literal sense: sieve. Not a judging body, not a reviewer. A sieve does two things at once: it holds back what must not pass and lets through what must. The two phases are two different meshes of the same net.

2. Anatomy: the Verdict

Every call to the vaglio returns a Verdict object. It is a small dataclass (runtime/vaglio.py:74-81):

@dataclass
class Verdict:
    approved: bool
    reason: str
    ts: float = field(default_factory=time.time)
    judge_kind: str = "rule-based-v1"
    score: float = 1.0  # relevant only if the guard let it through
    blocked_by: str | None = None  # "guard" | "judge" | None
FieldTypeMeaning
approvedboolFinal outcome: True if the action may proceed, False otherwise.
reasonstrReadable explanation. For a guard block: "guard: forbidden path violated: ...". For a judge block: "judge: score 0.20 < threshold 0.30 (...)".
tsfloatUnix timestamp of the decision.
judge_kindstrFamily of judge used. "rule-based-v1" (default) or "llm-v1" (opt-in via env METNOS_JUDGE_KIND=llm-v1, since the evening of Apr 27).
scorefloatAlignment score in [0, 1]. Meaningful only if the guard let it through. For guard blocks, value is 0.0.
blocked_bystr|NoneWhich phase denied: "guard", "judge", or None if approved.

The orchestrator judge(intent, executor_name, args, context) (runtime/vaglio.py:186-214) is the only public signature. It receives the user's intent (the initial query of the turn), the name of the proposed executor, the validated args, and an optional context dictionary (mode, capability, step number). It returns a single Verdict.

3. Binary guard v1.1

The guard has no nuances. It performs two checks in sequence, and stops at the first violation (runtime/vaglio.py:104-126).

Check 1: forbidden paths

The guard extracts all string values from args, even nested ones (_flatten_str_values, runtime/vaglio.py:86-97), expands the tilde with the user's path (_expand_user, runtime/vaglio.py:100-101), and compares them with a list of encoded regex patterns (runtime/vaglio.py:45-58):

_FORBIDDEN_PATH_PATTERNS = [
    re.compile(r"(^|/)\.ssh(/|$)"),
    re.compile(r"^/etc/(passwd|shadow|sudoers)"),
    re.compile(r"^/etc/ssh(/|$)"),
    re.compile(r"^/root(/|$)"),
    re.compile(r"^/boot(/|$)"),
    re.compile(r"^/sys(/|$)"),
    re.compile(r"^/proc(/[0-9]|$)"),
    re.compile(r"^/dev/(sd|nvme|mmcblk|loop)"),
    re.compile(r"\.aws/credentials"),
    re.compile(r"\.config/[^/]+/credentials\.env"),
    re.compile(r"\.gnupg(/|$)"),
]

The subtle point: the guard does not distinguish between «I am reading ~/.ssh/id_rsa» and «I am just passing the string ~/.ssh/id_rsa in some parameter or other». If the pattern shows up, it blocks. This is deliberate: it means that even an executor that mentions a forbidden path in args (for example in a "glob" or "exclude" field) gets stopped. Conservative? Yes. But the cost of a false positive is a reformulated request; the cost of a false negative is an SSH key leaving.

The list does not loosen by autonomy level. The config lets you tune many things (timeouts, thresholds, modes). On forbidden paths, nothing: it is the «non-negotiable core» of ch. 5 of the Architecture. Modifying them requires a code edit and a new deploy. Deliberate.

Check 2: shell quasi-irrecoverable commands

This applies only if executor_name == "shell_exec" or if the context declares capability == "code:exec" (runtime/vaglio.py:117-124). The value checked is the command field (or alternatively cmd); if it is a list, it gets joined with spaces before matching. The pattern list (runtime/vaglio.py:62-69):

_DANGEROUS_SHELL_PATTERNS = [
    re.compile(r"\brm\s+-rf?\s+/(\s|$)"),         # rm -rf /
    re.compile(r"\brm\s+-rf?\s+~(\s|$|/)"),       # rm -rf ~
    re.compile(r"\bmkfs\b"),
    re.compile(r"\bdd\s+.*\bof=/dev/"),
    re.compile(r":\s*\(\s*\)\s*\{\s*:\|:&\s*\}"),  # fork bomb
    re.compile(r"\bchmod\s+-?R?\s*0?77[0-9]\s+/"),
]

The motivation is Law 1 of the constitution (see constitution): «no irrecoverable state». A rm -rf / cannot be undone with an undo; a mkfs on the system drive neither; a fork bomb saturates the process. They are pathologies distinct from any ordinary file deletion: they take the system out of a state in which the system itself can still respond.

Here too the match is textual. The guard does not emulate the shell: it looks for literal patterns in the command. This means that variants cleverly built with escape characters can pass; the defence in depth is the sandbox with filesystem namespacing for shell_exec, of which the guard is only the first layer.

4. Graded judge v1.1

If the guard lets it through, the judge starts. In v1.1 the judge is rule-based: no LLM call, microsecond latency, zero cost. The function is judge_score(intent, executor_name, args, context) (runtime/vaglio.py:131-171) and returns (score, rationale).

Heuristics

It starts from a base score of 0.7: the v1.1 regime is benevolent by construction (see ch. 9). From there the judge adds or subtracts small adjustments, then clamps to [0, 1].

SignalEffect on scoreRationale
The user's intent mentions the executor name (even just one of its tokens, length ≥ 3).+0.10Sign of explicit intent. If the user wrote «read the file» and the executor is fs_read, the match on read is a weak but real signal that the action matches the request.
Any string value in args contains .. and / (possible path traversal).−0.20Coarse but effective heuristic: many filesystem attacks use .. to escape the expected folder. The penalty is the largest because it indicates an attempt to cross a boundary.
An args key does not respect Python identifier rules (non-alphanumeric / underscore characters).−0.10Shape anomaly. A healthy executor declares args with normal keys; an odd key suggests either injection or a planning error.
The context declares critical: false (non-critical executor, read-only).+0.05Modest bonus: read-only executors cannot produce irrecoverable state, so the bar can be slightly lower.

The score is clamped to [0, 1] and compared against JUDGE_THRESHOLD (runtime/vaglio.py:40). Default: 0.30, configurable via env METNOS_JUDGE_THRESHOLD.

Example: harmless path, generic intent

Intent: "read my notes in /tmp/n.txt". Executor: fs_read. Args: {"path": "/tmp/n.txt"}.

Starting score: 0.7. The judge splits fs_read on separators, gets the tokens fs and read; in English the latter (read) appears in the intent. Intent bonus applies (+0.10). Harmless path. Clean keys.

Final score: 0.80. Above threshold. approved=True, reason="approved: score 0.80 (intent matches executor)".

Example: suspicious path traversal

Args: {"path": "/tmp/../etc/foo"}. Suspicious path traversal: .. and / present.

Score: 0.7 − 0.2 = 0.5. Still above 0.30: approved nonetheless, but with the note "possible path traversal ('..' in path)" in the reason and in the log.

To actually block the action, the judge would need other cumulative negative signals, or a higher configured threshold. In v1.1 the judge marks; it does not block on its own except in edge cases.

Why is the rule-based judge so benevolent? Because in v1.1 the real filter is the guard. The rule-based judge is a first layer of observability (what gives off a strange smell?) more than a gate. The score lands in the JSONL log and becomes signal for later analysis; outliers emerge. When the judge becomes LLM (v1.2), the threshold will make sense as more stringent, because the evaluation will really be able to weigh alternatives and context.

5. LLM judge v1.1 (opt-in)

Since the evening of Apr 27 (phase 5 kickoff) the vaglio can use an LLM-based judge instead of the rule-based one. It is enabled via the env variable METNOS_JUDGE_KIND=llm-v1 (default rule-based-v1): opt-in because it costs — every judge decision pays one call to the middle tier. Implementation in runtime/vaglio.py:_judge_score_llm.

5.1 Applied decisions

5.2 When to enable it

Rule-based is the default and is enough for most turns: the guard filters danger, the rule-based judge flags outliers. The LLM judge is for cases that need a finer assessment of telos alignment (e.g. proactive actions under bother budget, choice between equivalent options). Enable with:

METNOS_JUDGE_KIND=llm-v1 python3 -m channels.daemon

Tradeoff: every judge call pays a latency of ~800 ms (local qwen3:8b in middle-tier mode) or ~150 ms (Anthropic Sonnet). For a 3-step turn, that is 3 vaglio calls. A future cost policy will be able to automatically degrade to rule-based above a calls-per-day threshold.

5.3 Deferred to v1.2

6. JSONL logging

Every call to the vaglio writes a JSONL line in ~/.local/share/metnos/vaglio/YYYY-MM.jsonl (monthly file, see runtime/vaglio.py:33 and 176-183). The write is fail-safe: if the filesystem refuses, the vaglio does not block the decision — it simply does not write the log.

Record schema

{
  "approved": false,
  "reason": "guard: forbidden path violated: pattern '(^|/)\\.ssh(/|$)' in args",
  "ts": 1714225863.214,
  "judge_kind": "rule-based-v1",
  "score": 0.0,
  "blocked_by": "guard",
  "intent": "read the key",
  "executor": "fs_read",
  "args_keys": ["path"],
  "context_keys": ["mode", "step"]
}
Only the keys of args, not the values. This is explicit (runtime/vaglio.py:197 and 212): the log records the key names ("path", "command", ...) but not the values (the actual path, the actual command). Reason: privacy. Values often contain sensitive data (user paths, file contents, payloads), and a log that recorded them would itself become an asset to protect. The keys are enough for statistical analysis (which executors get blocked most often?); the live value lives in the turn events of the agent_runtime, with different retention rules.

Rotation is monthly per file. No compression, no encryption: the logs stay under the user's ~/.local. If long-term archiving is needed, it is the job of external processes (rsync, backup, etc.).

7. Runtime integration

The call site lives in the planner, in runtime/agent_runtime.py:512-526. The filter sequence is strictly ordered:

# runtime/agent_runtime.py:512-526 (excerpt)
# Validation, sandbox, vaglio
validation = validate_args(args, executor.args_schema)
step.validation_failures = validation
if validation:
    obs = {"ok": False, "error": f"validation failed: {validation}"}
else:
    scope_violation = check_hints(args, executor.capabilities)
    step.scope_violation = scope_violation
    if scope_violation:
        obs = {"ok": False, "error": scope_violation}
    else:
        verdict = judge(user_query, chosen_name, args,
                        {"mode": chosen_mode, "step": step_num})
        step.vaglio_approved = verdict.approved
        if not verdict.approved:
            obs = {"ok": False, "error": f"vaglio rejects: {verdict.reason}"}
        else:
            ...
            obs = invoke_executor(executor, args)

The order is not random:

  1. Args validation against the executor's schema (types and required). If it fails, no vaglio: the error is one of shape, not of permissibility.
  2. Capabilities check (check_hints): is the executor stepping out of its declared perimeter? (Example: a read-only executor that writes.)
  3. Vaglio (judge(...)): only if the two previous checks pass. It receives the user's original intent (user_query), the chosen executor name, the validated args, and a context with mode and step_num.
  4. Invoke executor: only if the vaglio approves.

For multistep plans, the vaglio runs between one step and the next: every tool_call proposed by the LLM gets vagliato before execution, so that a 5-step plan produces 5 distinct verdicts. For single-shot (a plan proposed entirely in one go, without intermediate steps), the vaglio runs post-hoc: the plan is already formulated, but the vaglio can still block the execution of each tool_call.

The flag step.vaglio_approved ends up in the turn log (agent_runtime ch. 11), and from there in the spectator: the user can see after the fact which steps were vagliati and with what outcome.

8. Tests

The vaglio cluster is at 13/13 green. The cases are declared in runtime/testing/populate_cases.py (section --- vaglio ---) and cover guard, judge, log and privacy.

CaseCategoryWhat it verifies
approva_path_innocuohappyA normal path (/tmp/x) passes, judge_kind is "rule-based-v1", score in [0, 1].
guard_blocca_sshsecurity~/.ssh/id_rsa blocked by the guard with blocked_by="guard" and reason "forbidden path".
guard_blocca_etc_passwdsecurity/etc/passwd blocked.
guard_blocca_credentials_usersecurity~/.config/metnos/credentials.env blocked (check of the credentials.env pattern nested in user path).
guard_blocca_rm_rf_rootsecurityshell_exec with rm -rf / blocked as irrecoverable.
guard_blocca_fork_bombsecurityshell_exec with :(){ :|:& };: blocked.
guard_lascia_passare_shell_innocuahappyshell_exec with ls -la /tmp passes the guard.
judge_intent_menziona_executor_alza_scorehappyScore with intent that mentions the executor > score with generic intent.
judge_path_traversal_abbassa_scoreedgePath with .. receives lower score than the harmless path.
judge_sotto_soglia_bloccafailureWith METNOS_JUDGE_THRESHOLD=0.99, any action gets denied with blocked_by="judge".
log_jsonl_viene_scrittohappyAfter judge(...), the file YYYY-MM.jsonl exists and contains the intent marker.
args_keys_loggate_non_valuessecurityA sensitive value in args ("PASSWORD_segreto_...") does NOT appear in the log; only the key name ("path") is present.

On top of these 13, integration in the agent_runtime cluster (172/172 green) exercises the vaglio at its real call site (validation + scope + vaglio + invoke), on end-to-end scenarios.

9. Limits of v1.1

LimitExplanationWhen it lifts
The vaglio does not read the mnestoma.Separation-of-concerns decision: the v1.1 vaglio evaluates only the declared intent + executor + args, it does not look at history. Meaning: it cannot see «this same action was denied 5 minutes ago» or «the user usually does not operate this way».v1.2: the LLM judge will receive context from the mnestoma (similar relevance, prior outcomes) to weigh better.
The approval_ux (card + buttons in the channel) is renderable, the callback dispatcher is not.The card the user sees in the channels (Telegram, Slack, etc.) gets generated already via runtime/channels/approval.py. But if the user presses «Allow this once», the callback does not yet re-enter the live plan.v1.2: callback dispatcher with re-injection into the current turn.
The rule-based judge is very benevolent.Default threshold 0.30 and base score 0.7: even with several moderate penalties, the score stays above. Conscious: in v1.1 the real filter is the guard, the judge only marks outliers in the log.v1.2 with LLM judge: the threshold will mean more, and the score distribution will be wider.
No structured explanation of refusal to the user.When the vaglio denies, the planner shows the textual reason but does not build a card «why I said no and what you can do». The refusal UX is minimal.v1.2: integration with approval_ux for dialogic refusals.
No interactive vaglio.The v1.1 vaglio is a pure function: input args, output Verdict. It cannot ask «are you sure?» or request confirmation.v1.2: integration with the dialog manager for authorisation requests (see approval_ux).

10. Configurables

The configuration of the v1.1 vaglio is minimal, and this minimality is deliberate.

WhatWhereDefaultNotes
METNOS_JUDGE_THRESHOLDEnvironment variable.0.30Below this, the judge denies. Read at module import time (runtime/vaglio.py:40): to change it at runtime use importlib.reload(vaglio).
VAGLIO_LOG_DIRModule constant (not env).~/.local/share/metnos/vaglio/Changing the log destination would require a code edit. Deliberately not a parameter: the log is a system invariant, not a user preference.
_FORBIDDEN_PATH_PATTERNSModule constant, encoded.11 regexes (see ch. 3).Not configurable. Modifying them requires a code edit and a new deploy. Deliberate: ch. 5 of the Architecture, «non-negotiable core».
_DANGEROUS_SHELL_PATTERNSModule constant, encoded.6 regexes (see ch. 3).Not configurable. Same reasoning: the list of shell quasi-irrecoverable commands does not loosen.
Deliberately few configurations. When the vaglio becomes more configurable (in v1.2 with LLM judge there will be the provider, the prompt, the budget, possibly the fallback), every new knob will need to be justified. The philosophy: the ethical filter has so few dials on purpose. What cannot be configured cannot be misconfigured.

Final notes

The vaglio is a small component (~210 lines of Python) but architecturally central. The two-phase shape is the price of robustness: a binary guard that is not up for debate, a graded judge that evolves. Today's guard will protect even from tomorrow's judge, because it will have proved stable across versions of the judge.

The microdesign of this module was validated by the code before the document: the module went live on April 27, 2026, passed 13 own tests and 172 integration tests, and this doc describes what runs. Not the other way around.