Verification Workflows
How to verify agent work before it ships: /review, /shadow-verify, /ship --verify, and the failure-geometry pattern.
Agent AFK ships several verification tools.
Failure Geometry
All of Agent AFK's verification tools are built around a single pattern called Failure Geometry.
The core insight: good workflows do not eliminate failure. They make failure earlier, smaller, louder, cheaper, more local, more informative, or more recoverable.
The pattern has four steps:
- Name the default failure. What does the workflow do when it breaks without intervention? (e.g., silently approve malformed output, narrate instead of act, declare done while blockers remain)
- Choose the preferred failure. What should breaking look like instead? (e.g., reject by default, pause in a named state, surface uncertainty)
- Add structure that transforms one into the other. A gate, a parser fallback, a rubric dimension, a contract field, an abort cascade rule.
- Verify the signal. Confirm the new failure mode is observable — if it fires and no one sees it, the geometry didn't change.
Source: docs/failure-geometry.md.
Examples in the codebase
| Default failure | Preferred failure | Mechanism |
|---|---|---|
| Trust all subagent output | Re-verify when uncertain | confidence gate (src/skills/_lib/confidence-gate.ts) |
| Fail silently at iteration cap | Pause in named state with next-step | heal-failed exit path (src/skills/mint/index.ts) |
| Child abort kills parent | Child notifies parent without auto-abort | AbortGraph cascade rule (src/agent/abort-graph.ts) |
Understanding this pattern helps you reason about why verification tools are structured the way they are — and how to add the same structure to your own skills.
/review — pre-merge diff review
/review dispatches parallel dimension agents across a diff, PR, commit SHA, branch, staged changes, or patch file. Each agent focuses on one concern:
- Security
- Correctness
- API compatibility
- Test coverage
- Performance and observability
The agents run in parallel and their findings are synthesized by severity. The output is a merge recommendation with specific, citable findings — not a general summary.
/review # staged changes in the current repo
/review https://github.com/... # PR URL
/review 3a1b2c4 # commit SHA
/review main..feature # branch range/review is read-only: it analyzes and reports only. It never edits files, commits, pushes, comments on a PR, or modifies a PR description.
/shadow-verify — adversarial claim verification
After any high-stakes subagent investigation — code reviews, audits, findings reports, large refactors — use /shadow-verify to independently re-derive key claims.
Shadow verifiers are parallel adversarial agents that each re-derive 2–3 claims from scratch using tool calls only. They return a verdict per claim:
| Verdict | Meaning |
|---|---|
CONFIRMED | The claim was independently verified against source files |
REFUTED | The claim contradicts what the verifier found in the files |
UNVERIFIABLE | The claim could not be checked with available tools |
Any disagreement is flagged before you act on the result. The skill fires automatically when a subagent uses high-confidence language ("confident", "certain", "clearly", ≥80%) — confidence is a trigger, not a verdict.
When to reach for it explicitly:
- After a security audit run from memory rather than file reads
- After a subagent reports a complex refactor is safe
- Before acting on any finding that would drive file changes, commits, or external side-effects
Source: AGENTS.md (shadow-verify skill description).
/ship --verify
/ship handles the release pipeline for already-done local work: it dispatches a pre-flight check, runs the test suite, drafts a commit message, pushes, and opens a PR.
Passing --verify triggers an adversarial verifier wave on the diff before a human reads the PR:
/ship --verifyThe verifier applies the same parallel-dimension analysis as /review to the outgoing diff and gates the PR open on the result. If the verifier finds a severity-critical issue, the PR is not opened until it's resolved.
Use --verify whenever changes are non-trivial, touch security-sensitive code, or change public API surfaces.
The confidence gate
Internally, AFK checks each subagent result for uncertainty signals — a low confidence score, a detected gap, or a claim at the edge of what the subagent could verify. When any fires, the output goes to re-verification.
This is the Failure Geometry pattern applied to subagent trust: the default failure mode would be to pass all output through unchecked. The preferred failure is to surface uncertainty loudly. The structure that transforms one into the other is the confidence gate.
Checklist for new skills
When you write a new skill, answer these before shipping:
- What is the default failure mode if the skill breaks with no hardening?
- What is the preferred failure mode — what should breaking look like?
- What structure transforms one into the other? (A gate, fallback, contract field, exit state, or rubric check — not a comment.)
- Is the preferred failure observable without reading source? (Logged, surfaced in output, or gated.)
Source: docs/failure-geometry.md (Checklist section).
Choosing the right tool
| Situation | Tool |
|---|---|
| Reviewing a PR or diff before merging | /review |
| Verifying a subagent's findings before acting | /shadow-verify |
| Pushing local work with pre-ship verification | /ship --verify |
| Writing a new skill and hardening its failure modes | Failure Geometry checklist |