Implementation Verification

A plan verified before execution and an implementation unverified after it are still a risk. Implementation Verification closes the loop.

What it is

Implementation Verification is the post-execution gate that checks whether code, configuration, and scripts built from a plan are correct, complete, and faithful to the specification. It sits between plan review (pre-execution) and adversarial verification (document faithfulness), and it addresses a structural gap that manual testing and plan-anchored review both miss: the implementation that looks correct to the person who built it, because they are reading it through the lens of the intent they were trying to realise.

Three lenses run in parallel: Correctness (does the code do what the plan requires?), Safety (credentials handling, permission scope, injection surfaces), and Spec Adherence (every plan step traceable to a file change, every file change traceable to a plan step). Two adversarial agents run alongside: an Independent Extractor that reads the implementation without knowing the plan, and a Plan Baseline agent that reads the plan without seeing the implementation. The gap between what the code does and what the plan required is the omission the anchored lens agents are most likely to miss.

Every CRITICAL finding is independently verified by the orchestrator before it appears in the report. In the baseline that motivated this methodology, two of three agents’ CRITICAL claims were factually wrong.

When you reach for it

Implementation Verification applies immediately after any plan has been executed — before the implementation is committed, deployed, or handed over. It is the right methodology when an agentic build has produced a large surface area of changes quickly (speed increases the rate of “obviously correct” bugs), when the implementation involves regulated environments where a defect has compliance consequences, or when the producing agent is the same agent being asked to review its own work (an architectural conflict that Implementation Verification resolves by design).

It is not the right tool for pre-execution plan review or for document faithfulness checking. Those are separate methodologies with different detection methods.

What you ship

A severity-triaged findings report — CRITICAL findings independently confirmed before inclusion, WARNING findings spot-checked, INFO findings included as-is. Every CRITICAL and WARNING finding carries an exact file, line, and change specification — not a complaint, a fix plan.
A cross-validation evidence register — for every CRITICAL finding, a record of the independent verification step: what was checked, how it was checked, and whether the finding was confirmed or refuted. Refuted findings are removed. Fabricated findings do not reach the report.
A post-deploy platform verification (when applicable) — for deployments to remote platforms, authenticated API queries confirm that deployed artefacts exist with the correct configuration. CLI status is not trusted alone.

Linked methodologies

Implementation Verification applies the same failure-mode awareness as the Karpathy-6 Adversarial Verification flagship — FM-1 Fabrication, FM-2 Misattribution, FM-3 Inference Leakage, FM-4 Severity Inflation, FM-5 Phantom Consensus, FM-6 Omission — extended for implementations to include Absolutist Claims (untested “never”/“always” assertions in code or configuration). The methodology draws from the same adversarial architecture: context isolation between lens agents, independent extraction without plan anchoring, cross-validation before reporting.

The sequence is: [plan review] → plan execution → [Implementation Verification] → [Karpathy-6 on resulting deliverables if applicable].

Start here

Implementation Verification runs as part of every Blu Wingu build engagement. For external teams bringing an existing implementation for verification, the entry point is the Stream D Evaluation Audit — which covers both document faithfulness (Karpathy-6) and implementation correctness (this methodology) in a single five-day engagement. Book a Stream D audit.