Evaluation Audit — Karpathy-6 and AI Verification

Calibration register — REV-5

Independent AI evaluation with a regulator-aligned audit trail.

The Evaluation Audit is a fixed-price, independent assessment of AI-generated analytical outputs, model deployments, or AI-augmented workflows — applying the Karpathy-6 adversarial verification methodology, implementation verification, and LLM-as-a-Judge scoring. It produces an evidence-grounded report with a documented claim-verification record designed to support due diligence, procurement assurance, and AI governance programmes. Every finding is cited to source evidence; no claim is asserted without a verification trace.

What you get

Karpathy-6 adversarial verification — a six-phase isolated-subagent pipeline testing for Fabrication, Misattribution, Inference Leakage, Severity Inflation, Phantom Consensus, and Omission across the corpus under review. Applied at enterprise scale, the methodology consistently identifies Category-A defects that standard review processes miss and delivers a corrected output set with a per-claim grounding trace.
Implementation verification — a structured review of whether the AI system or analytical pipeline you are assessing has been built to the specification claimed. Covers architecture, governance controls, data handling, and output quality at the level of evidence a technical due diligence process requires.
LLM-as-a-Judge scoring and audit trail — a documented, reproducible scoring record across the evaluation dimensions, structured for use in regulatory, procurement, or board-level reporting contexts. The audit trail is engineered for conformity assessment, not produced retrospectively.

Who it is for

Organisations that need independent assurance on an AI system or AI-generated analytical output before a significant decision is made. Typical contexts: procurement team assessing an AI vendor’s capability claims against their own evidence requirements; legal or compliance function requiring a verification record aligned to DORA Article 28 requirements or EU AI Act obligations; investment team conducting technical due diligence on an AI-native company or product.

All regulatory language in this engagement and its deliverables is written in the calibration register: designed to support conformity assessment and materially reduce audit risk. Claims are stated as engineered properties with evidence citations, not as absolute compliance guarantees.

Pricing

Fixed price, scoped at initiation based on corpus size, verification depth, and reporting requirements. Minimum engagement: one structured Karpathy-6 pass across a defined document or model output set.

Commission an Evaluation Audit

Brief Chris directly

Start the conversation

No SDR, no preliminary questionnaire — a direct conversation about your context and your mandate.

Prefer email? info@bluwingu.com

Direct booking launching soon

While we wire the scheduling link, email info@bluwingu.com — we respond within one working day.