The Specification as Quality Gate: Three Hypotheses on AI-Assisted Code Review
Christo Zietsman

TL;DR
This paper critically examines AI-assisted code review, highlighting its limitations when lacking external specifications, and proposes a structured approach emphasizing specifications first, then verification, and AI review for residual defects.
Contribution
It introduces three hypotheses on AI code review limitations, supported by empirical evidence, and proposes an architecture prioritizing specifications and deterministic verification.
Findings
Correlated errors in homogeneous LLM pipelines tend to echo rather than cancel.
Executable specifications enable a domain transition from complex to complicated.
AI review effectively targets residual defect classes outside specifications.
Abstract
The dominant industry response to AI-generated code quality problems is to deploy AI reviewers. This paper argues that this response is structurally circular when executable specifications are absent: without an external reference, both the generating agent and the reviewing agent reason from the same artefact, share the same training distribution, and exhibit correlated failures. The review checks code against itself, not against intent. Three hypotheses are developed. First, that correlated errors in homogeneous LLM pipelines echo rather than cancel, a claim supported by convergent empirical evidence from multiple 2025-2026 studies and by three small contrived experiments reported here. The first two experiments are same-family (Claude reviewing Claude-generated code); the third extends to a cross-family panel of four models from three families. All use a planted bug corpus rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
