Are LLMs Reliable Code Reviewers? Systematic Overcorrection in Requirement Conformance Judgement

Haolin Jin; Huaming Chen

arXiv:2603.00539·cs.SE·March 3, 2026

Are LLMs Reliable Code Reviewers? Systematic Overcorrection in Requirement Conformance Judgement

Haolin Jin, Huaming Chen

PDF

Open Access

TL;DR

This paper reveals that large language models often misjudge code compliance with natural language requirements, especially with detailed prompts, exposing reliability issues in LLM-based code review systems.

Contribution

The study systematically uncovers LLM failures in code requirement conformance judgment and proposes a fix-guided verification method to improve reliability.

Findings

01

LLMs frequently misclassify correct code as non-compliant.

02

More detailed prompts increase misjudgment rates.

03

The proposed verification filter improves review accuracy.

Abstract

Large language models (LLMs) have become essential tools in software development, widely used for requirements engineering, code generation and review tasks. Software engineers often rely on LLMs to verify if code implementation satisfy task requirements, thereby ensuring code robustness and accuracy. However, it remains unclear whether LLMs can reliably determine code against the given task descriptions, which is usually in a form of natural language specifications. In this paper, we uncover a systematic failure of LLMs in matching code to natural language requirements. Specifically, with widely adopted benchmarks and unified prompts design, we demonstrate that LLMs frequently misclassify correct code implementation as non-compliant or defective. Surprisingly, we find that more detailed prompt design, particularly with those requiring explanations and proposed corrections, leads to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Model-Driven Software Engineering Techniques