Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Amal Akli; Mike Papadakis; Maxime Cordy; Yves Le Traon

arXiv:2604.24703·cs.SE·April 28, 2026

Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon

PDF

TL;DR

This paper introduces SpecValidator, a lightweight model that effectively detects defects in task descriptions for code generation, improving reliability across various benchmarks.

Contribution

Develops a parameter-efficient classifier, SpecValidator, for automatic detection of defective task descriptions, outperforming larger models in accuracy and generalization.

Findings

01

SpecValidator achieves F1=0.804, MCC=0.745 in defect detection.

02

It outperforms GPT-5-mini and Claude Sonnet 4 significantly.

03

Rich contextual benchmarks improve robustness against defects.

Abstract

Large language models are widely used for code generation, yet they rely on an implicit assumption that the task descriptions are sufficiently detailed and well-formed. However, in practice, users may provide defective descriptions, which can have a strong effect on code correctness. To address this issue, we develop SpecValidator, a lightweight classifier based on a small model that has been parameter-efficiently finetuned, to automatically detect task description defects. We evaluate SpecValidator on three types of defects, Lexical Vagueness, Under-Specification and Syntax-Formatting on 3 benchmarks with task descriptions of varying structure and complexity. Our results show that SpecValidator achieves defect detection of F1 = 0.804 and MCC = 0.745, significantly outperforming GPT-5-mini (F1 = 0.469 and MCC = 0.281) and Claude Sonnet 4 (F1 = 0.518 and MCC = 0.359). Perhaps more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.