DeCon: Detecting Incorrect Assertions via Postconditions Generated by a Large Language Model
Hao Yu, Tianyu Chen, Jiaming Huang, Zongyang Li, Dezhi Ran, Xinyu, Wang, Ying Li, Assaf Marron, David Harel, Yuan Xie, Tao Xie

TL;DR
DeCon is a novel approach that detects incorrect assertions generated by large language models in code, using postconditions and a small set of I/O examples, significantly improving assertion correctness detection.
Contribution
DeCon introduces a new method leveraging LLM-generated postconditions and I/O examples to effectively identify incorrect assertions in code generated by LLMs.
Findings
Detects over 64% of incorrect assertions
Improves code generation effectiveness by 4% Pass@1
Maintains high fault-finding ability despite filtering
Abstract
Recently, given the docstring for the target problem and the target function signature, large language models (LLMs) have been used not only to generate source code, but also to generate test cases, consisting of test inputs and assertions (e.g., in the form of checking an actual output against the expected output). However, as shown by our empirical study on assertions generated by four LLMs for the HumanEval benchmark, over 62% of the generated assertions are incorrect (i.e., failed on the ground-truth problem solution). To detect incorrect assertions (given the docstring and the target function signature along with a sample of example inputs and outputs), in this paper, we propose a new approach named DeCon to effectively detect incorrect assertions via LLM-generated postconditions for the target problem (a postcondition is a predicate that must always be true just after the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
