Beyond Code Generation: Assessing Code LLM Maturity with Postconditions
Fusen He, Juan Zhai, Minxue Pan

TL;DR
This paper introduces a new maturity model for code LLMs based on postcondition generation, providing a more comprehensive evaluation of their understanding and generation capabilities beyond traditional code synthesis benchmarks.
Contribution
The paper proposes a novel maturity model for code LLMs centered on postcondition generation, expanding evaluation beyond code writing to include understanding and semantic capabilities.
Findings
Open-source models need significant improvements in understanding code semantics.
The postcondition benchmark reveals gaps in current code LLM capabilities.
Augmented EvalPlus dataset enables more comprehensive assessment of code LLMs.
Abstract
Most existing code Large Language Model (LLM) benchmarks, e.g., EvalPlus, focus on the code generation tasks. Namely, they contain a natural language description of a problem and ask the LLM to write code to solve the problem. We argue that they do not capture all capabilities needed to assess the quality of a code LLM. In this paper, we propose a code LLM maturity model, based on the postcondition generation problem, to access a more complete set of code LLM capabilities. We choose the postcondition generation problem as it requires the code LLM to understand the code including semantics, natural language, and also have the capability to generate unambiguous postconditions in programming languages (i.e., the generation capablity). Moreover, postconditions have various types, requiring different levels of these capabilities, making it suitable to evaluate the maturity of the code LLM.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property
