NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
Manav Singhal, Tushar Aggarwal, Abhijeet Awasthi, Nagarajan Natarajan,, Aditya Kanade

TL;DR
NoFunEval introduces a new benchmark to evaluate code language models on non-functional requirements like security and efficiency, revealing their limitations beyond functional correctness and questioning their understanding of real-world software needs.
Contribution
The paper presents NoFunEval, a novel benchmark for assessing code LMs on non-functional requirements and introduces the Coding Concepts prompting method for better domain knowledge communication.
Findings
Code LMs perform poorly on non-functional requirement tasks.
Even on functional correctness tasks, classification accuracy is surprisingly low.
The results highlight fundamental blindspots in current training setups of code LMs.
Abstract
Existing evaluation benchmarks of language models of code (code LMs) focus almost exclusively on whether the LMs can generate functionally-correct code. In real-world software engineering, developers think beyond functional correctness. They have requirements on "how" a functionality should be implemented to meet overall system design objectives like efficiency, security, and maintainability. They would also trust the code LMs more if the LMs demonstrate robust understanding of such requirements. We propose a new benchmark NoFunEval to evaluate code LMs on non-functional requirements and simple classification instances for both functional and non-functional requirements. We propose a prompting method, Coding Concepts (CoCo), as a way for a developer to communicate the domain knowledge to the LMs. We conduct an extensive evaluation of 27 code LMs. Our finding is that LMs generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Engineering Techniques and Practices
MethodsFocus
