ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
Mehant Kammakomati, Sameer Pimparkhede, Srikanth Tamilselvam, Prince, Kumar, Pushpak Bhattacharyya

TL;DR
ConCodeEval is a benchmark designed to evaluate large language models' ability to understand and adhere to code constraints in domain-specific languages like JSON and YAML, revealing significant challenges in controllability.
Contribution
This work introduces the first benchmark for assessing LLMs' understanding of code constraints in domain-specific languages across multiple representations.
Findings
LLMs struggle with code constraints in DSLs.
High performance in normal code tasks does not translate to constraint understanding.
LLMs show limited controllability over code constraints.
Abstract
Recent work shows Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks in zero- and few-shot settings. While, in the code domain, there is wide usage of constraints in code format to maintain the integrity of code written in Domain-Specific Languages (DSLs) like JSON and YAML which are widely used for system-level programming tasks in enterprises. Given that LLMs are increasingly used for system-level code tasks, evaluating if they can comprehend these code constraints is crucial. However, no work has been done to evaluate their controllability over code constraints. Hence, we introduce ConCodeEval, a first-of-its-kind benchmark having two novel tasks for code constraints across five representations. Our findings suggest that language models struggle with code constraints. Code languages that perform excellently for normal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
