CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation
Kaiwen Yan, Hongcheng Guo, Xuanqing Shi, Shaosheng Cao, Donglin Di, Zhoujun Li

TL;DR
CodeIF is a comprehensive benchmark designed to evaluate large language models' ability to follow instructions across various code generation tasks, highlighting current strengths and limitations in automated programming.
Contribution
This paper introduces CodeIF, the first dedicated benchmark for assessing instruction-following capabilities of LLMs in diverse code generation scenarios.
Findings
LLMs show strong performance in function synthesis
Models struggle with complex debugging tasks
Instruction adherence varies across tasks and models
Abstract
With the rapid advancement of Large Language Models (LLMs), the demand for robust instruction-following capabilities in code generation tasks has grown significantly. Code generation not only facilitates faster prototyping and automated testing, but also augments developer efficiency through improved maintainability and reusability of code. In this paper, we introduce CodeIF, the first benchmark specifically designed to assess the abilities of LLMs to adhere to task-oriented instructions within diverse code generation scenarios. CodeIF encompasses a broad range of tasks, including function synthesis, error debugging, algorithmic refactoring, and code explanation, thereby providing a comprehensive suite to evaluate model performance across varying complexity levels and programming domains. We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Algorithms · Software Engineering Research
MethodsALIGN
