CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation

Kaiwen Yan; Hongcheng Guo; Xuanqing Shi; Shaosheng Cao; Donglin Di; Zhoujun Li

arXiv:2502.19166·cs.SE·August 5, 2025

CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation

Kaiwen Yan, Hongcheng Guo, Xuanqing Shi, Shaosheng Cao, Donglin Di, Zhoujun Li

PDF

Open Access 1 Repo

TL;DR

CodeIF is a comprehensive benchmark designed to evaluate large language models' ability to follow instructions across various code generation tasks, highlighting current strengths and limitations in automated programming.

Contribution

This paper introduces CodeIF, the first dedicated benchmark for assessing instruction-following capabilities of LLMs in diverse code generation scenarios.

Findings

01

LLMs show strong performance in function synthesis

02

Models struggle with complex debugging tasks

03

Instruction adherence varies across tasks and models

Abstract

With the rapid advancement of Large Language Models (LLMs), the demand for robust instruction-following capabilities in code generation tasks has grown significantly. Code generation not only facilitates faster prototyping and automated testing, but also augments developer efficiency through improved maintainability and reusability of code. In this paper, we introduce CodeIF, the first benchmark specifically designed to assess the abilities of LLMs to adhere to task-oriented instructions within diverse code generation scenarios. CodeIF encompasses a broad range of tasks, including function synthesis, error debugging, algorithmic refactoring, and code explanation, thereby providing a comprehensive suite to evaluate model performance across varying complexity levels and programming domains. We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lin-rany/codeIF
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Algorithms · Software Engineering Research

MethodsALIGN