CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

Tao Zhang; Chenglin Zhu; Yanjun Shen; Wenjing Luo; Yan Zhang; Hao; Liang; Tao Zhang; Fan Yang; Mingan Lin; Yujing Qiao; Weipeng Chen; Bin Cui,; Wentao Zhang; Zenan Zhou

arXiv:2408.01122·cs.CL·May 7, 2025

CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

Tao Zhang, Chenglin Zhu, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao, Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui,, Wentao Zhang, Zenan Zhou

PDF

Open Access 1 Repo

TL;DR

CFBench is a large-scale benchmark designed to evaluate LLMs on their ability to comprehensively follow diverse real-world constraints across multiple NLP tasks, addressing limitations of previous fragmented assessments.

Contribution

The paper introduces CFBench, a comprehensive and systematic benchmark with 1,000 samples covering diverse constraints and scenarios, along with an advanced evaluation methodology for LLMs.

Findings

01

Current LLMs show significant room for improvement in constraints following.

02

The benchmark reveals varying performance across different constraint types.

03

Evaluation methodology aligns better with user perceptions of constraint adherence.

Abstract

The adeptness of Large Language Models (LLMs) in comprehending and following natural language instructions is critical for their deployment in sophisticated real-world applications. Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user's perspective. To bridge this gap, we propose CFBench, a large-scale Comprehensive Constraints Following Benchmark for LLMs, featuring 1,000 curated samples that cover more than 200 real-life scenarios and over 50 NLP tasks. CFBench meticulously compiles constraints from real-world instructions and constructs an innovative systematic framework for constraint types, which includes 10 primary categories and over 25 subcategories, and ensures each constraint is seamlessly integrated within the instructions. To make certain that the evaluation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-baichuan-mlsystemlab/cfbench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Mathematics, Computing, and Information Processing

MethodsFocus