Benchmarking Complex Instruction-Following with Multiple Constraints Composition
Bosi Wen, Pei Ke, Xiaotao Gu, Lindong Wu, Hao Huang, Jinfeng Zhou,, Wenchuang Li, Binxin Hu, Wendy Gao, Jiaxin Xu, Yiming Liu, Jie Tang, Hongning, Wang, Minlie Huang

TL;DR
ComplexBench is a comprehensive benchmark designed to evaluate large language models' ability to follow complex instructions with multiple constraints, addressing a gap in existing evaluation methods by considering constraint composition.
Contribution
The paper introduces a hierarchical taxonomy and a high-quality dataset for evaluating complex instruction-following, along with an augmented evaluation method for reliability.
Findings
Existing LLMs show significant deficiencies in handling complex instructions with multiple constraints.
ComplexBench effectively identifies limitations in current models' instruction-following capabilities.
The benchmark provides a structured approach to assess the composition of constraints in instructions.
Abstract
Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints, which is an indispensable constituent in complex instructions. To this end, we propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints. We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsFocus
