Benchmarking Complex Instruction-Following with Multiple Constraints   Composition

Bosi Wen; Pei Ke; Xiaotao Gu; Lindong Wu; Hao Huang; Jinfeng Zhou,; Wenchuang Li; Binxin Hu; Wendy Gao; Jiaxin Xu; Yiming Liu; Jie Tang; Hongning; Wang; Minlie Huang

arXiv:2407.03978·cs.CL·November 1, 2024

Benchmarking Complex Instruction-Following with Multiple Constraints Composition

Bosi Wen, Pei Ke, Xiaotao Gu, Lindong Wu, Hao Huang, Jinfeng Zhou,, Wenchuang Li, Binxin Hu, Wendy Gao, Jiaxin Xu, Yiming Liu, Jie Tang, Hongning, Wang, Minlie Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

ComplexBench is a comprehensive benchmark designed to evaluate large language models' ability to follow complex instructions with multiple constraints, addressing a gap in existing evaluation methods by considering constraint composition.

Contribution

The paper introduces a hierarchical taxonomy and a high-quality dataset for evaluating complex instruction-following, along with an augmented evaluation method for reliability.

Findings

01

Existing LLMs show significant deficiencies in handling complex instructions with multiple constraints.

02

ComplexBench effectively identifies limitations in current models' instruction-following capabilities.

03

The benchmark provides a structured approach to assess the composition of constraints in instructions.

Abstract

Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints, which is an indispensable constituent in complex instructions. To this end, we propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints. We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-coai/complexbench
noneOfficial

Videos

Benchmarking Complex Instruction-Following with Multiple Constraints Composition· slideslive

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsFocus