CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Cases

Xiaona Xue; Yiqiao Huang; Jiacheng Li; Yuanhang Zheng; Huiqi Miao; Yunfei Ma; Rui Liu; Xinbao Sun; Minglu Liu; Fanyu Meng; Chao Deng; Junlan Feng

arXiv:2603.07886·cs.CL·March 10, 2026

CCR-Bench: A Comprehensive Benchmark for Evaluating LLMs on Complex Constraints, Control Flows, and Real-World Cases

Xiaona Xue, Yiqiao Huang, Jiacheng Li, Yuanhang Zheng, Huiqi Miao, Yunfei Ma, Rui Liu, Xinbao Sun, Minglu Liu, Fanyu Meng, Chao Deng, Junlan Feng

PDF

Open Access 1 Datasets

TL;DR

CCR-Bench is a new benchmark that rigorously evaluates large language models on their ability to understand and execute complex, real-world instructions involving intricate constraints, control flows, and task decompositions.

Contribution

This paper introduces CCR-Bench, a comprehensive and realistic benchmark that captures the high-dimensional complexity of real-world instructions for evaluating LLMs.

Findings

01

State-of-the-art models show significant performance gaps on CCR-Bench.

02

CCR-Bench reveals the limitations of current LLMs in handling complex instructions.

03

Real-world industrial scenarios are effectively represented in the benchmark.

Abstract

Enhancing the ability of large language models (LLMs) to follow complex instructions is critical for their deployment in real-world applications. However, existing evaluation methods often oversimplify instruction complexity as a mere additive combination of atomic constraints, failing to adequately capture the high-dimensional complexity arising from the intricate interplay of content and format, logical workflow control, and real-world applications. This leads to a significant gap between current evaluation practices and practical demands. To bridge this gap, we introduce CCR-Bench, a novel benchmark designed to assess LLMs' adherence to complex instructions. CCR-Bench is characterized by: (1) deep entanglement of content and formatting requirements in task specifications; (2) instructions that involve intricate task decomposition, conditional reasoning, and procedural planning; and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

JT-LM/CCR-Bench
dataset· 37 dl
37 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications