CCTU: A Benchmark for Tool Use under Complex Constraints
Junjie Ye, Guoqiang Zhang, Wenjie Fu, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR
CCTU is a comprehensive benchmark designed to evaluate large language models' ability to use tools under complex, multi-faceted constraints, revealing significant limitations in current models' adherence and self-refinement capabilities.
Contribution
We introduce CCTU, a novel benchmark with detailed constraint categories and validation tools, enabling systematic evaluation of LLM tool use under complex constraints.
Findings
Models rarely achieve high success rates under strict constraints.
Over 50% of constraint violations occur, especially in resource and response categories.
Models show limited self-refinement even with detailed feedback.
Abstract
Solving problems through tool use under explicit constraints constitutes a highly challenging yet unavoidable scenario for large language models (LLMs), requiring capabilities such as function calling, instruction following, and self-refinement. However, progress has been hindered by the absence of dedicated evaluations. To address this, we introduce CCTU, a benchmark for evaluating LLM tool use under complex constraints. CCTU is grounded in a taxonomy of 12 constraint categories spanning four dimensions (i.e., resource, behavior, toolset, and response). The benchmark comprises 200 carefully curated and challenging test cases across diverse tool-use scenarios, each involving an average of seven constraint types and an average prompt length exceeding 4,700 tokens. To enable reliable evaluation, we develop an executable constraint validation module that performs step-level validation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · AI in Service Interactions
