AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Yunjia Qi, Hao Peng, Xiaozhi Wang, Amy Xin, Youfeng Liu, Bin Xu, Lei Hou, Juanzi Li

TL;DR
This paper introduces AgentIF, a comprehensive benchmark designed to evaluate large language models' ability to follow complex, lengthy instructions in realistic agentic scenarios, revealing current models' limitations.
Contribution
The paper presents the first benchmark for assessing LLM instruction following in agentic tasks, including a large dataset of real-world instructions and detailed evaluation metrics.
Findings
Current LLMs perform poorly on complex constraints.
Models struggle with tool specifications and lengthy instructions.
Error analysis reveals specific failure modes.
Abstract
Large Language Models (LLMs) have demonstrated advanced capabilities in real-world agentic applications. Growing research efforts aim to develop LLM-based agents to address practical demands, introducing a new challenge: agentic scenarios often involve lengthy instructions with complex constraints, such as extended system prompts and detailed tool specifications. While adherence to such instructions is crucial for agentic applications, whether LLMs can reliably follow them remains underexplored. In this paper, we introduce AgentIF, the first benchmark for systematically evaluating LLM instruction following ability in agentic scenarios. AgentIF features three key characteristics: (1) Realistic, constructed from 50 real-world agentic applications. (2) Long, averaging 1,723 words with a maximum of 15,630 words. (3) Complex, averaging 11.9 constraints per instruction, covering diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
