How Many Instructions Can LLMs Follow at Once?
Daniel Jaroslawicz, Brendan Whiting, Parth Shah, Karime Maamari

TL;DR
This paper introduces IFScale, a benchmark with 500 instructions to evaluate how well large language models follow multiple instructions simultaneously, revealing performance degradation patterns and informing prompt design.
Contribution
The paper presents IFScale, a new benchmark for high-density instruction-following evaluation and analyzes model performance and error patterns at large instruction counts.
Findings
Even the best models achieve only 68% accuracy at 500 instructions.
Model size and reasoning ability correlate with different degradation patterns.
Models tend to bias towards earlier instructions and exhibit specific error categories.
Abstract
Production-grade LLM systems require robust adherence to dozens or even hundreds of instructions simultaneously. However, the instruction-following capabilities of LLMs at high instruction densities have not yet been characterized, as existing benchmarks only evaluate models on tasks with a single or few instructions. We introduce IFScale, a simple benchmark of 500 keyword-inclusion instructions for a business report writing task to measure how instruction-following performance degrades as instruction density increases. We evaluate 20 state-of-the-art models across seven major providers and find that even the best frontier models only achieve 68% accuracy at the max density of 500 instructions. Our analysis reveals model size and reasoning capability to correlate with 3 distinct performance degradation patterns, bias towards earlier instructions, and distinct categories of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Digital Rights Management and Security · Mathematics, Computing, and Information Processing
