How Many Instructions Can LLMs Follow at Once?

Daniel Jaroslawicz; Brendan Whiting; Parth Shah; Karime Maamari

arXiv:2507.11538·cs.AI·July 16, 2025

How Many Instructions Can LLMs Follow at Once?

Daniel Jaroslawicz, Brendan Whiting, Parth Shah, Karime Maamari

PDF

Open Access

TL;DR

This paper introduces IFScale, a benchmark with 500 instructions to evaluate how well large language models follow multiple instructions simultaneously, revealing performance degradation patterns and informing prompt design.

Contribution

The paper presents IFScale, a new benchmark for high-density instruction-following evaluation and analyzes model performance and error patterns at large instruction counts.

Findings

01

Even the best models achieve only 68% accuracy at 500 instructions.

02

Model size and reasoning ability correlate with different degradation patterns.

03

Models tend to bias towards earlier instructions and exhibit specific error categories.

Abstract

Production-grade LLM systems require robust adherence to dozens or even hundreds of instructions simultaneously. However, the instruction-following capabilities of LLMs at high instruction densities have not yet been characterized, as existing benchmarks only evaluate models on tasks with a single or few instructions. We introduce IFScale, a simple benchmark of 500 keyword-inclusion instructions for a business report writing task to measure how instruction-following performance degrades as instruction density increases. We evaluate 20 state-of-the-art models across seven major providers and find that even the best frontier models only achieve 68% accuracy at the max density of 500 instructions. Our analysis reveals model size and reasoning capability to correlate with 3 distinct performance degradation patterns, bias towards earlier instructions, and distinct categories of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Digital Rights Management and Security · Mathematics, Computing, and Information Processing