The Instruction Gap: LLMs get lost in Following Instruction

Vishesh Tripathi; Uday Allu; Biddwan Ahmed

arXiv:2601.03269·cs.CL·January 8, 2026

The Instruction Gap: LLMs get lost in Following Instruction

Vishesh Tripathi, Uday Allu, Biddwan Ahmed

PDF

Open Access

TL;DR

This paper evaluates 13 leading LLMs revealing significant variability in instruction adherence, highlighting a critical 'instruction gap' that impacts enterprise deployment and providing benchmarks for future improvements.

Contribution

It systematically assesses instruction compliance across models, identifying the extent of the instruction gap and establishing benchmarks for enterprise-ready LLM performance.

Findings

01

Claude-Sonnet-4 and GPT-5 perform best in instruction following

02

Instruction adherence varies dramatically across models

03

The instruction gap poses a challenge for enterprise deployment

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding and generation, yet their deployment in enterprise environments reveals a critical limitation: inconsistent adherence to custom instructions. This study presents a comprehensive evaluation of 13 leading LLMs across instruction compliance, response accuracy, and performance metrics in realworld RAG (Retrieval-Augmented Generation) scenarios. Through systematic testing with samples and enterprise-grade evaluation protocols, we demonstrate that instruction following varies dramatically across models, with Claude-Sonnet-4 and GPT-5 achieving the highest results. Our findings reveal the "instruction gap" - a fundamental challenge where models excel at general tasks but struggle with precise instruction adherence required for enterprise deployment. This work provides practical insights for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Materials Science