Language models fail at extended rule following
Tianxiang Dai, Jonathan Fan

TL;DR
Large language models struggle with reliably following extended rules, such as counting, due to finite internal states, highlighting the need for new architectures for autonomous agents.
Contribution
This paper demonstrates the limitations of current language models in rule-following tasks and identifies internal state exhaustion as the core failure mode.
Findings
Models fail to accurately count beyond a certain threshold.
Failures persist despite increased model size and external tools.
Internal states used for counting are finite and get exhausted.
Abstract
Large language models are highly capable of answering difficult questions by retrieving, recombining, and attending to information in long contexts. For agentic tasks, an additional capability is required: the preservation of an exact state while repeatedly applying rules. We find that this reliability is absent across language models. To demonstrate, we query 126 leading model variants with the task of counting a long string of repeated characters, and we find they all cannot accurately count above a model-dependent, syntax-sensitive counting capacity threshold. Failures are abrupt and persist even with increasing model size, inference time computation, and external tool. Mechanistic probing indicates that models use a finite number of internal states to mimic counting as a rule and fail once these states are exhausted. Furthermore, such states are the basis for performing complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
