Scaling Laws for State Dynamics in Large Language Models
Jacob X Li, Shreyas S Raman, Jessica Wan, Fahad Samman, Jazlyn Lin

TL;DR
This paper investigates how well large language models can track and predict internal state transitions across various formal tasks, revealing limitations in accuracy and the distributed nature of state representation.
Contribution
It provides a systematic evaluation of LLMs' ability to model deterministic state dynamics and identifies specific attention heads involved in state propagation.
Findings
Accuracy decreases with larger state spaces and sparse transitions.
GPT-2 XL achieves ~70% accuracy in simple tasks, drops below 30% in complex ones.
State information is propagated by specific attention heads, but joint state-action reasoning is weak.
Abstract
Large Language Models (LLMs) are increasingly used in tasks requiring internal state tracking, yet their ability to model state transition dynamics remains poorly understood. We evaluate how well LLMs capture deterministic state dynamics across 3 domains: Box Tracking, Abstract DFA Sequences, and Complex Text Games, each formalizable as a finite-state system. Across tasks, we find that next-state prediction accuracy degrades with increasing state-space size and sparse transitions. GPT-2 XL reaches about 70% accuracy in low-complexity settings but drops below 30% when the number of boxes or states exceeds 5 or 10, respectively. In DFA tasks, Pythia-1B fails to exceed 50% accuracy when the number of states is > 10 and transitions are < 30. Through activation patching, we identify attention heads responsible for propagating state information: GPT-2 XL Layer 22 Head 20, and Pythia-1B Heads…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Softmax · Attention Dropout · Linear Layer · Residual Connection · Byte Pair Encoding · Weight Decay
