Scaling Laws for State Dynamics in Large Language Models

Jacob X Li; Shreyas S Raman; Jessica Wan; Fahad Samman; Jazlyn Lin

arXiv:2505.14892·cs.CL·May 22, 2025

Scaling Laws for State Dynamics in Large Language Models

Jacob X Li, Shreyas S Raman, Jessica Wan, Fahad Samman, Jazlyn Lin

PDF

Open Access

TL;DR

This paper investigates how well large language models can track and predict internal state transitions across various formal tasks, revealing limitations in accuracy and the distributed nature of state representation.

Contribution

It provides a systematic evaluation of LLMs' ability to model deterministic state dynamics and identifies specific attention heads involved in state propagation.

Findings

01

Accuracy decreases with larger state spaces and sparse transitions.

02

GPT-2 XL achieves ~70% accuracy in simple tasks, drops below 30% in complex ones.

03

State information is propagated by specific attention heads, but joint state-action reasoning is weak.

Abstract

Large Language Models (LLMs) are increasingly used in tasks requiring internal state tracking, yet their ability to model state transition dynamics remains poorly understood. We evaluate how well LLMs capture deterministic state dynamics across 3 domains: Box Tracking, Abstract DFA Sequences, and Complex Text Games, each formalizable as a finite-state system. Across tasks, we find that next-state prediction accuracy degrades with increasing state-space size and sparse transitions. GPT-2 XL reaches about 70% accuracy in low-complexity settings but drops below 30% when the number of boxes or states exceeds 5 or 10, respectively. In DFA tasks, Pythia-1B fails to exceed 50% accuracy when the number of states is > 10 and transitions are < 30. Through activation patching, we identify attention heads responsible for propagating state information: GPT-2 XL Layer 22 Head 20, and Pythia-1B Heads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Softmax · Attention Dropout · Linear Layer · Residual Connection · Byte Pair Encoding · Weight Decay