Chess as a Testbed for Language Model State Tracking
Shubham Toshniwal, Sam Wiseman, Karen Livescu, Kevin Gimpel

TL;DR
This paper explores using chess game notation as a benchmark for evaluating how well transformer language models can track game states, revealing their capabilities and limitations in a constrained, deterministic domain.
Contribution
It introduces chess as a novel, controlled testbed for assessing transformer models' world state tracking abilities, highlighting the importance of full attention and training data.
Findings
Transformers can learn to track chess pieces and predict moves with enough data.
Access to board state info improves performance on small training sets.
Full attention is crucial; approximations cause significant performance drops.
Abstract
Transformer language models have made tremendous strides in natural language understanding tasks. However, the complexity of natural language makes it challenging to ascertain how accurately these models are tracking the world state underlying the text. Motivated by this issue, we consider the task of language modeling for the game of chess. Unlike natural language, chess notations describe a simple, constrained, and deterministic domain. Moreover, we observe that the appropriate choice of chess notation allows for directly probing the world state, without requiring any additional probing-related machinery. We find that: (a) With enough training data, transformer language models can learn to track pieces and predict legal moves with high accuracy when trained solely on move sequences. (b) For small training sets providing access to board state information during training can yield…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSports Analytics and Performance · Topic Modeling · Natural Language Processing Techniques
