Learning State-Tracking from Code Using Linear RNNs
Julien Siems, Riccardo Grazzi, Kirill Kalinin, Hitesh Ballani, Babak Rahmani

TL;DR
This paper demonstrates that linear RNNs excel at state-tracking in code, especially in permutation tasks, while Transformers struggle, highlighting the importance of model architecture in sequence understanding.
Contribution
It introduces a method to convert permutation composition into code with REPL traces and shows linear RNNs outperform Transformers in this setting.
Findings
Linear RNNs excel at state-tracking in code tasks.
Transformers fail to perform well in permutation composition tasks.
Tracking states in code is difficult due to partial observability.
Abstract
Over the last years, state-tracking tasks, particularly permutation composition, have become a testbed to understand the limits of sequence models architectures like Transformers and RNNs (linear and non-linear). However, these are often sequence-to-sequence tasks: learning to map actions (permutations) to states, which is incompatible with the next-token prediction setting commonly used to train language models. We address this gap by converting permutation composition into code via REPL traces that interleave state-reveals through prints and variable transformations. We show that linear RNNs capable of state-tracking excel also in this setting, while Transformers still fail. Motivated by this representation, we investigate why tracking states in code is generally difficult: actions are not always fully observable. We frame this as tracking the state of a probabilistic finite-state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
