The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model
Hongxu Zhou

TL;DR
This paper introduces the UNDO Flip-Flop task to evaluate whether state space models can reliably learn reversible semantic state retrieval, revealing systematic failures in current models despite their theoretical expressivity.
Contribution
The paper presents a new benchmark task that isolates reversible state retrieval in state space models, highlighting the gap between theoretical capacity and practical learnability.
Findings
Models fail to learn the stack-based rollback mechanism.
Two-layer models collapse under adversarial retraction, achieving below-chance accuracy.
Retrieval, not storage, is the bottleneck in learning reversible states.
Abstract
State space models (SSMs) have been shown to possess the theoretical capacity to model both star-free sequential tasks and bounded hierarchical structures Sarrof et al. (2024). However, formal expressivity results do not guarantee that gradient-based optimisation will reliably discover the corresponding solutions. Existing benchmarks probe either monotonic state tracking, as in the standard Flip-Flop task, or structural nesting, as in the Dyck languages, but neither isolates reversible semantic state retrieval. We introduce the UNDO Flip-Flop task to fill this gap. By extending the standard Flip-Flop with an UNDO, the task requires a model to maintain an implicit bounded stack and recover historical states under non-monotonic update sequences. We evaluate one-layer and two-layer Mamba-2 under this framework. Both variants fail to acquire the provably expressible stack-based rollback…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
