State space models can express n-gram languages
Vinoth Nandakumar, Qiang Qu, Peng Mi, Tongliang Liu

TL;DR
This paper demonstrates that state space models (SSMs) can theoretically and practically encode n-gram language rules, showing their expressiveness and potential advantages over traditional n-gram models in next-word prediction tasks.
Contribution
The paper provides a theoretical framework proving SSMs can encode n-gram rules and shows how their context window can be controlled, bridging the gap between rule-based and neural models.
Findings
SSMs can encode n-gram rules using new theoretical results.
The spectrum of the state transition matrix controls the context window.
Experiments show SSMs can be applied to n-gram generated data.
Abstract
Recent advancements in recurrent neural networks (RNNs) have reinvigorated interest in their application to natural language processing tasks, particularly with the development of more efficient and parallelizable variants known as state space models (SSMs), which have shown competitive performance against transformer models while maintaining a lower memory footprint. While RNNs and SSMs (e.g., Mamba) have been empirically more successful than rule-based systems based on n-gram models, a rigorous theoretical explanation for this success has not yet been developed, as it is unclear how these models encode the combinatorial rules that govern the next-word prediction task. In this paper, we construct state space language models that can solve the next-word prediction task for languages generated from n-gram rules, thereby showing that the former are more expressive. Our proof shows how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
