Rethinking Selectivity in State Space Models: A Minimal Predictive Sufficiency Approach
Yiyi Wang, Jian'an Zhang, Hongyi Duan, Haoyang Liu, Qingyang Li

TL;DR
This paper introduces the Principle of Predictive Sufficiency to guide state space models in optimally compressing past information for better prediction, leading to a new model that outperforms existing methods in robustness and accuracy.
Contribution
It proposes a novel information-theoretic principle for designing state space models, resulting in the MPS-SSM that improves predictive efficiency and robustness over prior heuristic-based models.
Findings
MPS-SSM achieves state-of-the-art performance on benchmark datasets.
The model demonstrates superior robustness in noisy and long-term forecasting scenarios.
The principle can serve as a regularization framework for other architectures.
Abstract
State Space Models (SSMs), particularly recent selective variants like Mamba, have emerged as a leading architecture for sequence modeling, challenging the dominance of Transformers. However, the success of these state-of-the-art models largely relies on heuristically designed selective mechanisms, which lack a rigorous first-principle derivation. This theoretical gap raises questions about their optimality and robustness against spurious correlations. To address this, we introduce the Principle of Predictive Sufficiency, a novel information-theoretic criterion stipulating that an ideal hidden state should be a minimal sufficient statistic of the past for predicting the future. Based on this principle, we propose the Minimal Predictive Sufficiency State Space Model (MPS-SSM), a new framework where the selective mechanism is guided by optimizing an objective function derived from our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
