Rethinking Selectivity in State Space Models: A Minimal Predictive Sufficiency Approach

Yiyi Wang; Jian'an Zhang; Hongyi Duan; Haoyang Liu; Qingyang Li

arXiv:2508.03158·cs.LG·August 6, 2025

Rethinking Selectivity in State Space Models: A Minimal Predictive Sufficiency Approach

Yiyi Wang, Jian'an Zhang, Hongyi Duan, Haoyang Liu, Qingyang Li

PDF

TL;DR

This paper introduces the Principle of Predictive Sufficiency to guide state space models in optimally compressing past information for better prediction, leading to a new model that outperforms existing methods in robustness and accuracy.

Contribution

It proposes a novel information-theoretic principle for designing state space models, resulting in the MPS-SSM that improves predictive efficiency and robustness over prior heuristic-based models.

Findings

01

MPS-SSM achieves state-of-the-art performance on benchmark datasets.

02

The model demonstrates superior robustness in noisy and long-term forecasting scenarios.

03

The principle can serve as a regularization framework for other architectures.

Abstract

State Space Models (SSMs), particularly recent selective variants like Mamba, have emerged as a leading architecture for sequence modeling, challenging the dominance of Transformers. However, the success of these state-of-the-art models largely relies on heuristically designed selective mechanisms, which lack a rigorous first-principle derivation. This theoretical gap raises questions about their optimality and robustness against spurious correlations. To address this, we introduce the Principle of Predictive Sufficiency, a novel information-theoretic criterion stipulating that an ideal hidden state should be a minimal sufficient statistic of the past for predicting the future. Based on this principle, we propose the Minimal Predictive Sufficiency State Space Model (MPS-SSM), a new framework where the selective mechanism is guided by optimizing an objective function derived from our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.