Rethinking Token Reduction for State Space Models
Zheng Zhan, Yushu Wu, Zhenglun Kong, Changdi Yang, Yifan Gong, Xuan, Shen, Xue Lin, Pu Zhao, Yanzhi Wang

TL;DR
This paper introduces a novel, fine-grained token reduction method for State Space Models that improves accuracy and efficiency, addressing limitations of existing techniques and enabling broader application of large-scale models like Mamba.
Contribution
We propose a unified, intra-layer token reduction approach that combines importance and similarity, significantly enhancing performance and efficiency of SSMs like Mamba.
Findings
Improves accuracy by 5.7% to 13.1% on six benchmarks
Reduces computational and memory requirements substantially
Addresses limitations of existing token reduction methods
Abstract
Recent advancements in State Space Models (SSMs) have attracted significant interest, particularly in models optimized for parallel training and handling long-range dependencies. Architectures like Mamba have scaled to billions of parameters with selective SSM. To facilitate broader applications using Mamba, exploring its efficiency is crucial. While token reduction techniques offer a straightforward post-training strategy, we find that applying existing methods directly to SSMs leads to substantial performance drops. Through insightful analysis, we identify the reasons for this failure and the limitations of current techniques. In response, we propose a tailored, unified post-training token reduction method for SSMs. Our approach integrates token importance and similarity, thus taking advantage of both pruning and merging, to devise a fine-grained intra-layer token reduction strategy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReal-time simulation and control systems · Simulation Techniques and Applications
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Pruning
