Rethinking Token Reduction for State Space Models

Zheng Zhan; Yushu Wu; Zhenglun Kong; Changdi Yang; Yifan Gong; Xuan; Shen; Xue Lin; Pu Zhao; Yanzhi Wang

arXiv:2410.14725·cs.LG·October 22, 2024

Rethinking Token Reduction for State Space Models

Zheng Zhan, Yushu Wu, Zhenglun Kong, Changdi Yang, Yifan Gong, Xuan, Shen, Xue Lin, Pu Zhao, Yanzhi Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel, fine-grained token reduction method for State Space Models that improves accuracy and efficiency, addressing limitations of existing techniques and enabling broader application of large-scale models like Mamba.

Contribution

We propose a unified, intra-layer token reduction approach that combines importance and similarity, significantly enhancing performance and efficiency of SSMs like Mamba.

Findings

01

Improves accuracy by 5.7% to 13.1% on six benchmarks

02

Reduces computational and memory requirements substantially

03

Addresses limitations of existing token reduction methods

Abstract

Recent advancements in State Space Models (SSMs) have attracted significant interest, particularly in models optimized for parallel training and handling long-range dependencies. Architectures like Mamba have scaled to billions of parameters with selective SSM. To facilitate broader applications using Mamba, exploring its efficiency is crucial. While token reduction techniques offer a straightforward post-training strategy, we find that applying existing methods directly to SSMs leads to substantial performance drops. Through insightful analysis, we identify the reasons for this failure and the limitations of current techniques. In response, we propose a tailored, unified post-training token reduction method for SSMs. Our approach integrates token importance and similarity, thus taking advantage of both pruning and merging, to devise a fine-grained intra-layer token reduction strategy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wuyushuwys/tor_ssm
pytorchOfficial

Videos

Rethinking Token Reduction for State Space Models· underline

Taxonomy

TopicsReal-time simulation and control systems · Simulation Techniques and Applications

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Pruning