TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles
Sara Shoouri, Morteza Tavakoli Taba, Hun-Seok Kim

TL;DR
TCP-SSM introduces a structured, interpretable state space model with token-conditioned poles, enhancing efficiency and adaptability in vision tasks while maintaining high accuracy.
Contribution
It proposes a novel Token-Conditioned Poles SSM framework that explicitly models recurrence dynamics with stable poles and token-dependent adaptation for vision tasks.
Findings
Reduces SSM computation complexity up to 44%.
Maintains or surpasses baseline accuracy across vision benchmarks.
Provides interpretable recurrence dynamics through stable poles.
Abstract
State Space Models (SSMs) have emerged as a compelling alternative to attention models for long-range vision tasks, offering input-dependent recurrence with linear complexity. However, most efficient SSM variants reduce computation cost by modifying scan routes, resolutions, or traversal patterns, while largely leaving the recurrent dynamics implicit. Consequently, the model's state-dependent memory behavior is difficult to control, particularly in compact backbones where long scan paths can exceed the effective memory horizon. We propose Token-Conditioned Poles SSM (TCP-SSM), a structured selective SSM framework that improves efficiency while making recurrence dynamics explicit and interpretable through stable poles. TCP-SSM builds each scan operator with 1) real poles that model monotone or sign-alternating decay, and 2) complex-conjugate poles that capture damped oscillatory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
