HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet
Badri N. Patro, Vijay S. Agneeswaran

TL;DR
HAMSA introduces a spectral domain vision model that eliminates scanning, achieves high accuracy on ImageNet-1K, and significantly improves speed and efficiency over existing SSMs and transformers.
Contribution
The paper proposes HAMSA, a novel scanning-free spectral vision model with simplified kernel parameterization and adaptive spectral gating, outperforming prior SSMs in accuracy, speed, and resource efficiency.
Findings
HAMSA achieves 85.7% top-1 accuracy on ImageNet-1K.
HAMSA is 2.2 times faster than DeiT-S during inference.
HAMSA uses less memory and energy compared to previous SSMs.
Abstract
Vision State Space Models (SSMs) like Vim, VMamba, and SiMBA rely on complex scanning strategies to adapt sequential SSMs to process 2D images, introducing computational overhead and architectural complexity. We propose HAMSA, a scanning-free SSM operating directly in the spectral domain. HAMSA introduces three key innovations: (1) simplified kernel parameterization-a single Gaussian-initialized complex kernel replacing traditional (A, B, C) matrices, eliminating discretization instabilities; (2) SpectralPulseNet (SPN)-an input-dependent frequency gating mechanism enabling adaptive spectral modulation; and (3) Spectral Adaptive Gating Unit (SAGU)-magnitude-based gating for stable gradient flow in the frequency domain. By leveraging FFT-based convolution, HAMSA eliminates sequential scanning while achieving O(L log L) complexity with superior simplicity and efficiency. On ImageNet-1K,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
