Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension
Dongxin Guo, Jikun Wu, Siu Ming Yiu

TL;DR
This paper develops a theoretical framework for spiking transformers, demonstrating their expressivity, efficiency, and providing design principles validated by experiments across vision and language tasks.
Contribution
It establishes the first comprehensive expressivity theory for spiking self-attention and offers practical design rules with validated experimental results.
Findings
Spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator.
Input-dependent bounds explain why few timesteps suffice in practice.
Experimental results show high correlation between theory and actual performance.
Abstract
Spiking transformers achieve competitive accuracy with conventional transformers while offering - energy efficiency on neuromorphic hardware, yet no theoretical framework guides their design. This paper establishes the first comprehensive expressivity theory for spiking self-attention. We prove that spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator of continuous permutation-equivariant functions, providing explicit spike circuit constructions including a novel lateral inhibition network for softmax normalization with proven convergence. We derive tight spike-count lower bounds via rate-distortion theory: -approximation requires spikes, with rigorous information-theoretic derivation. Our key insight is input-dependent bounds using measured effective dimensions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
