Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

Dongxin Guo; Jikun Wu; Siu Ming Yiu

arXiv:2604.15769·cs.LG·April 20, 2026

Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

Dongxin Guo, Jikun Wu, Siu Ming Yiu

PDF

TL;DR

This paper develops a theoretical framework for spiking transformers, demonstrating their expressivity, efficiency, and providing design principles validated by experiments across vision and language tasks.

Contribution

It establishes the first comprehensive expressivity theory for spiking self-attention and offers practical design rules with validated experimental results.

Findings

01

Spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator.

02

Input-dependent bounds explain why few timesteps suffice in practice.

03

Experimental results show high correlation between theory and actual performance.

Abstract

Spiking transformers achieve competitive accuracy with conventional transformers while offering $38$ - $57 \times$ energy efficiency on neuromorphic hardware, yet no theoretical framework guides their design. This paper establishes the first comprehensive expressivity theory for spiking self-attention. We prove that spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator of continuous permutation-equivariant functions, providing explicit spike circuit constructions including a novel lateral inhibition network for softmax normalization with proven $O (1/ T)$ convergence. We derive tight spike-count lower bounds via rate-distortion theory: $ε$ -approximation requires $Ω (L_{f}^{2} n d / ε^{2})$ spikes, with rigorous information-theoretic derivation. Our key insight is input-dependent bounds using measured effective dimensions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.