IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing

Zeyang Song; Shimin Zhang; Yuhong Chou; Jibin Wu; Haizhou Li

arXiv:2507.07396·cs.MM·September 30, 2025

IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing

Zeyang Song, Shimin Zhang, Yuhong Chou, Jibin Wu, Haizhou Li

PDF

Open Access

TL;DR

The paper introduces IML-Spikeformer, a novel spiking Transformer architecture for large-scale speech processing that improves performance and energy efficiency by simulating multi-timestep spike firing within a single timestep.

Contribution

It proposes the Input-aware Multi-Level Spike mechanism and a re-parameterized self-attention module with hierarchical decay mask, advancing scalable SNN architectures for speech tasks.

Findings

01

Achieves competitive word error rates on AiShell-1 and Librispeech-960.

02

Reduces inference energy consumption by over 4 times.

03

Demonstrates scalable performance of SNNs in large-scale speech processing.

Abstract

Spiking Neural Networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional Artificial Neural Networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on large-scale speech processing tasks. Two key challenges hinder progress: (1) the high computational overhead during training caused by multi-timestep spike firing, and (2) the absence of large-scale SNN architectures tailored to speech processing tasks. To overcome the issues, we introduce Input-aware Multi-Level Spikeformer, i.e. IML-Spikeformer, a spiking Transformer architecture specifically designed for large-scale speech processing. Central to our design is the Input-aware Multi-Level Spike (IMLS) mechanism, which simulates multi-timestep spike firing within a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis