Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?

Yakov Pyotr Shkolnikov

arXiv:2603.16922·eess.AS·March 19, 2026

Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?

Yakov Pyotr Shkolnikov

PDF

Open Access

TL;DR

This paper introduces the Learnable Pulse Accumulator (LPA), a novel O(n) attention mechanism that significantly reduces computational complexity in transformer-based speech models, enabling efficient on-device speech recognition with minimal accuracy loss.

Contribution

The paper presents LPA, a learnable gating mechanism replacing quadratic attention with linear complexity, optimized for edge devices, and demonstrates its effectiveness on speech recognition and enhancement tasks.

Findings

01

Replacing 8 of 12 layers yields 10.61% WER on LibriSpeech test-clean.

02

LPA achieves 3.27x speedup on Apple M4 Pro with minimal accuracy loss.

03

All intra-chunk attention layers can be replaced without collapse in speech enhancement.

Abstract

Self-attention scales quadratically with sequence length, limiting transformer-based speech models on edge devices. We introduce the Learnable Pulse Accumulator (LPA), an O(n) replacement that substitutes key-query dot products with learned gating functions: content-dependent rectangular pulses, periodic windows, and position-dependent basis functions. An MSE diagnostic sweep determines per-layer replacement difficulty and ordering. Replacing 8 of 12 wav2vec2-base layers yields 10.61% word error rate (WER) on LibriSpeech test-clean, +7.24 percentage points (pp) over the 3.37% baseline, with 3.27x speedup at 120s audio on Apple M4 Pro via an optimized MLX inference path. Cross-domain validation on SepFormer speech enhancement shows all 16 intra-chunk attention layers can be replaced without collapse, suggesting the depth wall arises from linguistic computation rather than an LPA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research