TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices
Alexander Wong, Mahmoud Famouri, Maya Pavlova, and Siddharth Surana

TL;DR
This paper introduces attention condensers and TinySpeech networks, achieving highly efficient on-device speech recognition with significantly reduced parameters, computation, and memory, suitable for edge devices and TinyML applications.
Contribution
The paper proposes attention condensers as a novel self-attention mechanism and develops TinySpeech networks optimized for low-resource edge devices, demonstrating substantial efficiency improvements.
Findings
Achieved up to 507x fewer parameters compared to previous models.
Reduced multiply-add operations by up to 48x.
Lowered weight memory requirements by up to 2028x.
Abstract
Advances in deep learning have led to state-of-the-art performance across a multitude of speech recognition tasks. Nevertheless, the widespread deployment of deep neural networks for on-device speech recognition remains a challenge, particularly in edge scenarios where the memory and computing resources are highly constrained (e.g., low-power embedded devices) or where the memory and computing budget dedicated to speech recognition is low (e.g., mobile devices performing numerous tasks besides speech recognition). In this study, we introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge. An attention condenser is a self-attention mechanism that learns and produces a condensed embedding characterizing joint local and cross-channel activation relationships, and performs selective attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Neural Network Applications · Speech and Audio Processing
