A neural attention model for speech command recognition
Douglas Coimbra de Andrade, Sabato Leo, Martin Loesener Da Silva, Viana, Christoph Bernkopf

TL;DR
This paper presents a convolutional recurrent neural network with attention for speech command recognition, achieving state-of-the-art accuracy while maintaining a small model size, and providing interpretability of the audio regions influencing decisions.
Contribution
The paper introduces a novel attention-based neural network architecture that improves speech command recognition accuracy and interpretability over previous models.
Findings
Achieved 94.1% accuracy on Google Speech Commands V1
Achieved 94.5% accuracy on V2 for 20 commands
Model has only 202K parameters
Abstract
This paper introduces a convolutional recurrent network with attention for speech command recognition. Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint of only 202K trainable parameters. Results are compared with previous convolutional implementations on 5 different tasks (20 commands recognition (V1 and V2), 12 commands recognition (V1), 35 word recognition (V1) and left-right (V1)). We show detailed performance results and demonstrate that the proposed attention mechanism not only improves performance but also allows inspecting what regions of the audio were taken into consideration by the network when outputting a given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
