Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection
Takuya Higuchi, Mohammad Ghasemzadeh, Kisun You, Chandra Dhir

TL;DR
This paper introduces a stacked 1D convolutional neural network (S1DCNN) for efficient, end-to-end voice trigger detection on small devices, outperforming SVDF-based models in reducing false rejects while maintaining low resource usage.
Contribution
The paper presents S1DCNN, a novel model architecture that generalizes SVDFs and improves false reject rates in small-footprint voice trigger detection systems.
Findings
S1DCNN achieves 19% relative FRR reduction over SVDF.
S1DCNN maintains similar model size and latency as SVDF.
Longer time delays further improve FRR by up to 12.2%.
Abstract
We propose a stacked 1D convolutional neural network (S1DCNN) for end-to-end small footprint voice trigger detection in a streaming scenario. Voice trigger detection is an important speech application, with which users can activate their devices by simply saying a keyword or phrase. Due to privacy and latency reasons, a voice trigger detection system should run on an always-on processor on device. Therefore, having small memory and compute cost is crucial for a voice trigger detection system. Recently, singular value decomposition filters (SVDFs) has been used for end-to-end voice trigger detection. The SVDFs approximate a fully-connected layer with a low rank approximation, which reduces the number of model parameters. In this work, we propose S1DCNN as an alternative approach for end-to-end small-footprint voice trigger detection. An S1DCNN layer consists of a 1D convolution layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
