Exploring Filterbank Learning for Keyword Spotting
Iv\'an L\'opez-Espejo, Zheng-Hua Tan, Jesper Jensen

TL;DR
This paper investigates learned filterbanks for keyword spotting, comparing them to handcrafted features, and finds no significant accuracy difference, suggesting handcrafted features remain effective but highlighting potential for future research.
Contribution
It explores filterbank learning methods for KWS, including spectral domain and gammachirp filterbanks, and evaluates their effectiveness with neural network back-ends.
Findings
No significant accuracy difference between learned and handcrafted features
Handcrafted features remain a strong choice for modern KWS
Potential information redundancy in features suggests new research directions
Abstract
Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application. Consequently, with greater or lesser success, optimal filterbank learning has been studied for different speech processing tasks. In this paper, we fill in a gap by exploring filterbank learning for keyword spotting (KWS). Two approaches are examined: filterbank matrix learning in the power spectral domain and parameter learning of a psychoacoustically-motivated gammachirp filterbank. Filterbank parameters are optimized jointly with a modern deep residual neural network-based KWS back-end. Our experimental results reveal that, in general, there are no statistically significant differences, in terms of KWS accuracy, between using a learned filterbank and handcrafted speech features. Thus, while we conclude that the latter are still a wise choice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
