Vocal Tract Length Warped Features for Spoken Keyword Spotting
Achintya kr. Sarkar, Priyanka Dwivedi, Zheng-Hua Tan

TL;DR
This paper introduces VTL-warped features and novel methods for spoken keyword spotting, enhancing accuracy by incorporating vocal tract length variations through training and feature concatenation techniques.
Contribution
It presents new VTL-based feature methods and training strategies that improve keyword spotting performance over conventional approaches.
Findings
VTL-independent KWS with random VTL feature selection improves accuracy.
Concatenating VTL warped features enhances keyword spotting performance.
Proposed methods outperform traditional features on Google Command dataset.
Abstract
In this paper, we propose several methods that incorporate vocal tract length (VTL) warped features for spoken keyword spotting (KWS). The first method, VTL-independent KWS, involves training a single deep neural network (DNN) that utilizes VTL features with various warping factors. During training, a specific VTL feature is randomly selected per epoch, allowing the exploration of VTL variations. During testing, the VTL features with different warping factors of a test utterance are scored against the DNN and combined with equal weight. In the second method scores the conventional features of a test utterance (without VTL warping) against the DNN. The third method, VTL-concatenation KWS, concatenates VTL warped features to form high-dimensional features for KWS. Evaluations carried out on the English Google Command dataset demonstrate that the proposed methods improve the accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
