The DKU System Description for The Interspeech 2021 Auto-KWS Challenge
Yechen Wang, Yan Jia, Murong Ma, Zexin Cai, Ming Li

TL;DR
This paper describes a two-stage keyword spotting system combining dynamic time warping and acoustic word embeddings, achieving improved accuracy in the Auto-KWS 2021 Challenge.
Contribution
The paper introduces a novel two-stage keyword spotting approach that integrates template matching and acoustic word embeddings for enhanced detection performance.
Findings
Achieved an average score of 0.61 on the feedback dataset.
Outperformed the baseline system by 0.25 in the challenge.
Demonstrated effectiveness of combining DTW and embedding-based verification.
Abstract
This paper introduces the system submitted by the DKU-SMIIP team for the Auto-KWS 2021 Challenge. Our implementation consists of a two-stage keyword spotting system based on query-by-example spoken term detection and a speaker verification system. We employ two different detection algorithms in our proposed keyword spotting system. The first stage adopts subsequence dynamic time warping for template matching based on frame-level language-independent bottleneck feature and phoneme posterior probability. We use a sliding window template matching algorithm based on acoustic word embeddings to further verify the detection from the first stage. As a result, our KWS system achieves an average score of 0.61 on the feedback dataset, which outperforms the baseline1 system by 0.25.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
