A Nonparametric Bayesian Approach for Spoken Term detection by Example Query
Amir Hossein Harati Nejad Torbati, Joseph Picone

TL;DR
This paper introduces a nonparametric Bayesian method for discovering acoustic units in low-resource languages and applies it to spoken term detection, achieving competitive results on the TIMIT dataset.
Contribution
The paper presents a novel nonparametric Bayesian approach for acoustic unit discovery and a spoken term detection algorithm based on these units, suitable for low-resource languages.
Findings
Achieved P@N of 61.2% on TIMIT
EER of 13.95% on TIMIT
Improved EER by 5% over previous methods
Abstract
State of the art speech recognition systems use data-intensive context-dependent phonemes as acoustic units. However, these approaches do not translate well to low resourced languages where large amounts of training data is not available. For such languages, automatic discovery of acoustic units is critical. In this paper, we demonstrate the application of nonparametric Bayesian models to acoustic unit discovery. We show that the discovered units are correlated with phonemes and therefore are linguistically meaningful. We also present a spoken term detection (STD) by example query algorithm based on these automatically learned units. We show that our proposed system produces a P@N of 61.2% and an EER of 13.95% on the TIMIT dataset. The improvement in the EER is 5% while P@N is only slightly lower than the best reported system in the literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
