Enhancing Automatically Discovered Multi-level Acoustic Patterns Considering Context Consistency With Applications in Spoken Term Detection
Cheng-Tao Chung, Wei-Ning Hsu, Cheng-Yi Lee, Lin-Shan Lee

TL;DR
This paper introduces a method to improve automatically discovered multi-level acoustic patterns by considering context consistency, leading to better spoken term detection performance on TIMIT and Mandarin datasets.
Contribution
It proposes a novel enhancement technique for acoustic patterns that leverages context consistency across multiple HMM configurations, improving STD results.
Findings
Significant improvements in spoken term detection accuracy.
Effective use of multi-level acoustic patterns for better recognition.
Validation on TIMIT and Mandarin datasets confirms robustness.
Abstract
This paper presents a novel approach for enhancing the multiple sets of acoustic patterns automatically discovered from a given corpus. In a previous work it was proposed that different HMM configurations (number of states per model, number of distinct models) for the acoustic patterns form a two-dimensional space. Multiple sets of acoustic patterns automatically discovered with the HMM configurations properly located on different points over this two-dimensional space were shown to be complementary to one another, jointly capturing the characteristics of the given corpus. By representing the given corpus as sequences of acoustic patterns on different HMM sets, the pattern indices in these sequences can be relabeled considering the context consistency across the different sequences. Good improvements were observed in preliminary experiments of pattern spoken term detection (STD)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
