Unsupervised Discovery of Linguistic Structure Including Two-level Acoustic Patterns Using Three Cascaded Stages of Iterative Optimization
Cheng-Tao Chung, Chun-an Chan, Lin-shan Lee

TL;DR
This paper presents an unsupervised method for discovering two-level linguistic structures, including subword and word-like acoustic patterns, from raw speech data through iterative optimization, without requiring manual annotations.
Contribution
It introduces a three-stage cascaded iterative approach for unsupervised learning of linguistic structures, including lexicon and language models, directly from unlabelled speech data.
Findings
Achieved reasonable performance in Mandarin broadcast news corpus
Complementary to supervised large vocabulary ASR systems
Demonstrated effective unsupervised learning of linguistic patterns
Abstract
Techniques for unsupervised discovery of acoustic patterns are getting increasingly attractive, because huge quantities of speech data are becoming available but manual annotations remain hard to acquire. In this paper, we propose an approach for unsupervised discovery of linguistic structure for the target spoken language given raw speech data. This linguistic structure includes two-level (subword-like and word-like) acoustic patterns, the lexicon of word-like patterns in terms of subword-like patterns and the N-gram language model based on word-like patterns. All patterns, models, and parameters can be automatically learned from the unlabelled speech corpus. This is achieved by an initialization step followed by three cascaded stages for acoustic, linguistic, and lexical iterative optimization. The lexicon of word-like patterns defines allowed consecutive sequence of HMMs for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
