Unsupervised Discovery of Linguistic Structure Including Two-level   Acoustic Patterns Using Three Cascaded Stages of Iterative Optimization

Cheng-Tao Chung; Chun-an Chan; Lin-shan Lee

arXiv:1509.02208·cs.CL·September 9, 2015

Unsupervised Discovery of Linguistic Structure Including Two-level Acoustic Patterns Using Three Cascaded Stages of Iterative Optimization

Cheng-Tao Chung, Chun-an Chan, Lin-shan Lee

PDF

Open Access

TL;DR

This paper presents an unsupervised method for discovering two-level linguistic structures, including subword and word-like acoustic patterns, from raw speech data through iterative optimization, without requiring manual annotations.

Contribution

It introduces a three-stage cascaded iterative approach for unsupervised learning of linguistic structures, including lexicon and language models, directly from unlabelled speech data.

Findings

01

Achieved reasonable performance in Mandarin broadcast news corpus

02

Complementary to supervised large vocabulary ASR systems

03

Demonstrated effective unsupervised learning of linguistic patterns

Abstract

Techniques for unsupervised discovery of acoustic patterns are getting increasingly attractive, because huge quantities of speech data are becoming available but manual annotations remain hard to acquire. In this paper, we propose an approach for unsupervised discovery of linguistic structure for the target spoken language given raw speech data. This linguistic structure includes two-level (subword-like and word-like) acoustic patterns, the lexicon of word-like patterns in terms of subword-like patterns and the N-gram language model based on word-like patterns. All patterns, models, and parameters can be automatically learned from the unlabelled speech corpus. This is achieved by an initialization step followed by three cascaded stages for acoustic, linguistic, and lexical iterative optimization. The lexicon of word-like patterns defines allowed consecutive sequence of HMMs for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques