Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep   Neural Networks

Naoya Takahashi; Tofigh Naghibi; Beat Pfister

arXiv:1606.05007·cs.CL·June 17, 2016

Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

Naoya Takahashi, Tofigh Naghibi, Beat Pfister

PDF

Open Access

TL;DR

This paper introduces a semi-supervised deep neural network approach for automatic pronunciation generation that improves speech recognition accuracy by jointly estimating sub-word units and dictionaries from orthographic transcriptions.

Contribution

It presents a novel data-driven method that reduces reliance on handcrafted pronunciation dictionaries and handles pronunciation variations effectively.

Findings

01

Outperforms phoneme-based recognition on TIMIT dataset

02

Reduces effort in dictionary creation and error correction

03

Enhances recognition accuracy in under-resourced languages

Abstract

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing