Performance Improvements of Probabilistic Transcript-adapted ASR with   Recurrent Neural Network and Language-specific Constraints

Xiang Kong; Preethi Jyothi; Mark Hasegawa-Johnson

arXiv:1612.03991·cs.CL·January 16, 2017

Performance Improvements of Probabilistic Transcript-adapted ASR with Recurrent Neural Network and Language-specific Constraints

Xiang Kong, Preethi Jyothi, Mark Hasegawa-Johnson

PDF

Open Access

TL;DR

This paper presents two techniques using recurrent neural networks and language-specific constraints to refine probabilistic transcriptions, significantly improving the accuracy of cross-lingual ASR systems for non-native speakers.

Contribution

It introduces a noisy-channel model trained with RNNs and applies language-dependent constraints to enhance probabilistic transcriptions for better ASR adaptation.

Findings

01

Reduced phone error rate by 7% with the RNN model.

02

Achieved a 9% reduction in phone error rate using pronunciation constraints.

03

Both methods improve transcription quality and ASR performance.

Abstract

Mismatched transcriptions have been proposed as a mean to acquire probabilistic transcriptions from non-native speakers of a language.Prior work has demonstrated the value of these transcriptions by successfully adapting cross-lingual ASR systems for different tar-get languages. In this work, we describe two techniques to refine these probabilistic transcriptions: a noisy-channel model of non-native phone misperception is trained using a recurrent neural net-work, and decoded using minimally-resourced language-dependent pronunciation constraints. Both innovations improve quality of the transcript, and both innovations reduce phone error rate of a trainedASR, by 7% and 9% respectively

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques