Prediction-Adaptation-Correction Recurrent Neural Networks for   Low-Resource Language Speech Recognition

Yu Zhang; Ekapol Chuangsuwanich; James Glass; Dong Yu

arXiv:1510.08985·cs.CL·December 6, 2018·2 cites

Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition

Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu

PDF

Open Access

TL;DR

This paper introduces PAC-RNNs, a novel neural network architecture for low-resource speech recognition, leveraging prediction and correction modules with transfer learning to outperform existing models.

Contribution

The paper proposes PAC-RNNs, combining prediction and correction networks with transfer learning, achieving superior performance in low-resource speech recognition tasks.

Findings

01

PAC-RNNs outperform DNNs and LSTMs on IARPA-Babel tasks.

02

Transfer learning from similar languages improves recognition accuracy.

03

The model effectively utilizes auxiliary information for better state estimation.

Abstract

In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition. A PAC-RNN is comprised of a pair of neural networks in which a {\it correction} network uses auxiliary information given by a {\it prediction} network to help estimate the state probability. The information from the correction network is also used by the prediction network in a recurrent loop. Our model outperforms other state-of-the-art neural networks (DNNs, LSTMs) on IARPA-Babel tasks. Moreover, transfer learning from a language that is similar to the target language can help improve performance further.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques