Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic   Model

Sibo Tong; Philip N. Garner; Herv\'e Bourlard

arXiv:1711.10025·eess.AS·January 24, 2018·29 cites

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Sibo Tong, Philip N. Garner, Herv\'e Bourlard

PDF

Open Access

TL;DR

This paper explores multilingual CTC-based acoustic models for speech recognition, focusing on adaptation and regularization techniques like LHUC and dropout to improve performance on under-resourced languages.

Contribution

It introduces a universal IPA-based CTC model with adaptation methods that enhance multilingual speech recognition, especially with limited data.

Findings

01

LHUC improves language adaptation performance

02

Dropout during adaptation reduces overfitting

03

Achieves competitive results with DNN/HMM systems

Abstract

Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels. We investigate multilingual CTC in the context of adaptation and regularisation techniques that have been shown to be beneficial in more conventional contexts. The multilingual model is trained to model a universal International Phonetic Alphabet (IPA)-based phone set using the CTC loss function. Learning Hidden Unit Contribution (LHUC) is investigated to perform language adaptive training. In addition, dropout during cross-lingual adaptation is also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing