Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model
Sibo Tong, Philip N. Garner, Herv\'e Bourlard

TL;DR
This paper explores multilingual CTC-based acoustic models for speech recognition, focusing on adaptation and regularization techniques like LHUC and dropout to improve performance on under-resourced languages.
Contribution
It introduces a universal IPA-based CTC model with adaptation methods that enhance multilingual speech recognition, especially with limited data.
Findings
LHUC improves language adaptation performance
Dropout during adaptation reduces overfitting
Achieves competitive results with DNN/HMM systems
Abstract
Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels. We investigate multilingual CTC in the context of adaptation and regularisation techniques that have been shown to be beneficial in more conventional contexts. The multilingual model is trained to model a universal International Phonetic Alphabet (IPA)-based phone set using the CTC loss function. Learning Hidden Unit Contribution (LHUC) is investigated to perform language adaptive training. In addition, dropout during cross-lingual adaptation is also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
