Adaptive Activation Network For Low Resource Multilingual Speech Recognition
Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

TL;DR
This paper introduces an adaptive activation network for low-resource multilingual speech recognition, using language-specific activation functions and training strategies to improve performance over traditional methods.
Contribution
It proposes a novel adaptive activation network with cross-lingual and multilingual training approaches for low-resource ASR, enhancing recognition accuracy.
Findings
Outperforms from-scratch training methods.
Improves over traditional bottleneck feature approaches.
Combining strategies yields further performance gains.
Abstract
Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts of training data. The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language. In this work, we introduced an adaptive activation network to the upper layers of ASR model, and applied different activation functions to different languages. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, (2) multilingual learning, jointly training the Connectionist Temporal Classification (CTC) loss of each language and the relevance of different languages. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
