Adaptive Activation Network For Low Resource Multilingual Speech   Recognition

Jian Luo; Jianzong Wang; Ning Cheng; Zhenpeng Zheng; Jing Xiao

arXiv:2205.14326·cs.CL·May 31, 2022

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

PDF

Open Access

TL;DR

This paper introduces an adaptive activation network for low-resource multilingual speech recognition, using language-specific activation functions and training strategies to improve performance over traditional methods.

Contribution

It proposes a novel adaptive activation network with cross-lingual and multilingual training approaches for low-resource ASR, enhancing recognition accuracy.

Findings

01

Outperforms from-scratch training methods.

02

Improves over traditional bottleneck feature approaches.

03

Combining strategies yields further performance gains.

Abstract

Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts of training data. The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language. In this work, we introduced an adaptive activation network to the upper layers of ASR model, and applied different activation functions to different languages. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, (2) multilingual learning, jointly training the Connectionist Temporal Classification (CTC) loss of each language and the relevance of different languages. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing