Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition
Genta Indra Winata, Guangsen Wang, Caiming Xiong, Steven Hoi

TL;DR
This paper introduces Adapt-and-Adjust (A2), a transformer-based framework that improves multilingual speech recognition for low-resource languages by leveraging pretrained models, dual adapters, and class imbalance techniques.
Contribution
The paper presents a novel multi-task learning framework combining pretrained multilingual models, dual adapters, and class imbalance handling to address the long-tail problem in multilingual speech recognition.
Findings
A2 significantly outperforms conventional methods on CommonVoice.
The use of pretrained mBERT enhances low-resource language performance.
Dual adapters effectively balance language-specific and language-agnostic adaptation.
Abstract
One crucial challenge of real-world multilingual speech recognition is the long-tailed distribution problem, where some resource-rich languages like English have abundant training data, but a long tail of low-resource languages have varying amounts of limited training data. To overcome the long-tail problem, in this paper, we propose Adapt-and-Adjust (A2), a transformer-based multi-task learning framework for end-to-end multilingual speech recognition. The A2 framework overcomes the long-tail problem via three techniques: (1) exploiting a pretrained multilingual language model (mBERT) to improve the performance of low-resource languages; (2) proposing dual adapters consisting of both language-specific and language-agnostic adaptation with minimal additional parameters; and (3) overcoming the class imbalance, either by imposing class priors in the loss during training or adjusting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
