Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages
Yang Xiao, Eun-Jung Holden, Ting Dang

TL;DR
This paper introduces DAMA, a depth-aware adaptation framework for multilingual speech recognition that allocates adaptation capacity based on layer roles, significantly improving efficiency and accuracy in low-resource languages.
Contribution
The paper reveals a U-shaped adaptability pattern in multilingual ASR models and proposes DAMA, a novel method that allocates adaptation based on layer importance, enhancing efficiency and performance.
Findings
DAMA matches or surpasses state-of-the-art accuracy with 80% fewer trainable parameters.
DAMA achieves a 29% error reduction in low-resource scenarios.
DAMA improves memory, training time, and computational efficiency.
Abstract
Recent speech foundation models excel at multilingual automatic speech recognition (ASR) for high-resource languages, but adapting them to low-resource languages remains challenging due to data scarcity and efficiency constraints. Full-model fine-tuning is computationally expensive and prone to overfitting, while parameter-efficient methods like LoRA apply adaptation uniformly across layers, overlooking internal representations thus compromising effectiveness and efficiency. We analyze multilingual ASR models and reveal a U-shaped adaptability pattern: early and late layers are language-specific and require more adaptation, while intermediate layers retain shared semantics and need less. Building on this observation, we propose DAMA, a Depth-Aware Model Adaptation framework that allocates adaptation capacity according to each layer's role. DAMA also introduces Singular Value…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques
