A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR

Yuang Zheng; Dongxu Chen; Yuxiang Mei; Dongxing Xu; Jie Chen; Yanhua Long

arXiv:2601.00557·cs.CL·March 17, 2026

A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR

Yuang Zheng, Dongxu Chen, Yuxiang Mei, Dongxing Xu, Jie Chen, Yanhua Long

PDF

Open Access

TL;DR

This paper introduces a lightweight, language-agnostic hierarchical LoRA-MoE architecture for multilingual ASR that improves decoding efficiency and removes the need for prior language information, suitable for resource-constrained devices.

Contribution

It presents a novel hierarchical LoRA-MoE framework integrated into an mHuBERT-CTC model, enabling true language-agnostic decoding without explicit language labels.

Findings

01

Achieves comparable performance to two-stage inference methods.

02

Reduces real-time factor (RTF) by 11.7% and 8.2%.

03

Demonstrates effectiveness on MSR-86K and MLC-SLM datasets.

Abstract

Large-scale multilingual ASR (mASR) models such as Whisper achieve strong performance but incur high computational and latency costs, limiting their deployment on resource-constrained edge devices. In this study, we propose a lightweight and language-agnostic multilingual ASR system based on a CTC architecture with domain adaptation. Specifically, we introduce a Language-agnostic Hierarchical LoRA-MoE (HLoRA) framework integrated into an mHuBERT-CTC model, enabling end-to-end decoding via LID-posterior-driven LoRA routing. The hierarchical design consists of a multilingual shared LoRA for learning language-invariant acoustic representations and language-specific LoRA experts for modeling language-dependent characteristics. The proposed routing mechanism removes the need for prior language identity information or explicit language labels during inference, achieving true language-agnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Domain Adaptation and Few-Shot Learning