Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries
Yuchen Zhang, Ravi Shekhar, Haralambos Mouratidis

TL;DR
This paper introduces a novel language family-based connector-sharing strategy for LLM-based ASR systems, reducing parameters and improving cross-domain generalization across multiple languages.
Contribution
It proposes a new connector-sharing approach based on linguistic families, enhancing efficiency and scalability in multilingual speech recognition systems.
Findings
Family-based connectors reduce parameter count.
Improved generalization across domains.
Effective across multiple languages and corpora.
Abstract
Large Language Model (LLM)-powered Automatic Speech Recognition (ASR) systems achieve strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. Prior work trains a separate connector per language, overlooking linguistic relatedness. We propose an efficient and novel connector-sharing strategy based on linguistic family membership, enabling one connector per family, and empirically validate its effectiveness across two multilingual LLMs and two real-world corpora spanning curated and crowd-sourced speech. Our results show that family-based connectors reduce parameter count while improving generalization across domains, offering a practical and scalable strategy for multilingual ASR deployment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
