LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Sangmin Lee, Woo-Jin Chung, Hong-Goo Kang

TL;DR
LAMA-UT introduces a language-agnostic multilingual ASR framework that unifies orthographies and uses language-specific transliteration, achieving high accuracy with minimal data and no language-specific modules.
Contribution
The paper presents a novel orthography unification and transliteration pipeline for multilingual ASR that operates without language-specific components, matching state-of-the-art performance with minimal data.
Findings
Achieves 45% relative error reduction over Whisper
Performs comparably to MMS with only 0.1% of Whisper's data
Operates effectively on unseen languages without language-specific modules
Abstract
Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while matching the performance of state-of-the-art models trained on a minimal amount of data. Our pipeline consists of two key steps. First, we utilize a universal transcription generator to unify orthographic features into Romanized form and capture common phonetic characteristics across diverse languages. Second, we utilize a universal converter to transform these universal transcriptions into language-specific ones. In experiments, we demonstrate the effectiveness of our proposed method leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis
