LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration

Sangmin Lee; Woo-Jin Chung; Hong-Goo Kang

arXiv:2412.15299·cs.CL·October 24, 2025

LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration

Sangmin Lee, Woo-Jin Chung, Hong-Goo Kang

PDF

Open Access 1 Video

TL;DR

LAMA-UT introduces a language-agnostic multilingual ASR framework that unifies orthographies and uses language-specific transliteration, achieving high accuracy with minimal data and no language-specific modules.

Contribution

The paper presents a novel orthography unification and transliteration pipeline for multilingual ASR that operates without language-specific components, matching state-of-the-art performance with minimal data.

Findings

01

Achieves 45% relative error reduction over Whisper

02

Performs comparably to MMS with only 0.1% of Whisper's data

03

Operates effectively on unseen languages without language-specific modules

Abstract

Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while matching the performance of state-of-the-art models trained on a minimal amount of data. Our pipeline consists of two key steps. First, we utilize a universal transcription generator to unify orthographic features into Romanized form and capture common phonetic characteristics across diverse languages. Second, we utilize a universal converter to transform these universal transcriptions into language-specific ones. In experiments, we demonstrate the effectiveness of our proposed method leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LAMA-UT: Language Agnostic Multilingual ASR Through Orthography Unification and Language-Specific Transliteration· underline

Taxonomy

TopicsSpeech Recognition and Synthesis