Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Quy-Anh Dang; Chris Ngo

arXiv:2603.16184·cs.CL·March 18, 2026

Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR

Quy-Anh Dang, Chris Ngo

PDF

Open Access 2 Models

TL;DR

Polyglot-Lion is a compact, multilingual ASR model tailored for Singapore's four main languages, achieving competitive accuracy with significantly lower training costs and faster inference by balanced fine-tuning of moderate-sized pretrained models.

Contribution

The paper introduces a balanced fine-tuning approach for multilingual ASR that enables high performance with reduced computational resources and no explicit language tagging.

Findings

01

Achieves 14.85% average error rate on 12 benchmarks.

02

Training cost is significantly lower at $81 compared to $18,862 for baseline.

03

Inference is approximately 20 times faster than larger models.

Abstract

We present Polyglot-Lion, a family of compact multilingual automatic speech recognition (ASR) models tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Our models are obtained by fine-tuning Qwen3-ASR-0.6B and Qwen3-ASR-1.7B exclusively on publicly available speech corpora, using a balanced sampling strategy that equalizes the number of training utterances per language and deliberately omits language-tag conditioning so that the model learns to identify languages implicitly from audio. On 12 benchmarks spanning the four target languages, Polyglot-Lion-1.7B achieves an average error rate of 14.85, competitive with MERaLiON-2-10B-ASR (14.32) - a model 6x larger - while incurring a training cost of $81 on a single RTX PRO 6000 GPU compared to $18,862 for the 128-GPU baseline. Inference throughput is approximately 20x faster than MERaLiON at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques