LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families

Jianan Chen; Xiaoxue Gao; Tatsuya Kawahara; Nancy F. Chen

arXiv:2603.20042·cs.CL·March 23, 2026

LoASR-Bench: Evaluating Large Speech Language Models on Low-Resource Automatic Speech Recognition Across Language Families

Jianan Chen, Xiaoxue Gao, Tatsuya Kawahara, Nancy F. Chen

PDF

Open Access

TL;DR

This paper introduces LoASR-Bench, a new benchmark for evaluating large speech language models on low-resource languages across diverse language families, revealing their current limitations in real-world multilingual ASR applications.

Contribution

The paper presents LoASR-Bench, a comprehensive benchmark with 25 languages from 9 families, to evaluate SpeechLMs on low-resource languages and assess their cross-linguistic performance.

Findings

01

SpeechLMs perform poorly on low-resource languages

02

Benchmark covers diverse scripts and language families

03

Highlights need for improved multilingual ASR models

Abstract

Large language models (LLMs) have driven substantial advances in speech language models (SpeechLMs), yielding strong performance in automatic speech recognition (ASR) under high-resource conditions. However, existing benchmarks predominantly focus on high-resource languages, leaving the ASR behavior of SpeechLMs in low-resource languages insufficiently understood. This gap is critical, as practical ASR systems must reliably support low-resource languages and generalize across diverse language families, and it directly hinders the deployment of SpeechLM-based ASR in real-world multilingual scenarios. As a result, it is essential to evaluate SpeechLMs on low-resource languages to ensure their generalizability across different language families. To address this problem, we propose \textbf{LoASR-Bench}, a comprehensive benchmark designed to evaluate \textbf{lo}w-resource \textbf{a}utomatic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · ICT in Developing Communities · Face recognition and analysis