Multimodal In-context Learning for ASR of Low-resource Languages

Zhaolin Li; Jan Niehues

arXiv:2601.05707·cs.CL·April 21, 2026

Multimodal In-context Learning for ASR of Low-resource Languages

Zhaolin Li, Jan Niehues

PDF

1 Repo

TL;DR

This paper explores multimodal in-context learning with speech LLMs to improve ASR for low-resource and unseen languages, demonstrating effective cross-lingual transfer and interpretability of MICL mechanisms.

Contribution

It introduces the use of multimodal in-context learning with speech LLMs for unseen languages, showing improvements over traditional prompt-based ASR and analyzing underlying attention patterns.

Findings

01

MICL is effective for unseen languages using speech and text modalities.

02

Cross-lingual transfer enhances MICL efficiency without target-language training.

03

MICL improves ASR performance and outperforms corpus-trained models in low-resource settings.

Abstract

Automatic speech recognition (ASR) still covers only a small fraction of the world's languages, mainly due to supervised data scarcity. In-context learning (ICL) with large language models (LLMs) addresses this problem, but prior work largely focuses on high-resource languages covered during training and text-only settings. This paper investigates whether speech LLMs can learn unseen languages with multimodal ICL (MICL), and how this learning can be used to improve ASR. We conduct experiments with two speech LLMs, Phi-4 and Qwen3-Omni, on three diverse endangered languages. Firstly, we find that MICL is effective for unseen languages, leveraging both speech and text modalities. We further show that cross-lingual transfer learning improves MICL efficiency on target languages without training on them. Moreover, we analyze attention patterns to interpret MICL mechanisms, and we observe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.