Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning

Yexing Du; Youcheng Pan; Ziyang Ma; Bo Yang; Yifan Yang; Keqi Deng; Xie Chen; Yang Xiang; Ming Liu; Bing Qin

arXiv:2409.19510·cs.CL·June 17, 2025

Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning

Yexing Du, Youcheng Pan, Ziyang Ma, Bo Yang, Yifan Yang, Keqi Deng, Xie Chen, Yang Xiang, Ming Liu, Bing Qin

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a three-stage curriculum learning approach to improve many-to-many speech-to-text translation with multimodal large language models, especially in low-resource language settings, achieving state-of-the-art results.

Contribution

The paper presents a novel curriculum learning strategy that adapts large language models for effective many-to-many speech translation in low-resource scenarios.

Findings

01

Achieves state-of-the-art performance on 15×14 language pairs.

02

Requires less than 10 hours of speech data per language.

03

Effective across models with 3B, 7B, and 32B parameters.

Abstract

Multimodal Large Language Models (MLLMs) have achieved significant success in Speech-to-Text Translation (S2TT) tasks. While most existing research has focused on English-centric translation directions, the exploration of many-to-many translation is still limited by the scarcity of parallel data. To address this, we propose a three-stage curriculum learning strategy that leverages the machine translation capabilities of large language models and adapts them to S2TT tasks, enabling effective learning in low-resource settings. We trained MLLMs with varying parameter sizes (3B, 7B, and 32B) and evaluated the proposed strategy using the FLEURS and CoVoST-2 datasets. Experimental results show that the proposed strategy achieves state-of-the-art average performance in $15 \times 14$ language pairs, requiring fewer than 10 hours of speech data per language to achieve competitive results. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

X-LANCE/SLAM-LLM
pytorchOfficial

Models

🤗
yxdu/llm-srt
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices · Speech and dialogue systems