Task Arithmetic for Language Expansion in Speech Translation

Yao-Fei Cheng; Hayato Futami; Yosuke Kashiwagi; Emiru Tsunoo; Wen Shen Teo; Siddhant Arora; Shinji Watanabe

arXiv:2409.11274·cs.CL·July 30, 2025

Task Arithmetic for Language Expansion in Speech Translation

Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe

PDF

Open Access

TL;DR

This paper introduces a novel task arithmetic method with language control to expand speech translation systems to new languages without re-training, achieving significant BLEU and COMET improvements.

Contribution

It proposes a new task arithmetic approach with language control for language expansion in speech translation, avoiding re-training and enabling extension to low-resource language pairs.

Findings

01

BLEU score improvements up to 4.92 points

02

COMET gains of up to 11.83 points

03

Effective extension to language pairs without paired ST data

Abstract

Recent progress in large language models (LLMs) has gained interest in speech-text multimodal foundation models, achieving strong performance on instruction-tuned speech translation (ST). However, expanding language pairs is costly due to re-training on combined new and previous datasets. To address this, we aim to build a one-to-many ST system from existing one-to-one ST systems using task arithmetic without re-training. Direct application of task arithmetic in ST leads to language confusion; therefore, we introduce an augmented task arithmetic method incorporating a language control model to ensure correct target language generation. Our experiments on MuST-C and CoVoST-2 show BLEU score improvements of up to 4.66 and 4.92, with COMET gains of 8.87 and 11.83. In addition, we demonstrate our framework can extend to language pairs lacking paired ST training data or pre-trained ST models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems