Cross-Lingual Transfer Learning for Speech Translation

Rao Ma; Mengjie Qian; Yassir Fathullah; Siyuan Tang; Mark Gales; Kate; Knill

arXiv:2407.01130·cs.CL·February 12, 2025·1 cites

Cross-Lingual Transfer Learning for Speech Translation

Rao Ma, Mengjie Qian, Yassir Fathullah, Siyuan Tang, Mark Gales, Kate, Knill

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that a multilingual speech foundation model can be fine-tuned with limited data to achieve zero-shot cross-lingual speech translation, leveraging shared semantic spaces across languages.

Contribution

It introduces a method to expand speech translation capabilities of foundation models using shared semantic representations and minimal fine-tuning data.

Findings

01

Shared semantic space enables zero-shot translation across languages.

02

Fine-tuning with English-Chinese data improves translation for multiple languages.

03

Model can perform translation and transcription for unseen related languages.

Abstract

There has been increasing interest in building multilingual foundation models for NLP and speech research. This paper examines how to expand the speech translation capability of these models with restricted data. Whisper, a speech foundation model with strong performance on speech recognition and English translation, is used as the example model. Using speech-to-speech retrieval to analyse the audio representations generated by the encoder, we show that utterances from different languages are mapped to a shared semantic space. This shared embedding space can then be leveraged for zero-shot cross-lingual transfer in speech translation. By fine-tuning the Whisper decoder with only English-to-Chinese speech translation data, improved performance for translation to Chinese can be obtained for multiple languages, in addition to English. Furthermore, for languages related to those seen in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cross-Lingual Transfer Learning for Speech Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis