Zero-resource Speech Translation and Recognition with LLMs
Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady, Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric, Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff

TL;DR
This paper introduces a method leveraging multilingual LLMs and speech encoders to perform zero-resource speech translation and recognition in unseen languages, achieving promising BLEU and WER scores.
Contribution
It presents a novel approach combining pre-trained speech encoders and LLMs with a lightweight adaptation module for zero-resource speech tasks.
Findings
Achieved BLEU scores over 23 on CoVoST2 for unseen languages.
Attained WERs up to 28.2% in zero-resource ASR.
Performance limited by LLM's ability to generate text in target language.
Abstract
Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a multilingual LLM, and a lightweight adaptation module that maps the audio representations to the token embedding space of the LLM. We perform several experiments both in ST and ASR to understand how to best train the model and what data has the most impact on performance in previously unseen languages. In ST, our best model is capable to achieve BLEU scores over 23 in CoVoST2 for two previously unseen languages, while in ASR, we achieve WERs of up to 28.2\%. We finally show that the performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
