Investigating Decoder-only Large Language Models for Speech-to-text   Translation

Chao-Wei Huang; Hui Lu; Hongyu Gong; Hirofumi Inaguma; Ilia Kulikov,; Ruslan Mavlyutov; Sravya Popuri

arXiv:2407.03169·cs.CL·July 4, 2024

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov,, Ruslan Mavlyutov, Sravya Popuri

PDF

Open Access

TL;DR

This paper explores the use of decoder-only large language models for speech-to-text translation, achieving state-of-the-art results without proprietary data and analyzing various fine-tuning techniques.

Contribution

It introduces a decoder-only architecture for S2TT and evaluates parameter-efficient fine-tuning methods, advancing LLM application in speech translation.

Findings

01

Achieves state-of-the-art performance on CoVoST 2 and FLEURS datasets.

02

Demonstrates effectiveness of parameter-efficient fine-tuning techniques.

03

Provides insights into model design choices for speech-to-text translation.

Abstract

Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded speech representation and generate the text translation. Additionally, we investigate the effects of different parameter-efficient fine-tuning techniques and task formulation. Our model achieves state-of-the-art performance on CoVoST 2 and FLEURS among models trained without proprietary data. We also conduct analyses to validate the design choices of our proposed model and bring insights to the integration of LLMs to S2TT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus