Chain-of-Thought Prompting for Speech Translation

Ke Hu; Zhehuai Chen; Chao-Han Huck Yang; Piotr \.Zelasko; Oleksii; Hrinchuk; Vitaly Lavrukhin; Jagadeesh Balam; Boris Ginsburg

arXiv:2409.11538·cs.CL·March 28, 2025

Chain-of-Thought Prompting for Speech Translation

Ke Hu, Zhehuai Chen, Chao-Han Huck Yang, Piotr \.Zelasko, Oleksii, Hrinchuk, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

PDF

Open Access

TL;DR

This paper introduces a chain-of-thought prompting method for speech translation that leverages ASR transcripts to improve translation accuracy in Speech-LLMs, demonstrating significant BLEU score improvements over previous prompting techniques.

Contribution

The work presents a novel chain-of-thought prompting approach for speech translation using Speech-LLMs, combining ASR transcripts with speech encoding to enhance performance.

Findings

01

Achieved an average increase of 2.4 BLEU points across 6 translation tasks.

02

Outperformed a related CoT prediction method by 2 BLEU points on average.

03

Demonstrated the effectiveness of LoRA for model adaptation in speech translation.

Abstract

Large language models (LLMs) have demonstrated remarkable advancements in language understanding and generation. Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in Speech-LLM models that exhibit strong performance in automatic speech recognition (ASR) and automatic speech translation (AST). In this work, we propose a novel approach to leverage ASR transcripts as prompts for AST in a Speech-LLM built on an encoder-decoder text LLM. The Speech-LLM model consists of a speech encoder and an encoder-decoder structure Megatron-T5. By first decoding speech to generate ASR transcripts and subsequently using these transcripts along with encoded speech for prompting, we guide the speech translation in a two-step process like chain-of-thought (CoT) prompting. Low-rank adaptation (LoRA) is used for the T5 LLM for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Gated Linear Unit · SentencePiece · Softmax · Layer Normalization · Chain-of-thought prompting · Adafactor · Inverse Square Root Schedule