When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text   Translation

Anna Min; Chenxu Hu; Yi Ren; Hang Zhao

arXiv:2502.00377·cs.CL·February 4, 2025

When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation

Anna Min, Chenxu Hu, Yi Ren, Hang Zhao

PDF

Open Access

TL;DR

This paper revisits cascaded speech-to-text translation, demonstrating that incorporating multiple ASR candidates and self-supervised speech features can reduce error propagation and improve translation accuracy.

Contribution

It introduces a novel approach that leverages multiple ASR outputs and self-supervised features to mitigate error propagation in cascaded speech translation.

Findings

01

Including multiple ASR candidates improves translation quality.

02

Self-supervised speech features help reduce divergence between samples.

03

The approach effectively minimizes cascading errors.

Abstract

Though end-to-end speech-to-text translation has been a great success, we argue that the cascaded speech-to-text translation model still has its place, which is usually criticized for the error propagation between automatic speech recognition (ASR) and machine translation (MT) models. In this paper, we explore the benefits of incorporating multiple candidates from ASR and self-supervised speech features into MT. Our analysis reveals that the primary cause of cascading errors stems from the increased divergence between similar samples in the speech domain when mapped to the text domain. By including multiple candidates and self-supervised speech features, our approach allows the machine translation model to choose the right words and ensure precise translation using various speech samples. This strategy minimizes error spread and takes advantage of large ASR and MT datasets, along with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques