Representation Purification for End-to-End Speech Translation
Chengwei Zhang, Yue Zhou, Rui Zhao, Yidong Chen, Xiaodong Shi

TL;DR
This paper introduces SRPSE, a framework that purifies speech representations by removing content-agnostic factors, significantly enhancing end-to-end speech translation performance, especially in transcript-free scenarios.
Contribution
The paper proposes a novel speech representation purification method that improves translation accuracy by excluding irrelevant speech factors, addressing a key limitation in current end-to-end speech translation models.
Findings
SRPSE improves translation performance across multiple datasets.
Content-agnostic factors negatively impact translation quality.
Significant gains are achieved in transcript-free translation settings.
Abstract
Speech-to-text translation (ST) is a cross-modal task that involves converting spoken language into text in a different language. Previous research primarily focused on enhancing speech translation by facilitating knowledge transfer from machine translation, exploring various methods to bridge the gap between speech and text modalities. Despite substantial progress made, factors in speech that are not relevant to translation content, such as timbre and rhythm, often limit the efficiency of knowledge transfer. In this paper, we conceptualize speech representation as a combination of content-agnostic and content-relevant factors. We examine the impact of content-agnostic factors on translation performance through preliminary experiments and observe a significant performance deterioration when content-agnostic perturbations are introduced to speech signals. To address this issue, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
