An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation
Pengzhi Gao, Ruiqing Zhang, Zhongjun He, Hua Wu, Haifeng Wang

TL;DR
This paper investigates the effectiveness of consistency regularization techniques in end-to-end speech-to-text translation, proposing new training strategies that improve performance in regular and zero-shot scenarios, achieving state-of-the-art results.
Contribution
It introduces two novel training strategies, SimRegCR and SimZeroCR, for leveraging intra-modal and cross-modal consistency in E2E speech-to-text translation.
Findings
Intra-modal consistency regularization is crucial for regular E2E ST.
Cross-modal consistency helps close modality gap in zero-shot scenarios.
Proposed methods achieve state-of-the-art results on MuST-C benchmark.
Abstract
Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field. Can we also boost end-to-end (E2E) speech-to-text translation (ST) by leveraging consistency regularization? In this paper, we conduct empirical studies on intra-modal and cross-modal consistency and propose two training strategies, SimRegCR and SimZeroCR, for E2E ST in regular and zero-shot scenarios. Experiments on the MuST-C benchmark show that our approaches achieve state-of-the-art (SOTA) performance in most translation directions. The analyses prove that regularization brought by the intra-modal consistency, instead of modality gap, is crucial for the regular E2E ST, and the cross-modal consistency could close the modality gap and boost the zero-shot E2E ST performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
