A Modular-based Strategy for Mitigating Gradient Conflicts in   Simultaneous Speech Translation

Xiaoqian Liu; Yangfan Du; Jianjin Wang; Yuan Ge; Chen Xu; Tong Xiao,; Guocheng Chen; Jingbo Zhu

arXiv:2409.15911·cs.CL·December 31, 2024

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation

Xiaoqian Liu, Yangfan Du, Jianjin Wang, Yuan Ge, Chen Xu, Tong Xiao,, Guocheng Chen, Jingbo Zhu

PDF

Open Access

TL;DR

This paper introduces MGCM, a modular gradient conflict mitigation strategy that improves simultaneous speech translation performance and reduces GPU memory usage by detecting and resolving conflicts at a fine-grained modular level.

Contribution

The paper presents a novel modular gradient conflict mitigation approach that effectively addresses optimization conflicts in multi-task SimulST, outperforming existing methods.

Findings

01

Achieves 0.68 BLEU score improvement in offline tasks.

02

Reduces GPU memory consumption by over 95%.

03

Enhances performance under medium and high latency conditions.

Abstract

Simultaneous Speech Translation (SimulST) involves generating target language text while continuously processing streaming speech input, presenting significant real-time challenges. Multi-task learning is often employed to enhance SimulST performance but introduces optimization conflicts between primary and auxiliary tasks, potentially compromising overall efficiency. The existing model-level conflict resolution methods are not well-suited for this task which exacerbates inefficiencies and leads to high GPU memory consumption. To address these challenges, we propose a Modular Gradient Conflict Mitigation (MGCM) strategy that detects conflicts at a finer-grained modular level and resolves them utilizing gradient projection. Experimental results demonstrate that MGCM significantly improves SimulST performance, particularly under medium and high latency conditions, achieving a 0.68 BLEU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques