Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach
Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li

TL;DR
This paper introduces a revision-controllable decoding method for simultaneous speech translation that significantly reduces flickering in partial results, enhancing stability without sacrificing translation quality.
Contribution
It proposes a novel revision window mechanism within beam search to control and eliminate flickering in real-time speech translation.
Findings
Substantial flickering reduction demonstrated in experiments
Translation quality remains largely unaffected
Method provides complete flickering elimination capability
Abstract
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in the flickering of partial results. In this paper, we propose a novel revision-controllable method designed to address this issue. Our method introduces an allowed revision window within the beam search pruning process to screen out candidate translations likely to cause extensive revisions, leading to a substantial reduction in flickering and, crucially, providing the capability to completely eliminate flickering. The experiments demonstrate the proposed method can significantly improve the decoding stability without compromising substantially on the translation quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
MethodsPruning
