Improving Stability in Simultaneous Speech Translation: A   Revision-Controllable Decoding Approach

Junkun Chen; Jian Xue; Peidong Wang; Jing Pan; Jinyu Li

arXiv:2310.04399·cs.CL·October 9, 2023

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach

Junkun Chen, Jian Xue, Peidong Wang, Jing Pan, Jinyu Li

PDF

Open Access

TL;DR

This paper introduces a revision-controllable decoding method for simultaneous speech translation that significantly reduces flickering in partial results, enhancing stability without sacrificing translation quality.

Contribution

It proposes a novel revision window mechanism within beam search to control and eliminate flickering in real-time speech translation.

Findings

01

Substantial flickering reduction demonstrated in experiments

02

Translation quality remains largely unaffected

03

Method provides complete flickering elimination capability

Abstract

Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in the flickering of partial results. In this paper, we propose a novel revision-controllable method designed to address this issue. Our method introduces an allowed revision window within the beam search pruning process to screen out candidate translations likely to cause extensive revisions, leading to a substantial reduction in flickering and, crucially, providing the capability to completely eliminate flickering. The experiments demonstrate the proposed method can significantly improve the decoding stability without compromising substantially on the translation quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques

MethodsPruning