Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings
Junkun Chen, Renjie Zheng, Atsuhito Kita, Mingbo Ma, Liang Huang

TL;DR
This paper introduces a method to improve simultaneous translation by rewriting full-sentence data into a style suitable for real-time translation, reducing reorderings and enhancing performance.
Contribution
The authors propose a novel pseudo-reference generation technique that adapts full-sentence corpora for simultaneous translation training, addressing data scarcity and reordering issues.
Findings
Up to +2.7 BLEU improvement on Zh->En and Ja->En tasks.
Effective reduction of unnecessary long-distance reorderings.
Enhanced translation quality with pseudo-references.
Abstract
Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large-scale, high-quality simultaneous translation datasets, most such systems are still trained on conventional full-sentence bitexts. This is far from ideal for the simultaneous scenario due to the abundance of unnecessary long-distance reorderings in those bitexts. We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation. Experiments on Zh->En and Ja->En simultaneous translation show substantial improvements (up to +2.7 BLEU) with the addition of these generated pseudo-references.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
