Improving Simultaneous Translation by Incorporating Pseudo-References   with Fewer Reorderings

Junkun Chen; Renjie Zheng; Atsuhito Kita; Mingbo Ma; Liang Huang

arXiv:2010.11247·cs.CL·September 24, 2021

Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings

Junkun Chen, Renjie Zheng, Atsuhito Kita, Mingbo Ma, Liang Huang

PDF

Open Access

TL;DR

This paper introduces a method to improve simultaneous translation by rewriting full-sentence data into a style suitable for real-time translation, reducing reorderings and enhancing performance.

Contribution

The authors propose a novel pseudo-reference generation technique that adapts full-sentence corpora for simultaneous translation training, addressing data scarcity and reordering issues.

Findings

01

Up to +2.7 BLEU improvement on Zh->En and Ja->En tasks.

02

Effective reduction of unnecessary long-distance reorderings.

03

Enhanced translation quality with pseudo-references.

Abstract

Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large-scale, high-quality simultaneous translation datasets, most such systems are still trained on conventional full-sentence bitexts. This is far from ideal for the simultaneous scenario due to the abundance of unnecessary long-distance reorderings in those bitexts. We propose a novel method that rewrites the target side of existing full-sentence corpora into simultaneous-style translation. Experiments on Zh->En and Ja->En simultaneous translation show substantial improvements (up to +2.7 BLEU) with the addition of these generated pseudo-references.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification