TL;DR
This paper introduces Gaussian Multi-head Attention (GMA), a novel approach for simultaneous machine translation that explicitly models alignment to improve translation quality and latency trade-offs.
Contribution
The paper proposes GMA, which unifies alignment modeling and translation in SiMT using Gaussian distributions to enhance policy control.
Findings
Outperforms strong baselines on En-Vi and De-En tasks.
Improves the trade-off between translation quality and latency.
Explicit alignment modeling benefits SiMT performance.
Abstract
Simultaneous machine translation (SiMT) outputs translation while receiving the streaming source inputs, and hence needs a policy to determine where to start translating. The alignment between target and source words often implies the most informative source word for each target word, and hence provides the unified control over translation quality and latency, but unfortunately the existing SiMT methods do not explicitly model the alignment to perform the control. In this paper, we propose Gaussian Multi-head Attention (GMA) to develop a new SiMT policy by modeling alignment and translation in a unified manner. For SiMT policy, GMA models the aligned source position of each target word, and accordingly waits until its aligned position to start translating. To integrate the learning of alignment into the translation model, a Gaussian distribution centered on predicted aligned position is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Linear Layer
