Gaussian Multi-head Attention for Simultaneous Machine Translation

Shaolei Zhang; Yang Feng

arXiv:2203.09072·cs.CL·March 18, 2022

Gaussian Multi-head Attention for Simultaneous Machine Translation

Shaolei Zhang, Yang Feng

PDF

1 Repo

TL;DR

This paper introduces Gaussian Multi-head Attention (GMA), a novel approach for simultaneous machine translation that explicitly models alignment to improve translation quality and latency trade-offs.

Contribution

The paper proposes GMA, which unifies alignment modeling and translation in SiMT using Gaussian distributions to enhance policy control.

Findings

01

Outperforms strong baselines on En-Vi and De-En tasks.

02

Improves the trade-off between translation quality and latency.

03

Explicit alignment modeling benefits SiMT performance.

Abstract

Simultaneous machine translation (SiMT) outputs translation while receiving the streaming source inputs, and hence needs a policy to determine where to start translating. The alignment between target and source words often implies the most informative source word for each target word, and hence provides the unified control over translation quality and latency, but unfortunately the existing SiMT methods do not explicitly model the alignment to perform the control. In this paper, we propose Gaussian Multi-head Attention (GMA) to develop a new SiMT policy by modeling alignment and translation in a unified manner. For SiMT policy, GMA models the aligned source position of each target word, and accordingly waits until its aligned position to start translating. To integrate the learning of alignment into the translation model, a Gaussian distribution centered on predicted aligned position is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ictnlp/gma
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Linear Layer