Monotonic Multihead Attention

Xutai Ma; Juan Pino; James Cross; Liezl Puzon; Jiatao Gu

arXiv:1909.12406·cs.CL·September 30, 2019·68 cites

Monotonic Multihead Attention

Xutai Ma, Juan Pino, James Cross, Liezl Puzon, Jiatao Gu

PDF

Open Access 3 Repos

TL;DR

This paper introduces Monotonic Multihead Attention (MMA), a novel attention mechanism for simultaneous machine translation that improves latency-quality tradeoffs by extending monotonic attention to multiple heads with interpretable latency controls.

Contribution

The paper proposes MMA, a new multihead attention mechanism with latency control methods, advancing the state-of-the-art in simultaneous translation models.

Findings

01

MMA outperforms previous methods like MILk in latency-quality tradeoffs.

02

Latency controls influence attention span and translation quality.

03

Analysis of decoder layers and heads shows their impact on performance.

Abstract

Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention. We also introduce two novel and interpretable approaches for latency control that are specifically designed for multiple attentions heads. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. We also analyze how the latency controls affect the attention span and we motivate the introduction of our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax