On Biasing Transformer Attention Towards Monotonicity

Annette Rios; Chantal Amrhein; No\"emi Aepli; Rico Sennrich

arXiv:2104.03945·cs.CL·April 9, 2021

On Biasing Transformer Attention Towards Monotonicity

Annette Rios, Chantal Amrhein, No\"emi Aepli, Rico Sennrich

PDF

1 Repo

TL;DR

This paper introduces a monotonicity loss for standard attention mechanisms to promote monotonic alignment in sequence-to-sequence tasks, showing mixed results with some improvements on RNNs and selective attention heads.

Contribution

The work proposes a new monotonicity loss compatible with standard attention, tested across multiple NLP tasks, and explores its effects on transformer and RNN models.

Findings

01

Achieves largely monotonic behavior in models.

02

Larger gains observed on RNN baselines.

03

Selective biasing of attention heads yields isolated improvements.

Abstract

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining. In this work, we introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks: grapheme-to-phoneme conversion, morphological inflection, transliteration, and dialect normalization. Experiments show that we can achieve largely monotonic behavior. Performance is mixed, with larger gains on top of RNN baselines. General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZurichNLP/monotonicity_loss
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.