Parallel Attention Forcing for Machine Translation

Qingyun Dou; Mark Gales

arXiv:2211.03237·cs.CL·November 8, 2022

Parallel Attention Forcing for Machine Translation

Qingyun Dou, Mark Gales

PDF

Open Access

TL;DR

This paper introduces parallel attention forcing with scheduled control for training Transformer-based and RNN models in machine translation, addressing training mismatches and improving performance.

Contribution

It proposes two extensions of attention forcing—scheduled attention forcing and parallel attention forcing—for better training of sequence-to-sequence models with discrete outputs.

Findings

01

Improved translation quality with attention forcing methods.

02

Parallel attention forcing enables efficient training of Transformer models.

03

Scheduled attention forcing effectively manages attention guidance during training.

Abstract

Attention-based autoregressive models have achieved state-of-the-art performance in various sequence-to-sequence tasks, including Text-To-Speech (TTS) and Neural Machine Translation (NMT), but can be difficult to train. The standard training approach, teacher forcing, guides a model with the reference back-history. During inference, the generated back-history must be used. This mismatch limits the evaluation performance. Attention forcing has been introduced to address the mismatch, guiding the model with the generated back-history and reference attention. While successful in tasks with continuous outputs like TTS, attention forcing faces additional challenges in tasks with discrete outputs like NMT. This paper introduces the two extensions of attention forcing to tackle these challenges. (1) Scheduled attention forcing automatically turns attention forcing on and off, which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications