Learn To Remember: Transformer with Recurrent Memory for Document-Level   Machine Translation

Yukun Feng; Feng Li; Ziang Song; Boyuan Zheng; Philipp Koehn

arXiv:2205.01546·cs.AI·October 21, 2022

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Yukun Feng, Feng Li, Ziang Song, Boyuan Zheng, Philipp Koehn

PDF

Open Access

TL;DR

This paper introduces a recurrent memory mechanism into the Transformer architecture to enhance document-level machine translation by maintaining context across sentences, leading to improved translation coherence and performance.

Contribution

The paper proposes a novel recurrent memory unit for Transformers, enabling better context integration for document-level translation without increasing computational complexity significantly.

Findings

01

Achieved an average of 0.91 s-BLEU improvement over sentence-level baseline.

02

Set new state-of-the-art results on TED and News datasets.

03

Demonstrated effective context modeling with recurrent memory in Transformer.

Abstract

The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side or have an increasing computational complexity as documents get longer. To address such problems, we introduce a recurrent memory unit to the vanilla Transformer, which supports the information exchange between the sentence and previous context. The memory unit is recurrently updated by acquiring information from sentences, and passing the aggregated knowledge back to subsequent sentence states. We follow a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Multi-Head Attention · Layer Normalization · Residual Connection · Softmax · Label Smoothing · Adam · Position-Wise Feed-Forward Layer