Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation
Yukun Feng, Feng Li, Ziang Song, Boyuan Zheng, Philipp Koehn

TL;DR
This paper introduces a recurrent memory mechanism into the Transformer architecture to enhance document-level machine translation by maintaining context across sentences, leading to improved translation coherence and performance.
Contribution
The paper proposes a novel recurrent memory unit for Transformers, enabling better context integration for document-level translation without increasing computational complexity significantly.
Findings
Achieved an average of 0.91 s-BLEU improvement over sentence-level baseline.
Set new state-of-the-art results on TED and News datasets.
Demonstrated effective context modeling with recurrent memory in Transformer.
Abstract
The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side or have an increasing computational complexity as documents get longer. To address such problems, we introduce a recurrent memory unit to the vanilla Transformer, which supports the information exchange between the sentence and previous context. The memory unit is recurrently updated by acquiring information from sentences, and passing the aggregated knowledge back to subsequent sentence states. We follow a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Multi-Head Attention · Layer Normalization · Residual Connection · Softmax · Label Smoothing · Adam · Position-Wise Feed-Forward Layer
