Residual Tree Aggregation of Layers for Neural Machine Translation
GuoLiang Li, Yiyang Li

TL;DR
This paper introduces RTAL, a residual tree aggregation method for Transformer models in neural machine translation, which fuses multi-layer information to improve translation quality.
Contribution
It proposes a novel residual tree aggregation approach that effectively combines information across layers in Transformer models for NMT.
Findings
RTAL outperforms baseline models on WMT14 English-German translation.
RTAL achieves significant improvements on WMT17 English-French translation.
The method effectively utilizes multi-layer information to enhance translation accuracy.
Abstract
Although attention-based Neural Machine Translation has achieved remarkable progress in recent layers, it still suffers from issue of making insufficient use of the output of each layer. In transformer, it only uses the top layer of encoder and decoder in the subsequent process, which makes it impossible to take advantage of the useful information in other layers. To address this issue, we propose a residual tree aggregation of layers for Transformer(RTAL), which helps to fuse information across layers. Specifically, we try to fuse the information across layers by constructing a post-order binary tree. In additional to the last node, we add the residual connection to the process of generating child nodes. Our model is based on the Neural Machine Translation model Transformer and we conduct our experiments on WMT14 English-to-German and WMT17 English-to-France translation tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Residual Connection · Softmax · Dropout · Adam
