Modeling Recurrence for Transformer
Jie Hao, Xing Wang, Baosong Yang, Longyue Wang, Jinfeng, Zhang, Zhaopeng Tu

TL;DR
This paper introduces a recurrence encoder for Transformer models, combining attention and recurrence to enhance translation capacity, demonstrated through improved results on standard translation benchmarks.
Contribution
It proposes a novel recurrence modeling approach with an attentive recurrent network, enhancing Transformer performance in machine translation tasks.
Findings
The recurrence encoder improves translation quality.
The attentive recurrent network leverages strengths of attention and recurrence.
A shortcut bridging source and target sequences outperforms deep models.
Abstract
Recently, the Transformer model that is based solely on attention mechanisms, has advanced the state-of-the-art on various machine translation tasks. However, recent studies reveal that the lack of recurrence hinders its further improvement of translation capacity. In response to this problem, we propose to directly model recurrence for Transformer with an additional recurrence encoder. In addition to the standard recurrent neural network, we introduce a novel attentive recurrent network to leverage the strengths of both attention and recurrent networks. Experimental results on the widely-used WMT14 English-German and WMT17 Chinese-English translation tasks demonstrate the effectiveness of the proposed approach. Our studies also reveal that the proposed model benefits from a short-cut that bridges the source and target sequences with a single recurrent layer, which outperforms its deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
