R-Transformer: Recurrent Neural Network Enhanced Transformer

Zhiwei Wang; Yao Ma; Zitao Liu; Jiliang Tang

arXiv:1907.05572·cs.LG·July 15, 2019·89 cites

R-Transformer: Recurrent Neural Network Enhanced Transformer

Zhiwei Wang, Yao Ma, Zitao Liu, Jiliang Tang

PDF

Open Access 2 Repos

TL;DR

The R-Transformer combines RNN and Transformer features to effectively model local and long-term sequence dependencies without position embeddings, outperforming current methods across various tasks.

Contribution

It introduces a novel R-Transformer model that integrates RNN and multi-head attention to capture both local and global sequence structures without position embeddings.

Findings

01

Outperforms state-of-the-art methods in multiple sequence tasks.

02

Effectively models local structures and long-term dependencies.

03

Eliminates the need for position embeddings.

Abstract

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation procedure. Therefore, many non-recurrent sequence models that are built on convolution and attention operations have been proposed recently. Notably, models with multi-head attention such as Transformer have demonstrated extreme effectiveness in capturing long-term dependencies in a variety of sequence modeling tasks. Despite their success, however, these models lack necessary components to model local structures in sequences and heavily rely on position embeddings that have limited effects and require a considerable amount of design efforts. In this paper, we propose the R-Transformer which enjoys the advantages of both RNNs and the multi-head attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Time Series Analysis and Forecasting · Anomaly Detection Techniques and Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax