A Survey of Transformers

Tianyang Lin; Yuxin Wang; Xiangyang Liu; Xipeng Qiu

arXiv:2106.04554·cs.LG·June 16, 2021

A Survey of Transformers

Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu

PDF

2 Repos

TL;DR

This survey provides a comprehensive overview of Transformer variants, categorizing them by architecture, pre-training, and applications, highlighting their developments and future research directions.

Contribution

It offers the first systematic taxonomy and review of Transformer variants, covering architectural changes, pre-training methods, and diverse applications.

Findings

01

Extensive classification of Transformer variants.

02

Identification of key architectural modifications.

03

Outline of promising future research directions.

Abstract

Transformers have achieved great success in many artificial intelligence fields, such as natural language processing, computer vision, and audio processing. Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a. X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing. In this survey, we provide a comprehensive review of various X-formers. We first briefly introduce the vanilla Transformer and then propose a new taxonomy of X-formers. Next, we introduce the various X-formers from three perspectives: architectural modification, pre-training, and applications. Finally, we outline some potential directions for future research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Residual Connection · Dense Connections · Softmax · Dropout