Transformer models: an introduction and catalog
Xavier Amatriain, Ananth Sankar, Jie Bing, Praveen Kumar Bodigutla,, Timothy J. Hazen, and Michaeel Kazi

TL;DR
This paper provides a comprehensive catalog and classification of popular Transformer models, explaining their key features and innovations, including models trained with self-supervised learning and human-in-the-loop methods.
Contribution
It offers a simple, organized catalog of Transformer models, highlighting their main characteristics and recent advancements in the field.
Findings
Catalog includes models like BERT, GPT-3, and InstructGPT.
Highlights key innovations in Transformer architectures.
Provides a classification framework for Transformer models.
Abstract
In the past few years we have seen the meteoric appearance of dozens of foundation models of the Transformer family, all of which have memorable and sometimes funny, but not self-explanatory, names. The goal of this paper is to offer a somewhat comprehensive but simple catalog and classification of the most popular Transformer models. The paper also includes an introduction to the most important aspects and innovations in Transformer models. Our catalog will include models that are trained using self-supervised learning (e.g., BERT or GPT3) as well as those that are further trained using a human-in-the-loop (e.g. the InstructGPT model used by ChatGPT).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Seismic Imaging and Inversion Techniques · Computational Physics and Python Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Attention Dropout · Weight Decay · BERT · Layer Normalization · Linear Layer
