Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet
M. Onat Topal, Anil Bas, Imke van Heerden

TL;DR
This paper reviews the impact of Transformer models like GPT, BERT, and XLNet on Natural Language Generation, highlighting their advantages over traditional RNNs and LSTMs in generating diverse and high-quality text.
Contribution
It provides an overview of three major Transformer-based models and discusses their implications and advancements in the field of Natural Language Generation.
Findings
Transformers overcome vanishing gradient issues in RNNs and LSTMs.
Transformer models achieve groundbreaking results in text generation tasks.
Rapid developments in attention mechanisms enhance NLG capabilities.
Abstract
Recent years have seen a proliferation of attention mechanisms and the rise of Transformers in Natural Language Generation (NLG). Previously, state-of-the-art NLG architectures such as RNN and LSTM ran into vanishing gradient problems; as sentences grew larger, distance between positions remained linear, and sequential computation hindered parallelization since sentences were processed word by word. Transformers usher in a new era. In this paper, we explore three major Transformer-based models, namely GPT, BERT, and XLNet, that carry significant implications for the field. NLG is a burgeoning area that is now bolstered with rapid developments in attention mechanisms. From poetry generation to summarization, text generation derives benefit as Transformer-based language models achieve groundbreaking results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · SentencePiece · Cosine Annealing · Byte Pair Encoding · Weight Decay · Multi-Head Attention · Discriminative Fine-Tuning · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Layer Normalization
