Exploring Transformers in Natural Language Generation: GPT, BERT, and   XLNet

M. Onat Topal; Anil Bas; Imke van Heerden

arXiv:2102.08036·cs.CL·February 17, 2021·20 cites

Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet

M. Onat Topal, Anil Bas, Imke van Heerden

PDF

Open Access 1 Repo

TL;DR

This paper reviews the impact of Transformer models like GPT, BERT, and XLNet on Natural Language Generation, highlighting their advantages over traditional RNNs and LSTMs in generating diverse and high-quality text.

Contribution

It provides an overview of three major Transformer-based models and discusses their implications and advancements in the field of Natural Language Generation.

Findings

01

Transformers overcome vanishing gradient issues in RNNs and LSTMs.

02

Transformer models achieve groundbreaking results in text generation tasks.

03

Rapid developments in attention mechanisms enhance NLG capabilities.

Abstract

Recent years have seen a proliferation of attention mechanisms and the rise of Transformers in Natural Language Generation (NLG). Previously, state-of-the-art NLG architectures such as RNN and LSTM ran into vanishing gradient problems; as sentences grew larger, distance between positions remained linear, and sequential computation hindered parallelization since sentences were processed word by word. Transformers usher in a new era. In this paper, we explore three major Transformer-based models, namely GPT, BERT, and XLNet, that carry significant implications for the field. NLG is a burgeoning area that is now bolstered with rapid developments in attention mechanisms. From poetry generation to summarization, text generation derives benefit as Transformer-based language models achieve groundbreaking results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZaidAli60/saylaniassignment3
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · SentencePiece · Cosine Annealing · Byte Pair Encoding · Weight Decay · Multi-Head Attention · Discriminative Fine-Tuning · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Layer Normalization