AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, Sivanesan Sangeetha

TL;DR
This survey comprehensively reviews transformer-based pretrained language models in NLP, covering their evolution, core concepts, taxonomy, benchmarks, libraries, and future research directions.
Contribution
It introduces a new taxonomy of T-PTLMs and provides an extensive overview of concepts, benchmarks, and tools, serving as a valuable reference for researchers.
Findings
Summarizes various core concepts and pretraining methods.
Provides a taxonomy of T-PTLMs.
Highlights future research directions.
Abstract
Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Softmax · Discriminative Fine-Tuning · Dense Connections · WordPiece · Byte Pair Encoding
