The NLP Cookbook: Modern Recipes for Transformer based Deep Learning   Architectures

Sushant Singh; Ausif Mahmood

arXiv:2104.10640·cs.CL·May 18, 2021

The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures

Sushant Singh, Ausif Mahmood

PDF

TL;DR

This paper reviews recent advancements in Transformer-based NLP models, focusing on their architectures, efficiency improvements, and future research directions, highlighting the balance between performance and computational costs.

Contribution

It provides a comprehensive summary and taxonomy of state-of-the-art NLP models, analyzing their architectures, efficiencies, and potential future developments.

Findings

01

Transformers have revolutionized NLP with high performance.

02

Recent models incorporate transfer learning, pruning, and distillation.

03

Efforts are ongoing to improve inference efficiency and handle longer sequences.

Abstract

In recent years, Natural Language Processing (NLP) models have achieved phenomenal success in linguistic and semantic tasks like text classification, machine translation, cognitive dialogue systems, information retrieval via Natural Language Understanding (NLU), and Natural Language Generation (NLG). This feat is primarily attributed due to the seminal Transformer architecture, leading to designs such as BERT, GPT (I, II, III), etc. Although these large-size models have achieved unprecedented performances, they come at high computational costs. Consequently, some of the recent NLP architectures have utilized concepts of transfer learning, pruning, quantization, and knowledge distillation to achieve moderate model sizes while keeping nearly similar performances as achieved by their predecessors. Additionally, to mitigate the data size challenge raised by language models from a knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Linear Layer · Knowledge Distillation · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Label Smoothing · Byte Pair Encoding · Dropout