AxFormer: Accuracy-driven Approximation of Transformers for Faster,   Smaller and more Accurate NLP Models

Amrit Nagarajan; Sanchari Sen; Jacob R. Stevens; Anand Raghunathan

arXiv:2010.03688·cs.CL·June 13, 2022

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

Amrit Nagarajan, Sanchari Sen, Jacob R. Stevens, Anand Raghunathan

PDF

Open Access 1 Repo

TL;DR

AxFormer is a framework that improves transformer models by accuracy-driven pruning and selective attention, resulting in faster, smaller, and more accurate NLP models tailored for specific tasks.

Contribution

It introduces a systematic approach combining pruning and hard attention to optimize transformers for downstream tasks, enhancing accuracy and efficiency.

Findings

01

Up to 4.5% accuracy improvement on NLP tasks

02

Models are up to 2.5X faster and 3.2X smaller

03

Compatible with distillation and quantization techniques

Abstract

Transformers have greatly advanced the state-of-the-art in Natural Language Processing (NLP) in recent years, but present very large computation and storage requirements. We observe that the design process of Transformers (pre-train a foundation model on a large dataset in a self-supervised manner, and subsequently fine-tune it for different downstream tasks) leads to task-specific models that are highly over-parameterized, adversely impacting both accuracy and inference efficiency. We propose AxFormer, a systematic framework that applies accuracy-driven approximations to create optimized transformer models for a given downstream task. AxFormer combines two key optimizations -- accuracy-driven pruning and selective hard attention. Accuracy-driven pruning identifies and removes parts of the fine-tuned transformer that hinder performance on the given downstream task. Sparse hard-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amrnag/specialized-transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsPruning · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax