AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan, Sanchari Sen, Jacob R. Stevens, Anand Raghunathan

TL;DR
AxFormer is a framework that improves transformer models by accuracy-driven pruning and selective attention, resulting in faster, smaller, and more accurate NLP models tailored for specific tasks.
Contribution
It introduces a systematic approach combining pruning and hard attention to optimize transformers for downstream tasks, enhancing accuracy and efficiency.
Findings
Up to 4.5% accuracy improvement on NLP tasks
Models are up to 2.5X faster and 3.2X smaller
Compatible with distillation and quantization techniques
Abstract
Transformers have greatly advanced the state-of-the-art in Natural Language Processing (NLP) in recent years, but present very large computation and storage requirements. We observe that the design process of Transformers (pre-train a foundation model on a large dataset in a self-supervised manner, and subsequently fine-tune it for different downstream tasks) leads to task-specific models that are highly over-parameterized, adversely impacting both accuracy and inference efficiency. We propose AxFormer, a systematic framework that applies accuracy-driven approximations to create optimized transformer models for a given downstream task. AxFormer combines two key optimizations -- accuracy-driven pruning and selective hard attention. Accuracy-driven pruning identifies and removes parts of the fine-tuned transformer that hinder performance on the given downstream task. Sparse hard-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsPruning · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax
