Early Transformers: A study on Efficient Training of Transformer Models through Early-Bird Lottery Tickets
Shravan Cheekati

TL;DR
This paper explores the early-bird ticket hypothesis to efficiently train Transformer models by identifying pruned models early in training, reducing resources needed while maintaining or improving accuracy across various architectures.
Contribution
It introduces a methodology combining iterative pruning and selective retraining to find early-bird tickets in Transformer models, enhancing training efficiency and resource savings.
Findings
Early-bird tickets can be identified within the first few training epochs.
Pruned models from early-bird tickets maintain or surpass original accuracy.
The phenomenon is consistent across different Transformer architectures and tasks.
Abstract
The training of Transformer models has revolutionized natural language processing and computer vision, but it remains a resource-intensive and time-consuming process. This paper investigates the applicability of the early-bird ticket hypothesis to optimize the training efficiency of Transformer models. We propose a methodology that combines iterative pruning, masked distance calculation, and selective retraining to identify early-bird tickets in various Transformer architectures, including ViT, Swin-T, GPT-2, and RoBERTa. Our experimental results demonstrate that early-bird tickets can be consistently found within the first few epochs of training or fine-tuning, enabling significant resource optimization without compromising performance. The pruned models obtained from early-bird tickets achieve comparable or even superior accuracy to their unpruned counterparts while substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBIM and Construction Integration
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Cosine Annealing · Dense Connections · Linear Warmup With Linear Decay · BERT · Dropout · Weight Decay · Attention Dropout
