Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks
Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan, Chen, Richard G. Baraniuk, Zhangyang Wang, and Yingyan Celine Lin

TL;DR
This paper introduces early-bird (EB) tickets, a method to identify critical subnetworks early in training using low-cost schemes, enabling more efficient deep network training with significant energy savings.
Contribution
The paper demonstrates that winning tickets can be identified early using low-cost training schemes and proposes a mask distance metric for efficient detection, reducing training costs.
Findings
EB tickets can be identified early in training.
Mask distance effectively finds EB tickets with low overhead.
Training only EB tickets achieves up to 4.7x energy savings.
Abstract
(Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsEarly Stopping
