Drawing Early-Bird Tickets: Towards More Efficient Training of Deep   Networks

Haoran You; Chaojian Li; Pengfei Xu; Yonggan Fu; Yue Wang; Xiaohan; Chen; Richard G. Baraniuk; Zhangyang Wang; and Yingyan Celine Lin

arXiv:1909.11957·cs.LG·March 4, 2025·50 cites

Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks

Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Xiaohan, Chen, Richard G. Baraniuk, Zhangyang Wang, and Yingyan Celine Lin

PDF

Open Access 2 Repos

TL;DR

This paper introduces early-bird (EB) tickets, a method to identify critical subnetworks early in training using low-cost schemes, enabling more efficient deep network training with significant energy savings.

Contribution

The paper demonstrates that winning tickets can be identified early using low-cost training schemes and proposes a mask distance metric for efficient detection, reducing training costs.

Findings

01

EB tickets can be identified early in training.

02

Mask distance effectively finds EB tickets with low overhead.

03

Training only EB tickets achieves up to 4.7x energy savings.

Abstract

(Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsEarly Stopping