Cyclic Sparse Training: Is it Enough?
Advait Gadhikar, Sree Harsha Nelaturu, Rebekka Burkholz

TL;DR
This paper investigates cyclic sparse training and proposes SCULPT-ing, a method that enhances sparse network training by coupling parameters and masks through repeated cyclic training and pruning, matching state-of-the-art performance efficiently.
Contribution
It challenges existing hypotheses by showing cyclic training improves optimization and introduces SCULPT-ing, a new method that reduces computational cost while maintaining high sparsity performance.
Findings
Cyclic training boosts pruning at initialization.
Repeated cyclic training explores the loss landscape better.
SCULPT-ing matches state-of-the-art performance at high sparsity.
Abstract
The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted by repeated cyclic training, even outperforming standard iterative pruning methods. The dominant mechanism how this is achieved, as we conjecture, can be attributed to a better exploration of the loss landscape leading to a lower training loss. However, at high sparsity, repeated cyclic training alone is not enough for competitive performance. A strong coupling between learnt parameter initialization and mask seems to be required. Standard methods obtain this coupling via expensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching Methods
MethodsPruning
