Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?
Antoine Vanderschueren, Christophe De Vleeschouwer

TL;DR
This paper introduces ST-3, a simple yet effective method combining soft-thresholding and straight-through gradients to train sparse neural networks, achieving state-of-the-art accuracy and sparsity trade-offs.
Contribution
The paper presents ST-3, a novel approach that enables progressive sparsification during training using differentiable soft-thresholding and straight-through gradients, outperforming recent methods.
Findings
Achieves state-of-the-art accuracy/sparsity trade-offs.
Effectively increases sparsity without sharp weight discontinuities.
Simplifies sparse training with competitive results.
Abstract
Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
