Are Straight-Through gradients and Soft-Thresholding all you need for   Sparse Training?

Antoine Vanderschueren; Christophe De Vleeschouwer

arXiv:2212.01076·cs.CV·January 25, 2023

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Antoine Vanderschueren, Christophe De Vleeschouwer

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ST-3, a simple yet effective method combining soft-thresholding and straight-through gradients to train sparse neural networks, achieving state-of-the-art accuracy and sparsity trade-offs.

Contribution

The paper presents ST-3, a novel approach that enables progressive sparsification during training using differentiable soft-thresholding and straight-through gradients, outperforming recent methods.

Findings

01

Achieves state-of-the-art accuracy/sparsity trade-offs.

02

Effectively increases sparsity without sharp weight discontinuities.

03

Simplifies sparse training with competitive results.

Abstract

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vanderschuea/stthree
pytorchOfficial

Videos

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM