spred: Solving $L_1$ Penalty with SGD
Liu Ziyin, Zihao Wang

TL;DR
This paper introduces 'spred', a simple stochastic gradient descent method that directly solves $L_1$ constrained problems through a differentiable reparametrization, enabling effective sparse neural network training and compression.
Contribution
It provides a theoretically grounded, exact differentiable solver for $L_1$ penalties using reparametrization, applicable to nonconvex functions in deep learning.
Findings
Effective training of sparse neural networks for gene selection.
Successful neural network compression with $L_1$ penalty.
Bridges gap between deep learning sparsity and statistical learning.
Abstract
We propose to minimize a generic differentiable objective with constraint using a simple reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of previous ideas that the penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact differentiable solver of and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the -penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference
MethodsFeature Selection
