Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization
Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Jing Tian, Yixin Shi,, Sheng Yi, Xiao Tu, Zhihui Zhu

TL;DR
This paper introduces OBProx-SG, a novel stochastic optimization method that enhances sparsity and convergence in l1-regularized problems, outperforming existing methods in both convex and non-convex machine learning tasks.
Contribution
The paper proposes a new orthant-based stochastic gradient method that improves sparsity promotion and convergence guarantees for l1-regularized optimization problems.
Findings
OBProx-SG converges to global optima or stationary points.
It significantly enhances sparsity compared to existing methods.
It achieves higher sparsity in deep neural networks without accuracy loss.
Abstract
Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
MethodsFeature Selection · Depthwise Convolution · Pointwise Convolution · Average Pooling · Global Average Pooling · Depthwise Separable Convolution · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Dense Connections
