Training L1-Regularized Models with Orthant-Wise Passive Descent   Algorithms

Jianqiao Wangni

arXiv:1704.07987·cs.LG·February 23, 2018

Training L1-Regularized Models with Orthant-Wise Passive Descent Algorithms

Jianqiao Wangni

PDF

Open Access

TL;DR

This paper introduces OPDA, a novel orthant-wise passive descent algorithm that improves the optimization of L1-regularized models by maintaining parameter signs and promoting sparsity, with proven linear convergence.

Contribution

The paper proposes OPDA, a new algorithm combining SVRG, an alignment operator, and quasi-Newton updates for efficient L1-regularized model training, outperforming existing methods.

Findings

01

OPDA achieves faster convergence than state-of-the-art stochastic proximal algorithms.

02

OPDA effectively maintains parameter orthants and promotes sparsity.

03

Experimental results show OPDA's superior performance on logistic regression and CNNs.

Abstract

The $L_{1}$ -regularized models are widely used for sparse regression or classification tasks. In this paper, we propose the orthant-wise passive descent algorithm (OPDA) for optimizing $L_{1}$ -regularized models, as an improved substitute of proximal algorithms, which are the standard tools for optimizing the models nowadays. OPDA uses a stochastic variance-reduced gradient (SVRG) to initialize the descent direction, then apply a novel alignment operator to encourage each element keeping the same sign after one iteration of update, so the parameter remains in the same orthant as before. It also explicitly suppresses the magnitude of each element to impose sparsity. The quasi-Newton update can be utilized to incorporate curvature information and accelerate the speed. We prove a linear convergence rate for OPDA on general smooth and strongly-convex loss functions. By conducting experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Numerical methods in inverse problems · Stochastic Gradient Optimization Techniques

MethodsLogistic Regression