Proximal Stochastic Newton-type Gradient Descent Methods for Minimizing   Regularized Finite Sums

Ziqiang Shi

arXiv:1409.2979·math.OC·October 30, 2014

Proximal Stochastic Newton-type Gradient Descent Methods for Minimizing Regularized Finite Sums

Ziqiang Shi

PDF

Open Access

TL;DR

This paper introduces PROXTONE, a proximal stochastic Newton-type gradient method that leverages second order information to achieve linear convergence in both objective value and solutions for optimizing regularized finite sums.

Contribution

It unifies previous proximal stochastic gradient methods and incorporates second order information to improve convergence rates and solution accuracy.

Findings

01

Achieves linear convergence in objective function value.

02

Achieves linear convergence in the solution.

03

Provides a simple and intuitive proof technique.

Abstract

In this work, we generalized and unified recent two completely different works of Jascha \cite{sohl2014fast} and Lee \cite{lee2012proximal} respectively into one by proposing the \textbf{prox}imal s\textbf{to}chastic \textbf{N}ewton-type gradient (PROXTONE) method for optimizing the sums of two convex functions: one is the average of a huge number of smooth convex functions, and the other is a non-smooth convex function. While a set of recently proposed proximal stochastic gradient methods, include MISO, Prox-SDCA, Prox-SVRG, and SAG, converge at linear rates, the PROXTONE incorporates second order information to obtain stronger convergence results, that it achieves a linear convergence rate not only in the value of the objective function, but also in the \emph{solution}. The proof is simple and intuitive, and the results and technique can be served as a initiate for the research on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research