Kalman-based Stochastic Gradient Method with Stop Condition and   Insensitivity to Conditioning

Vivak Patel

arXiv:1512.01139·math.OC·January 5, 2017·SIAM J. Optim.

Kalman-based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning

Vivak Patel

PDF

TL;DR

This paper introduces a Kalman-based stochastic gradient method (kSGD) that is asymptotically optimal, insensitive to problem conditioning, and includes a justified stop condition, addressing key challenges in large-scale and streaming data optimization.

Contribution

The paper develops and analyzes a second order proximal/SGD method based on Kalman Filtering, providing theoretical guarantees and practical algorithms for large, infinite, or streaming data.

Findings

01

kSGD is asymptotically optimal.

02

kSGD is insensitive to problem conditioning.

03

Supported by experiments on multiple regression problems.

Abstract

Modern proximal and stochastic gradient descent (SGD) methods are believed to efficiently minimize large composite objective functions, but such methods have two algorithmic challenges: (1) a lack of fast or justified stop conditions, and (2) sensitivity to the objective function's conditioning. In response to the first challenge, modern proximal and SGD methods guarantee convergence only after multiple epochs, but such a guarantee renders proximal and SGD methods infeasible when the number of component functions is very large or infinite. In response to the second challenge, second order SGD methods have been developed, but they are marred by the complexity of their analysis. In this work, we address these challenges on the limited, but important, linear regression problem by introducing and analyzing a second order proximal/SGD method based on Kalman Filtering (kSGD). Through our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent