Fast and Strong Convergence of Online Learning Algorithms

Zheng-Chu Guo; Lei Shi

arXiv:1710.03600·cs.LG·October 11, 2017·2 cites

Fast and Strong Convergence of Online Learning Algorithms

Zheng-Chu Guo, Lei Shi

PDF

Open Access

TL;DR

This paper proves the fastest known convergence rates for online learning algorithms in RKHS, showing strong convergence of the last iterate with polynomially decaying step sizes, without explicit regularization.

Contribution

It introduces a novel capacity-dependent analysis that achieves optimal convergence rates and demonstrates strong convergence of the last iterate in RKHS.

Findings

01

Best mean square convergence rate achieved to date.

02

First proof of strong convergence of the last iterate in RKHS.

03

Sharp error estimates leveraging RKHS structure.

Abstract

In this paper, we study the online learning algorithm without explicit regularization terms. This algorithm is essentially a stochastic gradient descent scheme in a reproducing kernel Hilbert space (RKHS). The polynomially decaying step size in each iteration can play a role of regularization to ensure the generalization ability of online learning algorithm. We develop a novel capacity dependent analysis on the performance of the last iterate of online learning algorithm. The contribution of this paper is two-fold. First, our nice analysis can lead to the convergence rate in the standard mean square distance which is the best so far. Second, we establish, for the first time, the strong convergence of the last iterate with polynomially decaying step sizes in the RKHS norm. We demonstrate that the theoretical analysis established in this paper fully exploits the fine structure of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Adaptive Filtering Techniques · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques

MethodsAffine Coupling · Normalizing Flows