Generalization Error Bounds for Optimization Algorithms via Stability

Qi Meng; Yue Wang; Wei Chen; Taifeng Wang; Zhi-Ming Ma; and Tie-Yan; Liu

arXiv:1609.08397·stat.ML·September 28, 2016

Generalization Error Bounds for Optimization Algorithms via Stability

Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, and Tie-Yan, Liu

PDF

Open Access

TL;DR

This paper derives bounds on the generalization error of optimization algorithms like GD, SGD, and SVRG using stability, showing how their convergence impacts test performance in convex and non-convex settings.

Contribution

It provides the first theoretical analysis linking optimization convergence rates with generalization error bounds via stability for both convex and non-convex problems.

Findings

01

Generalization error decreases with training for all algorithms studied.

02

SVRG exhibits better generalization ability than GD and SGD.

03

Experimental results confirm the theoretical bounds and insights.

Abstract

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and non-convex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent