Stability and Generalization of Learning Algorithms that Converge to   Global Optima

Zachary Charles; Dimitris Papailiopoulos

arXiv:1710.08402·stat.ML·October 25, 2017·19 cites

Stability and Generalization of Learning Algorithms that Converge to Global Optima

Zachary Charles, Dimitris Papailiopoulos

PDF

Open Access

TL;DR

This paper derives new generalization bounds for algorithms converging to global minima, including neural networks, by analyzing stability based on convergence and loss geometry, applicable to various optimization methods.

Contribution

It introduces black-box stability results for nonconvex loss functions satisfying PL and QG conditions, applicable to multiple optimization algorithms and neural network architectures.

Findings

01

Stability bounds match or surpass existing results.

02

Applicable to neural networks with linear activations.

03

SGD can be stable while GD is not in certain neural network scenarios.

Abstract

We establish novel generalization bounds for learning algorithms that converge to global minima. We do so by deriving black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the loss function. The results are shown for nonconvex loss functions satisfying the Polyak-{\L}ojasiewicz (PL) and the quadratic growth (QG) conditions. We further show that these conditions arise for some neural networks with linear activations. We use our black-box results to establish the stability of optimization algorithms such as stochastic gradient descent (SGD), gradient descent (GD), randomized coordinate descent (RCD), and the stochastic variance reduced gradient method (SVRG), in both the PL and the strongly convex setting. Our results match or improve state-of-the-art generalization bounds and can easily be extended to similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data

MethodsStochastic Gradient Descent