Generalization Bounds for Stochastic Gradient Descent via Localized   $\varepsilon$-Covers

Sejun Park; Umut \c{S}im\c{s}ekli; Murat A. Erdogdu

arXiv:2209.08951·stat.ML·September 20, 2022·1 cites

Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers

Sejun Park, Umut \c{S}im\c{s}ekli, Murat A. Erdogdu

PDF

Open Access 1 Video

TL;DR

This paper introduces a localized covering technique for analyzing SGD trajectories, leading to dimension-independent generalization bounds for certain non-convex, non-smooth functions, and applies these bounds to multiple machine learning models.

Contribution

It develops a new localized covering method for SGD trajectories that yields dimension-independent complexity measures and improves generalization bounds for various models.

Findings

01

Generalization error bound of O(√(log n log(nP))/n) for certain non-convex functions

02

Dimension-independent bounds that do not require early stopping or decaying step size

03

Improved rates for multi-index linear models, SVMs, and K-means clustering

Abstract

In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with $P$ pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by $O ((lo g n lo g (n P)) / n)$ , where $n$ is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Machine Learning and Algorithms

MethodsStochastic Gradient Descent · Early Stopping