Uniform Generalization, Concentration, and Adaptive Learning

Ibrahim Alabdulmohsin

arXiv:1608.06072·cs.LG·October 4, 2016·1 cites

Uniform Generalization, Concentration, and Adaptive Learning

Ibrahim Alabdulmohsin

PDF

Open Access

TL;DR

This paper explores the concept of uniform generalization in learning algorithms, establishing its connection to concentration inequalities and providing theoretical bounds and tightness results.

Contribution

It proves that uniform generalization in expectation implies concentration, introduces a chain rule for uniform generalization risk, and derives a tight large deviation bound.

Findings

01

Uniform generalization in expectation implies concentration.

02

A chain rule for the uniform generalization risk is established.

03

A tight large deviation bound is derived.

Abstract

One fundamental goal in any learning algorithm is to mitigate its risk for overfitting. Mathematically, this requires that the learning algorithm enjoys a small generalization risk, which is defined either in expectation or in probability. Both types of generalization are commonly used in the literature. For instance, generalization in expectation has been used to analyze algorithms, such as ridge regression and SGD, whereas generalization in probability is used in the VC theory, among others. Recently, a third notion of generalization has been studied, called uniform generalization, which requires that the generalization risk vanishes uniformly in expectation across all bounded parametric losses. It has been shown that uniform generalization is, in fact, equivalent to an information-theoretic stability constraint, and that it recovers classical results in learning theory. It is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms

MethodsStochastic Gradient Descent