Fantastic Generalization Measures are Nowhere to be Found

Michael Gastpar; Ido Nachum; Jonathan Shafer; Thomas Weinberger

arXiv:2309.13658·cs.LG·November 29, 2023·1 cites

Fantastic Generalization Measures are Nowhere to be Found

Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper demonstrates that uniformly tight generalization bounds are impossible in overparameterized neural networks, providing formal proofs and highlighting a trade-off between algorithm performance and bound tightness.

Contribution

It proves mathematically that no uniformly tight generalization bounds exist for overparameterized models, and shows a trade-off between algorithm accuracy and bound tightness.

Findings

01

No uniformly tight bounds exist for overparameterized models.

02

A trade-off exists between model accuracy and the tightness of generalization bounds.

03

Formal proofs establish limitations of current generalization bounds.

Abstract

We study the notion of a generalization bound being uniformly tight, meaning that the difference between the bound and the population loss is small for all learning algorithms and all population distributions. Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize in the overparameterized setting. However, in their paper ``Fantastic Generalization Measures and Where to Find Them,'' Jiang et al. (2020) examine more than a dozen generalization bounds, and show empirically that none of them are uniformly tight. This raises the question of whether uniformly-tight generalization bounds are at all possible in the overparameterized setting. We consider two types of generalization bounds: (1) bounds that may depend on the training set and the learned hypothesis (e.g., margin bounds). We prove mathematically…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

I think the main message of the paper probably is the most interesting part: We need to develop an understanding of the formal assumptions on the algorithms and distributions under which a generalization bound can be tight. Otherwise, we can develop lower bounds as shown in this paper.

Weaknesses

-- The main issue I found in the paper is that in many places the discussions are not precise. As an example the claim about cross-validation can be misleading: The authors claim that cross-validation approaches do not lead to algorithm design principles. It is not completely correct. For instance, the well-known algorithm of the one-inclusion graph [Haussler et al 1988] is based on the cross-validation analysis. [Haussler et al 1988] Haussler, David, Nick Littlestone, and Manfred K. Warmuth.

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

- This work, especially the introduction, is utterly well-written and organised, it has been a pleasure to read it. - The proposed results have the potential to be impacting for the whole generalisation field, as they suggest to give up the current shape of state-of-the-art generalisation bounds to direct future works on bounds focusing on the role of the data-distribution.

Weaknesses

- I have no problems with results (although I did not read carefully the proofs). However, several paragraphs in this paper, not only provide unfair analysis of existing literature but also contains wrong claims concerning existing works. I strongly believe those paragraphs have to be re-written before acceptance, see the Questions section below for details.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. Understanding generalization bounds at a fundamental level is an extremely important and popular topic, and this paper makes a nontrivial contribution in this direction. 2. All the formal results and proofs are sound and reasonably well written. 3. Some of the **proofs are nontrivial**, especially the **proof of Theorem 4**, which is quite impressive and relies on an ingenious splitting into many different situations depending on the rank of the design matrix and whether or not the first

Weaknesses

The paper makes sensational claims that lack nuance: in the main paper and in extensive discussions in the appendix, the paper uses the argument derived from Theorems 1 to 3 to explain that most existing generalization analyses are invalid or at least "lacking" since they "do not take the distribution into account". As far as I am concerned, there are several serious problems with this claim which can be summarized as follows: 1. The general argument is partially valid, but not as strongly as

Videos

Fantastic Generalization Measures are Nowhere to be Found· slideslive

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning

MethodsNone