Information-Theoretic Generalization Bounds for Deep Neural Networks

Haiyun He; Ziv Goldfeld

arXiv:2404.03176·cs.LG·May 9, 2025·1 cites

Information-Theoretic Generalization Bounds for Deep Neural Networks

Haiyun He, Ziv Goldfeld

PDF

Open Access

TL;DR

This paper develops information-theoretic bounds on the generalization error of deep neural networks, revealing how depth and layer-wise information contraction influence generalization performance.

Contribution

It introduces hierarchical bounds based on KL divergence and Wasserstein distance, and analyzes information contraction across layers for regularized DNNs, providing new insights into depth's role.

Findings

01

Deeper layers can serve as a generalization funnel with minimal Wasserstein distance.

02

Generalization bounds tighten as the network depth increases under certain conditions.

03

Deeper and narrower networks may generalize better in specific settings.

Abstract

Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds. We first derive two hierarchical bounds on the generalization error in terms of the Kullback-Leibler (KL) divergence or the 1-Wasserstein distance between the train and test distributions of the network internal representations. The KL divergence bound shrinks as the layer index increases, while the Wasserstein bound implies the existence of a layer that serves as a generalization funnel, which attains a minimal 1-Wasserstein distance. Analytic expressions for both bounds are derived under the setting of binary Gaussian classification with linear DNNs. To quantify the contraction of the relevant information measures when moving deeper into the network,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsDropout · DropConnect