Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He, Ziv Goldfeld

TL;DR
This paper develops information-theoretic bounds on the generalization error of deep neural networks, revealing how depth and layer-wise information contraction influence generalization performance.
Contribution
It introduces hierarchical bounds based on KL divergence and Wasserstein distance, and analyzes information contraction across layers for regularized DNNs, providing new insights into depth's role.
Findings
Deeper layers can serve as a generalization funnel with minimal Wasserstein distance.
Generalization bounds tighten as the network depth increases under certain conditions.
Deeper and narrower networks may generalize better in specific settings.
Abstract
Deep neural networks (DNNs) exhibit an exceptional capacity for generalization in practical applications. This work aims to capture the effect and benefits of depth for supervised learning via information-theoretic generalization bounds. We first derive two hierarchical bounds on the generalization error in terms of the Kullback-Leibler (KL) divergence or the 1-Wasserstein distance between the train and test distributions of the network internal representations. The KL divergence bound shrinks as the layer index increases, while the Wasserstein bound implies the existence of a layer that serves as a generalization funnel, which attains a minimal 1-Wasserstein distance. Analytic expressions for both bounds are derived under the setting of binary Gaussian classification with linear DNNs. To quantify the contraction of the relevant information measures when moving deeper into the network,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsDropout · DropConnect
