Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality
Vudtiwat Ngampruetikorn, David J. Schwab

TL;DR
This paper applies information bottleneck theory to high-dimensional linear regression, analyzing the trade-offs between residual and relevant information, and revealing fundamental limits and phenomena like double descent in learning.
Contribution
It introduces an information-theoretic framework for understanding overfitting and optimality in high-dimensional regression, including new bounds and insights into algorithm efficiency.
Findings
Optimal algorithms minimize residual information while maximizing relevant information.
Randomized ridge regression's efficiency is compared to optimal algorithms.
Reveals information-theoretic analogs of double and multiple descent phenomena.
Abstract
Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms. Finally, using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical Mechanics and Entropy · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsLinear Regression
